Features of the Resource-Oriented Architecture - Statelessness (Page 2 of 4 )
Addressability is one of the four main features of the ROA. The second is statelessness. I’ll give you two definitions of statelessness: a somewhat general definition and a more practical definition geared toward the ROA.
Statelessness means that every HTTP request happens in complete isolation. When the client makes an HTTP request, it includes all information neccessary for the server to fulfill that request. The server never relies on information from previous requests. If that information was important, the client would have sent it again in this request.
More practically, consider statelessness in terms of addressability. Addressability says that every interesting piece of information the server can provide should be exposed as a resource, and given its own URI. Statelessness says that the possible states of the server are also resources, and should be given their own URIs. The client should not have to coax the server into a certain state to make it receptive to a certain request.
On the human web, you often run into situations where your browser’s back button doesn’t work correctly, and you can’t go back and forth in your browser history. Sometimes this is because you performed an irrevocable action, like posting a weblog entry or buying a book, but often it’s because you’re at a web site that violates the principle of statelessness. Such a site expects you to make requests in a certain order: A, B, then C. It gets confused when you make request B a second time instead of moving on to request C.
Let’s take the search example again. A search engine is a web service with an infinite number of possible states: at least one for every string you might search for. Each state has its own URI. You can ask the service for a directory of resources about mice: http:// www.google.com/search?q=mice. You can ask for a directory of resources about jellyfish: http://www.google.com/search?q=jellyfish. If you’re not comfortable creating a URI from scratch, you can ask the service for a form to fill out: http://www.google.com/.
When you ask for a directory of resources about mice or jellyfish, you don’t get the whole directory. You get a single page of the directory: a list of the 10 or so items the search engine considers the best matches for your query. To get more of the directory you must make more HTTP requests. The second and subsequent pages are distinct states of the application, and they need to have their own URIs: something like http:// www.google.com/search?q=jellyfish&start=10. As with any addressable resource, you can transmit that state of the application to someone else, cache it, or bookmark it and come back to it later.
This is a stateless application because every time the client makes a request, it ends up back where it started. Each request is totally disconnected from the others. The client can make requests for these resources any number of times, in any order. It can request page 2 of “mice” before requesting page 1 (or not request page 1 at all), and the server won’t care.
By way of contrast, Figure 4-2 shows the same states arranged statefully, with states leading sensibly into each other. Most desktop applications are designed this way.
That’s a lot better organized, and if HTTP were designed to allow stateful interaction, HTTP requests could be a lot simpler. When the client started a session with the search engine it could be automatically fed the search form. It wouldn’t have to send any request data at all, because the first response would be predetermined. If the client was looking at the first 10 entries in the mice directory and wanted to see entries 11–20, it
Figure 4-1.A stateless search engine
Figure 4-2. A stateful search engine
could just send a request that said “start=10”. It wouldn’t have to send /search?q=mice&start=10, repeating the intitial assertions: “I’m searching, and searching for mice in particular.”
FTP works this way: it has a notion of a “working directory” that stays constant over the course of a session unless you change it. You might log in to an FTP server, cd to a certain directory, and get a file from that directory. You can get another file from the same directory, without having to issue a second cd command. Why doesn’t HTTP support this?
State would make individual HTTP requests simpler, but it would make the HTTP protocol much more complicated. An FTP client is much more complicated than an HTTP client, precisely because the session state must be kept in sync between client and server. This is a complex task even over a reliable network, which the Internet is not.
To eliminate state from a protocol is to eliminate a lot of failure conditions. The server never has to worry about the client timing out, because no interaction lasts longer than a single request. The server never loses track of “where” each client is in the application, because the client sends all neccessary information with each request. The client never ends up performing an action in the wrong “working directory” due to the server keeping some state around without telling the client.
Statelessness also brings new features. It’s easier to distribute a stateless application across load-balanced servers. Since no two requests depend on each other, they can be handled by two different servers that never coordinate with each other. Scaling up is as simple as plugging more servers into the load balancer. A stateless application is also easy to cache: a piece of software can decide whether or not to cache the result of an HTTP request just by looking at that one request. There’s no nagging uncertainty that state from a previous request might affect the cacheability of this one.
The client benefits from statelessness as well. A client can process the “mice” directory up to page 50, bookmark /search?q=mice&start=50, and come back a week later without having to grind through dozens of predecessor states. A URI that works when you’re hours deep into an HTTP session will work the same way as the first URI sent in a new session.
To make your service addressabile you have to put in some work, dissect your application’s data into sets of resources. HTTP is an intrinsically stateless protocol, so when you write web services, you get statelessness by default. You have to do something to break it.
The most common way to break statelessness is to use your framework’s version of HTTP sessions. The first time a user visits your site, he gets a unique string that identifies his session with the site. The string may be kept in a cookie, or the site may propagate a unique string through all the URIs it serves a particular client. Here’s an session cookie being set by a Rails application:
The important thing is, that nonsensical hex or decimal number is not the state. It’s a key into a data structure on the server, and the data structure contains the state. There’s nothing unRESTful about stateful URIs: that’s how the server communicates possible next states to the client. However, there is something unRESTful about cookies, as I discuss in “The Trouble with Cookies.” To use a web browser analogy, cookies break a web service client’s back button.
Think of the query variable start=10 in a URI, embedded in an HTML page served by the Google search engine. That’s the server sending a possible next state to the client.
But those URIs need to contain the state, not just provide a key to state stored on the server. start=10 means something on its own, and PHPSESSID=27314962133 doesn’t. RESTfulness requires that the state stay on the client side, and be transmitted to the server for every request that needs it. The server can nudge the client toward new states, by sending stateful links for the client to follow, but it can’t keep any state of its own.