|
|
|
|
|
|||||||||||||||||||||||||||||||||||||||||
|
|
|
|||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
Author: Kai Voigt <kai@roxen.de>
Last modified: 2000-11-15 15:35:00
The World Wide Web is a perfect platform for distributed applications. All a user needs is a webbrowser for the input and output. Anything else is managed on the webserver. This article is about this anything else. The author wrote a module for the Roxen Webserver that adds a distinct set of variables for each browser session. This article will start with a look at the current technical background and the resulting problems when trying to develop webbased applications. Then, solutions for those problems will be presented and how sessions and session variables work in the Roxen module. Finally, some examples are shown about how to make use of session variables.
As said before, we want to use the World Wide Web as an environment for webbased applications. The concept of such applications is obvious, it's a client-server model. The client is the webbrowser and it's acting as an intelligent terminal for displaying the application's output and sending data input back to the server. We will of course use HTML for the display data and HTTP for data transport.
The author, Kai Voigt
<kai@roxen.de>The webserver is the application server. It is running the code, receives input from the client, computes the required output and sends it back to the client. This model is similar to X Windows on UNIX systems. The server needs to distinguish between different instances of the same application and handle states of those sessions. These are the core tasks of the session module presented in this article.
The client-server model offers some advantages.
- One code: Since the application itself is only running on the webserver, you don't have to distribute the software to any possible user. This makes maintaining your software easy as you can quickly make changes.
- Global access: By using HTML and HTTP as open standards, users can access the application from any webbrowser that is connected to the network. You don't have to rely on a specific operating system or hardware. Even a WAP phone or a PDA will suffice.
While PHP4, Microsoft's ASP, Apple's WebObjects already support sessions, the Roxen Webserver is still shipped without session features. That's why the author created such a module for the Roxen Webserver as his diploma thesis.
The core of the module consists of three important tasks. The first problem is the identification of different browsers. This results in two subtasks. A session identifier with some required qualities has to be created and then transported to the webbrowser. The webbrowser will send this identifier with every HTTP request to the server for the identification of the session.
The biggest task is to handle session variables for each webbrowser during the single HTTP requests. An application consists of the code itself and each instance of the application (session) is having a unique state which needs to be stored on server side. This state is placed into session variables that need to be stored during the session.
While a UNIX based application knows about an exit() system call to end a process, there is no termination in HTTP. To prevent session variables that are not used anymore to consume storage space, we need to delete old session variables frequently. There are several ways to decide what variables are deleted at what time.
Each webbrowser needs to be identified by the webserver to handle a seperate session of the application. This session identifier needs to be both unique and secure. In UNIX, a process identifier is simply a counter starting at 1. While this is a unique identifier, it is not secure, i.e. it is guessable. An evil user can use his webbrowser and send a session identifier that belongs to some other session and is able to hitchhike the entire session and its content. Of course, this is not secure.
The solution is to create some random data containing a configurable passphrase that is only known to the server administrator, a counter and a timestamp. This random data is then sent into the md5 hash algorithm. The output is a 128 bit result with the required properties. md5 creates hardly guessable values that are unique.
The following function computes the session identifier.
string sessionid_create() { object md5 = Crypto.md5(); md5->update(query("secret")); md5->update(sprintf("%d", roxen->increase_id())); md5->update(sprintf("%d", time(1))); return(Crypto.string_to_hex(md5->digest())); }
An md5 value looks like this in its hexadecimal notation:
e389d67a67e6c344b278d8eefc56ed7c
The next problem is to transport the generated session identifier to the browser and let the browser transmit this identifier with each request. The most convinient way is to store a cookie in the browser's memory. A cookie will be sent in the HTTP header from and to the webserver. This way, the webserver will be able to identify the webbrowser with each request.
Another way is to encode the session identifier into the URL of the website, either into the request path, or into the hostname. The URLs would look like this:
http://www.myapplication.com/(SessionID=deadbeef)/ http://deadbeef.myapplication.com/
The hostname alternative is a patented algorithm and thus not present in the session module. Encoding the session identifier as a Roxen prestate is implemented as an alternative to cookies.
The algorithm of the module is simple. For each request, the webserver examines if the webbrowser sent a session identifier either as a cookie or in the URL. If there's no identifier, a new one will be generated with the methods described above. Then a cookie with this new identifier is being sent to the webbrowser as well as a redirect to an URL containing the session identifier as a prestate. If the webbrowser doesn't support cookies, it still can send the identfier in the URL.
For cosmetic reasons the prestate is removed with another redirect in case the session identifier is being sent in both places, cookie and prestate.
Now it's possible to identify each webbrowser with each request. The next step is to use variables that we can fill with content and use those values during the sequent requests. The module will use the mapping id->misc->session_variables to let the document access the session variables.
It should be possible to store any kind of data into the mapping id->misc->session_variables, like integers, floats, strings, arrays and mappings. There are other difficult data types like objects, programs, file and socket descriptors. Those are not covered by this module.
Pike offers some builtin functions to encode variables into a string. Since this string is not plain ASCII but an 8bit string, a base64 encoding will be used to get an ASCII string. By using the following code, we can encode id->misc->session_variables into a plain string that we can store after the request has been fully processed.
string tmp = encode_values(id->misc->session_variables); string serialized_string = MIME.encode_base64(tmp);
Of course, the opposite method is applied at the start of each request. The encoded ASCII string is decoded into the session variables.
Now, the plain string containing the encoded session variables for a single request needs to be stored somewhere. And it has to be restored when the same webbrowser sends the next request. There are some ways to store the session variables on the server.
A simple solution is to keep all session variables in memory. This is a fast and portable method, and encoding the variables into a plain ASCII string is not even required. But memory is having the big disadvantage that all the variables will be lost when the server is being restarted, either due to a crash or administration reasons. Memory is also limited in its available size.
To gain a persistant storage, a database can be used. It's still fast enough and offers the possibility to do load balancing by having multiple servers running the application and accessing the same storage of session variables. It also can store huge amounts of data.
Another storage method is using simple files to store the variables. This is not very fast, but portable and persistant. It depends on the requirements of the application what method to choose. Usually, the database method will be the best compromise.
Each single webbrowser will create a distinct set of sesssion variables on the webserver. Eventually, the webbrowser will stop sending requests. Since HTTP doesn't know states there's nothing like an exit() system call as in UNIX. That's why the webserver needs to clean up the storage frequently to delete unused session variables. Otherwise, the storage will continue growing and reach its physical limit.
The point time when to run a garbage collection can be computed on different methods. A simple approach is to run it every 10, 100 or 1000 requests. A more sophisticated idea is to start a garbage collection depending on the server load and the size of the stored session variables. The current version of the module only supports the simple method.
The next question to be answered is what to delete. Either all sessions except the session that had been active during the past 10 or 20 minutes can be deleted, or you can delete as many sessions to just the most recent ones not consuming more than 10 or 20 MB of storage size. Again, the current version of the module has implemented the first method.
The first example is a simple counter. This is good enough to check if the module is working correctly. For each request, id->misc->session_variables->counter should be incremented by 1. This can be done with the following piece of code.
Counter = <pike> id->misc->session_variables->counter++; </pike>
With each request to a document containing these lines, the counter will be incremented and implements a simple counter. In case you are running Roxen 2.0 or above, you can use the new RXML tags to create the same result. It's the same functionality, but it's cleaner to read and unterstand.
<inc variable="session.counter" /> Counter = &session.counter;
Another more complex example is to create a simple shopping cart. Adding an item to a virtual cart is done by adding an element to a session variable, like the following lines will do.
id->misc->session_variables->cart += (["quantity":5, "product": "Roxen Platform"]);
The cart mapping will contain all the items that are stored and it's easy to create some smart RXML tags to work on the content of the shopping cart.
<cart_add item="Roxen Platform" quantity="5" /> <cart_edit /> <cart_init /> <cart_dump />
A session is meant to be something temporary like a UNIX process is too. You start a session and end it eventually. But sometimes, you want to store data for a longer period of time to access it again and again in any new session. This data is usually bound to a specific user, like preferences. It should be possible to access a newspaper website and let a user configure the content of the website, like sport headlines and the latest cartoons.
It's necessary to attach a user to a running session. This needs to be protected by some authentication. The session module can use any existing user authentication module, like an SQL database, the system's password file or an LDAP server. If the user needs to login to a protected document, a form based authentication will take place. After successful authentication, the username will attached to the session and the document can use the user variables from previous sessions.
By not using the HTTP authentication, it's even possible to do a clean logout.
Documentation, the module code, examples and the module in action can be found at http://session.123.org/.
Further documentation about the Roxen Webserver is available at http://roxen.com/.