community.roxen.com
Not logged in Date: May 16, 2008
 DEMO  DOCS  PIKE
 COMMUNITY  DOWNLOAD
Home Articles The History of Variable Substitution www.roxen.com

The History of Variable Substitution

Author: Martin Nilsson <nilsson@roxen.com>
Last modified: 2000-12-12 4:34:46


It didn't take long before it became apparent that the world wide web could be so much more than static pages of information. By hiding some parts of the page, showing other parts and modifying yet other parts depending on who saw the page, the time of day and countless other factors, you would get an entierly new media, never seen before... In this article I will take a look at the development of server side dynamic web pages, with a special focus on page accessable variables.

The Brute Force Approach

The first way to create dynamic content was of course to create a program that generates the page, given the parameters that the content should depend on, and send the page to the client. This is a natural way for any programmer to solve the problem because it takes a minimum of effort and yields the required result. It's also a very efficient solution, technically speaking, but given the average logic on a web page, especially at the time, that's not an issue.

A more problematic issue is that of manageability of the content. If a web page is generated by program code, then you typically need a programmer to alter the code, if you want to alter the page. Another problem is the virtually nonexistent reusability of components. If a program generates a complete, dynamic page, then you can not use that program for any other purpose than to generate that page. Both these problems where adressed by SSI, Server Side Includes, that came with the NCSA server.

Martin Nilsson
The author, Martin Nilsson
<nilsson@roxen.com>

Server Side Includes

The basic idea with SSI is that if you can get the person that writes and edits web pages to learn HTML, then it should be possible to throw in a few more tags that inserts the result from our programs. The page would then be a sort of template or frame around the dynamic content. Hence everything but the actual dynamic parts can be modified by a non-programmer and every dynamic component can potentially be used for several pages.

The SSI was designed to look like HTML comments. The reason for this might have been to make a compatible extension to HTML, but it could really have looked any way they liked, because it would be evaluated before the page was sent from the server. An SSI tag is built like this:



<!--#command parameter="value"-->

where the original commands included "include", "exec" and "echo".

The include directive simply loaded another file and inserted it where the SSI tag was located. This is e.g. useful for headers and footers for the pages or a site navigation menu. It could also be a automatically generated file, e.g. a log file or a file made by a cron job. You'll find the same functionality in a Roxen WebServer with the <insert file> tag.

The exec directive is the one that actually runs an external program and inserts the result into the page. The echo directive inserts the value of an environment variable. Although they all insert data into the page, there are several differences that I'd like to point out. Even though both exec and include triggers a disk operation, I'd like to group echo and include into "passive components" and exec into "active components". Further I would like to group exec and include into "parametric components", but I realize that the difference between echo and include is very thin in this respect and needs a bit more justification. The include command has two different paramteters to identify the file, called "file" and "virtual", which operates on different filesystems. One can further think of the path as a kind of parameter for the file, wheras in the case of echo, the variable is choosen from a flat structure. This might seem a bit far fetched, but you'll see how these things matters in the next section.

The Next Section

Although SSI proved to be a good solution for the moment, it was nothing near a perfect solution. Integration between static and dynamic content was still a problem, it had just shrunken from static vs. dynamic pages to static vs. dynamic blocks. What was initially a strength for SSI, its similarity to HTML, had also become a problem. Let's say we had a program that generated a link to an image. In order to use it in a <img>-tag we would have to do something like this:



<img src='<!--#exec cmd="generate_url.pike"-->'>

wich more or less defeats the purpose of having a HTML-syntax for the language, since the correct way to write would have been


<img src='&lt;!--#exec cmd="generate_url.pike"--&gt;'>

One solution that Roxen implemented adresses both of these issues, although the second one wasn't really intentional. The basic idea is to identify the value you want to insert in a non-parametric way, i.e. only by name, as with the echo command in SSI. Then, if the parser only has to find the name we don't need as much framework around the identifier, only some special character signature that doesn't appear otherwise in the page.



<form action="<!--#echo var="DOCUMENT_NAME"-->">

could be

<form action="#DOCUMENT_NAME#">

This was used in the old Roxen Challenger output tags and is still used in Cold Fusion tags.

Now, how does this increase the interaction between static and dynamic content, which I said was the real reason for using this method. Consider that you have a component that queries a database and returns the result, which you want to display in a table. In the SSI-model you would have to do the formatting in the database component, unless you know how many rows the database query will return and creates a table with one SSI tag for each cell.



<table>
<tr><td><!--#exec cgi="/cgi-bin/db.pike?row=1&col=1"--></td>
    <td><!--#exec cgi="/cgi-bin/db.pike?row=1&col=2"--></td></tr>
<tr><td><!--#exec cgi="/cgi-bin/db.pike?row=2&col=1"--></td>
    <td><!--#exec cgi="/cgi-bin/db.pike?row=2&col=2"--></td></tr>
</table>

Note that the actual query is not sent to the database component in this example, which of course is something that you also want to do in order to keep it truly general. With the signature replacement method the same example might look like the following example.


<table>
<sqloutput query="SELECT name,phone FROM user">
  <tr><td>#name#</td><td>#phone#</td></tr>
</db-output>
</table>

With this solution we both achieve sufficient integration between static and dynamic content and we solve doesn't-look-like-valid-HTML-issue. This is of course a subjective conclusion, but several independent solutions has this approach to content integration. Do note that this solution both sends the query to the sqloutput function, which makes it a more general component, and that the answer can be of any number of rows.

New World - New Problems

With the new flexibility we can now begin to combine several components, or several instances of the same component, and do some sort of high level programming in the web pages. The following example illustrates a problem inherited in the signature replacement method: you must come up with unique signatures that can be replaced. This also means that code that uses signature replacement isn't context free. You must ensure that your code isn't surrounded by another signature replacement dependent code using your signature. In effect you can not build high level components.



<table>
<sqloutput query="SELECT name FROM customer" quote="#">
  <tr><td>#name#</td><td>
    <sqloutput query="SELECT date FROM order WHERE customer='#name#'" quote="§">
    §date§<br>
    </sqloutput>
  </td></tr>
</sqloutput>
</table>

Let me give you an even worse example that illustrates an issue that we havn't raised at all, so far. In the following example, the output of one function is used as input to the next function. Note the §#Field#§ construction and make sure you understand how it works. Now make sure that you understand why it will fail if a Field, or column, in the database is called anything with an "§" in it. The problem from the parsers point of view is that it may be imposible to determine the intention of the programmers code, because one could easily create a situation where we want a newly inserted signature to be replaced by an outer function.



<table>
<sqloutput query="DESCRIBE the_table" quote="#">
  <tr>
  <sqloutput query="SELECT #Field# FROM the_table WHERE id=1" quote="§">
    <td>§#Field#§</td>
  </sqloutput>
  </tr>
</sqloutut>
</table>

This might look like a problem only for code that uses signature replacement, but the general issue, quoting forbidden characters is something that must be dealt with in all methods described so far, although the problem has been fairly trivial so far. With completely generated pages the problem is no different from normal HTML quoting, except that the programmer might need to quote characters reserved in that programming language. With SSI the problem is a bit worse, since it can output the data in different contexts. Let's say that our program outputs "> x". Then if the text is outputted where normal text should be it should be inserted as "< x", but if we outputted it in the src-field of an <img>-tag it should be inserted as "%3c%20x". Half of the problem, encode the data for the environment it is inserted into, was solved in Roxen by adding the possibility to add parameters to the signature. Examples of such signatures are #name:quote=mysql# and #link:quote=url#.

Two dot Zero

One of the issues in the initial specification for Roxen Challenger 1.4 was to "end the quoting from hell" and to prevent users from doing so many fatal mistakes when using variables. To do this we needed a new RXML parser and this then came to be the reason why Roxen Challenger 1.4 took at least four month longer to develop than planned, and managed to go to version 2.0 instead. Even now, one and a half year after work with 1.4 begun, all the features in the new parser are not fully deplyoed yet.

Let's get down to business about what we did, and why. To begin with, a signature replacement based variable insertion had already been proven to be fairly successful to do the integration between static and dynamic content. The drawback was that when doing nested insertions one had to come up with new signatures, and the signatures one came up with could be present int the data. Each bucket of variables was named a scope, which was a concept already present in Challenger 1.3, although hardly used at the time. An initial thought was that one could number the scopes in creation order, e.g. #1:name# would fetch the value of the variable "name" from the first scope, but with that solution you would still have context dependent code.

Finally, the choice for the variable references came to be something similar to HTML entities, because it was already well integrated in HTML/XML and software that handles HTML/XML. To indicate which scope to operate in, two methods where created. One is to to call the scopes by name, hence it is possible to make code with absolute references to different scopes and thereby making them relocatable. The other method is a special case method. To operate in the present scope you use the name "_", hence it is possible to make code that is relocatable into any scope.



&scope.variable;

It is also possible to quote the dot in the entity, so that it is possible for entities and scopes to have "." in their names, e.g. &form.submit..x; for submit buttons which uses an image.

Another goal outlined in the Roxen Challenger 1.4 specification was to solve unintentional reevaluation of RXML, e.g. when a RXML function outputs new RXML code which gets parsed again. This is potentially very dangerous if the output data originated from the user. One of the solutions for this was to make the RXML parser only parse the page once, i.e. never the output of a RXML tag. In the same way variable expansion was also only done once, so there is no need worry about an entity outputting another entity. There was however still the issue of an entity outputting RXML code.

The possibility for a variable representation to take parameters, as seen with the output tags, where reduced to the ability to select encoding. The rationale for not taking general parameters was, amoung other things, that it looked ugly and cluttered, but also for technical and conceptual reasons.



&scope.variable:encoding;

The encoding parameter was supplemented with "context types", i.e. one can declare the type of each possible insertion point of entity values, hence giving the parser a good default encoding. This feature is not fully deployed yet, although it is fairly complete at the backend level, making all the types and declaring all the input and output types for all the RXML tags still remains to do.

Loose ends

I have not presented the more technical aspects of variables, entities, scopes and types for a good reason; It would easily make this article several times longer. If you do want to know how to use scopes and entities in your own program the server/etc/modules/RXML.pmod/module.pike file is a good startig point. E.g. read the class definition for Scope and for Entity. You'll find plenty of down-to-the-details documentation here as well. Then move on to the SSI module, rxmltags module and server/etc/modules/Roxen.pmod to see examples of how scopes and entities are registered.

Finally, a comprehensive fortune cookie study has shown that in coming versions of Roxen WebServer, you will be able to handle more complex data types. E.g. a mapping could be indexed like this:



&scope.variable.index;

The already existing framework for RXML types will be developed and several new, cool types will be added, making it possible to move high level data objects like tables and images between tags in RXML variables. Stay tuned...