|
| |
|
It didn't take long before it became apparent that the world wide web
could be so much more than static pages of information. By hiding some parts
of the page, showing other parts and modifying yet other parts depending on
who saw the page, the time of day and countless other factors, you would get
an entierly new media, never seen before... In this article I will take a look
at the development of server side dynamic web pages, with a special focus on
page accessable variables.
The Brute Force Approach
The first way to create dynamic content was of course to create a program
that generates the page, given the parameters that the content should depend on,
and send the page to the client. This is a natural way for any programmer to
solve the problem because it takes a minimum of effort and yields the required
result. It's also a very efficient solution, technically speaking, but given
the average logic on a web page, especially at the time, that's not an issue.
A more problematic issue is that of manageability of the content. If a web page
is generated by program code, then you typically need a programmer to alter the
code, if you want to alter the page. Another problem is the virtually nonexistent
reusability of components. If a program generates a complete, dynamic page, then
you can not use that program for any other purpose than to generate that page.
Both these problems where adressed by SSI, Server Side Includes, that came with
the NCSA server.
|
|
Server Side Includes
The basic idea with SSI is that if you can get the person that writes and
edits web pages to learn HTML, then it should be possible to throw in a few
more tags that inserts the result from our programs. The page would then be
a sort of template or frame around the dynamic content. Hence everything but
the actual dynamic parts can be modified by a non-programmer and every dynamic
component can potentially be used for several pages.
The SSI was designed to look like HTML comments. The reason for
this might have been to make a compatible extension to HTML, but it
could really have looked any way they liked, because it would be
evaluated before the page was sent from the server. An SSI tag is
built like this:
| |
<!--#command parameter="value"-->
|
|
|
|
where the original commands included "include", "exec" and
"echo".
The include directive simply loaded another file and
inserted it where the SSI tag was located. This is e.g. useful for headers and
footers for the pages or a site navigation menu. It could also be a automatically
generated file, e.g. a log file or a file made by a cron job. You'll find the
same functionality in a Roxen WebServer with the <insert file> tag.
The exec directive is the one that actually runs an external program
and inserts the result into the page. The echo directive inserts the value
of an environment variable. Although they all insert data into the page, there are
several differences that I'd like to point out. Even though both exec and
include triggers a disk operation, I'd like to group echo and
include into "passive components" and exec into "active components".
Further I would like to group exec and include into "parametric
components", but I realize that the difference between echo and include
is very thin in this respect and needs a bit more justification. The include
command has two different paramteters to identify the file, called "file" and
"virtual", which operates on different filesystems. One can further think of
the path as a kind of parameter for the file, wheras in the case of echo, the
variable is choosen from a flat structure. This might seem a bit far fetched, but
you'll see how these things matters in the next section.
The Next Section
Although SSI proved to be a good solution for the moment, it was nothing near a
perfect solution. Integration between static and dynamic content was still a problem,
it had just shrunken from static vs. dynamic pages to static vs. dynamic blocks. What
was initially a strength for SSI, its similarity to HTML, had also become a problem.
Let's say we had a program that generated a link to an image. In order to use it in
a <img>-tag we would have to do something like this:
| |
<img src='<!--#exec cmd="generate_url.pike"-->'>
|
|
|
|
wich more or less defeats the purpose of having a HTML-syntax for the language,
since the correct way to write would have been
| |
<img src='<!--#exec cmd="generate_url.pike"-->'>
|
|
|
|
One solution that Roxen implemented adresses both of these issues, although
the second one wasn't really intentional. The basic idea is to identify the
value you want to insert in a non-parametric way, i.e. only by name, as with
the echo command in SSI. Then, if the parser only has to find the name
we don't need as much framework around the identifier, only some special character
signature that doesn't appear otherwise in the page.
| |
<form action="<!--#echo var="DOCUMENT_NAME"-->">
could be
<form action="#DOCUMENT_NAME#"> |
|
|
|
This was used in the old Roxen Challenger output tags and is still used in Cold Fusion
tags.
Now, how does this increase the interaction between static and dynamic content, which
I said was the real reason for using this method. Consider that you have a component
that queries a database and returns the result, which you want to display in a table.
In the SSI-model you would have to do the formatting in the database component, unless
you know how many rows the database query will return and creates a table with one SSI
tag for each cell.
| |
<table>
<tr><td><!--#exec cgi="/cgi-bin/db.pike?row=1&col=1"--></td>
<td><!--#exec cgi="/cgi-bin/db.pike?row=1&col=2"--></td></tr>
<tr><td><!--#exec cgi="/cgi-bin/db.pike?row=2&col=1"--></td>
<td><!--#exec cgi="/cgi-bin/db.pike?row=2&col=2"--></td></tr>
</table>
|
|
|
|
Note that the actual query is not sent to the database component in this example, which of course
is something that you also want to do in order to keep it truly general. With the signature
replacement method the same example might look like the following example.
| |
<table>
<sqloutput query="SELECT name,phone FROM user">
<tr><td>#name#</td><td>#phone#</td></tr>
</db-output>
</table>
|
|
|
With this solution we both achieve sufficient integration between static and dynamic
content and we solve doesn't-look-like-valid-HTML-issue. This is of course a subjective
conclusion, but several independent solutions has this approach to content integration.
Do note that this solution both sends the query to the sqloutput function, which
makes it a more general component, and that the answer can be of any number of rows.
New World - New Problems
With the new flexibility we can now begin to combine several components, or several
instances of the same component, and do some sort of high level programming in the web pages.
The following example illustrates a problem inherited in the signature replacement method:
you must come up with unique signatures that can be replaced. This also means that code that
uses signature replacement isn't context free. You must ensure that your code isn't surrounded
by another signature replacement dependent code using your signature. In effect you can
not build high level components.
| |
<table>
<sqloutput query="SELECT name FROM customer" quote="#">
<tr><td>#name#</td><td>
<sqloutput query="SELECT date FROM order WHERE customer='#name#'" quote="§">
§date§<br>
</sqloutput>
</td></tr>
</sqloutput>
</table>
|
|
|
|
Let me give you an even worse example that illustrates an issue that we havn't raised at
all, so far. In the following example, the output of one function is used as input to the
next function. Note the §#Field#§ construction and make sure you understand how it
works. Now make sure that you understand why it will fail if a Field, or column, in the
database is called anything with an "§" in it. The problem from the parsers point of view
is that it may be imposible to determine the intention of the programmers code, because one
could easily create a situation where we want a newly inserted signature to be replaced
by an outer function.
| |
<table>
<sqloutput query="DESCRIBE the_table" quote="#">
<tr>
<sqloutput query="SELECT #Field# FROM the_table WHERE id=1" quote="§">
<td>§#Field#§</td>
</sqloutput>
</tr>
</sqloutut>
</table>
|
|
|
|
This might look like a problem only for code that uses signature replacement, but
the general issue, quoting forbidden characters is something that must be dealt with
in all methods described so far, although the problem has been fairly trivial so far.
With completely generated pages the problem is no different from normal HTML quoting,
except that the programmer might need to quote characters reserved in that programming
language. With SSI the problem is a bit worse, since it can output the data in
different contexts. Let's say that our program outputs "> x". Then if the text
is outputted where normal text should be it should be inserted as "< x", but if we
outputted it in the src-field of an <img>-tag it should be
inserted as "%3c%20x". Half of the problem, encode the data for the environment it is
inserted into, was solved in Roxen by adding the possibility to add parameters to
the signature. Examples of such signatures are #name:quote=mysql# and
#link:quote=url#.
Two dot Zero
One of the issues in the initial specification for Roxen Challenger 1.4 was to
"end the quoting from hell" and to prevent users from doing so many fatal mistakes
when using variables. To do this we needed a new RXML parser and this then came to be
the reason why Roxen Challenger 1.4 took at least four month longer to develop than
planned, and managed to go to version 2.0 instead. Even now, one and a half year
after work with 1.4 begun, all the features in the new parser are not fully deplyoed
yet.
Let's get down to business about what we did, and why. To begin with, a signature
replacement based variable insertion had already been proven to be fairly successful
to do the integration between static and dynamic content. The drawback was that
when doing nested insertions one had to come up with new signatures, and the
signatures one came up with could be present int the data. Each bucket of variables
was named a scope, which was a concept already present in Challenger 1.3, although
hardly used at the time. An initial thought was that one could number the scopes in creation order,
e.g. #1:name# would fetch the value of the variable "name" from the first
scope, but with that solution you would still have context dependent code.
Finally, the choice for the variable references came to be something similar to
HTML entities, because it was already well integrated in HTML/XML and software that
handles HTML/XML. To indicate which scope to operate in, two methods where created.
One is to to call the scopes by name, hence it is possible to make code with absolute
references to different scopes and thereby making them relocatable. The other method
is a special case method. To operate in the present scope you use the name "_", hence
it is possible to make code that is relocatable into any scope.
| |
|
|
It is also possible to quote the dot in the entity, so that it is possible for entities
and scopes to have "." in their names, e.g. &form.submit..x; for submit
buttons which uses an image.
Another goal outlined in the Roxen Challenger 1.4 specification was to solve
unintentional reevaluation of RXML, e.g. when a RXML function outputs new RXML
code which gets parsed again. This is potentially very dangerous if the output
data originated from the user. One of the solutions for this was to make the RXML
parser only parse the page once, i.e. never the output of a RXML tag. In the same
way variable expansion was also only done once, so there is no need worry about an
entity outputting another entity. There was however still the issue of an entity
outputting RXML code.
The possibility for a variable representation to take parameters, as seen with the
output tags, where reduced to the ability to select encoding. The rationale for not
taking general parameters was, amoung other things, that it looked ugly and cluttered,
but also for technical and conceptual reasons.
| |
&scope.variable:encoding;
|
|
|
The encoding parameter was supplemented with "context types", i.e. one can declare the
type of each possible insertion point of entity values, hence giving the parser a
good default encoding. This feature is not fully deployed yet, although it is fairly
complete at the backend level, making all the types and declaring all the input and
output types for all the RXML tags still remains to do.
Loose ends
I have not presented the more technical aspects of variables, entities, scopes and
types for a good reason; It would easily make this article several times longer. If
you do want to know how to use scopes and entities in your own program the
server/etc/modules/RXML.pmod/module.pike file is a good startig point. E.g. read the
class definition for Scope and for Entity. You'll find plenty of down-to-the-details
documentation here as well. Then move on to the SSI module, rxmltags module and
server/etc/modules/Roxen.pmod to see examples of how scopes and entities are registered.
Finally, a comprehensive fortune cookie study has shown that in coming versions of
Roxen WebServer, you will be able to handle more complex data types. E.g. a mapping
could be indexed like this:
| |
|
|
The already existing framework for RXML types will be developed and several new, cool
types will be added, making it possible to move high level data objects like tables and
images between tags in RXML variables. Stay tuned...
|  |