|
| |
|
(Or "Tags interacting" or "The liver of RXML" or "Together we are <strong>" or
"Grown up tags doing things together")
In my previous article The soul of RXML I showed
some of the basic RXML tag syntax, from a module point of view, and showed a few
ways to get the new tag system to mimic the old one. In this article I will begin to
demonstrate why the new parser took half a year longer to develop than initially
planned. Let's get serious.
Context
I somehow managed to write an entire article about how to program RXML tags
without mentioning virtually anything about the parser, but we won't get much
further without some basic understanding of the parser. Much like the WebServer
keeps all variables and objects that concerns a specific request in the RequestID
object, the parser keeps most of its states and variables in a context object.
This object is then continually updated and questioned during the parsing and
evaluation of a page.
The difference between parsing and evaluation is not something that really
matters on this level of programming, and the terms are already mixed up in
RXML. There is one tag called <noparse> while the tag with the
inverse function is called <eval>. I'll try to clarify the
distinction with an example. When the parser finds the character "&" in
an XML document it knows it has found the beginning of an entity. The process
of finding this is called parsing. Then the parser finds the ending ";" character,
and the characters "form.foo" in between. Still parsing. That string can be divided
into the scope name "form" and the variable name "foo". Still parsing. That entity
can however be replaced with the value found in the variable foo in the
scope form. Evaluation.
It is during the evaluation that the RXML parser asks the context if there is a
variable "foo" in the scope "form". If we make a tag that creates a new scope, it
will be put into the context and removed when we exit the scope of our tag. If we
create a new scope with the same name as an already existing scope, we will replace
that scope with our own during the evaluation of our tag. The following simple RXML
example demonstrates temporary addition
of a variable scope.
|
|
| |
<pre>
<insert scopes="plain"/>
<emit source="values" values="a">
<insert scopes="plain"/>
</emit>
<insert scopes="plain"/>
</pre>
|
|
|
|
Which will yield something similar to the following:
| |
This examplifies temporarily addition of a variable scope.
client, cookie, form, page, roxen and var
client, cookie, form, page, roxen, values and var
client, cookie, form, page, roxen and var
|
|
|
|
You can add scope="form" to the emit tag and verify, e.g. by
using <insert variables="full" scope="form"/>, that overloading
of scopes works.
Behind the magic
Let's have a look at the interface for creating new variable scopes from a
tag module. We start with a dummy-tag:
| |
class TagDummy {
inherit RXML.Tag;
constant name = "dummy";
class Frame {
inherit RXML.Frame;
array do_enter(RequestID id) {
}
}
}
|
|
|
|
Unsurprisingly this tag doesn't do anything. Note that I don't have to have
the empty do_enter function in the dummy tag for it to work. I have it there to
make the next picture more instructive. In it I have added two variables, vars
and scope_name, as well as code that initializes them.
| |
class TagDummy {
inherit RXML.Tag;
constant name = "dummy";
class Frame {
inherit RXML.Frame;
mapping(string:mixed) vars;
string scope_name;
array do_enter(RequestID id) {
scope_name = args->scope || "dummy";
vars = ([ "one":"eins", "two":"zwei" ]);
}
}
}
|
|
|
|
The string scope_name is of course the name of the scope, and the
mapping vars is a mapping from the name of the variable to the content of it.
In this case we have a static mapping that would make <dummy>&_.one; &dummy.two;</dummy>
to produce eins zwei. One might think that it should be ok to declare
the contents of the mapping vars only once, when it is defined. Remember from
the previous article that frames might be reused, and someone might have altered
the values of the vars mapping, e.g. by using the <set> tag.
Loops
And now for something completly different. The new tag API provides a way
to easily create iterations. The most basic way is to repeat the
content a predefined number of times, which is accomplished by setting the
integer do_interate to the number of times the content should be iterated.
| |
class TagLoop {
inherit RXML.Tag;
constant name = "loop";
class Frame {
inherit RXML.Frame;
array do_enter(RequestID id) {
do_iterate = (int)args->loops;
}
int do_iterate;
}
}
|
|
|
|
Compare that implementation with the following:
| |
class TagLoop {
inherit RXML.Tag;
constant name = "loop";
class Frame {
inherit RXML.Frame;
array do_return(RequestID id) {
result = content * (int)args->loops;
}
}
}
|
|
|
|
The latter implementation differs in functionality in two important ways.
First the content is not re-evaluated for each iteration, which can be both good
and bad, of course, depending on what you are trying to accomplish. Secondly,
and not as apparent, the second solution can not stream. The new parser has,
although currently not activated, support for streaming, and can in the first
case iterate a few times, send the intermediate result to the client, iterate a
few times more, and so on instead of processing and sending the whole batch in
one big chunk. This can lead to a more responsive system, since you won't have
to wait for the entire page to be parsed and evaluated before you can start
receiving and seeing its contents render in your browser.
Instead of having a fix integer for the number of loops, one can use a function
do_iterate() returning the number of loops that should be performed before it is called again.
Returning zero means that the iterations are done. This is obviously more flexible,
since we can now create conditional loops, e.g. while-loops, but we are also able
to perform operations between each loop. This is how the <emit> tag
works. The following little code creates an emit-like tag that outputs parts of
the multiplication table.
| |
class TagMultab {
inherit RXML.Tag;
constant name = "multab";
class Frame {
inherit RXML.Frame;
mapping(string:int) vars;
string scope_name;
int counter;
array do_enter(RequestID id) {
scope_name = args->scope_name || "multab";
args->to = (int)args->to+2;
counter = 0;
}
int do_iterate(RequestID id) {
vars = ([ "one":counter,
"two":counter*2,
"three":counter*3,
"four":counter*4,
"five":counter*5,
]);
counter++;
return counter<args->to;
}
}
}
|
|
|
Plugins
Assuming that you know a decent amount of RXML, you know that there
are several emit sources. Why chose to have several emit "sources" instead
of just making more tags? To begin with you would typically want to be able
to perform a bunch of generic table operations on all these tags, such as
limit the number of output rows, skip a certain amount of preceding rows,
sort the rows etc. You could inherit a genric emit tag into all the tags,
but it would probably end up having a method "get_data" or similar where
the actual information retrieval occured, leaving the do_enter method
untouched. Also, we have the risk that someone would indeed alter the
do_enter method so that it would operate slighly different than the other
emit tags, making it harder to use RMXL. One solution is to use the
plugin architecture available in the new parser.
The plugin system provides two different properties for tags; socket tags and
plugin tags. The socket tag is the tag that is using the plugins, so let us start
with making a tag with socket support. First we must
set the flag RXML.FLAG_SOCKET_TAG in the flags constant. Secondly we
call get_plugins() to receive a mapping with the available plugins.
| |
class TagDummy {
inherit RXML.Tag;
constant name = "dummy";
constant flags = RXML.FLAG_SOCKET_TAG;
class Frame {
inherit RXML.Frame;
array do_enter(RequestID id) {
mapping(string:RXML.Tag) plugins = get_plugins();
}
}
}
|
|
|
|
After that we are on our own. Once we get hold of the plugin tag objects
there is no defined API. It's up to the socket tag programmer to decide upon
an API. In the emit case it is quite straightforward; all plugins have a
method called get_dataset that returns an array of mappings. These mappings
will then be placed as the vars mapping when the emit tag iterates over the
array.
The plugin part is not really difficult either. Set the name to that of the
tag it is a plugin for and add a constant plugin_name naming the plugin, i.e. the
name that will be the tag objects index in the get_plugins response mapping.
The multab tag turned into an emit plugin might look like this:
| |
class TagEmitMulTab {
inherit RXML.Tag;
constant name = "emit";
constant plugin_name = "multab";
array(mapping(string:int)) get_dataset(mapping(string:string) args,
RequestID id) {
array response = ({});
for(int i; i<(int)args->to+1; i++)
response += ({ ([ "one":i,
"two":i*2,
"three":i*3,
"four":i*4,
"five":i*5,
]) });
return response;
}
}
|
|
|
Temporary tags
For this performance's last trick I have saved something special. This is one
of the more useful features in the new parser and certainly the most useful
in this article. Often you would like to have tags that are not globally defined,
e.g. the <td> and <tr> tags are pretty useless outside <table>
tags. Given the functional properties of RXML you would typically want to be able
to place any kind of RXML inside your tag and see if it produces "<td>":s
and "<tr>":s. Since you want all normal RXML to work as before, the new tags
have to be added to the parser, instead of e.g. be parsed by Parser.HTML in the code of
the top tag (the "<table>" tag).
Just as the context holds all the variables and controls variable overloading,
it holds the tags and controls tag overloading. The collection of currently active
tags is called a tagset. One can easily create a personal tagset like this:
| |
RXML.TagSet my_tagset = RXML.TagSet("The name of the tagset",
({ TagDummy(), TagEmitMulTab() }));
|
|
|
|
To add your tagset to the tagset in the context for the duration of the evaluation
of the contents of your tags, just put the tagset in additional_tags variable in the Frame
class. To locally replace the tagset, use the variable local_tags instead.
That is however seldom useful and is only used in one place in Roxen WebServer, that I know about.
| |
class TagTest {
inherit RXML.Tag;
constant name = "test";
class TagReverse {
inherit RXML.Tag;
constant name = "reverse";
class Frame {
inherit RXML.Frame;
array do_return(RequestID id) {
result = reverse(content);
}
}
}
RXML.TagSet internal = RXML.TagSet("TagTest.internal",
({ TagReverse() }));
class Frame {
inherit RXML.Frame;
RXML.TagSet additional_tags = internal;
array do_return(RequestID id) {
result = "START " + content + " STOP";
}
}
}
|
|
|
|
These kind of tags, if anything, shows that Martin Stjernholm likes
object orientation. You don't have to put the internal tag classes inside the
top tag class, but if you put them outside they must not be named Tag-something.
You can call the tagset whatever you want (the name is really only used for
human debugability purposes), but it is a convenient naming convention
to take the tag class name and put a ".internal" on the end.
Even more tag registration
I ended the first new-RXML-parser-article with some extra information about tag
registration, so why not do that again, to demonstrate that I'm not telling you
even close to everything in these articles. Those of you who have programmed tag
modules for Roxen Challenger 1.3 knows that in order to register a tag you had to
add a query_tag_callers-method (and a query_container_callers-method, since tags and
containers were treated as different things back then), who returned a mapping(string:function)
that mapped the tag name to its function. Though this was mostly trouble compared to
todays interface (you usually forgot to add your new tags to the query_tag_callers mapping),
it is of course powerful to be able to let the module decide upon loading what tags to register.
For newstyle tags this is done with the method query_tag_set that should return a tagset, as
discussed above. For simpletags the interface is in a way more complex than for newstyle tags. Then you
have the method query_simpletag_callers and query_simple_pi_tag_callers, both returning
a mapping from the tag names to an array where the first element is the flags associated to the tag and
the second element is the function representing the "do_return" function, although with different arguments.
An example of a simpletag registration:
| |
mapping(string:array(int|function) query_simpletag_callers() {
return ([ "my-emit" : ({ RXML.FLAG_SOCKET_TAG, my_emit_tag }),
"another_tag" : ({ RXML.FLAG_NONE, another_tag }) ]);
}
|
|
|
|
Those of you who really want to know the inner workings of Roxen WebServer should look in the
server/base_server/module.pike, which is inherited in all modules, and see how the default query methods
looks like. You would also like to take a look inside server/base_server/rxml.pike where the query methods
are actually called, and of course look in server/etc/modules/RXML.pmod/module.pmod where the RXML parser
lives. There is plenty of documentation in that file. I hope you all feel as exhausted reading this
article as it was for me to write it. Thank you for your time. Live long, and prosper.
|  |