|
| |
|
This article will introduce you to the wonderful world
of tag programming. Initially I was going to name this
article "The heart of RXML", but I'll reserve that one for
an article about the RXML parser. I'll try to be as verbose as
possible, at least in the beginning, so that those of you whose
pike programming experience are limited to our Pike
Tutorial can follow.
A Tag Module Outline
The first thing we need before we can begin to experiment with different
tag programming techniques is to create a tag module to place our code into.
Open a new file in your servers module search path, e.g. in local/modules and
enter the following.
|
|
| |
#include <module.h>
inherit "module";
constant module_type = MODULE_TAG;
constant module_name = "Experimental Tag Module";
constant module_doc = "A module dedicated to experiments on tag programming.";
|
|
|
|
The two first lines are needed in all modules and defines the module framework
as well as some helper functions. The three constants are simply identifiers. module_type
identifies this module as a tag module, module_name assigns a human readable name to
the module and module_doc adds a short description. The internal identifier of the
module will however be its filename, so it may not have the same name as another module,
even if it is in another directory.
When you have done this you should be able to add this module to any virtual server in
your Roxen WebServer, as with any other module. A good way to develop tag modules is to
add them to a server and reload the module between every modification that you want to try out.
Simpletags
Let's say we want a tag that does something trivial, like uppercasing the text within it,
then we add the following to the file.
| |
string simpletag_uppercase(string tagname, mapping(string:string) args,
string content, RequestID id) {
return upper_case(content);
}
|
|
|
|
Now reload the module and "<uppercase></>" will appear in the list of tags defined
in the module. The API is pretty straightforward. Any function that begins with "simpletag_"
will be registered as a tag, using the other half of the function name, with "_" replaced with "-",
as the tag name. To avoid getting a function that begins with "simpletag_" registered as a tag, that function can
be declared as static or private.
The argument tagname will contain the name of the tag,
"uppercase" in this case. The mapping args will contain all
the attributes given to the RXML tag. The string content will
contain the contents of the tag. To look closer at these variables you
can try the following code, e.g. compare the result between
<argument-test></argument-test> and
<argument-test/>.
| |
string simpletag_argument_test(string tagname, mapping(string:string) args,
string content, RequestID id) {
return sprintf("<pre>Tagname %s\nArguments %O\nContent %s</pre>", tagname, args, content);
}
|
|
|
Newstyle Tags
The Roxen WebServer has three different kinds of tags, the old tags, which is what was used in 1.3
and before. The old tags interface is very similar to the next type of tag API, the simpletag API.
Finally there is what we call the newstyle tag API, which really is what this article is all about.
The anatomy of a generic newstyle tag is two classes as follows.
| |
class TagExampleTag {
inherit RXML.Tag;
constant name = "exampletag";
class Frame {
inherit RXML.Frame;
}
}
|
|
|
|
A generic newstyle tag consists of two classes, one within the other, as follows. The outer class is named Tag*,
where * can be any name you choose. This triggers a similar magic as with simpletags, but the name of the class does not
affect the name of the resulting tag, as with simpletags. The inner class is always called Frame.
One can view the outer class as a tag declaration while the inner class is a tag instance in a page. The parser
only clones one tag object, but it can clone several frame objects if there are
several concurrent requests to the server. Hence to be thread safe, all variables that are private
for each tag instance in the page should be in the Frame class. Note that the Frame object might not be destroyed
between the parsing of different tags, so all variables should be initialized, if needed.
E.g. if the "global" variable in the following frame class has to be initialized to zero, you can not trust that
it is when you begin to use the object, as you otherwise can in Pike, since the object may be a recycled one.
| |
class Frame {
inherit RXML.Frame;
int x;
array do_return(RequestID id) {
x=0;
}
}
|
|
|
|
The following example does exactly the same as the simpletag uppercase in the previous section. Note
that the constant name defines the name of the tag, not the characters after Tag. Perhaps it might
seem weird to use this much code to accomplish the same job as the simpletag version. Well...
| |
class TagUpperCase {
inherit RXML.Tag;
constant name = "uppercase";
class Frame {
inherit RXML.Frame;
array do_return(RequestID id) {
result = upper_case(content);
return 0;
}
}
}
|
|
|
RXML.Tag
The outer tag class, as I mentioned above, defines static and global information about the tag.
We have already seen the name constant, but there is more. Apart from the name constant, the flags
field is the most used "extra feature" of the Tag class. With it you can adjust several aspects of
the parser behavior when parsing the tag. In the following list the most common and useful flags
are shown. To see the complete list, take a look at the inline documentation in the RXML parser,
found in server/etc/modules/RXML.pmod/module.pmod.
| |
Constants for the bit field RXML.Tag.flags:
FLAG_DEBUG
Write a lot of debug during the execution of the tag, showing what
type conversions are done, what callbacks are being called etc.
Note that the DEBUG define must be defined for the debug printouts to be
compiled in (normally enabled with the --debug flag to Roxen).
FLAG_PROC_INSTR
Flags this as a processing instruction tag (i.e. one parsed with
the <?name ... ?> syntax in XML). The string after the tag name to
the ending separator constitutes the content of the tag. Arguments
are not used. The real benefit of using processing instructions is that
all character combinations except for "?>" is allowed inside them. Since
"<", ">" and "&" isn't uncommon in programming languages such
as pike this tag type is well suited for <?pike ?>,
<?perl ?> and similar.
FLAG_EMPTY_ELEMENT
If set, the tag does not use any content. E.g. with an HTML parser
this defines whether the tag is a container or not, and in XML
parsing the parser will signal an error if the tag has anything
but "" as content.
FLAG_SOCKET_TAG
Declare the tag to be a socket tag, which accepts plugin tags. This will
be described in a coming article.
FLAG_DONT_RECOVER
If set, RXML errors are never recovered when parsing the content
in the tag. If any occurs, it will instead abort the execution of
this tag too to propagate the error to the parent tag.
When an error occurs, the parser aborts tags upward in the frame
stack until it comes to one which looks like it can accept an
error report in its content. The parser then reports the error
there and continues.
|
|
|
|
So, with the following modifications
the uppercase tag will be an processing instruction tag instead. That is,
if you add this code to the test module, <?uppercase foobar ?>
will render a "FOOBAR".
| |
class TagUpperCase {
inherit RXML.Tag;
constant name = "uppercase";
constant flags = RXML.FLAG_PROC_INSTR;
class Frame {
inherit RXML.Frame;
array do_return(RequestID id) {
result = upper_case(content);
return 0;
}
}
}
|
|
|
do_stuff
Now, let's have a closer look at the inner class, the Frame class. The frame
object is where all the real action is. In this introduction we will only look
at two simple but important methods, do_enter and do_return. do_enter is called
when the parser is moving into the tag, and hence there isn't any content available
to parse, and do_return is called when the end tag is found. If the tag is an
empty element tag both methods will still be called.
Both methods gets the RequestID object as argument and both are expected to return
either an array or zero. The array would contain either text that should be added to
the result or new frames that should be parsed. The preferred method for returning
data is however to set the result variable to the result. The arguments
are available in the mapping args and the content is available in the
variable content. These variables are part of the frame class that you
inherit into your own, so it's not as magical as it may look. I'll demonstrate all of this with this simple example.
It is once again the <uppercase> tag, but now it sets the variable form.foo
to the value given as attribute bar. E.g. the RXML <uppercase bar="roxen">
&form.foo;</uppercase> would return ROXEN,
leaving the value roxen in the variable form.foo:
| |
class TagUpperCase {
inherit RXML.Tag;
constant name = "uppercase";
class Frame {
inherit RXML.Frame;
array do_enter(RequestID id) {
RXML.set_var("form.foo", args->bar);
}
array do_return(RequestID id) {
result = upper_case(content);
return 0;
}
}
}
|
|
|
|
A more elegant solution would be to save the old form.foo value and
reinsert it in the do_return statement instead of leaving
garbage (roxen) in the form.foo variable:
| |
class TagUpperCase {
inherit RXML.Tag;
constant name = "uppercase";
class Frame {
inherit RXML.Frame;
mixed old_foo;
array do_enter(RequestID id) {
old_foo = get_var("foo", "form");
set_var("foo", args->bar, "form");
}
array do_return(RequestID id) {
user_set_var("foo", old_foo, "form");
result = upper_case(content);
return 0;
}
}
}
|
|
|
|
When dealing with variables that are decided by the user, which typically is the case,
you can use the RXML.user_get_var and RXML.user_set_var instead.
I'll illustrate how they are used with the following short example that implements a simple
set tag. You can use both the <myset variable="form.foo" value="bar"/> and
<myset variable="foo" scope="form" value="bar"/> notation.
RXML.parse_error and RXML.run_error are used to throw RXML errors
from a tag. These are described on a more abstract level in a previous article.
| |
string simpletag_myset(string tagname, mapping(string:string) args,
string content, RequestID id) {
if(!args->variable || !args->value)
RXML.parse_error("Missed attribute variable or value.");
RXML.user_set_var(args->variable, args->value, args->scope);
}
|
|
|
More tag registration
As you've seen the newstyle tag system gives the tag programmer many
new and powerful possibilities. Almost everything that I have told you
so far about newstyled tags is however possible to do with simpletags.
By naming an integer the same as your tag appended with "_flags" you're
able to control the flags of the generated tag.
| |
int simpletag_uppercase_flags = RXML.FLAG_PROC_INSTR;
string simpletag_uppercase(string tagname, mapping(string:string) args,
string content, RequestID id) {
return upper_case(content);
}
|
|
|
|
Incidentally, there is already a shortcut for the above construction called
simple_pi_tag, which is the same thing as a simpletag, but with the PROC_INSTR
flag set.
| |
string simple_pi_tag_uppercase(string tagname, mapping(string:string) args,
string content, RequestID id) {
return upper_case(content);
}
|
|
|
Wrap up, sortof
In this article you'll find very little or nothing concerning RXML variables,
scope creation, RXML types, sockets and plugins, iterating tags, streaming tags,
query_tag_callers, parse order manipulation and internal tags. It is hard to
find a part that is small enough to be covered in an article and that doesn't
have to many dependencies on other parts of the system. I hope I succeeded in
making this introduction understandable for almost everybody. We'll see how the follow
up is going to turn out...
|  |