community.roxen.com
Not logged in Date: May 9, 2008
 DEMO  DOCS  PIKE
 COMMUNITY  DOWNLOAD
Home Articles The soul of RXML www.roxen.com

The soul of RXML

Author: Martin Nilsson <nilsson@roxen.com>
Last modified: 2001-09-16 1:37:44


This article will introduce you to the wonderful world of tag programming. Initially I was going to name this article "The heart of RXML", but I'll reserve that one for an article about the RXML parser. I'll try to be as verbose as possible, at least in the beginning, so that those of you whose pike programming experience are limited to our Pike Tutorial can follow.

A Tag Module Outline

The first thing we need before we can begin to experiment with different tag programming techniques is to create a tag module to place our code into. Open a new file in your servers module search path, e.g. in local/modules and enter the following.

Martin Nilsson
The author, Martin Nilsson
<nilsson@roxen.com>



#include <module.h>
inherit "module";

constant module_type = MODULE_TAG;
constant module_name = "Experimental Tag Module";
constant module_doc  = "A module dedicated to experiments on tag programming.";

The two first lines are needed in all modules and defines the module framework as well as some helper functions. The three constants are simply identifiers. module_type identifies this module as a tag module, module_name assigns a human readable name to the module and module_doc adds a short description. The internal identifier of the module will however be its filename, so it may not have the same name as another module, even if it is in another directory.

When you have done this you should be able to add this module to any virtual server in your Roxen WebServer, as with any other module. A good way to develop tag modules is to add them to a server and reload the module between every modification that you want to try out.

Simpletags

Let's say we want a tag that does something trivial, like uppercasing the text within it, then we add the following to the file.



string simpletag_uppercase(string tagname, mapping(string:string) args,
                           string content, RequestID id) {
  return upper_case(content);
}

Now reload the module and "<uppercase></>" will appear in the list of tags defined in the module. The API is pretty straightforward. Any function that begins with "simpletag_" will be registered as a tag, using the other half of the function name, with "_" replaced with "-", as the tag name. To avoid getting a function that begins with "simpletag_" registered as a tag, that function can be declared as static or private.

The argument tagname will contain the name of the tag, "uppercase" in this case. The mapping args will contain all the attributes given to the RXML tag. The string content will contain the contents of the tag. To look closer at these variables you can try the following code, e.g. compare the result between <argument-test></argument-test> and <argument-test/>.



string simpletag_argument_test(string tagname, mapping(string:string) args,
                               string content, RequestID id) {
  return sprintf("<pre>Tagname %s\nArguments %O\nContent %s</pre>", tagname, args, content);
}

Newstyle Tags

The Roxen WebServer has three different kinds of tags, the old tags, which is what was used in 1.3 and before. The old tags interface is very similar to the next type of tag API, the simpletag API. Finally there is what we call the newstyle tag API, which really is what this article is all about. The anatomy of a generic newstyle tag is two classes as follows.



class TagExampleTag {
  inherit RXML.Tag;
  constant name = "exampletag";

  class Frame {
    inherit RXML.Frame;
  }
}

A generic newstyle tag consists of two classes, one within the other, as follows. The outer class is named Tag*, where * can be any name you choose. This triggers a similar magic as with simpletags, but the name of the class does not affect the name of the resulting tag, as with simpletags. The inner class is always called Frame. One can view the outer class as a tag declaration while the inner class is a tag instance in a page. The parser only clones one tag object, but it can clone several frame objects if there are several concurrent requests to the server. Hence to be thread safe, all variables that are private for each tag instance in the page should be in the Frame class. Note that the Frame object might not be destroyed between the parsing of different tags, so all variables should be initialized, if needed. E.g. if the "global" variable in the following frame class has to be initialized to zero, you can not trust that it is when you begin to use the object, as you otherwise can in Pike, since the object may be a recycled one.



class Frame {
  inherit RXML.Frame;
  int x;

  array do_return(RequestID id) {
    x=0;
  }
}

The following example does exactly the same as the simpletag uppercase in the previous section. Note that the constant name defines the name of the tag, not the characters after Tag. Perhaps it might seem weird to use this much code to accomplish the same job as the simpletag version. Well...



class TagUpperCase {
  inherit RXML.Tag;
  constant name = "uppercase";

  class Frame {
    inherit RXML.Frame;

    array do_return(RequestID id) {
      result = upper_case(content);
      return 0;
    }
  }
}

RXML.Tag

The outer tag class, as I mentioned above, defines static and global information about the tag. We have already seen the name constant, but there is more. Apart from the name constant, the flags field is the most used "extra feature" of the Tag class. With it you can adjust several aspects of the parser behavior when parsing the tag. In the following list the most common and useful flags are shown. To see the complete list, take a look at the inline documentation in the RXML parser, found in server/etc/modules/RXML.pmod/module.pmod.



Constants for the bit field RXML.Tag.flags:

FLAG_DEBUG
Write a lot of debug during the execution of the tag, showing what
type conversions are done, what callbacks are being called etc.
Note that the DEBUG define must be defined for the debug printouts to be
compiled in (normally enabled with the --debug flag to Roxen).

FLAG_PROC_INSTR
Flags this as a processing instruction tag (i.e. one parsed with
the <?name ... ?> syntax in XML). The string after the tag name to
the ending separator constitutes the content of the tag. Arguments
are not used. The real benefit of using processing instructions is that
all character combinations except for "?>" is allowed inside them. Since
"<", ">" and "&" isn't uncommon in programming languages such
as pike this tag type is well suited for <?pike ?>,
<?perl ?> and similar.

FLAG_EMPTY_ELEMENT
If set, the tag does not use any content. E.g. with an HTML parser
this defines whether the tag is a container or not, and in XML
parsing the parser will signal an error if the tag has anything
but "" as content.
   
FLAG_SOCKET_TAG
Declare the tag to be a socket tag, which accepts plugin tags. This will
be described in a coming article.

FLAG_DONT_RECOVER
If set, RXML errors are never recovered when parsing the content
in the tag. If any occurs, it will instead abort the execution of
this tag too to propagate the error to the parent tag.

When an error occurs, the parser aborts tags upward in the frame
stack until it comes to one which looks like it can accept an
error report in its content. The parser then reports the error
there and continues.

So, with the following modifications the uppercase tag will be an processing instruction tag instead. That is, if you add this code to the test module, <?uppercase foobar ?> will render a "FOOBAR".



class TagUpperCase {
  inherit RXML.Tag;
  constant name = "uppercase";
  constant flags = RXML.FLAG_PROC_INSTR;

  class Frame {
    inherit RXML.Frame;

    array do_return(RequestID id) {
      result = upper_case(content);
      return 0;
    }
  }
}

do_stuff

Now, let's have a closer look at the inner class, the Frame class. The frame object is where all the real action is. In this introduction we will only look at two simple but important methods, do_enter and do_return. do_enter is called when the parser is moving into the tag, and hence there isn't any content available to parse, and do_return is called when the end tag is found. If the tag is an empty element tag both methods will still be called.

Both methods gets the RequestID object as argument and both are expected to return either an array or zero. The array would contain either text that should be added to the result or new frames that should be parsed. The preferred method for returning data is however to set the result variable to the result. The arguments are available in the mapping args and the content is available in the variable content. These variables are part of the frame class that you inherit into your own, so it's not as magical as it may look. I'll demonstrate all of this with this simple example. It is once again the <uppercase> tag, but now it sets the variable form.foo to the value given as attribute bar. E.g. the RXML <uppercase bar="roxen"> &form.foo;</uppercase> would return ROXEN, leaving the value roxen in the variable form.foo:



class TagUpperCase {
  inherit RXML.Tag;
  constant name = "uppercase";

  class Frame {
    inherit RXML.Frame;

    array do_enter(RequestID id) {
      RXML.set_var("form.foo", args->bar);
    }

    array do_return(RequestID id) {
      result = upper_case(content);
      return 0;
    }
  }
}

A more elegant solution would be to save the old form.foo value and reinsert it in the do_return statement instead of leaving garbage (roxen) in the form.foo variable:



class TagUpperCase {
  inherit RXML.Tag;
  constant name = "uppercase";

  class Frame {
    inherit RXML.Frame;
    mixed old_foo;

    array do_enter(RequestID id) {
      old_foo = get_var("foo", "form");
      set_var("foo", args->bar, "form");
    }

    array do_return(RequestID id) {
      user_set_var("foo", old_foo, "form");
      result = upper_case(content);
      return 0;
    }
  }
}

When dealing with variables that are decided by the user, which typically is the case, you can use the RXML.user_get_var and RXML.user_set_var instead. I'll illustrate how they are used with the following short example that implements a simple set tag. You can use both the <myset variable="form.foo" value="bar"/> and <myset variable="foo" scope="form" value="bar"/> notation. RXML.parse_error and RXML.run_error are used to throw RXML errors from a tag. These are described on a more abstract level in a previous article.



string simpletag_myset(string tagname, mapping(string:string) args,
                       string content, RequestID id) {
  if(!args->variable || !args->value)
    RXML.parse_error("Missed attribute variable or value.");
  RXML.user_set_var(args->variable, args->value, args->scope);
}

More tag registration

As you've seen the newstyle tag system gives the tag programmer many new and powerful possibilities. Almost everything that I have told you so far about newstyled tags is however possible to do with simpletags. By naming an integer the same as your tag appended with "_flags" you're able to control the flags of the generated tag.



int simpletag_uppercase_flags = RXML.FLAG_PROC_INSTR;
string simpletag_uppercase(string tagname, mapping(string:string) args,
                           string content, RequestID id) {
  return upper_case(content);
}

Incidentally, there is already a shortcut for the above construction called simple_pi_tag, which is the same thing as a simpletag, but with the PROC_INSTR flag set.



string simple_pi_tag_uppercase(string tagname, mapping(string:string) args,
                           string content, RequestID id) {
  return upper_case(content);
}

Wrap up, sortof

In this article you'll find very little or nothing concerning RXML variables, scope creation, RXML types, sockets and plugins, iterating tags, streaming tags, query_tag_callers, parse order manipulation and internal tags. It is hard to find a part that is small enough to be covered in an article and that doesn't have to many dependencies on other parts of the system. I hope I succeeded in making this introduction understandable for almost everybody. We'll see how the follow up is going to turn out...