community.roxen.com
Not logged in Date: May 9, 2008
 DEMO  DOCS  PIKE
 COMMUNITY  DOWNLOAD
Home Articles The groove of RXML www.roxen.com

The groove of RXML

Author: Martin Nilsson <nilsson@roxen.com>
Last modified: 2001-09-16 1:37:44


(Or "Tags interacting" or "The liver of RXML" or "Together we are <strong>" or "Grown up tags doing things together") In my previous article The soul of RXML I showed some of the basic RXML tag syntax, from a module point of view, and showed a few ways to get the new tag system to mimic the old one. In this article I will begin to demonstrate why the new parser took half a year longer to develop than initially planned. Let's get serious.

Context

I somehow managed to write an entire article about how to program RXML tags without mentioning virtually anything about the parser, but we won't get much further without some basic understanding of the parser. Much like the WebServer keeps all variables and objects that concerns a specific request in the RequestID object, the parser keeps most of its states and variables in a context object. This object is then continually updated and questioned during the parsing and evaluation of a page.

The difference between parsing and evaluation is not something that really matters on this level of programming, and the terms are already mixed up in RXML. There is one tag called <noparse> while the tag with the inverse function is called <eval>. I'll try to clarify the distinction with an example. When the parser finds the character "&" in an XML document it knows it has found the beginning of an entity. The process of finding this is called parsing. Then the parser finds the ending ";" character, and the characters "form.foo" in between. Still parsing. That string can be divided into the scope name "form" and the variable name "foo". Still parsing. That entity can however be replaced with the value found in the variable foo in the scope form. Evaluation.

It is during the evaluation that the RXML parser asks the context if there is a variable "foo" in the scope "form". If we make a tag that creates a new scope, it will be put into the context and removed when we exit the scope of our tag. If we create a new scope with the same name as an already existing scope, we will replace that scope with our own during the evaluation of our tag. The following simple RXML example demonstrates temporary addition of a variable scope.

Martin Nilsson
The author, Martin Nilsson
<nilsson@roxen.com>



<pre>
<insert scopes="plain"/>
<emit source="values" values="a">
  <insert scopes="plain"/>
</emit>
<insert scopes="plain"/>
</pre>

Which will yield something similar to the following:



This examplifies temporarily addition of a variable scope.

client, cookie, form, page, roxen and var

  client, cookie, form, page, roxen, values and var

client, cookie, form, page, roxen and var

You can add scope="form" to the emit tag and verify, e.g. by using <insert variables="full" scope="form"/>, that overloading of scopes works.

Behind the magic

Let's have a look at the interface for creating new variable scopes from a tag module. We start with a dummy-tag:



class TagDummy {
  inherit RXML.Tag;
  constant name = "dummy";

  class Frame {
    inherit RXML.Frame;

    array do_enter(RequestID id) {
    }
  }
}

Unsurprisingly this tag doesn't do anything. Note that I don't have to have the empty do_enter function in the dummy tag for it to work. I have it there to make the next picture more instructive. In it I have added two variables, vars and scope_name, as well as code that initializes them.



class TagDummy {
  inherit RXML.Tag;
  constant name = "dummy";

  class Frame {
    inherit RXML.Frame;
    mapping(string:mixed) vars;
    string scope_name;

    array do_enter(RequestID id) {
      scope_name = args->scope || "dummy";
      vars = ([ "one":"eins", "two":"zwei" ]);
    }
  }
}

The string scope_name is of course the name of the scope, and the mapping vars is a mapping from the name of the variable to the content of it. In this case we have a static mapping that would make <dummy>&_.one; &dummy.two;</dummy> to produce eins zwei. One might think that it should be ok to declare the contents of the mapping vars only once, when it is defined. Remember from the previous article that frames might be reused, and someone might have altered the values of the vars mapping, e.g. by using the <set> tag.

Loops

And now for something completly different. The new tag API provides a way to easily create iterations. The most basic way is to repeat the content a predefined number of times, which is accomplished by setting the integer do_interate to the number of times the content should be iterated.



class TagLoop {
  inherit RXML.Tag;
  constant name = "loop";

  class Frame {
    inherit RXML.Frame;

    array do_enter(RequestID id) {
      do_iterate = (int)args->loops;
    }

    int do_iterate;
  }
}

Compare that implementation with the following:



class TagLoop {
  inherit RXML.Tag;
  constant name = "loop";

  class Frame {
    inherit RXML.Frame;

    array do_return(RequestID id) {
      result = content * (int)args->loops;
    }
  }
}

The latter implementation differs in functionality in two important ways. First the content is not re-evaluated for each iteration, which can be both good and bad, of course, depending on what you are trying to accomplish. Secondly, and not as apparent, the second solution can not stream. The new parser has, although currently not activated, support for streaming, and can in the first case iterate a few times, send the intermediate result to the client, iterate a few times more, and so on instead of processing and sending the whole batch in one big chunk. This can lead to a more responsive system, since you won't have to wait for the entire page to be parsed and evaluated before you can start receiving and seeing its contents render in your browser.

Instead of having a fix integer for the number of loops, one can use a function do_iterate() returning the number of loops that should be performed before it is called again. Returning zero means that the iterations are done. This is obviously more flexible, since we can now create conditional loops, e.g. while-loops, but we are also able to perform operations between each loop. This is how the <emit> tag works. The following little code creates an emit-like tag that outputs parts of the multiplication table.



class TagMultab {
  inherit RXML.Tag;
  constant name = "multab";

  class Frame {
    inherit RXML.Frame;

    mapping(string:int) vars;
    string scope_name;
    int counter;

    array do_enter(RequestID id) {
      scope_name = args->scope_name || "multab";
      args->to = (int)args->to+2;
      counter = 0;
    }

    int do_iterate(RequestID id) {
      vars = ([ "one":counter,
                "two":counter*2,
                "three":counter*3,
                "four":counter*4,
                "five":counter*5,
             ]);
      counter++;
      return counter<args->to;
    }
  }
}

Plugins

Assuming that you know a decent amount of RXML, you know that there are several emit sources. Why chose to have several emit "sources" instead of just making more tags? To begin with you would typically want to be able to perform a bunch of generic table operations on all these tags, such as limit the number of output rows, skip a certain amount of preceding rows, sort the rows etc. You could inherit a genric emit tag into all the tags, but it would probably end up having a method "get_data" or similar where the actual information retrieval occured, leaving the do_enter method untouched. Also, we have the risk that someone would indeed alter the do_enter method so that it would operate slighly different than the other emit tags, making it harder to use RMXL. One solution is to use the plugin architecture available in the new parser.

The plugin system provides two different properties for tags; socket tags and plugin tags. The socket tag is the tag that is using the plugins, so let us start with making a tag with socket support. First we must set the flag RXML.FLAG_SOCKET_TAG in the flags constant. Secondly we call get_plugins() to receive a mapping with the available plugins.



class TagDummy {
  inherit RXML.Tag;
  constant name = "dummy";
  constant flags = RXML.FLAG_SOCKET_TAG;

  class Frame {
    inherit RXML.Frame;

    array do_enter(RequestID id) {
      mapping(string:RXML.Tag) plugins = get_plugins();
    }
  }
}

After that we are on our own. Once we get hold of the plugin tag objects there is no defined API. It's up to the socket tag programmer to decide upon an API. In the emit case it is quite straightforward; all plugins have a method called get_dataset that returns an array of mappings. These mappings will then be placed as the vars mapping when the emit tag iterates over the array.

The plugin part is not really difficult either. Set the name to that of the tag it is a plugin for and add a constant plugin_name naming the plugin, i.e. the name that will be the tag objects index in the get_plugins response mapping. The multab tag turned into an emit plugin might look like this:



class TagEmitMulTab {
  inherit RXML.Tag;
  constant name = "emit";
  constant plugin_name = "multab";

  array(mapping(string:int)) get_dataset(mapping(string:string) args,
                                         RequestID id) {
    array response = ({});
    for(int i; i<(int)args->to+1; i++)
      response += ({ ([ "one":i,
                        "two":i*2,
                        "three":i*3,
                        "four":i*4,
                        "five":i*5,
      ]) });
    return response;
  }
}

Temporary tags

For this performance's last trick I have saved something special. This is one of the more useful features in the new parser and certainly the most useful in this article. Often you would like to have tags that are not globally defined, e.g. the <td> and <tr> tags are pretty useless outside <table> tags. Given the functional properties of RXML you would typically want to be able to place any kind of RXML inside your tag and see if it produces "<td>":s and "<tr>":s. Since you want all normal RXML to work as before, the new tags have to be added to the parser, instead of e.g. be parsed by Parser.HTML in the code of the top tag (the "<table>" tag).

Just as the context holds all the variables and controls variable overloading, it holds the tags and controls tag overloading. The collection of currently active tags is called a tagset. One can easily create a personal tagset like this:



  RXML.TagSet my_tagset = RXML.TagSet("The name of the tagset",
                                      ({ TagDummy(), TagEmitMulTab() }));

To add your tagset to the tagset in the context for the duration of the evaluation of the contents of your tags, just put the tagset in additional_tags variable in the Frame class. To locally replace the tagset, use the variable local_tags instead. That is however seldom useful and is only used in one place in Roxen WebServer, that I know about.



class TagTest {
  inherit RXML.Tag;
  constant name = "test";

  class TagReverse {
    inherit RXML.Tag;
    constant name = "reverse";

    class Frame {
      inherit RXML.Frame;

      array do_return(RequestID id) {
        result = reverse(content);
      }
    }
  }

  RXML.TagSet internal = RXML.TagSet("TagTest.internal",
                                     ({ TagReverse() }));

  class Frame {
    inherit RXML.Frame;
    RXML.TagSet additional_tags = internal;

    array do_return(RequestID id) {
      result = "START " + content + " STOP";
    }
  }
}

These kind of tags, if anything, shows that Martin Stjernholm likes object orientation. You don't have to put the internal tag classes inside the top tag class, but if you put them outside they must not be named Tag-something. You can call the tagset whatever you want (the name is really only used for human debugability purposes), but it is a convenient naming convention to take the tag class name and put a ".internal" on the end.

Even more tag registration

I ended the first new-RXML-parser-article with some extra information about tag registration, so why not do that again, to demonstrate that I'm not telling you even close to everything in these articles. Those of you who have programmed tag modules for Roxen Challenger 1.3 knows that in order to register a tag you had to add a query_tag_callers-method (and a query_container_callers-method, since tags and containers were treated as different things back then), who returned a mapping(string:function) that mapped the tag name to its function. Though this was mostly trouble compared to todays interface (you usually forgot to add your new tags to the query_tag_callers mapping), it is of course powerful to be able to let the module decide upon loading what tags to register.

For newstyle tags this is done with the method query_tag_set that should return a tagset, as discussed above. For simpletags the interface is in a way more complex than for newstyle tags. Then you have the method query_simpletag_callers and query_simple_pi_tag_callers, both returning a mapping from the tag names to an array where the first element is the flags associated to the tag and the second element is the function representing the "do_return" function, although with different arguments. An example of a simpletag registration:



mapping(string:array(int|function) query_simpletag_callers() {
  return ([ "my-emit" : ({ RXML.FLAG_SOCKET_TAG, my_emit_tag }),
            "another_tag" : ({ RXML.FLAG_NONE, another_tag }) ]);
}

Those of you who really want to know the inner workings of Roxen WebServer should look in the server/base_server/module.pike, which is inherited in all modules, and see how the default query methods looks like. You would also like to take a look inside server/base_server/rxml.pike where the query methods are actually called, and of course look in server/etc/modules/RXML.pmod/module.pmod where the RXML parser lives. There is plenty of documentation in that file. I hope you all feel as exhausted reading this article as it was for me to write it. Thank you for your time. Live long, and prosper.