Convert grm files to xgrm/xlex files

Since the parser process only the XML formmat of the grammars, the text grammar files must be converted to the XML form. There exist two way to convert these grammars. The first way is to use a combination of ANT tasks, and second way work with Cocoon components. But in general these ways do the same, in a more or less efficient way.

The problem, which makes the process more complicated as could be, is that the text grammar files include context-depending tokens, in this case the Regex expressions. The Regex expressions do have different syntax as the rest of the grammar files. So the converting process must cut into two stages.

  • Parse the text grammar files.
  • Mark the Regex expressings for the second parsing process.
  • Parse the Regex expressions.
  • Convert this output into xlex files.
  • Convert this output into xgrm files.

The reason, why the converting process is cutted into pieces instead of writting one tool, is that it is possible to have your own grammar format. So if you want your own format you simply change the 'text grammar' grammar and the stylesheets.

Using ANT to convert these files

The converting process with ANT works the temporary builded files, which could be removed after the process. In this case these files will be called *.tmp1, *.tmp2 and *.tmp3 .

First the parser task must be registed.

   <taskdef name="chaperon"
            classname="net.sourceforge.chaperon.adapter.ant.ParserTask">
    <classpath refid="myclasspath"/>
   </taskdef>

In the next step the chaperon task will be used to parse the complete text grammar files using the grm.xlex/grm.xgrm, and the output will be writting into *.tmp1 files.

   <chaperon srcdir="${resource.dir}/grammars"
             destdir="${build.tmp}"
             lexicon="${resource.dir}/grammars/grm.xlex"
             flatten="true"
             msglevel="WARN"
             grammar="${resource.dir}/grammars/grm.xgrm">
    <include name="*.grm"/>
    <mapper type="glob" from="*.grm" to="*.tmp1"/>
   </chaperon>

Next, the regular expressions must be marked. This will be done with XSLT stylesheet, and the <xslt> task.

   <xslt basedir="${build.tmp}"
         destdir="${build.tmp}"
         extension=".tmp2"
         style="${resource.dir}/stylesheets/text4regex.xsl">
    <outputproperty name="encoding" value="iso8859_1"/>
    <include name="*.tmp1"/>
   </xslt>

In this step the marked part will be parsed with regex.xlex/regex.xgrm .

   <chaperon srcdir="${build.tmp}"
             destdir="${build.tmp}"
             lexicon="${resource.dir}/grammars/regex.xlex"
             flatten="true"
             msglevel="WARN"
             inputtype="xml"
             grammar="${resource.dir}/grammars/regex.xgrm">
    <include name="*.tmp2"/>
    <mapper type="glob" from="*.tmp2" to="*.tmp3"/>
   </chaperon>

In the last step, the output will be transformed into the destination formats *.xlex and *.xgrm .

   <xslt basedir="${build.tmp}"
         destdir="${build.grammars}"
         extension=".xgrm"
         style="${resource.dir}/stylesheets/grm2xgrm.xsl">
    <outputproperty name="encoding" value="iso8859_1"/>
    <include name="*.tmp3"/>
   </xslt>

   <xslt basedir="${build.tmp}"
         destdir="${build.grammars}"
         extension=".xlex"
         style="${resource.dir}/stylesheets/grm2xlex.xsl">
    <outputproperty name="encoding" value="iso8859_1"/>
    <include name="*.tmp3"/>
   </xslt>

Some XSLT libraries doesn't work in cases of handling namespaces such like older versions of Xalan or the package, which were shipped with SUNs JDK1.4. To solve this, you can specify an alternative transformer. For example, to use the XSLTC transformer of Xalan, you have to specify transformer="org.apache.xalan.xsltc.trax.TransformerFactoryImpl" within the chaperon task. Also it is sometimes important to declare the cache directory to increase the performance like with cachedir="${build.tmp}" . A working example for the converting process can be found in src/targets/project.xtarget in the 'grammars' target.

Using Cocoon to convert these files

Using Cocoon to convert these files on the fly is much simplier as using ANT. There are two pipelines used to convert the text grammar files into the XML grammar. The output will not be written to the harddisk, instead the output will be hold in the cache. So it is important that the pipelines are complete cachable, otherwise the grammars will be build for every time they used.

  <map:pipeline type="caching" internal-only="true">

   <map:match pattern="grammars/*.xlex">
    <map:generate  type="text"   src="grammars/{1}.grm"/>
    <map:transform type="lexer"  src="grammars/grm.xlex"/>
    <map:transform type="parser" src="grammars/grm.xgrm"/>
    <map:transform               src="stylesheets/text4regex.xsl"/>
    <map:transform type="lexer"  src="grammars/regex.xlex"/>
    <map:transform type="parser" src="grammars/regex.xgrm"/>
    <map:transform               src="stylesheets/grm2xlex.xsl"/>
    <map:serialize type="xml"/>
   </map:match>

   <map:match pattern="grammars/*.xgrm">
    <map:generate  type="text"   src="grammars/{1}.grm"/>
    <map:transform type="lexer"  src="grammars/grm.xlex"/>
    <map:transform type="parser" src="grammars/grm.xgrm"/>
    <map:transform               src="stylesheets/text4regex.xsl"/>
    <map:transform type="lexer"  src="grammars/regex.xlex"/>
    <map:transform type="parser" src="grammars/regex.xgrm"/>
    <map:transform               src="stylesheets/grm2xgrm.xsl"/>
    <map:serialize type="xml"/>
   </map:match>

  </map:pipeline>

To use the converted grammars you use the Cocoon potocol 'cocoon:/' .

   <map:match pattern="wiki/*.html">
    <map:generate  type="text"    src="wiki/{1}.txt"/>
    <map:transform type="lexer"   src="cocoon:/grammars/wiki.xlex"/>
    <map:transform type="parser"  src="cocoon:/grammars/wiki.xgrm"/>
    <map:transform                src="wiki2html.xsl" label="xdoc"/>
    <map:serialize type="html"/>
   </map:match>

For small grammars it is a good practice to use Cocoon to convert these files. For larger grammars is is better to convert these files outside of Cocoon, because they must be converted almost every time you restart the web application.

by Stephan Michels