Using Jakarta Ant
The Chaperon project includes a Ant task. With this task you can parse a branch of
text files and convert them to XML. You specify the text file with the attribute srcdir
and include attribute. With the given lexicon and grammar files the text text file
will lexical analysed and parsed to XML files. If you doesn't specify a grammar
the input files will only be lexical analysed. The task supports mappers to map the
input files to the output files.
<taskdef name="chaperon"
classname="net.sourceforge.chaperon.adapter.ant.ParserTask"/>
<chaperon srcdir="src/examples/"
destdir="build/"
lexicon="src/grammars/test1.xlex"
grammar="src/grammars/test1.xgrm">
<include name="*.txt"/>
<mapper type="glob" from="*.txt" to="*.xml"/>
</chaperon>
Following list of attributes can be used.
Attribute | Description | Required |
srcdir | Location of the input files. | yes |
destdir | Location to store the XML files. | yes |
cachedir | Location to store intermediate files, which
increase the performance. | no |
lexicon | Location of the lexicon file. | yes |
grammar | Location of the grammar file. If you don't specify the attribute, the
files will only lexical analysed. | no |
includes | Comma-separated list of patterns of files that must be included; all
files are included when omitted. | no |
excludes | Comma-separated list of patterns of files that must be excluded; no files
(except default excludes) are excluded when omitted. | no |
indent | If the generated XML file should be indented | no |
msglevel | To specify the logging level (DEBUG / INFO / WARN / ERROR / FATAL) | no |
encoding | Encoding for the text input documents. | no |
inputtype | If the task should consume text file or XML file. If you choosing
XML files, the task will dispatch text fragment mark with <text>
elements.(text / xml) | no |
flatten | If the task should produce a more flatten XML
hirachy, which means elements which the same name
will be collapsed | no |
And following nested elements can be used.
Element | Attributes | Description | Required |
include | name | Pattern of files that must be included. | no |
exclude | name | Pattern of files that must be excluded. | no |
mapper | type, from, to | Mapping the input file to the output files. | no |
by Stephan Michels