| Using Apache Cocoon |
![]() Using Apache CocoonThe Chaperon project contains a generators and transformers for the Apache Cocoon project. This enables Cocoon to read and transform text documents. The projects holds three main components: TextGenerator, LexicalTransformer and the ParserTransformer. ![]() TextGeneratorThe TextGenerator are used to create a SAX stream by a text file. It simply read a the text file and put it into a XML element. To use the generator you must include the generator declaration into the sitemap like the following example.
<map:generators>
[...]
<map:generator name="text"
src="org.apache.cocoon.generation.TextGenerator"
logger="sitemap.generator.textgenerator"/>
[...]
</map:generator>
And can be used in a pipeline to generate the SAX streams.
<map:match pattern="*.xml">
<map:generate type="text" src="example/{1}.txt"/>
<map:serialize type="xml"/>
</map:match>
The generator generated following output: <text xmlns="http://chaperon.sourceforge.net/schema/text/1.0"> My text is not a text, if a text is a .... </text> ![]() LexicalTransformerThe LexicalTransformer used these text elements to analyse the text into lexemes. To use this transformer, you must the the following declaration for the sitemap.
<map:transformers>
[...]
<map:transformer name="lexer"
src="org.apache.cocoon.transformation.LexicalTransformer"
logger="sitemap.transformer.lexicaltransformer"/>
[...]
</map:transformers>
<map:pipelines>
<map:pipeline>
<map:match pattern="*.xml">
<map:generate type="text" src="example/{1}.txt"/>
<map:transform type="lexer" src="lexicon.xlex"/>
<map:serialize type="xml"/>
</map:match>
</map:pipeline>
</map:pipelines>
The output of the transformer has the following structure. <lexemes xmlns="http://chaperon.sourceforge.net/schema/lexemes/1.0"> <lexeme symbol="word" text="My"/> <lexeme symbol="word" text="text"/> <lexeme symbol="word" text="is"/> </lexemes> Optional you can use a parameter to specify the encoding, which should be used.
<map:transformer name="lexer"
src="org.apache.cocoon.transformation.LexicalTransformer"
logger="sitemap.transformer.lexicaltransformer">
<map.parameter name="encoding" value="ISO-8851_1"/>
<map:transformer>
Following list of parameters can be used.
![]() ParserTransformerThe ParserTransformer used these lexemes to build the syntax tree. Warning
Warning! With large grammars the transformer can take minutes to startup.
This time needs the transformer to build a parser automaton once-only
to be later as fast as possible.
<map:transformers>
[...]
<map:transformer name="parser"
src="org.apache.cocoon.transformation.ParserTransformer"
logger="sitemap.transformer.parsertransformer"/>
[...]
</map:transformers>
<map:pipelines>
<map:pipeline>
<map:match pattern="*.xml">
<map:generate type="text" src="example/{1}.txt"/>
<map:transform type="lexer" src="lexicon.xlex"/>
<map:transform type="parser" src="grammar.xgrm"/>
<map:serialize type="xml"/>
</map:match>
</map:pipeline>
</map:pipelines>
The output of the transformer has the following structure. <sentence xmlns="http://chaperon.sourceforge.net/schema/syntaxtree/1.0"> <preposition> <word>My</word> </preposition> <subject> <word>text</word> </subject> <verb> <word>is</world> </verb> [...] </sentence> Additional parameters are 'flatten', which used to decrease the deep of the produced XML hirachy. This parameter resolve nested elements, which had the same symbol.
<map:transformer name="parser"
src="net.sourceforge.chaperon.adapter.cocoon.ParserTransformer"
logger="sitemap.transformer.parsertransformer"/>
<map:parameter name="flatten" value="true"/>
</map:transformer>
Following list of parameters can be used.
|