Using Apache Cocoon |
![]() Using Apache CocoonThe Chaperon project contains a generators and transformers for the Apache Cocoon project. This enables Cocoon to read and transform text documents. The projects holds three main components: TextGenerator, LexicalTransformer and the ParserTransformer. ![]() TextGeneratorThe TextGenerator are used to create a SAX stream by a text file. It simply read a the text file and put it into a XML element. To use the generator you must include the generator declaration into the sitemap like the following example. <map:generators> [...] <map:generator name="text" src="org.apache.cocoon.generation.TextGenerator" logger="sitemap.generator.textgenerator"/> [...] </map:generator> And can be used in a pipeline to generate the SAX streams. <map:match pattern="*.xml"> <map:generate type="text" src="example/{1}.txt"/> <map:serialize type="xml"/> </map:match> The generator generated following output: <text xmlns="http://chaperon.sourceforge.net/schema/text/1.0"> My text is not a text, if a text is a .... </text> ![]() LexicalTransformerThe LexicalTransformer used these text elements to analyse the text into lexemes. To use this transformer, you must the the following declaration for the sitemap. <map:transformers> [...] <map:transformer name="lexer" src="org.apache.cocoon.transformation.LexicalTransformer" logger="sitemap.transformer.lexicaltransformer"/> [...] </map:transformers> <map:pipelines> <map:pipeline> <map:match pattern="*.xml"> <map:generate type="text" src="example/{1}.txt"/> <map:transform type="lexer" src="lexicon.xlex"/> <map:serialize type="xml"/> </map:match> </map:pipeline> </map:pipelines> The output of the transformer has the following structure. <lexemes xmlns="http://chaperon.sourceforge.net/schema/lexemes/1.0"> <lexeme symbol="word" text="My"/> <lexeme symbol="word" text="text"/> <lexeme symbol="word" text="is"/> </lexemes> Optional you can use a parameter to specify the encoding, which should be used. <map:transformer name="lexer" src="org.apache.cocoon.transformation.LexicalTransformer" logger="sitemap.transformer.lexicaltransformer"> <map.parameter name="encoding" value="ISO-8851_1"/> <map:transformer> Following list of parameters can be used.
![]() ParserTransformerThe ParserTransformer used these lexemes to build the syntax tree. Warning
Warning! With large grammars the transformer can take minutes to startup.
This time needs the transformer to build a parser automaton once-only
to be later as fast as possible.
<map:transformers> [...] <map:transformer name="parser" src="org.apache.cocoon.transformation.ParserTransformer" logger="sitemap.transformer.parsertransformer"/> [...] </map:transformers> <map:pipelines> <map:pipeline> <map:match pattern="*.xml"> <map:generate type="text" src="example/{1}.txt"/> <map:transform type="lexer" src="lexicon.xlex"/> <map:transform type="parser" src="grammar.xgrm"/> <map:serialize type="xml"/> </map:match> </map:pipeline> </map:pipelines> The output of the transformer has the following structure. <sentence xmlns="http://chaperon.sourceforge.net/schema/syntaxtree/1.0"> <preposition> <word>My</word> </preposition> <subject> <word>text</word> </subject> <verb> <word>is</world> </verb> [...] </sentence> Additional parameters are 'flatten', which used to decrease the deep of the produced XML hirachy. This parameter resolve nested elements, which had the same symbol. <map:transformer name="parser" src="net.sourceforge.chaperon.adapter.cocoon.ParserTransformer" logger="sitemap.transformer.parsertransformer"/> <map:parameter name="flatten" value="true"/> </map:transformer> Following list of parameters can be used.
|