Using Apache Cocoon |
Using Apache CocoonThe Chaperon project contains a generators and transformers for the Apache Cocoon project. This enables Cocoon to read and transform text documents. The projects holds three main components: TextGenerator, LexicalTransformer and the ParserTransformer. TextGeneratorThe TextGenerator are used to create a SAX stream by a text file. It simply read a the text file and put it into a XML element. To use the generator you must include the generator declaration into the sitemap like the following example. <map:generators> [...] <map:generator name="text" src="org.apache.cocoon.generation.TextGenerator" logger="sitemap.generator.textgenerator"/> [...] </map:generator> And can be used in a pipeline to generate the SAX streams. <map:match pattern="*.xml"> <map:generate type="text" src="example/{1}.txt"/> <map:serialize type="xml"/> </map:match> The generator generated following output: <text xmlns="http://chaperon.sourceforge.net/schema/text/1.0"> My text is not a text, if a text is a .... </text> LexicalTransformerThe LexicalTransformer used these text elements to analyse the text into lexemes. To use this transformer, you must the the following declaration for the sitemap. <map:transformers> [...] <map:transformer name="lexer" src="org.apache.cocoon.transformation.LexicalTransformer" logger="sitemap.transformer.lexicaltransformer"/> [...] </map:transformers> <map:pipelines> <map:pipeline> <map:match pattern="*.xml"> <map:generate type="text" src="example/{1}.txt"/> <map:transform type="lexer" src="lexicon.xlex"/> <map:serialize type="xml"/> </map:match> </map:pipeline> </map:pipelines> The output of the transformer has the following structure. <lexemes xmlns="http://chaperon.sourceforge.net/schema/lexemes/1.0"> <lexeme symbol="word" text="My"/> <lexeme symbol="word" text="text"/> <lexeme symbol="word" text="is"/> </lexemes> Optional you can use a parameter to specify the encoding, which should be used. <map:transformer name="lexer" src="org.apache.cocoon.transformation.LexicalTransformer" logger="sitemap.transformer.lexicaltransformer"> <map.parameter name="encoding" value="ISO-8851_1"/> <map:transformer> Following list of parameters can be used.
ParserTransformerThe ParserTransformer used these lexemes to build the syntax tree. Warning
Warning! With large grammars the transformer can take minutes to startup.
This time needs the transformer to build a parser automaton once-only
to be later as fast as possible.
<map:transformers> [...] <map:transformer name="parser" src="org.apache.cocoon.transformation.ParserTransformer" logger="sitemap.transformer.parsertransformer"/> [...] </map:transformers> <map:pipelines> <map:pipeline> <map:match pattern="*.xml"> <map:generate type="text" src="example/{1}.txt"/> <map:transform type="lexer" src="lexicon.xlex"/> <map:transform type="parser" src="grammar.xgrm"/> <map:serialize type="xml"/> </map:match> </map:pipeline> </map:pipelines> The output of the transformer has the following structure. <sentence xmlns="http://chaperon.sourceforge.net/schema/syntaxtree/1.0"> <preposition> <word>My</word> </preposition> <subject> <word>text</word> </subject> <verb> <word>is</world> </verb> [...] </sentence> Additional parameters are 'flatten', which used to decrease the deep of the produced XML hirachy. This parameter resolve nested elements, which had the same symbol. <map:transformer name="parser" src="net.sourceforge.chaperon.adapter.cocoon.ParserTransformer" logger="sitemap.transformer.parsertransformer"/> <map:parameter name="flatten" value="true"/> </map:transformer> Following list of parameters can be used.
|