| Getting started |
![]() Getting startedThis documentation should show to get started using the Chaperon parser within your own Java code. This example can also be found under src/java/net/sourceforge/chaperon/common/SimpleParser.java . The Chaperon components uses the Logger from the Apache Avalon project. This logger is optional, but you got plenty of information by the logger.
// Create logger
Logger logger = new ConsoleLogger(ConsoleLogger.LEVEL_WARN);
Next, you need a XML parser to read the lexicon and grammar files
// Create factory for SAX parser
SAXParserFactory parserFactoryImpl = SAXParserFactory.newInstance();
parserFactoryImpl.setNamespaceAware(true);
// Get a SAX parser
XMLReader xmlparser = parserFactoryImpl.newSAXParser().getXMLReader();
To read the lexicon file and create a lexicon model you can use the Configuration classes of the Apache Avalon project. But there are also other way to read these files.
// Create a lexicon model for a given lexicon file
NamespacedSAXConfigurationHandler handler = new NamespacedSAXConfigurationHandler();
xmlparser.setContentHandler(handler);
xmlparser.parse(lexiconFile.toString());
Configuration lexiconconfig = handler.getConfiguration();
Lexicon lexicon = LexiconFactory.createLexicon(lexiconconfig);
You use the LexicalAutomatonBuilder to compile the lexicon model into an automaton, which compress the information about the lexicon.
// Build a automaton from the lexicon model
LexicalAutomaton lexicalautomaton = (new LexicalAutomatonBuilder(lexicon,
logger)).getLexicalAutomaton();
Next thing, you need, is the processor, which executes the logic, which the automaton includes. In this case the LexicalProcessor analyse the text stream and cuts into pieces(tokens).
// Create a processor for the lexicon
LexicalProcessor lexer = new LexicalProcessor();
lexer.setLogger(logger);
lexer.setLexicalAutomaton(lexicalautomaton);
You create an automaton and processor for the grammar similar.
// Get a SAX parser
xmlparser = parserFactoryImpl.newSAXParser().getXMLReader();
// Create a grammar model for a given grammar file
handler = new NamespacedSAXConfigurationHandler();
xmlparser.setContentHandler(handler);
xmlparser.parse(grammarFile.toString());
Configuration grammarconfig = handler.getConfiguration();
Grammar grammar = GrammarFactory.createGrammar(grammarconfig);
// Build a automaton from the grammar model
ParserAutomaton parserautomaton = (new ParserAutomatonBuilder(grammar,
logger)).getParserAutomaton();
// Create a processor for the grammar
ParserProcessor parser = new ParserProcessor();
parser.setLogger(logger);
parser.setParserAutomaton(parserautomaton);
The ParserHandlerAdapter creates a SAX stream, which can be used write the output into an XML file. There are also other output format possible. For example, there exist a adapter to create an AST(Abstract Syntax Tree).
// Create an adapter to produce a SAX stream as result
ParserHandlerAdapter parseradapter = new ParserHandlerAdapter();
You need a serializer to write the SAX output to a file.
// Create factory for SAX transformer
SAXTransformerFactory transformerFactoryImpl = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
// Create serializer to write the SAX stream into a file
TransformerHandler serializer = transformerFactoryImpl.newTransformerHandler();
serializer.setResult(new StreamResult(outFile));
Now you connect these components.
// Connect components into a pipeline
lexer.setLexicalHandler(parser);
parser.setParserHandler(parseradapter);
parseradapter.setContentHandler(serializer);
The next part seems to be very complex. The only thing which makes the last part so complex, is the usage of the locator. The locator is a classes to hold the information of current point in the input file. For example if an error occurs, this classes help to find the error.
// Push text into this pipeline
// Create locator, which help to find possible syntax errors
TextLocator locator = new TextLocator();
locator.setURI(inFile.toURL().toString());
locator.setLineNumber(1);
locator.setColumnNumber(1);
lexer.handleLocator(locator);
// Start document
lexer.handleStartDocument();
LineNumberReader reader = new LineNumberReader(new InputStreamReader(new FileInputStream(inFile)));
String line, newline = null;
String separator = System.getProperty("line.separator");
// Push text
while (true)
{
if (newline==null)
line = reader.readLine();
else
line = newline;
if (line==null)
break;
newline = reader.readLine();
line = (newline!=null) ? line+separator : line;
locator.setLineNumber(reader.getLineNumber());
locator.setColumnNumber(1);
lexer.handleText(line);
if (newline==null)
break;
}
reader.close();
// End document
lexer.handleEndDocument();
At the end you can test this classes with the following command.
java -cp build/chaperon/chaperon.jar:lib/core/avalon-framework-4.1.3.jar:tools/centipede/lib/xalan.jar:tools/centipede/lib/xalan.jar
net.sourceforge.chaperon.common.SimpleParser src/resources/grammars/mathexp.xlex
src/resources/grammars/mathexp.xgrm src/resources/examples/mathexp.txt output.xml
A last hint, the output document should contain a namespace. For some reason, there doesn't occur a namspace declaration for some xalan versions, including the version, which comes with SUN's JDK 1.4 . |