Getting started |
Getting startedThis documentation should show to get started using the Chaperon parser within your own Java code. This example can also be found under src/java/net/sourceforge/chaperon/common/SimpleParser.java . The Chaperon components uses the Logger from the Apache Avalon project. This logger is optional, but you got plenty of information by the logger. // Create logger Logger logger = new ConsoleLogger(ConsoleLogger.LEVEL_WARN); Next, you need a XML parser to read the lexicon and grammar files // Create factory for SAX parser SAXParserFactory parserFactoryImpl = SAXParserFactory.newInstance(); parserFactoryImpl.setNamespaceAware(true); // Get a SAX parser XMLReader xmlparser = parserFactoryImpl.newSAXParser().getXMLReader(); To read the lexicon file and create a lexicon model you can use the Configuration classes of the Apache Avalon project. But there are also other way to read these files. // Create a lexicon model for a given lexicon file NamespacedSAXConfigurationHandler handler = new NamespacedSAXConfigurationHandler(); xmlparser.setContentHandler(handler); xmlparser.parse(lexiconFile.toString()); Configuration lexiconconfig = handler.getConfiguration(); Lexicon lexicon = LexiconFactory.createLexicon(lexiconconfig); You use the LexicalAutomatonBuilder to compile the lexicon model into an automaton, which compress the information about the lexicon. // Build a automaton from the lexicon model LexicalAutomaton lexicalautomaton = (new LexicalAutomatonBuilder(lexicon, logger)).getLexicalAutomaton(); Next thing, you need, is the processor, which executes the logic, which the automaton includes. In this case the LexicalProcessor analyse the text stream and cuts into pieces(tokens). // Create a processor for the lexicon LexicalProcessor lexer = new LexicalProcessor(); lexer.setLogger(logger); lexer.setLexicalAutomaton(lexicalautomaton); You create an automaton and processor for the grammar similar. // Get a SAX parser xmlparser = parserFactoryImpl.newSAXParser().getXMLReader(); // Create a grammar model for a given grammar file handler = new NamespacedSAXConfigurationHandler(); xmlparser.setContentHandler(handler); xmlparser.parse(grammarFile.toString()); Configuration grammarconfig = handler.getConfiguration(); Grammar grammar = GrammarFactory.createGrammar(grammarconfig); // Build a automaton from the grammar model ParserAutomaton parserautomaton = (new ParserAutomatonBuilder(grammar, logger)).getParserAutomaton(); // Create a processor for the grammar ParserProcessor parser = new ParserProcessor(); parser.setLogger(logger); parser.setParserAutomaton(parserautomaton); The ParserHandlerAdapter creates a SAX stream, which can be used write the output into an XML file. There are also other output format possible. For example, there exist a adapter to create an AST(Abstract Syntax Tree). // Create an adapter to produce a SAX stream as result ParserHandlerAdapter parseradapter = new ParserHandlerAdapter(); You need a serializer to write the SAX output to a file. // Create factory for SAX transformer SAXTransformerFactory transformerFactoryImpl = (SAXTransformerFactory) SAXTransformerFactory.newInstance(); // Create serializer to write the SAX stream into a file TransformerHandler serializer = transformerFactoryImpl.newTransformerHandler(); serializer.setResult(new StreamResult(outFile)); Now you connect these components. // Connect components into a pipeline lexer.setLexicalHandler(parser); parser.setParserHandler(parseradapter); parseradapter.setContentHandler(serializer); The next part seems to be very complex. The only thing which makes the last part so complex, is the usage of the locator. The locator is a classes to hold the information of current point in the input file. For example if an error occurs, this classes help to find the error. // Push text into this pipeline // Create locator, which help to find possible syntax errors TextLocator locator = new TextLocator(); locator.setURI(inFile.toURL().toString()); locator.setLineNumber(1); locator.setColumnNumber(1); lexer.handleLocator(locator); // Start document lexer.handleStartDocument(); LineNumberReader reader = new LineNumberReader(new InputStreamReader(new FileInputStream(inFile))); String line, newline = null; String separator = System.getProperty("line.separator"); // Push text while (true) { if (newline==null) line = reader.readLine(); else line = newline; if (line==null) break; newline = reader.readLine(); line = (newline!=null) ? line+separator : line; locator.setLineNumber(reader.getLineNumber()); locator.setColumnNumber(1); lexer.handleText(line); if (newline==null) break; } reader.close(); // End document lexer.handleEndDocument(); At the end you can test this classes with the following command. java -cp build/chaperon/chaperon.jar:lib/core/avalon-framework-4.1.3.jar:tools/centipede/lib/xalan.jar:tools/centipede/lib/xalan.jar net.sourceforge.chaperon.common.SimpleParser src/resources/grammars/mathexp.xlex src/resources/grammars/mathexp.xgrm src/resources/examples/mathexp.txt output.xml A last hint, the output document should contain a namespace. For some reason, there doesn't occur a namspace declaration for some xalan versions, including the version, which comes with SUN's JDK 1.4 . |