Getting started

This documentation should show to get started using the Chaperon parser within your own Java code. This example can also be found under src/java/net/sourceforge/chaperon/common/SimpleParser.java .

The Chaperon components uses the Logger from the Apache Avalon project. This logger is optional, but you got plenty of information by the logger.

    // Create logger
    Logger logger = new ConsoleLogger(ConsoleLogger.LEVEL_WARN);

Next, you need a XML parser to read the lexicon and grammar files

    // Create factory for SAX parser
    SAXParserFactory parserFactoryImpl = SAXParserFactory.newInstance();
    parserFactoryImpl.setNamespaceAware(true);
    
    // Get a SAX parser
    XMLReader xmlparser = parserFactoryImpl.newSAXParser().getXMLReader();

To read the lexicon file and create a lexicon model you can use the Configuration classes of the Apache Avalon project. But there are also other way to read these files.

    // Create a lexicon model for a given lexicon file
    NamespacedSAXConfigurationHandler handler = new NamespacedSAXConfigurationHandler();
    xmlparser.setContentHandler(handler);
    xmlparser.parse(lexiconFile.toString());
    Configuration lexiconconfig = handler.getConfiguration();
    Lexicon lexicon = LexiconFactory.createLexicon(lexiconconfig);

You use the LexicalAutomatonBuilder to compile the lexicon model into an automaton, which compress the information about the lexicon.

    // Build a automaton from the lexicon model
    LexicalAutomaton lexicalautomaton = (new LexicalAutomatonBuilder(lexicon,
            logger)).getLexicalAutomaton();

Next thing, you need, is the processor, which executes the logic, which the automaton includes. In this case the LexicalProcessor analyse the text stream and cuts into pieces(tokens).

    // Create a processor for the lexicon
    LexicalProcessor lexer = new LexicalProcessor();
    lexer.setLogger(logger);
    lexer.setLexicalAutomaton(lexicalautomaton);

You create an automaton and processor for the grammar similar.

    // Get a SAX parser
    xmlparser = parserFactoryImpl.newSAXParser().getXMLReader();
    
    // Create a grammar model for a given grammar file
    handler = new NamespacedSAXConfigurationHandler();
    xmlparser.setContentHandler(handler);
    xmlparser.parse(grammarFile.toString());
    Configuration grammarconfig = handler.getConfiguration();
    Grammar grammar = GrammarFactory.createGrammar(grammarconfig);
    
    // Build a automaton from the grammar model
    ParserAutomaton parserautomaton = (new ParserAutomatonBuilder(grammar,
              logger)).getParserAutomaton();

    // Create a processor for the grammar
    ParserProcessor parser = new ParserProcessor();
    parser.setLogger(logger);
    parser.setParserAutomaton(parserautomaton);

The ParserHandlerAdapter creates a SAX stream, which can be used write the output into an XML file. There are also other output format possible. For example, there exist a adapter to create an AST(Abstract Syntax Tree).

    // Create an adapter to produce a SAX stream as result
    ParserHandlerAdapter parseradapter = new ParserHandlerAdapter();

You need a serializer to write the SAX output to a file.

    // Create factory for SAX transformer
    SAXTransformerFactory transformerFactoryImpl = (SAXTransformerFactory) SAXTransformerFactory.newInstance();

    // Create serializer to write the SAX stream into a file
    TransformerHandler serializer = transformerFactoryImpl.newTransformerHandler();
    serializer.setResult(new StreamResult(outFile));

Now you connect these components.

    // Connect components into a pipeline
    lexer.setLexicalHandler(parser);
    parser.setParserHandler(parseradapter);
    parseradapter.setContentHandler(serializer);

The next part seems to be very complex. The only thing which makes the last part so complex, is the usage of the locator. The locator is a classes to hold the information of current point in the input file. For example if an error occurs, this classes help to find the error.

    // Push text into this pipeline
    // Create locator, which help to find possible syntax errors
    TextLocator locator = new TextLocator();
    locator.setURI(inFile.toURL().toString());
    locator.setLineNumber(1);
    locator.setColumnNumber(1);
    lexer.handleLocator(locator);
    
    // Start document
    lexer.handleStartDocument();
    
    LineNumberReader reader = new LineNumberReader(new InputStreamReader(new FileInputStream(inFile)));

    String line, newline = null;
    String separator = System.getProperty("line.separator");
    
    // Push text 
    while (true)
    {
      if (newline==null)
        line = reader.readLine();
      else
        line = newline;

      if (line==null)
        break;

      newline = reader.readLine();

      line = (newline!=null) ? line+separator : line;

      locator.setLineNumber(reader.getLineNumber());
      locator.setColumnNumber(1);
      lexer.handleText(line);

      if (newline==null)
        break;
    }
    
    reader.close();

    // End document
    lexer.handleEndDocument();

At the end you can test this classes with the following command.

    java -cp build/chaperon/chaperon.jar:lib/core/avalon-framework-4.1.3.jar:tools/centipede/lib/xalan.jar:tools/centipede/lib/xalan.jar
      net.sourceforge.chaperon.common.SimpleParser src/resources/grammars/mathexp.xlex
      src/resources/grammars/mathexp.xgrm src/resources/examples/mathexp.txt output.xml

A last hint, the output document should contain a namespace. For some reason, there doesn't occur a namspace declaration for some xalan versions, including the version, which comes with SUN's JDK 1.4 .

by Stephan Michels

Copyright © 2003 Chaperon Project. All rights reserved.