The extended Chaperon parser

The new extended parser was introduced, because the standard parser design suffers in many cases. The goal of the new parser design was to simplify grammar writing and solve common problems of the standard parser.

Grammar writing

The standard parser design make a cut between lexical and syntactical analysis. This cause several problems. For example, you not able to differ between token in context of the parser state, which allows some token and some not.

Also, the productions in the standard parser doesn't allow Kleene operators like one-or-more, zero-or-more or the optional operator. Grammar writing should easy as writing a regular expression(REGEX).

This proposal for the grammar offers again a XML and text version. The XML version borrowed many element from Relax NG, and should be very similar.


In the past, many people had a lot of problem with state conflict, like the well known shift/reduce and reduce/reduce conflicts. And this will be more complicated if you use an scannerless approach like in this proposal. So, the Chaperon project decided to use a GLR parser based on Tomita's algorithm.

This algorithm allow to traverse both ways in a conflict, and find all syntax trees for the given grammars. The Chaperon parser include several disambiguity filters to select the best syntax tree in for than one tree is possible.


Nevertheless, this approach is very experimental and needs to be tested. Also is this algorithm always slower than the standard parser, and need to be improved.

We were glad to hear for further improvment, and reports.

by Stephan Michels