XML lexicon format

Structure

The grammar format define several definitions, which could be recognized from the input stream. It similar to the normal lexicon, which defines words for natural language. Each token is represent by a symbol and definition.

<grammar>
  <definition name="...">[defintion content]</definition>
  <definition name="...">[defintion content]</definition>
  <definition name="...">[defintion content]</definition>
</lexicon>

Definitions

Every definition has an entry, and mapped to a name. The name identify the later name of the XML element.

<definition name="Name of the XMl element">
 [definition of the element]
</definition>

For the definition Chaperon uses a structure similar to Regex. It contains alternations, concatenations, characters classes, etc.

Alternations

Alternation means that one of the contained elements must match.

<definition name="Name of the XML element">
 <choice>
  [element 1]
  [element 2]
  [element 3]
 </choice>
</definition>

Concatenations

Concatenation means that all elements in a sequence must match.

<definition name="Name of the XML element">
 <sequence>
  [element 1]
  [element 2]
  [element 3]
 </sequence>
</definition>

Characters

The character must match against the character in the input. Instead the attribute value, you can specify a single character by his unicode using "#" for the introduction.

<definition name="Name of the XML element">
 <char value="a"/>
 <char value="#13"/>
</definition>

Repeatable and optional subexpressions

If a sub expression should be repeatable or optional, you can use zero-or-more, one-or-more and optional element.

<definition name="Name of the XML element">
 <zero-or-more>
  [subexpression]
 </zero-or-more>

 <one-or-more>
  [subexpression]
 </one-or-more>

 <optional>
  [subexpression]
 </optional>
</definition>

Nested element

If you want a particular element from the grammar be nested in the definition, then use the element element.

<definition name="Name of the XML element">
 <element name="Name of the other XML element"/>
</definition>

Character classes

A character class compares a character to the characters which this class contains. There are two options for a character class. Either a character class or a exclusive character class. The exclusive character class implies that the character should not match to any of the characters in the class.

<definition name="Name of the XML element">
 <class>
  [Characters, which should match]
 </class>

 <class exlusive="true">
  [Characters, which shouldn't match]
 </class>
</definition>

The character class can contain two elements:

Characters
Character intervals

<definition name="Name of the XML element">
 <class>
  <char value="a"/>
  <interval>
   <char value="e"/>
   <char value="z"/>
  </interval>
 </cclass>
</definition>

The characater interval defines a interval between two characters.

Universal characters

This character matches all characters including carriage return and line feed.

<definition name="Name of the XML element">
 <universal/>
</definition>

by Stephan Michels

Copyright © 2003 Chaperon Project. All rights reserved.