XML lexicon format |
StructureThe grammar format define several definitions, which could be recognized from the input stream. It similar to the normal lexicon, which defines words for natural language. Each token is represent by a symbol and definition. <grammar> <definition name="...">[defintion content]</definition> <definition name="...">[defintion content]</definition> <definition name="...">[defintion content]</definition> </lexicon> DefinitionsEvery definition has an entry, and mapped to a name. The name identify the later name of the XML element. <definition name="Name of the XMl element"> [definition of the element] </definition> For the definition Chaperon uses a structure similar to Regex. It contains alternations, concatenations, characters classes, etc. AlternationsAlternation means that one of the contained elements must match. <definition name="Name of the XML element"> <choice> [element 1] [element 2] [element 3] </choice> </definition> ConcatenationsConcatenation means that all elements in a sequence must match. <definition name="Name of the XML element"> <sequence> [element 1] [element 2] [element 3] </sequence> </definition> CharactersThe character must match against the character in the input. Instead the attribute value, you can specify a single character by his unicode using "#" for the introduction. <definition name="Name of the XML element"> <char value="a"/> <char value="#13"/> </definition> Repeatable and optional subexpressionsIf a sub expression should be repeatable or optional, you can use zero-or-more, one-or-more and optional element. <definition name="Name of the XML element"> <zero-or-more> [subexpression] </zero-or-more> <one-or-more> [subexpression] </one-or-more> <optional> [subexpression] </optional> </definition> Nested elementIf you want a particular element from the grammar be nested in the definition, then use the element element. <definition name="Name of the XML element"> <element name="Name of the other XML element"/> </definition> Character classesA character class compares a character to the characters which this class contains. There are two options for a character class. Either a character class or a exclusive character class. The exclusive character class implies that the character should not match to any of the characters in the class. <definition name="Name of the XML element"> <class> [Characters, which should match] </class> <class exlusive="true"> [Characters, which shouldn't match] </class> </definition> The character class can contain two elements:
<definition name="Name of the XML element"> <class> <char value="a"/> <interval> <char value="e"/> <char value="z"/> </interval> </cclass> </definition> The characater interval defines a interval between two characters. Universal charactersThis character matches all characters including carriage return and line feed. <definition name="Name of the XML element"> <universal/> </definition> |