Text grammar format |
![]() IntroductionThe XML presentation of the grammar is not intended to be human readable/writeable, but rather to be easy readable for the Chaperon components. It is recommended to use this text grammar format and convert it to the XML presentation. ![]() StructureThe text grammar consists of two parts. The first part contains the token definitions and special instruction declarations. The other part contains the productions. [tokens] [special instructions] %start "Symbol of the production" ; %% [productions] The declaration "%start" declares the root production for the result document. ![]() Lexical tokensThe tokens are similar to the tokens of the XML grammar. For token definition the text grammar makes use of regular expressions %token WORD "[A-Za-z][a-z]*"; If you are using '%left' or '%right' instead of '%token', the token gets the a left or right associativity. %right WORD "[A-Za-z][a-z]*"; %left PUNCTUATION "[\.,\;\?!]"; The token, which occurs first, gets a higher priority as the following tokens. ![]() AlternationsAlternation means that one of the contained elements must match. %token CHAR "[A-Za-z] | [0-9]"; ![]() ConcatenationsConcatenation means that all elements in a sequence must match. %token IDENTIFIER "[A-Za-z] [A-Za-z0-9_]*"; ![]() Character classesA character class compares a character to the characters which this class contains. There are two options for a character class. Either a character class or a negated character class. The negated character class implies that the character should not match. %token PUNCTUATION "[\.,\;\?!]"; %token NOTNUMBER "[^0-9]"; ![]() Universal characterThis character matches all characters except carriage return and line feed %token COMMENT "// .*"; ![]() Begin of lineThis symbol matches the beginning of a line %token NOTE "^ \[ [0-9]+ \]"; ![]() End of lineThis symbol matches the end of a line %token BREAK "\\ \\ $"; ![]() AbbreviationsIf an regular expression is often used, you can use an abbreviation for it %ab NUMBER "[0-9]"; %token FLOAT "<NUMBER>+ \. <NUMBER>+"; %token INT "<NUMBER>+"; ![]() Comments and WhitespacesThese are two special tokens which can appear in any position in the parsed text. The parser will read the tokens and then disgard them. %ignore whitespace "[\n\r\ ]"; %ignore comment "// .*"; ![]() ProductionsThe productions are similarly handled to the productions in the XML grammar. More than one definition can be declared through an alternation [Symbol of the production] : [Symbol1] [Symbol2] [..] | [Symbol1] [..] ; To set the precedence for the production use "%prec" example : WORD float %prec PLUS | WORD ; ![]() Error productionsError productions allow to control the level of error recovery. These productions use a special error symbol as placeholder, which can hold the text, which could not be parse by any other part of the grammar. line : error CR ; |