Text grammar format

Introduction

The XML presentation of the grammar is not intended to be human readable/writeable, but rather to be easy readable for the Chaperon components. It is recommended to use this text grammar format and convert it to the XML presentation.

Structure

Compared to the text grammar of the standard parser, the new text format doesn't split tokens and definition anymore. Definition and abbreviations and special instructions are all mixed with each other.

%ab int : "Integer" ;

integers : int ( ws int )*;

%start "integers" ;

The declaration "%start" declares the root definition for the result document.

Definition

Definition are definition for the xml element, which the parser output include.

WORD : [A-Za-z] [a-z]* ;

The definition, which occurs first, gets a higher priority as the following definitions.

Alternations

Alternation means that one of the contained elements must match.

CHAR : "[A-Za-z] | [0-9]";

Concatenations

Concatenation means that all elements in a sequence must match.

IDENTIFIER : [A-Za-z] [A-Za-z0-9_]*;

Character classes

A character class compares a character to the characters which this class contains. There are two options for a character class. Either a character class or a negated character class. The negated character class implies that the character should not match.

PUNCTUATION : [.,;?!] ;
NOTNUMBER   : [^0-9] ;

Universal character

This character matches all characters except carriage return and line feed

COMMENT : "//" .*;

Abbreviations

If an regular expression is often used, you can use an abbreviation for it

%ab NUMBER : [0-9];
FLOAT      : NUMBER+ \. NUMBER+;
INT        : NUMBER+;

by Stephan Michels

Copyright © 2003 Chaperon Project. All rights reserved.