Textbook Example
In many tutorials that talk about compiler design, the example used is parsing a math expressions consist of variables (identifiers), +
, *
, and parenthese ((
and )
). For example, the following are valid expressions:
a + b * c
(a + b) * c
(a + b) * (c + d)
The grammar can be described verbally as follows:
- An expression (
E
) is a list of terms (T
) separated by+
. - A term (
T
) is a list of factors (F
) separated by*
. - A factor (
F
) is either an identifier or an expression enclosed in parentheses.
You may also seen the following used to describe the above.
(It's called Backus-Naur Form or BNF. It's ok if you don't understand this for now.
Some places might have ::=
instead of =>
.)
E => T E'
E' => + T E' | ε
T => F T'
T' => * F T' | ε
F => ( E ) | id
This can almost be translated directly into Rust code with teleparse
.
Some helper structs are needed to avoid loops when calculating trait requirements,
which are annotated with comments.
use teleparse::prelude::*; #[derive_lexicon] #[teleparse(ignore(r#"\s+"#))] pub enum TokenType { #[teleparse(regex(r#"\w+"#), terminal(Ident))] Ident, #[teleparse(terminal(OpAdd = "+", OpMul = "*",))] Op, #[teleparse(terminal(ParenOpen = "(", ParenClose = ")"))] Paren, } #[derive_syntax] #[teleparse(root)] struct E { first: T, rest: Eprime, } // Eplus has to be a separate struct because of recursion when // computing trait satisfaction. See chapter 3.2 Note on Recursion for more details #[derive_syntax] struct Eprime(tp::Option<Eplus>); #[derive_syntax] struct Eplus { op: OpAdd, _t: T, rest: Box<Eprime>, } #[derive_syntax] struct T { first: F, rest: Tprime, } #[derive_syntax] struct Tprime(tp::Option<Tstar>); #[derive_syntax] struct Tstar { op: OpMul, _f: F, rest: Box<Tprime>, } #[derive_syntax] enum F { Ident(Ident), Paren((ParenOpen, Box<E>, ParenClose)), } #[test] fn main() -> Result<(), teleparse::GrammarError> { let source = "(a+b)*(c+d)"; let _ = E::parse(source)?; Ok(()) }