Textbook Example

In many tutorials that talk about compiler design, the example used is parsing a math expressions consist of variables (identifiers), +, *, and parenthese (( and )). For example, the following are valid expressions:

a + b * c
(a + b) * c
(a + b) * (c + d)

The grammar can be described verbally as follows:

  • An expression (E) is a list of terms (T) separated by +.
  • A term (T) is a list of factors (F) separated by *.
  • A factor (F) is either an identifier or an expression enclosed in parentheses.

You may also seen the following used to describe the above. (It's called Backus-Naur Form or BNF. It's ok if you don't understand this for now. Some places might have ::= instead of =>.)

E  => T E'
E' => + T E' | ε
T  => F T'
T' => * F T' | ε
F  => ( E ) | id

This can almost be translated directly into Rust code with teleparse. Some helper structs are needed to avoid loops when calculating trait requirements, which are annotated with comments.

use teleparse::prelude::*;

#[derive_lexicon]
#[teleparse(ignore(r#"\s+"#))]
pub enum TokenType {
    #[teleparse(regex(r#"\w+"#), terminal(Ident))]
    Ident,
    #[teleparse(terminal(OpAdd = "+", OpMul = "*",))]
    Op,
    #[teleparse(terminal(ParenOpen = "(", ParenClose = ")"))]
    Paren,
}

#[derive_syntax]
#[teleparse(root)]
struct E {
    first: T,
    rest: Eprime,
}

// Eplus has to be a separate struct because of recursion when
// computing trait satisfaction. See chapter 3.2 Note on Recursion for more details
#[derive_syntax]
struct Eprime(tp::Option<Eplus>);

#[derive_syntax]
struct Eplus {
    op: OpAdd,
    _t: T,
    rest: Box<Eprime>,
}

#[derive_syntax]
struct T {
    first: F,
    rest: Tprime,
}

#[derive_syntax]
struct Tprime(tp::Option<Tstar>);

#[derive_syntax]
struct Tstar {
    op: OpMul,
    _f: F,
    rest: Box<Tprime>,
}

#[derive_syntax]
enum F {
    Ident(Ident),
    Paren((ParenOpen, Box<E>, ParenClose)),
}

#[test]
fn main() -> Result<(), teleparse::GrammarError> {
    let source = "(a+b)*(c+d)";
    let _ = E::parse(source)?;

    Ok(())
}