Syntax Analysis

The #[derive_syntax] macro turns a struct or enum into a syntax tree node as part of the Abstract Syntax Tree , or AST. These become the non-terminal symbols of your language.

Structs

Structs are used to define sequences of symbols. For example, this production:

X => A B C

Means to produce X, you need to produce A, B, and C in that order.

For a more real-world example, say we want to parse an assignment statement like x = 1. For simplicity, we will assume the right hand side is always a number.

AssignmentStatement => Ident OpAssign Number

Suppose Ident, OpAssign and Number are all terminals created with #[derive_lexicon], we can create AssignmentStatement like this:

#![allow(unused)]
fn main() {
use teleparse::prelude::*;

#[derive_syntax]
pub struct AssignmentStatement(pub Ident, pub OpAssign, pub Number);
}

Named fields work as well:

#![allow(unused)]
fn main() {
use teleparse::prelude::*;

#[derive_syntax]
pub struct AssignmentStatement {
    pub ident: Ident,
    pub op_assign: OpAssign,
    pub number: Number,
}
}

When the parser is expecting an AssignmentStatement, it will try to parse Ident, then OpAssign, then Number, and put them in the struct.

Enums

Enums are used to define choices (unions) of productions. Continuing our example with AssignmentStatement, suppose we want to create a Statement that can either be an assignment or a function call.

This can be denoted with

Statement => AssignmentStatement | FunctionCallStatement

We can create Statement like this:

#![allow(unused)]
fn main() {
use teleparse::prelude::*;

#[derive_syntax]
pub enum Statement {
    Assignment(AssignmentStatement),
    FunctionCall(FunctionCallStatement),
}
}

When the parser is expecting a Statement, it will try to parse either an AssignmentStatement or a FunctionCallStatement, and create a Statement with the corresponding variant. We will cover how the parser decides which path to take in the next section.

Root

With the terminals and non-terminals, we can build the data structures for the entire language. On the outermost level, there will be one symbol that is the "target" to parse. We will refer to it as the root. For example, for a programming language, the root might be the syntax for a file.

To indicate the symbol is root, use the #[teleparse(root)] attribute.

#![allow(unused)]
fn main() {
use teleparse::prelude::*;

#[derive_syntax]
#[teleparse(root)]
pub struct File {
    ...
}
}

This will derive the Root trait, which has a parse() function that can be called to parse an input string to the root symbol. For more complex usage, you can use the Parser object.