Lexical Analysis
The #[derive_lexicon]
macro is used to declare token types and lexical analyzer rules (the lexer rules)
using an enum. It was already showcased in the beginning of the book with the full example. Let's take a closer look
here.
#![allow(unused)] fn main() { use teleparse::prelude::*; #[derive_lexicon] #[teleparse(ignore(r"\s"))] // ignore whitespaces pub enum TokenType { /// Numbers in the expression #[teleparse(regex(r"\d+"), terminal(Integer))] Integer, /// The 4 basic operators #[teleparse(terminal( OpAdd = "+", OpMul = "*", ))] Operator, /// Parentheses #[teleparse(terminal(ParenOpen = "(", ParenClose = ")"))] Paren, } }
Attributes on the enum:
#[derive_lexicon]
is the entry point, and processes the otherteleparse
attributes.#[teleparse(ignore(...))]
defines the patterns that the lexer should skip between tokens.- You can speify multiple regexes like
#[teleparse(ignore(r"\s+", r"\n+"))]
.
- You can speify multiple regexes like
Attributes on the variants:
#[teleparse(terminal(...))]
generates structs that can be used to put in the syntax tree.- The example generates
Integer
,OpAdd
,OpMul
,ParenOpen
andParenClose
structs. - Some have a specific literal value to match. For example,
OpAdd
will only match a token of typeOp
that is the+
character.
- The example generates
#[teleparse(regex(...))]
defines the pattern to match for the token type.