Note on Recursion
Consider again the example shown in the beginning of the book:
E => T E'
E' => + T E' | ε
T => F T'
T' => * F T' | ε
F => ( E ) | id
Note that this grammar involves recursions. For example, the productions of E'
and T'
involve themselves; and production of E
involves F
indirectly, which in turn involves E
.
There are 2 things to note when implement a grammar that has recursion:
- The data structure must have a finite size. This can be easily achieved by using a
Box
. - There must NOT be recursion when computing if trait requirements are satisfied.
We will go through an example to illustrate both points. Suppose we are implementing E'
as:
#![allow(unused)] fn main() { // E' => + T E' | ε #[derive_syntax] struct Eprime(tp::Option<(OpAdd, T, Eprime)>); }
Note we are using tp::Option
to represent the ε
case. This is a utility type used
to represent optional values. Utility types will be covered in the next chapter.
The first error you will get is:
recursive type `Eprime` has infinite size
This is because Eprime
contains Eprime
itself, creating an infinite recursion when calculating the size.
To fix this, we can box the recursive part:
#![allow(unused)] fn main() { #[derive_syntax] struct Eprime(tp::Option<(OpAdd, T, Box<Eprime>)>); }
Now we will see the second error:
Overflow evaluating the requirement `Eprime: Produce`
This error message is a little cryptic. One might think it's related to
having Eprime
referenced in itself. However, the root cause is that
derive_syntax
infers the lexicon type by looking at the first fields of structs
and first variants of enums. We can walk through how Rust is evaluating this:
- The lexicon type of
Eprime
is the same as the lexicon type oftp::Option<(OpAdd, T, Box<Eprime>)>
. - The lexicon type of
tp::Option<(OpAdd, T, Box<Eprime>)>
is the same as that of(OpAdd, T, Box<Eprime>)
. - The lexicon type of
(OpAdd, T, Box<Eprime>)
is the same as that ofOpAdd
,T
, andBox<Eprime>
, and they must be the same (according to the trait implementation for tuples) - The lexicon type of
OpAdd
isTokenType
. - The lexicon type of
T
isTokenType
, which is the same as that ofOpAdd
. - The lexicon type of
Box<Eprime>
is the same as that ofEprime
. - We are already calculating the lexicon type of
Eprime
, resulting in overflow
To fix this, the example in the beginning of the book uses a separate struct Eplus
to avoid the loop:
#![allow(unused)] fn main() { #[derive_syntax] struct Eprime(tp::Option<Eplus>); #[derive_syntax] struct Eplus { op: OpAdd, _t: T, rest: Box<Eprime>, } }
In this case, the lexicon type of Eplus
does not depend on Eprime
, because
derive_syntax
generates an implementation for Eplus
instead of relying on
blanket trait implementation as in the tuple case.
Another place where this error can appear is in enums.
In the following example, Paren
must not be the first variant of the enum.
#![allow(unused)] fn main() { #[derive_syntax] enum F { Ident(Ident), Paren((ParenOpen, Box<E>, ParenClose)), } }
Remove Recursion with Utility Types
In the first example above, the grammar requires recursion in E'
to implement
a list-like structure (terms separated by +
). Teleparse provides utility types
like tp::Split
to simplify the implementation and provide out-of-the-box support
for panic recovery. Utility types are covered in Chapter 4.