Lingo is a C++20 single-header library for building recursive descent parsers. Following a simplicity before performance philosophy, Lingo is intended for parsing small to medium-sized source files or structured text.
Lingo is currently under development. Expect strange things.
With Lingo, you can express grammars in a syntax resembling EBNF (Extended Backus–Naur Form) directly in C++ code using operator overloading. There are no external dependencies — just drop include/lingo.h into your project.
Here's an example grammar for parsing a list of numbers separated by commas:
<comma> = ","
<digit> = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
<number> = <digit>+
<num_list> = <number> (<comma> <number>)*
In Lingo:
lingo::rule comma(',');
lingo::rule digit('0', '9');
lingo::rule number = lingo::repeat(digit, 1); // at least one digit
lingo::rule num_list = number + lingo::repeat(comma + number, 0);You build grammars using lingo::rule and lingo::placeholder. Placeholders allow recursive grammars by acting as forward declarations.
| Form | Matches |
|---|---|
rule('a') |
single character a |
rule('a', 'z') |
any character in range a–z |
rule({'x','y','z'}) |
any character from the list |
rule("hello") |
the literal string hello |
| Operator | Meaning |
|---|---|
a + b |
sequence: a followed by b |
a | b |
alternation: a or b |
!a |
negation lookahead: succeeds if a does not match (does not consume input) |
repeat(r, min) |
repeat r at least min times |
repeat(r, min, max) |
repeat r between min and max times (0 = unlimited) |
<expression> = ["+"|"-"] <term> {("+"|"-") <term>}
<term> = <factor> {("*"|"/") <factor>}
<factor> = <number> | "(" <expression> ")"
In Lingo:
lingo::rule lparen('('), rparen(')');
lingo::rule sign({'+', '-'});
lingo::rule plus('+'), minus('-'), mul('*'), div('/');
lingo::rule digit('0', '9');
lingo::rule number = lingo::repeat(digit, 1);
lingo::placeholder expression_placeholder;
lingo::rule factor = number | (lparen + expression_placeholder + rparen);
lingo::rule term = factor + lingo::repeat((mul | div) + factor, 0);
lingo::rule expression =
lingo::repeat(sign, 0, 1) +
term +
lingo::repeat((plus | minus) + term, 0);
expression_placeholder.set(expression);To produce meaningful error messages, assign names to rules using the named constructor variants:
lingo::rule anon_digit('0', '9'); // anonymous
lingo::rule named_digit("digit", '0', '9'); // named
lingo::rule named_number("number", lingo::repeat(digit, 1));Only some rules need names — diagnostics reference the nearest named ancestor.
Lingo includes a lightweight source class that owns a string and tracks position, line, and column.
// From a string
lingo::source src("1 + 2 * 3");
// From a file
lingo::source src = lingo::source::from_file("data/example.txt");Parsing is done by constructing a lingo::grammar object from a rule, then calling one of its methods against a lingo::source.
lingo::source src("1 + 2 * 3");
lingo::grammar g(expression);
if (!g.check(src))
std::cerr << "parse failed\n";lingo::source src("1 + * 3");
std::string error;
if (!g.check(src, error))
std::cerr << error << '\n';
// unexpected '*', expected number at line 1 column 5Error messages include the unexpected character, the nearest named rule that was expected, and the line and column.
lingo::parse_node tree;
std::string error;
if (g.parse(src, tree, error)) {
// tree contains named nodes with source spans
}parse_node stores:
name— the rule name (only named rules appear in the tree)start/end— byte offsets into the sourcechildren— child named nodes
Helper methods on parse_node:
node.text(src); // matched source text
node.find("name"); // first direct child by name
node.all("name"); // all direct children by name
node.find_deep("name"); // first descendant DFS
node.collect("name"); // all descendants DFSLingo does not provide built-in evaluation. You call grammar::parse() to get a named parse tree, then walk it yourself. The samples/ directory contains two complete working examples that illustrate the two main approaches.
A recursive descent evaluator for arithmetic expressions with operator precedence, unary minus, and parentheses. It demonstrates the direct walk pattern: the parse tree is walked in a single recursive function without building any intermediate representation.
The grammar produces named nodes number, sign, addop, mulop, unary, factor, term, and expression. Because the grammar is left-recursive by construction, children of expression and term already alternate value/operator/value:
expression
├─ term ← left operand
├─ addop ("+")
└─ term ← right operand
The entire evaluator is a single eval() function that pattern-matches on the node name:
double eval(const parse_node &n, const source &src)
{
if (n.name == "number") return std::stod(n.text(src));
if (n.name == "factor") return eval(n.children[0], src);
if (n.name == "unary") return sign * eval(n.children[1], src);
// expression / term: fold [v, op, v, op, ...]
double result = eval(n.children[0], src);
for (size_t i = 1; i + 1 < n.children.size(); i += 2) { ... }
return result;
}This approach works well when the parse tree structure maps cleanly onto the computation — no intermediate AST is needed.
A tiny BASIC interpreter supporting LET, PRINT, IF/THEN, GOTO, and END. It demonstrates the two-phase pattern: first build an intermediate AST from the parse tree, then execute the AST.
An intermediate AST is necessary here because:
- Variables need to be resolved at runtime, not at parse time.
- GOTO needs a map from line number to program index, which requires all lines to be collected first.
The two phases are:
-
Parse → AST:
build_program_ast()walks the parse tree usingcollect("line"), then for each line usesfind("statement")/find("identifier")/all("expression")etc. to buildline_astandstatement_aststructs. -
Execute: A simple interpreter loop with a program counter (
pc), a variable map (vars), and a line-number index (index_by_line) for GOTO resolution.
Use the two-phase approach when you need to look ahead, resolve references across the tree, or separate parsing concerns cleanly from execution.
lingo
├── README.md ← This file
├── LICENSE ← MIT license
├── CMakeLists.txt ← Build configuration
├── include/
│ └── lingo.h ← Single header (drop this into your project)
├── samples/
│ ├── eval.cpp ← Arithmetic expression parser/evaluator
│ ├── eval-main.cpp ← CLI entry point
│ ├── basic.cpp ← Tiny BASIC interpreter
│ └── basic-main.cpp ← CLI entry point
├── tests/
│ ├── parse-tests.cpp ← Core parse tests
│ ├── basic-tests.cpp ← BASIC grammar tests
│ └── data/ ← Input files for tests
└── build/ ← CMake build output
Lingo is organized in four layers, each with a single responsibility:
┌─────────────────────────────────────────────────────────┐
│ rule / placeholder (grammar layer) │
│ User-facing combinators. Build a shared_ptr<node> │
│ tree via operator+, operator|, operator!, repeat(). │
│ rule describes structure — it never parses anything. │
├─────────────────────────────────────────────────────────┤
│ node hierarchy (grammar representation) │
│ literal, or_node, and_node, not_node, repeat_node, │
│ placeholder_node. Each node implements accept() for │
│ the visitor pattern. Nodes are ref-counted and shared │
│ across rules to avoid copying. │
├─────────────────────────────────────────────────────────┤
│ detail::parser (parsing engine) │
│ Implements node_visitor. Walks the grammar node tree │
│ against a source, tracks the furthest failure for │
│ error messages, and — when a root parse_node is │
│ supplied — incrementally builds the named parse tree │
│ with checkpoint/rollback on backtrack. │
├─────────────────────────────────────────────────────────┤
│ grammar (user-facing facade) │
│ Owns the root node (shared_ptr). Exposes check() and │
│ parse() which construct a detail::parser internally │
│ and run it. Users never touch detail::parser directly. │
└─────────────────────────────────────────────────────────┘
Key design decisions:
ruleis immutable after construction — combinators return new rules, never modify existing ones.- Only named nodes (
rule("name", ...)) appear in the parse tree. Unnamed nodes are transparent glue — they match input but produce no tree output. detail::parseris stateless between calls;grammar::check()/grammar::parse()construct one per invocation.placeholderbreaks recursive grammar cycles: it holds ashared_ptr<node>that is filled in after the recursive rule is defined.
No dependencies — GoogleTest is fetched automatically by CMake.
cmake -B build
cmake --build buildRun tests:
./bin/test-lingo| Option | Default | Description |
|---|---|---|
BUILD_TESTS |
ON |
Build the test executable |
BUILD_SAMPLES |
ON |
Build the sample executables |
./bin/sample-evaluator "1 + 2 * (3 + 4)"
./bin/sample-basic path/to/program.bas