|
SciLex
A header-only C++20 lexer built on REAL
|
Optional indentation layout: insert NEWLINE / INDENT / DEDENT tokens. More...
#include <cstddef>#include <limits>#include <span>#include <stdexcept>#include <string>#include <vector>#include "token.hpp"Go to the source code of this file.
Classes | |
| class | scilex::layout_error |
| Thrown when a line's indentation matches no enclosing level. More... | |
Namespaces | |
| namespace | scilex |
| The SciLex public API (scilex::lexer, scilex::rule, scilex::token). | |
Functions | |
| std::vector< token > | scilex::layout (std::span< const token > tokens, const std::vector< bool > &mode_significant={}) |
Rewrites tokens with NEWLINE / INDENT / DEDENT inserted. | |
Variables | |
| constexpr int | scilex::newline {std::numeric_limits<int>::min() + 1} |
| Reserved kind: end of a logical line. | |
| constexpr int | scilex::indent {std::numeric_limits<int>::min() + 2} |
| Reserved kind: indentation increased (start of a deeper block). | |
| constexpr int | scilex::dedent {std::numeric_limits<int>::min() + 3} |
| Reserved kind: indentation decreased (end of a block). | |
Optional indentation layout: insert NEWLINE / INDENT / DEDENT tokens.
Some languages (Python-like, e.g. SciLang) make indentation significant. This opt-in pass turns a flat token stream into a layout-aware one: it inserts a scilex::newline at each logical line end, and scilex::indent / scilex::dedent where the leading indentation changes.
It works purely from token positions — every scilex::token already carries its source line and (byte) column — so the base lexer needs no change and may keep skipping whitespace. Lines with no token (blank or comment-only) carry no structure and are naturally ignored.
Indentation width is the byte column of a line's first token (tabs and spaces each count as one column; it does not police mixed tabs/spaces, and there is no implicit line continuation inside brackets).
This pass is positional. With no significance policy it is mode-blind — every token shapes indentation — which is byte-for-byte the original behaviour. A per-mode significance policy (Layout Awareness Level A) lets a mode be marked insignificant, so its tokens pass through without affecting layout: this is how a multi-line flow collection (examples/yaml.hpp) and implicit line continuation inside brackets (examples/python.hpp) avoid spurious structure. A mode marked insignificant must be self-delimited (entered and left by its own tokens). Block scalars | / > and heredocs are a deeper case (a reference indent in the frame) — Layout Awareness Level B, still to come. Two invariants hold: (1) an empty policy ⇒ byte-for-byte the positional pass, at zero cost; (2) the mode is the single source of truth for the policy — there is no per-rule flag (e.g. ignore_layout); significance is derived from the mode, never beside it.
Input must be an end-of-input-terminated token sequence (the lexer's eof_policy::append); the terminal scilex::end_of_input is preserved.
Definition in file layout.hpp.