SciLex
A header-only C++20 lexer built on REAL
Loading...
Searching...
No Matches
Classes | Namespaces | Functions | Variables
layout.hpp File Reference

Optional indentation layout: insert NEWLINE / INDENT / DEDENT tokens. More...

#include <cstddef>
#include <limits>
#include <span>
#include <stdexcept>
#include <string>
#include <vector>
#include "token.hpp"
Include dependency graph for layout.hpp:

Go to the source code of this file.

Classes

class  scilex::layout_error
 Thrown when a line's indentation matches no enclosing level. More...
 

Namespaces

namespace  scilex
 The SciLex public API (scilex::lexer, scilex::rule, scilex::token).
 

Functions

std::vector< tokenscilex::layout (std::span< const token > tokens, const std::vector< bool > &mode_significant={})
 Rewrites tokens with NEWLINE / INDENT / DEDENT inserted.
 

Variables

constexpr int scilex::newline {std::numeric_limits<int>::min() + 1}
 Reserved kind: end of a logical line.
 
constexpr int scilex::indent {std::numeric_limits<int>::min() + 2}
 Reserved kind: indentation increased (start of a deeper block).
 
constexpr int scilex::dedent {std::numeric_limits<int>::min() + 3}
 Reserved kind: indentation decreased (end of a block).
 

Detailed Description

Optional indentation layout: insert NEWLINE / INDENT / DEDENT tokens.

Some languages (Python-like, e.g. SciLang) make indentation significant. This opt-in pass turns a flat token stream into a layout-aware one: it inserts a scilex::newline at each logical line end, and scilex::indent / scilex::dedent where the leading indentation changes.

It works purely from token positions — every scilex::token already carries its source line and (byte) column — so the base lexer needs no change and may keep skipping whitespace. Lines with no token (blank or comment-only) carry no structure and are naturally ignored.

Indentation width is the byte column of a line's first token (tabs and spaces each count as one column; it does not police mixed tabs/spaces, and there is no implicit line continuation inside brackets).

This pass is positional. With no significance policy it is mode-blind — every token shapes indentation — which is byte-for-byte the original behaviour. A per-mode significance policy (Layout Awareness Level A) lets a mode be marked insignificant, so its tokens pass through without affecting layout: this is how a multi-line flow collection (examples/yaml.hpp) and implicit line continuation inside brackets (examples/python.hpp) avoid spurious structure. A mode marked insignificant must be self-delimited (entered and left by its own tokens). Block scalars | / > and heredocs are a deeper case (a reference indent in the frame) — Layout Awareness Level B, still to come. Two invariants hold: (1) an empty policy ⇒ byte-for-byte the positional pass, at zero cost; (2) the mode is the single source of truth for the policy — there is no per-rule flag (e.g. ignore_layout); significance is derived from the mode, never beside it.

Input must be an end-of-input-terminated token sequence (the lexer's eof_policy::append); the terminal scilex::end_of_input is preserved.

Definition in file layout.hpp.