The Constellation: A Porous Node Among Typed Models

The Adaptive Domain Models paper, collected in A Deeper Dive, describes a constellation of domain models, each typed by the structure of its domain. A model of rigid-body kinematics carries the grade types of projective geometric algebra; a model of spacetime dynamics carries the Lorentz structure of spacetime algebra; a dimensional finance model carries currency dimensions and the fractional-time dimension of volatility. Each is correct by construction in its domain, and each exchanges values with the others through a shared substrate that preserves those types.

A language component cannot be a citizen of that kind. The prior structure of language admits no compact formal specification, so there is no grade to type and no invariant to discharge. The entire section to this point has been an argument that this does not exile the language component from the constellation; it changes the terms of its membership. The component is precise by construction, through the arithmetic, and bounded by the compiler, through the constraint layer, and it shares the substrate, the provenance discipline, and the numeric format of every typed model around it. It is a citizen that wears no type but observes every other law. The right image for it is a porous node.

Porous in a specific sense

Porosity is the architectural property that makes the arrangement work, and it has a precise meaning here drawn from the framework’s reactive resource model. A typed domain model is closed: its inputs and outputs carry types the system checks, its internal structure is fixed by its domain, and nothing enters or leaves except through typed ports. The language node is the opposite. It accepts unstructured input, natural-language intent, an underspecified goal, a partial program, and it produces structured output that must then be checked. Its boundary is porous because meaning crosses it in both directions without a type to carry that meaning across, and the checking happens at the boundary rather than being guaranteed by the channel.

The porosity is bounded, and the containment sits at the boundary, in the grammar and the BAREWire contract. On the way out of the node, the grammar-constrained decoder holds whatever the node emits to syntactically valid Clef, a static artifact that stands at the boundary on its own. Semantic fit is held by the contract on the far side: the request crosses to each typed domain model over BAREWire, whose fixed layout both ends were built to read, carrying the model’s dimensional annotations, so the request lands where it honors them and a mismatch surfaces at the message fabric. The node’s meaning may be wrong, a matter the typed models settle by their contracts, and the form it emits is always well-shaped Clef, so the typed models downstream take only what their types admit. Porosity in, structure out, with the form held by the grammar and the fit held by the BAREWire contract.

This is why the language node can route among typed models without corrupting them. It translates unstructured intent into typed requests the domain models can satisfy, reads their typed responses, and composes them into a result, and at every point where its untyped interior meets their typed exterior, the boundary machinery supplies the structure the node itself cannot guarantee. The node is the system’s interface to the unstructured world; the typed models are its guarantees; and the boundary layer is the membrane that lets intent pass while holding form.

The boundary between emergent and constructed structure

The deepest reason to look closely at the language node is that it makes visible a boundary the whole framework is organized around: the line between structure that is constructed and structure that emerges. Inside a typed domain model, structure is constructed. It is present before any data arrives, enforced by types, discharged by the verifier, and exact. Inside the language node, structure is emergent. It is absent before training, descended toward by the derived architecture’s optimization, and sharp only to the degree the arithmetic allows. The constellation is a system with both kinds of structure in it, and the porous node is where they meet.

The framework’s claim is that this meeting is governed, not chaotic, because both kinds of structure reduce to the same mathematical object where they touch. The positional-encoding subsystem is the clearest instance, and it is the proof that the boundary is real and crossable.

One subsystem where the two structures are provably the same

A result from outside the framework, Puranik’s group-theoretic analysis of positional encodings and the related GRAPE work of Zhang et al., supplies the bridge. Any positional encoding satisfying linearity, translation invariance, and continuity must take the form of a one-parameter matrix group, the exponential of a fixed generator. The design space of positional encodings reduces to the choice of that generator, and the generators classify by canonical form: a generator with a real negative eigenvalue gives the exponential decay of linear-attention variants, a complex-pair generator gives the constant-frequency rotation of RoPE, a generator with both gives damped RoPE, and a defective nilpotent generator gives the linear-in-position behavior of ALiBi.

This is exactly the object the ADM substrate is built on. A rotor is the exponential of a bivector; the group action is the sandwich product; equivariance is the exactness of that one-parameter group under the algebra’s metric. The external classification, reached by generic linear algebra, is the grade decomposition of the generator reached by geometric algebra. RoPE is a blockwise bivector exponential. The external result’s non-interacting subspaces are the framework’s structural zeros, the provably-zero off-block entries of a block-diagonal exponential. And the external result’s unexplored defective class, the one its author marks as probably impractical, is the translator subgroup that projective geometric algebra represents natively, where a null bivector squares to zero and the translator truncates to its linear term.

So positional encoding is a subsystem of an attention model where the structure is known with certainty before any training example arrives: it is a small, group-theoretically closed family characterized by a single graded generator. That is precisely the ADM admissibility condition. The generator can be typed as a graded element, the one-parameter-group constraint enforced at design time, and the encoding made grade-preserving, sparsity-stable, and exactly equivariant through forward-mode and quire training, with no claim made about the rest of the model. The constructed-structure discipline reaches exactly this far into the attention model and no further, and the external result marks where the boundary sits.

This is the boundary made concrete. The positional-encoding generator is typed and exact, a piece of constructed structure living inside a model whose bulk is emergent. The line between the two is not a vague gradient; it falls at a specific subsystem, on one side of which the framework’s type discipline applies in full and on the other side of which it deliberately does not. The porous node is the general case of this specific boundary: a model that is emergent in bulk, with islands of constructed structure where the domain admits a formal specification, and a deterministic membrane at its output where form is imposed even though meaning cannot be.

This subsystem is also where the two readings of the book, set out in Architecture and Arithmetic, stop being a matter of disposition and become a matter of proof. The book’s §5 derives a causal CRATE whose attention carries positional information; the common reading takes whatever positional encoding the model learns and trains it in floating point along with everything else. But positional encoding is precisely the component that the external group-theoretic result proves is a closed, graded family known before training. So here the framework-informed reading is not merely a different attitude toward the same artifact; it is demonstrably available where the common reading leaves structure on the table. A model built the common way is free to let the positional encoding’s block structure drift under floating-point training; the framework types the generator so the block structure is held within its decomposition. The divergence the section claims in general is, on this one subsystem, a theorem rather than a preference, which is why it is worth stating the book’s reading and the framework’s reading side by side exactly here.

Why the drift argument matters here

The constructed side is not merely tidier; it is the side that survives training. A fixed-frequency positional encoding is safe because its generator does not move. A learned, data-dependent generator of the kind the state-space lineage uses drifts under floating-point training, and the block structure that should stay separated suffers cross-block contamination, the same way a grade decomposition corrupts under imprecise arithmetic. This is not hypothetical drift in an abandoned design: Mamba-3, the current frontier of that lineage, reintroduces complex-valued (rotational) state transitions and bridges them explicitly to RoPE, which makes the rotational generator central to the newest models and makes its stability through training a live concern rather than an academic one. The type discipline plus the quire is what holds a learned generator inside its admissible decomposition through training, and the rotor-normalization obligation, discharged by the verifier, is designed to keep the rotational blocks exactly orthogonal where floating point would let them drift.

This generalizes to the whole constellation. The typed domain models hold their structure through training because the framework’s arithmetic and verifier keep it exact. The language node cannot hold structure that way, because it has no structure to hold, so it holds the next best thing: a sharp emergent structure, kept sharp by the same arithmetic, bounded by the same compiler. The constellation is coherent because every node, typed or porous, runs on the substrate that keeps structure from drifting, and the porous node simply has less structure to protect.

The sub-quadratic thread

The drift argument has a direct consequence, one of the framework’s clearest contrasts with the field. The reason attention is expensive is that full pairwise attention costs quadratically in sequence length: every token attends to every other. The sub-quadratic families the field has converged on, the linear-attention and state-space lineages, buy their lower cost by replacing full pairwise attention with a learned, data-dependent generator, the decay-and-rotation structure the positional-encoding result above classifies. The recurrence runs in linear time precisely because the generator summarizes the past instead of re-attending to all of it. This is not a settled or abandoned line of work; it is the live frontier of efficient sequence modeling as of 2026, with Mamba-3 (Cartesia, CMU, Princeton, and Together, ICLR 2026), xLSTM, RWKV-7, Gated DeltaNet, and Liquid AI’s structured-operator LFM family all active, and hybrid designs such as Jamba folding state-space layers into otherwise-transformer stacks.

Where that frontier has arrived is the point for the framework. Mamba-3’s central advance is the reintroduction of complex-valued state transitions, a move the lineage had deliberately abandoned: complex dynamics were found empirically unhelpful for language modeling and phased out after Mamba-1, and Mamba-3 is, by its authors’ account, the first modern recurrent model to bring them back. It does so by way of a theoretical bridge connecting the complex-valued state transition to data-dependent rotary position embeddings, RoPE. Read against the positional-encoding analysis above, this is the field arriving, empirically and from the state-space side, at precisely the object the framework types from the geometric-algebra side. The complex-pair generator that Mamba-3 found gives it expressivity and state-tracking ability is the rotor, the grade-2 bivector exponential, and the RoPE bridge it constructs is the blockwise rotor action the framework already names. The convergence is exact: two derivations, one by experiment and one by type, on the same generator.

That convergence is the main point, and it carries a sharper one delicately. The lineage discarded complex dynamics because, in floating-point training without a structural account, they were hard to make useful; it rediscovered them a model generation later by trial, and now reaches for RoPE to stabilize them. The framework holds those same complex-rotational dynamics for a principled reason the empirical path does not supply: the generator’s grade structure says the rotational component belongs, and typing it is designed to keep it inside its admissible decomposition through training rather than relying on the optimizer to keep it useful. Where the field phased complex dynamics out, struggled, and phased them back in, the framework’s reading explains why they were always the right structure and supplies the substrate that makes them stable. This is not picking up discarded art; it is providing the formal account for the thing the field has just spent a decade rediscovering.

Two open problems the newest work still names are the two the framework’s substrate addresses by construction. The first is state tracking: linear models have historically failed tasks like parity and arithmetic, and Mamba-3’s complex dynamics are aimed squarely at that weakness. A generator typed by its grade structure and kept exact through training is a more direct route to the state-tracking the rotational component is meant to provide, because the structure that enables it does not degrade. The second is the gap the Mamba-3 authors state plainly, that theoretically linear inference remains hardware-inefficient in practice. This is the framework’s home ground: a typed, known-sparse generator lowered through the Program Hypergraph has its computation shape fixed at compile time, which is what closes the distance between linear-in-theory and efficient-in-hardware.

The framework’s reading gets the speed and the exactness together. The same typed generator that makes the positional encoding exact is the sub-quadratic mechanism: typing it as a graded element, holding its decomposition through training with forward-mode and the quire, and discharging the rotor-normalization obligation with the verifier is designed to produce a linear-time recurrence whose generator stays inside its decomposition. The sub-quadratic cost comes from the same structure the field uses; the exactness comes from typing that structure instead of hoping it survives. This is a constructive claim, not a contrast alone: the framework does not propose an alternative to sub-quadratic attention, it proposes the sub-quadratic mechanism the field has converged on, built so the structure it depends on is a fact rather than an aspiration.

The payoff compounds with the rest of the section’s economy. A sub-quadratic model is cheaper in sequence length; a derived architecture is smaller in parameters; and a typed, known-sparse structure runs on simpler hardware because its computation shape is fixed at compile time rather than discovered at runtime. Sub-quadratic, small, and structurally simple are three independent reductions, and they multiply: a model that is linear in context, lean in parameters, and sparse in structure asks far less of its hardware than a dense quadratic transformer of comparable competence. That compound is the hardware argument Why Adaptive Domain Models makes for the typed domain models, carried now into the language node’s own attention mechanism.

What the arrangement buys

A constellation of typed domain models alone cannot talk to the unstructured world; it has no surface for natural-language intent or underspecified goals. A language model alone can talk to that world but guarantees nothing about what it produces. The arrangement gives each what it lacks. The typed models gain an interface to intent, a node that translates the unstructured into typed requests they can satisfy. The language node gains guarantees it cannot supply itself, because its output is bounded by the compiler and its numeric behavior is held by the substrate, and because the typed models it routes to are correct by construction in their domains. A request the language node cannot itself verify becomes a request to a dimensional finance model that will reject a currency-dimension error at the type level, or to a kinematics model whose equivariance is SMT-discharged. The node does not need to be correct about the domain; it needs to route to the model that is.

This is the paradigm. Intelligence in this picture is not a single large model that is trusted because it is large. It is a constellation of models, most of them correct by construction in narrow domains, one of them porous and emergent and bounded, composed so that the porous node handles the unstructured world and the typed nodes handle the parts where correctness can be guaranteed. The language model is essential and it is not load-bearing for correctness, which is exactly the role a component without a formal specification should play.

Open questions

Whether the routing the porous node performs can itself be made reliable enough, given that the node’s interior is emergent and only its boundary is bounded, is the central practical question and the one this section leaves most open.

Whether the islands of constructed structure inside the language model extend beyond positional encoding, and whether other attention subsystems admit a compact formal specification the way positional encoding does, is a research direction the positional-encoding analysis opens rather than closes.

Whether a typed domain model and the porous node can share more than substrate, for instance a common geometric representation at their interface, or whether the boundary must always be mediated by the deterministic layer, is the question the reversibility article takes up from one specific angle.