The Constellation

The Model Constellation

The Adaptive Domain Models paper, collected in A Deeper Dive, describes a constellation of domain models, each typed by the structure of its domain. A model of rigid-body kinematics carries the grade types of projective geometric algebra; a model of spacetime dynamics carries the Lorentz structure of spacetime algebra; a dimensional finance model carries currency dimensions and the fractional-time dimension of volatility. Each is correct by construction in its domain, and each exchanges values with the others through a shared substrate that preserves those types.

module Kinematics =
    [<Measure>] type m
    [<Measure>] type s

    type Point = Grade1<Pga, m>          // a dimensioned position
    type Motor = Even<Pga>               // a screw motion
    type Twist = Grade2<Pga, m/s>        // a velocity bivector

    type Step  = { pose: Point; twist: Twist; dt: float<s> }
    type Model = DomainModel<Step, Point>

    let act (g: Motor) (p: Point) : Point = sandwich g p   // grade-preserving sandwich

type RelativityStep = { frame: Even<Sta>; event: Vector<Sta> }   // Lorentz (1,3)

[<Measure>] type USD
[<Measure>] type yr
type OptionQuote<[<Measure>] 'ccy> =
    { spot: float<'ccy>; vol: float<'ccy^0 / yr^(1/2)> }   // volatility carries yr^-(1/2)

A language component cannot be a citizen of that kind. The prior structure of language admits no compact formal specification, so there is no grade to type and no invariant to discharge. This does not exile the language component from the constellation. It changes the terms of its membership. The component is precise by construction, through the arithmetic, and bounded by the compiler, through the constraint layer, and it shares the substrate, the provenance discipline, and the numeric format of every domain model around it. It is a citizen that wears no type but observes every other law. We model it as a porous node.

Porous in a specific sense

Porosity has a precise meaning here, the open-loop porosity set out in our structured recurrence. A well-structured domain model is closed: its inputs and outputs carry types the system checks, its internal structure is fixed by its domain, and nothing enters or leaves except through structured ports. The language node is the opposite. It accepts unstructured input, natural-language intent, an underspecified goal, a partial program, and it produces structured output that must then be checked. Its boundary is porous because meaning crosses it in both directions without a type to carry that meaning across, and the checking happens at the boundary rather than being guaranteed by the channel.

The porosity is bounded, and the containment sits at the boundary, in the grammar and the BAREWire contract. On the way out of the node, the grammar-constrained decoder holds whatever the node emits to syntactically valid Clef, a static artifact that stands at the boundary on its own. Semantic fit is held by the contract on the far side: the request crosses to each domain model over BAREWire, whose fixed layout both ends were built to read, carrying the model’s dimensional annotations, so the request reaches a model built to honor them and a mismatch surfaces at the message fabric. The node’s meaning may be wrong, a matter the domain models settle by their contracts, and the form it emits is always well-shaped Clef, so the models downstream take only what their types admit.

This is why the language node can route among domain models without corrupting them. It translates unstructured intent into structured requests the domain models can satisfy, reads their structured responses, and composes them into a result. At every point where its emergent interior meets a domain model’s structured exterior, the grammar and the BAREWire contract impose the form the node cannot promise on its own. The node is our system’s interface to the unstructured world. The domain models carry the guarantees. The boundary layer is the membrane that lets intent pass while holding form.

The boundary between emergent and constructed structure

The language node makes visible a boundary our framework is organized around: the line between structure that is constructed and structure that emerges. Inside a domain model, structure is constructed. It is present before any data arrives, enforced by types, discharged by the verifier, and exact. Inside the language node, structure is emergent. It is absent before training, descended toward by the derived architecture’s optimization, and sharp only to the degree the arithmetic allows. The constellation is a system with both kinds of structure in it, and the porous node is where they meet.

Our claim is that this meeting is governed, not chaotic, because both kinds of structure reduce to the same mathematical object where they touch. The positional-encoding subsystem is where this reduction is provable, which makes the boundary real and crossable.

Proven convergence at one subsystem

A result from outside the framework, Puranik’s group-theoretic analysis of positional encodings and the related GRAPE work of Zhang et al., supplies the bridge. Any positional encoding satisfying linearity, translation invariance, and continuity must take the form of a one-parameter matrix group, the exponential of a fixed generator. The design space of positional encodings reduces to the choice of that generator, and the generators classify by canonical form: a generator with a real negative eigenvalue gives the exponential decay of linear-attention variants, a complex-pair generator gives the constant-frequency rotation of RoPE, a generator with both gives damped RoPE, and a defective nilpotent generator gives the linear-in-position behavior of ALiBi.

This is exactly the object our ADM substrate is built on. A rotor is the exponential of a bivector; the group action is the sandwich product; equivariance is the exactness of that one-parameter group under the algebra’s metric. The external classification, reached by generic linear algebra, is the grade decomposition of the generator reached by geometric algebra. RoPE is a blockwise bivector exponential. The external result’s non-interacting subspaces are our structural zeros, the provably-zero off-block entries of a block-diagonal exponential. And the external result’s unexplored defective class, the one its author marks as probably impractical, is the translator subgroup that projective geometric algebra represents natively, where a null bivector squares to zero and the translator truncates to its linear term.

The Clef here is illustrative of the idiom rather than a finalized API surface.

// The rotor case: a block-diagonal bivector generator.
let rotor (b: BlockDiagonal<Bivector<Pga>>) : Even<Pga> =
    Ga.exp b                                 // block-diagonal in, block-diagonal out; off-block has no storage

// The defective case PGA carries natively: a null bivector.
let translate (n: NullBivector<Pga>) : Motor<Pga> =
    Motor.one + n                            // n*n = 0, so exp(n) = 1 + n exactly

So positional encoding is a subsystem of an attention model where the structure is known with certainty before any training example arrives: it is a small, group-theoretically closed family characterized by a single graded generator. That is the ADM admissibility condition. The generator can be typed as a graded element, the one-parameter-group constraint enforced at design time, and the encoding made grade-preserving, sparsity-stable, and exactly equivariant through forward-mode and quire training, with no claim made about the rest of the model. The constructed-structure discipline reaches this far into the attention model and no further, and the external result marks where the boundary sits.

This is the boundary made concrete. The positional-encoding generator is typed and exact, a piece of constructed structure living inside a model whose bulk is emergent. The line between the two falls at a specific subsystem, on one side of which the framework’s type discipline applies in full and on the other side of which it deliberately does not. The porous node is the general case of this specific boundary: a model that is emergent in bulk, with islands of constructed structure where the domain admits a formal specification, and a deterministic membrane at its output where form is imposed even though meaning cannot be.

This subsystem turns the two readings of the book, set out in Architecture and Arithmetic, from a matter of disposition into a matter of proof. The book’s §5 derives a causal CRATE whose attention carries positional information; the common reading takes whatever positional encoding the model learns and trains it in floating point along with everything else. But positional encoding is the component that the external group-theoretic result proves is a closed, graded family known before training. So here our reading is not merely a different attitude toward the same artifact; it is demonstrably available where the common reading leaves structure on the table. In a model built the common way, nothing holds the positional encoding’s block structure, so it drifts under floating-point training; our reading types the generator so the block structure is held within its decomposition. On this one subsystem the two readings differ provably rather than by preference.

Drift and the Learned Generator

Constructed structure holds through training where emergent structure would drift. A fixed-frequency positional encoding is safe because its generator does not move. A learned, data-dependent generator of the kind the state-space lineage uses drifts under floating-point training, and the block structure that should stay separated suffers cross-block contamination, the same way a grade decomposition corrupts under imprecise arithmetic. This is not hypothetical drift in an abandoned design: Mamba-3 (Lahoti et al.), the current frontier of that lineage, reintroduces complex-valued (rotational) state transitions and bridges them explicitly to RoPE, which makes the rotational generator central to the newest models and makes its stability through training a live concern rather than an academic one. The type discipline plus the quire is what holds a learned generator inside its admissible decomposition through training, and the rotor-normalization obligation, discharged by the verifier, is designed to keep the rotational blocks exactly orthogonal where floating point would let them drift.

This generalizes to the whole constellation. The domain models hold their structure through training because our arithmetic and verifier keep it exact. The language node cannot hold structure that way, because it has no structure to hold, so it holds the next best thing: a sharp emergent structure, kept sharp by the same arithmetic, bounded by the same compiler. The constellation is coherent because every node, structured or porous, runs on the substrate that keeps structure from drifting, and the porous node has less structure to protect.

The sub-quadratic thread

The drift argument has a direct consequence, one of our clearest contrasts with the field. The reason attention is expensive is that full pairwise attention costs quadratically in sequence length: every token attends to every other. The sub-quadratic families the field has converged on, the linear-attention and state-space lineages, buy their lower cost by replacing full pairwise attention with a learned, data-dependent generator, the decay-and-rotation structure the positional-encoding result above classifies. The recurrence runs in linear time because the generator summarizes the past instead of re-attending to all of it. This is not a settled or abandoned line of work; it is the live frontier of efficient sequence modeling as of 2026, with Mamba-3 (Cartesia, CMU, Princeton, and Together, ICLR 2026), xLSTM, RWKV-7, Gated DeltaNet, and Liquid AI’s structured-operator LFM family all active, and hybrid designs such as Jamba folding state-space layers into otherwise-transformer stacks.

The frontier has arrived at a structure our framework already accounts for. Mamba-3’s central advance is the reintroduction of complex-valued state transitions, a move the lineage had deliberately abandoned: complex dynamics were found empirically unhelpful for language modeling and phased out after Mamba-1, and Mamba-3 is, by its authors’ account, the first modern recurrent model to bring them back. It does so by way of a theoretical bridge connecting the complex-valued state transition to data-dependent rotary position embeddings, RoPE. Read against the positional-encoding analysis above, this is the field arriving, empirically and from the state-space side, at the object our framework types from the geometric-algebra side. The complex-pair generator that Mamba-3 found gives it expressivity and state-tracking ability is the rotor, the grade-2 bivector exponential, and the RoPE bridge it constructs is the blockwise rotor action our framework already names. The convergence is exact: two derivations, one by experiment and one by type, on the same generator.

The convergence has an implication beyond the matching itself. The lineage discarded complex dynamics because, in floating-point training without a structural account, they were hard to make useful. It rediscovered them a model generation later by trial, and now reaches for RoPE to stabilize them. Our framework holds those same complex-rotational dynamics for a principled reason the empirical path does not supply: the rotational component is a graded part of the generator, and typing it is designed to keep it inside its admissible decomposition through training rather than relying on the optimizer to keep it useful. Where the field phased complex dynamics out, struggled, and phased them back in, our reading explains why they were always the right structure and supplies the substrate that makes them stable. We supply the formal account for the structure the field has spent a decade rediscovering.

Two open problems the newest work still names are the two our substrate addresses by construction. The first is state tracking: linear models have historically failed tasks like parity and arithmetic, and Mamba-3’s complex dynamics are aimed squarely at that weakness. A generator typed by its grade structure and kept exact through training is a more direct route to the state-tracking the rotational component is meant to provide, because the structure that enables it does not degrade. The second is the gap the Mamba-3 authors state plainly, that theoretically linear inference remains hardware-inefficient in practice. This is the framework’s home ground: a known-sparse generator lowered through the Program Hypergraph has its computation shape fixed at compile time, which is what closes the distance between linear-in-theory and efficient-in-hardware.

Our reading gets the speed and the exactness together. The same typed generator that makes the positional encoding exact is the sub-quadratic mechanism: typing it as a graded element, holding its decomposition through training with forward-mode and the quire, and discharging the rotor-normalization obligation with the verifier is designed to produce a linear-time recurrence whose generator stays inside its decomposition. The sub-quadratic cost comes from the same structure the field uses; the exactness comes from typing that structure so it stays inside its admissible form under training. We adopt the sub-quadratic mechanism the field has converged on rather than offering an alternative to it, built so the structure it depends on is a fact rather than an aspiration.

The payoff compounds with the rest of the section’s economy. A sub-quadratic model is cheaper in sequence length; a derived architecture is smaller in parameters; and a known-sparse structure runs on simpler hardware because its computation shape is fixed at compile time rather than discovered at runtime. Sub-quadratic, small, and structurally simple are three independent reductions, and they multiply: a model that is linear in context, lean in parameters, and sparse in structure requires far less of its hardware than a dense quadratic transformer of comparable competence. That compound is the hardware argument The Utility of Adaptive Domain Models makes for the domain models, carried now into the language node’s own attention mechanism.

What the arrangement buys

A constellation of correct-by-construction domain models alone cannot talk to the unstructured world. It has no surface for natural-language intent or underspecified goals. A language model alone can talk to that world but guarantees nothing about what it produces. The arrangement gives each what it lacks. The domain models gain an interface to intent, a node that translates the unstructured into well-formed requests they can satisfy. The language node gains guarantees it cannot supply itself, because its output is bounded by the compiler and its numeric behavior is held by the substrate, and because the models it routes to are correct by construction in their domains. A request the language node cannot itself verify becomes a request to a dimensional finance model that will reject a currency-dimension error at the type level, or to a kinematics model whose equivariance is SMT-discharged. The node does not need to be correct about the domain; it needs to route to the model that is.

This is the paradigm. Intelligence in this picture is not a single large model that is trusted because of its size. It is a constellation of models, most of them correct by construction in narrow domains, one of them porous and emergent and bounded, composed so that the porous node handles the unstructured world and the adaptive domain nodes handle the parts where correctness can be effectively supported. The language model is essential and it is not load-bearing for correctness, which is the role a component without a formal specification should play.

A recent system reaches sparse activation from the opposite direction. Apple’s AFM Core Advanced holds twenty billion parameters and activates one to four billion per request, a single dense model gated down to what a prompt needs. It prunes a dense whole. The constellation composes domain specialists that were never densely built, so the parameters a task does not call on are absent rather than trained and gated, and the saving lands at construction rather than at inference.

Our reading is that the constructive direction reaches past the parameter count. In Ma’s framing a pruned model is sparse only as far as the one device it was gated to fit, and its sparsity is an optimization laid over the over-parameterization we identified as endemic in the industry. Our speculative contrast is that our constellation is an additive set of well-structured actors joined by BAREWire, and because that contract holds its meaning across a hardware boundary as cleanly as within one, the specialists need not share a single substrate. A component can run on the primary chip, on a remote node inside the operating organization’s trust boundary, on an FPGA or a neuromorphic part where its computation is most advantaged, or on legacy hardware that runs one tool well. The composition stays coherent because every message arrives in the appropriate structure regardless of whether it travels within a process, between processes, or over the wire. The compute a constellation can draw on then widens past its primary source, to wherever each part is best served, with the whole presented to the user as one coherent system.

Gradient ascent on the coding-rate objective, taken over K subspaces rather than one, produces one operator per subspace, a shape Ma flags as a possible mixture of experts. The constellation is that shape with the membership settled by grade: each domain specialist is the operator for its subspace, and which subspace it owns is fixed in the type before a request arrives.

Our least charitable view is that Apple will place their models only on the latest hardware, because at the end of the day they are primarily a hardware concern. The Fidelity Framework has a broader goal to reach all types of hardware and meet them where they are so to speak to place the workload in whatever context can support the use case.

Open questions

Whether the routing the porous node performs can itself be made reliable enough, given that the node’s interior is emergent and only its boundary is bounded, is the central practical question and the one this section leaves most open.

Whether the islands of constructed structure inside the language model extend beyond positional encoding, and whether other attention subsystems admit a compact formal specification the way positional encoding does, is a research direction the positional-encoding analysis opens rather than closes.

Whether a domain model and the porous node can share more than substrate, for instance a common geometric representation at their interface, or whether the boundary must always be mediated by the deterministic layer, is the question the reversibility article takes up from one specific angle.