05 Validation Plan

05 — Validation Plan: Building the Instrument That Tells Us Whether the Inference Is Possible

The stance: the instrument and the result co-evolve

The four prior notes in this scaffold ask whether a coexponential session type can be reconstructed from ordinary actor code without an annotation. This note is about how we would find out. The honest answer up front: we likely cannot settle the question on paper alone, and the reason is structural, not a matter of effort.

A pure paper result would name a fragment of session types and prove that every member of it is inferable by some abstract algorithm. That is the right move when the inferable fragment is defined by the syntax of the source calculus. It is the wrong move here, because the fragment we care about is not defined by the syntax. It is defined partly by what an actual analysis over our Program Semantic Graph can observe. A coexponential reconstruction that an idealized algorithm could perform on a whiteboard is not the same object as a reconstruction our CCS front end can perform against the PSG that elaboration actually produces. The boundary between the two is the thing under study, and it is a property of the instrument.

We have a precedent for exactly this in the framework, and it is load-bearing for the whole plan. The fixed-point scaffolding work states the discipline plainly for width inference: “a node whose range the analysis cannot observe is not guessed but reported, the error asking the source for an annotation.” The inferable fragment for widths is not “every value whose range is mathematically determined.” It is “every value whose range the analysis can see in the graph it walks.” A value whose range is determined by a fact the analysis does not carry is outside the fragment, and the system says so by failing toward an annotation request rather than guessing. The fragment is drawn by the observability of the analysis, and the analysis is software.

So the inferable fragment for coexponential sessions is, in the same way, partly an empirical fact about a piece of software we have not built. This is why the validation requires implementation. The Fidelity Framework may itself need to become the expression mechanism that tells us whether the inference is possible. Building the analyzer is not the step after we know the answer. Building the analyzer is a large part of how we find the answer. The instrument and the result co-evolve, and the plan below is structured around that fact instead of pretending around it.

The known-art the inference can lean on

The plan is not starting from nothing. Three results in the framework are shipped or implemented, and they bound the solution space the coexponential inference would sit inside. They are stated here in present tense because they are real; the coexponential inference is stated in architectural verbs throughout because it is not.

HelloArty proves that structural inference plus a design-time budget check works end to end. The machine classification is inferred from the step function’s dependency graph, surfaced as “Inferred Mixed,” not declared. The width inference reads the Counter’s range from its modulus bound, [0, 399,999,999], and fixes 29 bits; it reads PeriodMs from its clamp, [100, 2000], and fixes 11 bits, all from operations and none from an annotation. The timing budget is analyzed before synthesis: a combinational-depth pass walks the PSG and emits the CCS0100 warning when depth exceeds the clock threshold, with --warnaserror promoting it to a hard stop. The pattern is one we can copy directly: infer a structural property, check it against a budget at design time, and surface a steerable diagnostic that lets the developer fix, relax, or proceed knowingly.

The wait-for analyzer proves that part of the concurrency space already yields to this treatment. It collects synchronous-RPC suspensions into a wait-for relation W, a may-wait over-approximation that is sound and never under-claims. Acyclicity reduces to a rank function r with r(u) < r(v) on every edge, a QF_LIA goal Z3 discharges at Tier 2. The WaitClass ladder is the shape we expect the coexponential ladder to mirror: AcyclicStatic discharges silently, OrderedCyclic infers the rank with zero ceremony, and Unresolved (the value-carried callee, undecidable in general) downgrades visibly to a supervised timeout. The deadlock-freedom note already draws the connection across the two precedents in one line: “Width is inferred from type structure … the synchronous-action priority is the same inference one axis over.”

The width-inference coeffect is implemented in our Alex middle end, and it defines the discipline. narrowType pulls the design-time width coeffect and fixes the integer representation from the inferred range, and the node whose range is unobservable is reported, not guessed. The discipline is the coeffect calculus of Petricek, Orchard, and Mycroft: the requirement is settled during analysis, recorded as codata on the PSG, and witnessed by a zipper traversal at lowering with no second analysis. A coexponential inference that lives in the framework would be a coeffect of exactly this kind, settled once during analysis and carried as PSG codata to the seam.

These three are the floor the plan stands on. The wait-for slice in particular is the fallback the risk register returns to: even if the coexponential reconstruction yields nothing useful, the wait-for floor is already the liveness guarantee, and it stays.

The staged plan

The four stages move from paper to instrument to measurement and back to paper. The order matters: each stage produces the input the next one needs, and the last stage feeds the first.

Stage (a): characterize the fragment on paper as far as it goes

Carry the paper characterization to its natural limit before writing analyzer code. The coexponential connectives of Qian, Kavvos, and Birkedal are defined by a fixpoint isomorphism that unfolds a stateful server into a choice between termination and a further client interaction that threads state forward. The paper question is which actor behaviors admit a coexponential session type as their interface, expressed structurally and not as a dependent proposition.

This stage produces a candidate fragment description and, more usefully, a list of the facts the reconstruction would need to read off the graph. Each fact is a hypothesis about analyzer observability: “to place the fixpoint unfold, the analysis must observe the state-threading recursion in the actor loop,” “to orient the connective, the analysis must observe which endpoint holds the client pool,” and so on. The value of the paper stage is not a theorem that closes the question. It is a precise inventory of what the instrument in stage (b) would have to see. The paper stage is designed to run until it stops being able to predict observability, which is the point where it hands off.

Stage (b): implement a session-inference coeffect over the PSG for the statically resolvable fragment

Build a session-inference coeffect, narrow to start, over the PSG, reusing the graph machinery the wait-for analyzer already runs. The wait-for relation rides the joint-constraint axis of our Program Hypergraph, the axis that already carries region, lifetime, and actor-lifetime hyperedges, and the strongly-connected-component traversal over W is machinery the coexponential pass would reconstruct against the same axis with a different edge label and a richer node payload.

The coeffect would settle a candidate session structure during analysis, record it as PSG codata beside the dimension, grade, and escape class, and a node whose session the analysis cannot observe would be reported, not guessed, the error asking the source for an annotation. That sentence is lifted from the width-inference discipline on purpose: the design intent is that the coexponential coeffect inherits the same observability-bounded failure mode. The first implementation target is the statically resolvable fragment, the actors whose client interface and state thread are visible in the graph without value-carried indirection, because that is where the wait-for analyzer already succeeds and where the machinery is reusable with the least new risk.

This stage is where the instrument starts to exist, and where the inferable fragment starts to be defined by the running analyzer instead of by the paper guess. Architectural verbs govern this stage absolutely: the coeffect is designed to settle a candidate session, the traversal would reconstruct it from the graph, the pass admits as target the statically resolvable fragment. None of it infers anything as present fact until it is built and measured.

Stage (c): measure what fraction of real actor code falls in each ladder case

With a running analyzer, measure. Take real actor code, the framework’s own actors and external Clef actor code as it accumulates, and classify each actor’s interface into the ladder the wait-for analyzer’s WaitClass prefigures: cheaply inferable, inferable with developer assistance, or fallen back to the wait-for slice already in place. The measurement is the point of the whole exercise, and it is the part that cannot be done on paper.

This is the HelloArty calibration loop applied to inference instead of timing. HelloArty’s Layer 2 Vivado post-route WNS is the ground truth that back-annotates the Layer 1 structural heuristic: the ns_per_weight_unit constant for the Arty A7-100T binding, 1.6, was calibrated from a real post-route run where weighted depth 8 produced 12.635 ns. The Layer 1 heuristic does not get to assert where the real timing boundary is. The Layer 2 ground truth tells it. Here, the implemented analyzer is the ground truth, and what it tells us is where the real inferable boundary is. The fraction of real actors in each ladder case is the analogue of WNS: the empirical number that no amount of paper reasoning produces, and the number the rest of the plan calibrates against.

Stage (d): let the empirical fragment back-annotate the theory

Feed stage (c) back into stage (a). The measured fragment, the actors the running analyzer actually reconstructs, is the empirical inferable fragment, and it will differ from the paper candidate. Where the analyzer succeeds on actors the paper did not predict, the paper characterization was too narrow and the observability inventory missed a fact the graph in fact carries. Where the analyzer reports an annotation request on actors the paper called inferable, the paper characterization was too generous and assumed an observability the instrument does not have. Either way, the empirical fragment back-annotates the theory, and the next iteration of the paper characterization is calibrated to what the instrument can see, the same way the per-fabric ns_per_weight_unit constant gets refined as more designs are compiled across more device families.

This is the loop closing. The paper guides the instrument, the instrument measures the fragment, the fragment corrects the paper, and the corrected paper guides the next instrument. The coexponential inferable fragment is not a fixed target we are trying to hit. It is a moving boundary the loop is designed to locate, and locating it is the result.

The HelloArty calibration loop as the methodological template

The reason HelloArty is the template and not just a precedent is that it already solved the methodological problem this note faces: a structural property whose true value is only knowable downstream, with an upstream estimate that must be calibrated against the downstream ground truth. HelloArty’s upstream estimate is the Layer 1 depth heuristic. Its downstream ground truth is Layer 2 Vivado WNS. The constant that links them is calibrated, not assumed, and it is refined as ground-truth data accumulates.

Map the roles. The upstream estimate here is the paper characterization of the inferable fragment. The downstream ground truth is the implemented analyzer’s behavior on real code. The thing being calibrated is the fragment boundary itself. And the refinement-over-time property is the same: one actor corpus is the first calibration data point, the way HelloArty is the first calibration data point for the Arty binding, and the boundary sharpens as more corpora are measured. The methodological claim is that we should treat the inferable fragment the way we treat a per-fabric timing constant, as an empirical quantity calibrated against a running ground truth, not as a theorem we prove once and freeze.

One caution the HelloArty precedent also teaches: the Layer 1 heuristic is sound in a specific direction. It rounds ns_per_weight_unit conservatively, 1.58 up to 1.6, so it flags too early rather than too late. The coexponential analyzer should inherit a soundness direction in the same spirit, and the wait-for analyzer already shows which one: the may-wait relation over-approximates, so it never claims a guarantee an execution could violate. The session-inference coeffect should over-approximate toward “report and ask for an annotation,” never under-approximate toward a silently guessed session, so that the empirical fragment it reports is a sound under-statement of what is inferable rather than an unsound over-claim. A measured boundary that is too conservative is calibration data. A measured boundary that is too generous is a bug that ships a wrong session type, and that is the one failure mode the discipline is built to forbid.

Near-term experiments and what each would settle

These are concrete, each scoped to settle one question, and each leaning on machinery that already exists.

Experiment 1: reuse the wait-for graph traversal to extract state-threading recursion. Take the SCC traversal the wait-for analyzer runs over W and point it at the actor loop’s state thread instead of the suspension edges. The question it settles: can the existing graph machinery observe the fixpoint recursion that the coexponential’s unfold isomorphism corresponds to, on real actor loops, without new graph infrastructure? A yes means stage (b) reuses more than expected and the statically resolvable fragment is closer than the paper guess. A no localizes exactly which graph fact is missing, which is the first entry in the observability inventory that stage (d) would correct.

Experiment 2: classify a small corpus of real actors against the WaitClass ladder, by hand, before automation. Before the coeffect is built, hand-classify the framework’s own actors into cheaply inferable, assistance-needed, and wait-for-fallback. The question it settles: roughly what fraction of real actor code even has a coexponential-shaped interface, as opposed to a fire-and-forget Tell topology that carries no synchronous wait edge at all and needs no session reconstruction. If most real actors are fire-and-forget, the inferable-fragment question is smaller and lower-stakes than the paper framing suggests, and the wait-for floor already covers more of the ground than expected. This experiment is cheap and would re-scope the whole effort if the answer is surprising.

Experiment 3: run the width-inference observability failure on a contrived coexponential actor. Construct an actor whose session structure is determined by a value-carried callee, the Unresolved case the wait-for ladder already names, and confirm that the analyzer’s correct behavior is to report and request an annotation, not to guess. The question it settles: does the observability-bounded failure mode transfer cleanly from widths to sessions, producing an FPGA0001-style “range unobservable” error in the session register, an actionable diagnostic rather than a silent miss. This is the experiment that validates the soundness direction before any inference result is trusted.

Experiment 4: back-annotate one paper-predicted fragment with one measured result. Take a single fragment the stage (a) paper characterization predicts is inferable, run the stage (b) analyzer over the corpus members that fall in it, and record the agreement and disagreement. The question it settles: is the calibration loop of stage (d) actually closing, that is, does measured behavior diverge from predicted behavior in a way that refines the paper, the way HelloArty’s WNS refined the depth constant. One data point does not calibrate a constant, but it confirms the loop runs, which is the methodological precondition for everything after.

The honest risk register

The plan can fail to yield a useful inferable fragment, and the ways it can fail are worth naming plainly.

The fragment could be empirically tiny. The statically resolvable, coexponential-shaped, state-threading actors might be a small slice of real code, with most actors either fire-and-forget (no session to reconstruct) or value-carried (Unresolved, outside the static guarantee by construction). If experiment 2 returns that answer, the coexponential inference is a niche result, not a general one. What survives: the wait-for floor is the liveness guarantee for the synchronous fraction regardless, and that floor does not depend on any coexponential reconstruction succeeding.

The observability gap could be irreducible. The graph might not carry the facts the unfold isomorphism needs, and adding them might require analysis as expensive as the reconstruction itself, defeating the cheap-inference goal. If stage (b) finds that observability costs as much as the answer, the inferred-with-override posture collapses into annotate-or-fall-back. What survives: annotate-or-fall-back is still a coherent design, the same posture managed mutability and the wait-for ladder already take at their hard cases, and the developer who wants a session guarantee writes one while everyone else keeps the wait-for floor.

The soundness direction could prove hard to hold. If the analyzer cannot reliably distinguish “session observably absent” from “session present but unobserved,” it cannot safely report rather than guess, and the discipline’s one forbidden failure mode reappears. If experiment 3 shows the failure mode does not transfer cleanly, the inference is not shippable until it does. What survives: nothing wrong ships, because the floor below is the wait-for guarantee, which has its own established soundness posture (the may-wait over-approximation) and asserts no session type at all.

The calibration loop could fail to converge. Measured fragments across corpora might not stabilize, leaving no refinable boundary to name, the way an ns_per_weight_unit that varied wildly run to run would be useless. If stage (d) does not converge, there is no empirical fragment to back-annotate the theory with, and the result stays a per-corpus curiosity. What survives: per-corpus measurement is still honest data about specific systems, and the framework loses nothing it had before the experiment.

The through-line of the register is the same in every row. The fallback is not a degraded version of the coexponential inference. It is the wait-for floor that is already implemented and already the liveness guarantee, and it preserves deadlock freedom for the precisely drawn, common, statically resolvable fragment the wait-for analyzer already covers. The coexponential inference is an enrichment of the protocol layer above that floor, and if the enrichment yields little, the floor is exactly where it was. That is the property that makes building the instrument a safe way to find out: the downside of a null result is a null result, and the guarantee underneath does not move.

What this note commits to

The commitment is methodological, not a result. We commit to treating the coexponential inferable fragment as an empirical quantity to be located by a running analyzer, calibrated the way HelloArty calibrates a timing constant against post-route ground truth, with the paper characterization as the upstream estimate and the implemented analyzer as the downstream truth. We commit to the observability-bounded failure mode the width-inference coeffect already ships, where an unobservable session is reported and not guessed. And we commit to the wait-for floor as the guarantee that holds whether or not the enrichment succeeds. The Fidelity Framework becoming the expression mechanism that validates its own inference is not a detour around the question. It is the design we will keep building toward as the actor runtime matures and the inference boundary comes into focus, and building it is how we find out.