Program Semantic Graph

Program Semantic Graph

Status: Normative Last Updated: 2026-01-21 (Added Section 13: Enrichment, Section 14: Coeffect Analysis)

1. Overview

The Program Semantic Graph (PSG) is the unified intermediate representation produced by CCS (Clef Compiler Service). It carries semantic information from type checking through to code generation, preserving the meaning of F# programs in a form suitable for native compilation.

1.1 Architectural Heritage

The PSG draws on the nanopass tradition established in Standard ML and Scheme compiler research. Where traditional compilers make large, monolithic transformations between representations, nanopass architecture decomposes compilation into many small, single-purpose passes, each doing one thing well.

This heritage distinguishes Clef from the F# Compiler Services (FCS) design. FCS was built for .NET integration, where semantic information flows to the CLR runtime. The PSG instead preserves semantic information through to native code generation, carrying proofs about types, memory, and execution that the CLR would normally handle implicitly.

Key insight: Traversal information is itself semantic information. The classification of what executes during module initialization versus what constitutes a definition versus what serves as an entry point: these are semantic facts about the program that flow through the pipeline as first-class data.

1.2 Relationship to FCS

The PSG consumes both the syntax tree (SynExpr) and typed tree (FSharpExpr) from FCS:

FCS OutputPSG Usage
SynExprStructural skeleton, source ranges
FSharpExprResolved types, SRTP resolution, semantic edges

The typed tree overlay via zipper captures information that exists only after type inference and constraint solving, particularly SRTP (Statically Resolved Type Parameter) resolution, which the syntax tree cannot express.

2. SemanticGraph Structure

The SemanticGraph record is the complete result of PSG construction:

type SemanticGraph = {
    Nodes: Map<NodeId, SemanticNode>
    EntryPoints: NodeId list
    Modules: Map<ModulePath, NodeId list>
    Types: Lazy<Map<string, NodeId>>
    Platform: PlatformContext option
    ModuleClassifications: Lazy<Map<NodeId, ModuleClassification>>
}

2.1 Field Specifications

FieldTypeDescription
NodesMap<NodeId, SemanticNode>All semantic nodes indexed by unique identifier
EntryPointsNodeId listNodes marked as program entry points
ModulesMap<ModulePath, NodeId list>Module contents indexed by path
TypesLazy<Map<string, NodeId>>Type definitions, lazily indexed
PlatformPlatformContext optionTarget platform for NTU resolution
ModuleClassificationsLazy<Map<NodeId, ModuleClassification>>Module structure analysis, lazily computed

2.2 Lazy Coeffects

The Types and ModuleClassifications fields use the codata pattern: they are computed on first observation rather than eagerly during construction.

This design serves two purposes:

  1. Efficiency: Classification is only computed when Alex needs it
  2. Separation of concerns: CCS builds the graph; consumers decide what derived information they need

The lazy fields represent coeffects, that is, observations about the graph that can be computed from its structure but are not part of the primary construction.

3. SemanticNode Structure

Each node in the PSG carries rich semantic information:

type SemanticNode = {
    Id: NodeId
    Kind: SemanticKind
    Range: SourceRange
    Type: NativeType
    SRTPResolution: WitnessResolution option
    ArenaAffinity: ArenaAffinity
    LayoutHint: TypeLayout option
    Children: NodeId list
    Parent: NodeId option
    Metadata: Map<string, MetadataValue>
    IsReachable: bool
    EmissionStrategy: EmissionStrategy
}

3.1 Field Specifications

FieldTypeDescription
IdNodeIdUnique identifier within the graph
KindSemanticKindWhat construct this node represents
RangeSourceRangeSource location for diagnostics
TypeNativeTypeResolved type (attached during construction)
SRTPResolutionWitnessResolution optionResolved SRTP constraint, if any
ArenaAffinityArenaAffinityMemory region affinity
LayoutHintTypeLayout optionSize/alignment hints
ChildrenNodeId listStructural children
ParentNodeId optionParent for navigation
MetadataMap<string, MetadataValue>Typed extensible metadata
IsReachableboolSoft-delete marker for reachability
EmissionStrategyEmissionStrategyHow to emit during code generation

3.2 Type Attachment

NORMATIVE: Types SHALL be attached to nodes during PSG construction, not post-hoc.

This ensures that every node carries its resolved type from the moment of creation, enabling downstream passes to make decisions based on complete type information.

3.3 SRTP Resolution

The SRTPResolution field captures the compile-time resolution of statically resolved type parameters:

type WitnessResolution = {
    Operator: string              // e.g., "$", "+"
    ArgType: NativeType           // The type on which it's resolved
    ResolvedMember: string        // Fully qualified member name
    Implementation: WitnessImplementation
}

This information comes from the typed tree (FSharpExpr.TraitCall) and is not available in the syntax tree. The PSG captures it during the typed tree overlay phase.

4. SemanticKind

The SemanticKind discriminated union classifies each node. Key categories include:

4.1 Bindings and Control

KindDescription
BindingLet binding with mutability and entry point flags
LambdaFunction with parameters, captures, and context
PatternBindingVariable bound by pattern matching

4.2 Core Expressions

KindDescription
LiteralConstant value
VarRefReference to a binding
ApplicationFunction application
TypeAnnotationExplicit type annotation

4.3 Control Flow

KindDescription
MatchPattern matching expression
IfThenElseConditional expression
WhileLoopWhile loop
ForLoopNumeric for loop
ForEachCollection iteration
SequentialSequential composition
TryWithException handling
TryFinallyCleanup handler

4.4 Data Structures

KindDescription
TupleExprTuple construction
ArrayExprArray literal
ListExprList literal
RecordExprRecord construction
UnionCaseUnion case construction

4.5 Compiler Features

KindDescription
IntrinsicCCS intrinsic operation
TraitCallSRTP trait invocation
PlatformBindingPlatform-specific binding
LazyExprLazy computation (PRD-14)
SeqExprSequence expression (PRD-15)

5. Emission Strategy

The EmissionStrategy field tells Alex how to emit each node:

type EmissionStrategy =
    | Inline
    | SeparateFunction of captureCount: int
    | MainPrologue
StrategyDescription
InlineEmit inline at point of use
SeparateFunctionEmit as separate function (lambdas, seq expressions)
MainPrologueEmit in main function prologue (module initialization)

NORMATIVE: Emission strategy SHALL be determined by CCS during construction. Alex SHALL observe this strategy without recomputing it.

This architectural decision moves traversal logic upstream to CCS, where semantic context is fully available.

6. Module Classification

Module classification analyzes the structure of each module:

type ModuleClassification = {
    Name: string
    ModuleInit: NodeId list
    Definitions: NodeId list
    EntryPoint: NodeId option
}
FieldDescription
NameModule name
ModuleInitBindings that execute at module initialization
DefinitionsRegular definitions (functions, types)
EntryPointThe module’s entry point, if any

6.1 Classification Algorithm

Module classification is computed from EmissionStrategy:

  1. Nodes with MainPrologue strategy → ModuleInit
  2. Nodes with Inline or SeparateFunction strategy → Definitions
  3. Bindings marked isEntryPointEntryPoint

This classification is semantic traversal information: it describes how the program executes, not just what it contains.

7. Intrinsic Metadata

Intrinsic operations carry structured metadata:

type IntrinsicInfo = {
    Module: IntrinsicModule
    Operation: string
    Category: IntrinsicCategory
    FullName: string
}

NORMATIVE: Intrinsic dispatch SHALL use structured IntrinsicInfo, not string matching on symbol names.

This ensures that Alex can dispatch on intrinsics without knowledge of namespace structure or symbol naming conventions.

7.1 Intrinsic Modules

ModuleDescriptionProvider
SysSystem callsCompiler
NativePtrPointer operationsCompiler
NativeStrString operationsCompiler
ArrayArray operationsCompiler
MathMathematical functionsCompiler
LazyLazy computation (PRD-14)Compiler
SeqSequences (PRD-15)Compiler
Signal, Effect, Memo, BatchReactive primitivesLibrary

The distinction between compiler-provided and library-backed intrinsics is significant: compiler-provided intrinsics have implementations generated by Alex, while library-backed intrinsics dispatch to native library functions.

8. Reachability

8.1 Soft-Delete Model

NORMATIVE: Unreachable nodes SHALL be marked via IsReachable = false, not physically removed.

This preserves the graph structure for:

  • Typed tree zipper navigation
  • Diagnostic reporting
  • Debugging and inspection

8.2 Reachability Computation

Reachability flows from entry points through structural and semantic edges:

  1. Entry points are reachable
  2. Children of reachable nodes are reachable
  3. Definitions referenced by reachable VarRef nodes are reachable
  4. Type definitions used by reachable nodes are reachable

9. Platform Context

The platform context carries target-specific information. Width dimensions are resolved via a Dimensions map rather than discrete WordSize/PointerSize fields, aligning with the parameterized width model in NTU types (see ntu-types.md):

type PlatformContext = {
    PlatformId: string
    /// Width dimension resolutions (bits)
    Dimensions: Map<WidthDimension, int>  // Pointer → 64, Register → 64, etc.
    PointerAlign: int
    PlatformLibraryPath: string option
    Predicates: Map<PlatformPredicate, bool>
    FreestandingStartup: FreestandingStartup option
}

module PlatformContext =
    /// Resolve an NTUWidth to concrete bits
    let resolveWidth (ctx: PlatformContext) (width: NTUWidth) : int =
        match width with
        | NTUWidth.Fixed bits -> bits
        | NTUWidth.Resolved dim -> ctx.Dimensions.[dim]

    /// Convenience: pointer width in bytes
    let pointerSize (ctx: PlatformContext) = ctx.Dimensions.[WidthDimension.Pointer] / 8

    /// Convenience: machine word width in bytes
    let wordSize (ctx: PlatformContext) = ctx.Dimensions.[WidthDimension.Register] / 8

This information flows from the project file through CCS to the PSG, enabling platform-aware type resolution and code generation. Alex uses resolveWidth to determine concrete MLIR types for Resolved widths.

10. Invariants

The following invariants SHALL hold for any valid PSG:

  1. Node identity: Each NodeId maps to exactly one SemanticNode
  2. Parent consistency: If node A lists B as parent, B lists A as child
  3. Type completeness: Every node has a resolved NativeType
  4. Reachability closure: If A is reachable and references B, B is reachable
  5. Entry point marking: Entry points are marked in both EntryPoints list and node Kind
  6. Emission strategy presence: Every node has an EmissionStrategy

11. Traversal

The PSG supports multiple traversal patterns:

PatternUse Case
foldWithLambdaPreBindFollow semantic dependencies (definitions before uses)
foldWithSCFRegionsRespect SCF region boundaries for control flow
Pre-order structuralVisit parents before children
Post-order structuralVisit children before parents

Alex uses these traversal patterns to generate MLIR in the correct order, respecting both structural and semantic dependencies.

12. PSG Saturation

12.1 The Saturation Principle

NORMATIVE: CCS SHALL saturate the PSG with all semantic structure required for compilation, including synthetic constructs not directly expressed in source code.

The term saturation refers to making the PSG semantically complete, containing all information needed for downstream compilation without requiring structure synthesis during code generation.

12.2 Motivation

Certain F# constructs require runtime structures that don’t appear directly in source code:

Source ConstructSynthetic StructureLocation in PSG
seq { }MoveNext state machineSeqStateMachine nodes
async { }Continuation state machine(Future) AsyncStateMachine nodes
Pattern matchingDecision tree / switchMatch decomposition

Without saturation, code generators must synthesize these structures during emission, leading to:

  • Mutable state tracking during emission
  • Structure-building logic in the wrong architectural layer
  • Coupling between emission and semantic analysis
  • SSA allocation during emission rather than in nanopasses

12.3 Architectural Separation

The saturation principle enforces clean layer separation:

LayerResponsibilityNOT Responsible For
CCS/PSGSaturationBuild all semantic structureTarget-specific details
PSGElaborationAdd target-specific coeffects (SSA, platform bindings)Semantic structure
Alex/CCSTransferWitness and emitStructure building, SSA allocation

Key insight: Structure building is a semantic concern. CCS knows F# semantics. Code generators should witness structure, not build it.

12.4 Saturation Pass Architecture

PSG construction flows through saturation phases:

Phase 1: Structural Construction    SynExpr → PSG with basic nodes
Phase 2: Symbol Correlation         + FSharpSymbol attachments
Phase 3: Reachability Analysis      + IsReachable marks
Phase 4: Typed Tree Overlay         + Types, SRTP resolution
Phase 5: PSG Saturation             + Synthetic structures (MoveNext, etc.)

The saturation phase (Phase 5) transforms high-level semantic constructs into explicit operational structure:

SeqExpr(body, captures)
    ↓ PSG Saturation
SeqStateMachine {
    StateVariables: [...]
    Captures: [...]
    InitBlock: NodeId
    CheckBlock: NodeId
    YieldBlock: NodeId
    PostYieldBlock: NodeId
    DoneBlock: NodeId
}

12.5 Sequence Expression Saturation

For seq { } expressions, saturation produces explicit state machine structure:

Input (pre-saturation):

SeqExpr of body: NodeId * captures: CaptureInfo list

Output (post-saturation):

SeqStateMachine of {
    BodyKind: SeqBodyKind           // Sequential or WhileBased
    StateVariables: StateVar list   // Internal mutable state
    Captures: CaptureInfo list      // Captured values
    InitBlock: BlockSpec            // State 0 initialization
    CheckBlock: BlockSpec           // While condition evaluation
    YieldBlock: BlockSpec           // Value production
    PostYieldBlock: BlockSpec       // Post-yield state transitions
    ConditionalYield: ConditionSpec option  // If-guarded yield
}

This explicit structure enables:

  • SSAAssignment to walk block nodes and assign SSAs normally
  • CCSTransfer to witness structure without synthesis
  • Clear separation between semantic analysis and code generation

12.6 Comparison to F# Compiler

Clef’s saturation phase parallels LowerSequenceExpressions.fs in the F# compiler:

F# CompilerClefPurpose
CheckSequenceExpressions.fsPSG BuilderInitial semantic structure
LowerSequenceExpressions.fsPSG SaturationState machine elaboration
IlxGen.fsCCSTransferTarget code emission

The key insight from nanopass architecture: saturation happens at the language level (PSG), not in code generation.

12.7 Invariants for Saturated PSG

After saturation, the following additional invariants SHALL hold:

  1. Synthetic completeness: All constructs requiring runtime structure have explicit nodes
  2. Block structure: State machine blocks are explicit, traversable nodes
  3. No deferred synthesis: Code generators need not build structure during emission
  4. SSA assignability: All nodes can have SSAs assigned by walking structure

13. Enrichment

13.1 Terminology

Enrichment is the parent concept encompassing two related processes:

TermOriginDescription
ElaborationPLTMaking implicit program structure explicit
SaturationFidelityFilling the PSG with all information needed for lowering

Both processes synthesize PSG nodes that do not appear directly in source code.

13.2 Enrichment Categories

Two categories of enrichment are defined, distinguished by metadata:

CategoryElaboration.KindPurpose
Intrinsic Elaboration"Intrinsic"Implement intrinsic operation semantics
Baker Saturation"Baker"Decompose language features to primitives

13.2.1 Intrinsic Elaboration

Intrinsic elaboration synthesizes PSG structure to implement the semantics of intrinsic operations.

Example: Console.write "Hello" may elaborate to:

  • String length extraction nodes
  • Buffer pointer calculation nodes
  • System call invocation nodes

13.2.2 Baker Saturation

Baker saturation decomposes higher-order language constructs into primitive operations:

Source ConstructBaker Decomposition
List.map f xsRecursion using isEmpty, head, tail, cons
seq { ... }State machine structure (§12.5)
lazy exprThunk structure with memoization

13.3 Enrichment Metadata

NORMATIVE: Enriched nodes SHALL carry metadata identifying their origin.

module ElaborationMetadata =
    /// What kind of enrichment: "Intrinsic" or "Baker"
    [<Literal>]
    let Kind = "Elaboration.Kind"

    /// What construct triggered enrichment (e.g., "List.map", "Console.write")
    [<Literal>]
    let For = "Elaboration.For"

    /// Links related nodes from the same enrichment expansion
    [<Literal>]
    let Id = "Elaboration.Id"

13.4 Source-Based vs Enriched Nodes

Node TypeMetadataMeaning
Source-basedNoneDirect from user’s source code
Intrinsic-elaboratedKind = "Intrinsic"Synthesized for intrinsic implementation
Baker-saturatedKind = "Baker"Synthesized for language feature decomposition

NORMATIVE: The absence of Elaboration.Kind metadata SHALL indicate a source-based node.

13.5 Expansion ID Linking

When enrichment creates multiple related nodes, they share an Elaboration.Id:

Node 42: Kind="Baker", For="List.map", Id=7
Node 43: Kind="Baker", For="List.map", Id=7   ← Same expansion
Node 44: Kind="Baker", For="List.map", Id=7   ← Same expansion
Node 45: (no metadata)                         ← Source-based

This enables:

  • Grouping related enriched nodes in tooling
  • “Pierce the veil” debugging (showing source vs synthesized)
  • Verification that enrichment preserves semantics

14. Coeffect Analysis

14.1 Definition

Coeffect analysis computes metadata about PSG structure to inform lowering decisions. Unlike enrichment, coeffect analysis does NOT create new PSG nodes; it computes mappings, indices, and tables.

AspectEnrichmentCoeffect Analysis
Creates nodesYesNo
Modifies PSG structureYesNo
OutputEnriched PSGMetadata/indices
PurposeImplement semanticsInform lowering

14.2 Pipeline Position

NORMATIVE: Coeffect analysis SHALL execute AFTER enrichment completes.

PSG Construction → Enrichment → Coeffect Analysis → Lowering

This ordering is required because analyses must see ALL nodes, including those synthesized by enrichment.

14.3 Standard Coeffect Analyses

14.3.1 SSA Assignment

SSA (Static Single Assignment) analysis computes unique names for each binding:

type SSAAssignment = {
    NodeToSSA: Map<NodeId, SSA>
    BindingVersions: Map<string, int>
}
  • Each binding definition receives a unique SSA name
  • Mutable bindings receive versioned names (x_0, x_1, …)

14.3.2 Mutability Analysis

Mutability analysis identifies mutable state patterns:

type MutabilityAnalysis = {
    AddressedMutableBindings: Set<NodeId>
    ModifiedVarsInLoopBodies: Set<NodeId>
    ModuleLevelMutableBindings: (string * NodeId * NodeId) list
}

14.3.3 Yield State Analysis

For sequence expressions, yield state analysis computes state machine layout:

type YieldStateAnalysis = {
    SeqExprId: NodeId
    NumYields: int
    BodyKind: SeqBodyKind
    Yields: (NodeId * int * NodeId) list
    InternalState: (string * NodeId * int) list
}

14.3.4 Pattern Binding Analysis

For match expressions, pattern binding analysis computes binding scopes:

type PatternBindingAnalysis = {
    EntryPatternBindings: (NodeId * string * NativeType) list
    CasePatternBindings: Map<int, (NodeId * string * NativeType) list>
}

14.4 Demand-Driven Computation

NORMATIVE: Coeffect analyses SHOULD be demand-driven.

PSG ContainsRequired Analysis
Seq expressionsYield state analysis
Mutable bindingsMutability analysis
Match expressionsPattern binding analysis
Any bindingsSSA assignment

This principle (“only pay for what you use”) ensures compilation efficiency.

14.5 The Control-Flow / Dataflow Pivot

Coeffect analysis enables the control-flow / dataflow pivot:

  • F# source: Declarative, compositional (dataflow style)
  • Native code: Imperative, sequential (control-flow oriented)

Coeffect analysis provides metadata enabling the compiler to pivot between representations while preserving semantics.

Example: With yield state analysis, a seq expression can lower to:

  • State machine (control-flow emphasis)
  • Vectorized operation (dataflow emphasis, if applicable)

See Also