Clef Type Universe Specification

Clef Type Universe Specification

Status: Draft Phase: A (Type Universe) - Part of Clef principled type system redesign Last Updated: 2024-12-30

Overview

This document specifies the native type universe for Clef, the F# native compiler. The design follows ML/OCaml foundations while preserving familiar F# developer experience.

Core Principles

  1. Familiar Design-Time Experience: Use F# type names (string, option, int), not foreign alternatives
  2. Absolute Null-Freedom: Everything is voption<'T> - no null values anywhere
  3. Null-Freedom Cascades Through APIs: Sentinel values become voption returns
  4. Leverage Existing F# Machinery: Reuse FSharp.Core where possible
  5. Spec Before Scaffold: Document before implementing

Relationship to Other Documents

  • ccs-specification.md: Defines Clef Compiler Service, including:
    • SRTP resolution against native witness hierarchy (Part 3)
    • Platform bindings convention (Part 6)
    • Memory region enforcement rules (Parts 9-10)
    • Native-specific diagnostics (Appendix D)
  • beyond-fslang-spec.md: Extensions beyond standard F# semantics
  • fsharp-features-in-fidelity.md: Feature coverage matrix

Part 1: Foundational Principles

1.1 Type Universe Axioms

The native type universe is built on three primitive concepts from ML/OCaml:

  1. Products (tuples, records) - Combine multiple values into one
  2. Sums (discriminated unions) - Choose between alternatives
  3. Functions - Transform values

Everything else is derived from these primitives.

1.2 Memory Guarantees

  • No implicit boxing: Value types remain on stack unless explicitly requested
  • Deterministic layout: Memory representation is predictable and specified
  • No null: All types are non-nullable by construction

1.3 Type Equivalences

User WritesUnderlying ImplementationNotes
option<'T>voption<'T>Stack-allocated, non-null
stringUTF-8 fat pointer{ptr: *u8, len: usize}
intPlatform wordnativeint semantics

Part 2: Primitive Types

CCS Resolution: See ccs-specification.md Part 1.1 for how CCS resolves these types at compile-time.

OCaml Provenance: Primitives follow OCaml’s value-oriented representation (unboxed by default) while eliminating GC-oriented overhead. See Appendix E for detailed provenance analysis.

2.1 Unit

type unit = ()
PropertyValueNotes
MemoryZero-sized (ZST)No runtime representation
Alignment1 byteTrivially aligned
MLIR(elided)Not materialized in generated code

OCaml Provenance: Identical to OCaml’s unit - the canonical “no information” type. In OCaml, () shares representation with [] (empty list) as the integer 0. In Clef, unit is truly zero-sized - not even allocated.

Usage:

let doSideEffect () : unit = Console.WriteLine "Hello"
let ignoredResult = ignore 42  // : unit

2.2 Boolean

type bool = true | false
PropertyValueNotes
Memory1 byteNot 1 bit - for alignment
Representationi8false = 0, true = 1
Alignment1 byteNatural alignment
MLIRi1 or i8Context-dependent

OCaml Provenance: OCaml represents booleans as unboxed integers (false = 0, true = 1). Clef preserves this representation but uses a full byte for alignment efficiency in arrays and structs.

Design Decision: Booleans occupy 1 byte rather than 1 bit because:

  1. Sub-byte addressing is inefficient on modern hardware
  2. Array indexing requires byte-addressable elements
  3. Cache line efficiency favors byte alignment

No Boolean Coercion:

// ILLEGAL in Clef - no implicit int-to-bool
let x = if 1 then "yes" else "no"  // Error: expected bool, got int

// LEGAL - explicit comparison
let x = if 1 <> 0 then "yes" else "no"

2.3 Integer Family

TypeF# NameSizeSignedMLIR TypeOCaml Equivalent
Platform intintwordYesindexint (but 63-bit)
Platform uintuintwordNoindexN/A
8-bitint8, uint81 byteYes/Noi8N/A
16-bitint16, uint162 bytesYes/Noi16N/A
32-bitint32, uint324 bytesYes/Noi32int32
64-bitint64, uint648 bytesYes/Noi64int64
Nativenativeint, unativeintwordYes/Noindexnativeint

Design Decisions:

  1. int = platform word: Follows native compilation conventions (Rust, C), not F#’s 32-bit default. Array indexing and pointer arithmetic use platform word naturally. This aligns with Rust’s isize/usize philosophy.

  2. No GC tagging overhead: Unlike OCaml’s 63-bit tagged integers (which reserve 1 bit for runtime GC discrimination), Clef integers use full precision. Compile-time type safety eliminates the need for runtime type tags.

    Systemint precisionTag overhead
    OCaml (64-bit)63 bits1 bit for GC
    F# (.NET)32 bitsNone (boxed separately)
    Clef64 bits (full word)None
  3. int and nativeint are synonyms: Both map to MLIR index type. This differs from F# where int is always 32-bit.

  4. Fixed-width types for interop: int32, int64 provide explicit sizing for FFI and serialization. These match OCaml’s Int32.t and Int64.t.

Alignment and Cache Considerations:

TypeAlignmentCache Line Fit (64 bytes)
int81 byte64 values
int162 bytes32 values
int324 bytes16 values
int648 bytes8 values
int (64-bit)8 bytes8 values

For cache-critical code, arrays of smaller integer types pack more values per cache line.

Overflow Behavior:

ContextBehavior
DefaultWrapping (two’s complement)
Checked moduleReturns voption on overflow
Debug buildsOptional overflow traps
// Default: wrapping arithmetic
let wrapped = System.Int32.MaxValue + 1  // Wraps to MinValue

// Checked: explicit overflow handling
let checked = Checked.add System.Int32.MaxValue 1  // voption.None

2.4 Floating Point Family

TypeF# NameSizeMLIR TypeIEEE 754
Singlefloat32, single4 bytesf32binary32
Doublefloat, double8 bytesf64binary64

OCaml Provenance: OCaml’s float is always 64-bit (boxed in most contexts). Clef follows this default but provides float32 for memory-constrained scenarios.

Design Decisions:

  1. float = 64-bit: Matches OCaml and F# convention. Double precision is the default for scientific computing.

  2. IEEE 754 compliance: All floating-point operations follow IEEE 754 semantics including NaN propagation, infinities, and signed zeros.

  3. No implicit float-int conversion:

    let x : float = 42    // Error: expected float, got int
    let x : float = 42.0  // OK
    let x : float = float 42  // OK - explicit conversion

Alignment:

TypeAlignmentNotes
float324 bytesNatural alignment
float648 bytesNatural alignment

Special Values:

let inf = infinity        // Positive infinity
let ninf = -infinity      // Negative infinity
let nan = nan             // Not a Number
let isNan x = x <> x      // NaN property: NaN  NaN

2.5 Character

type char = (* Unicode scalar value *)
PropertyValueNotes
Memory4 bytesFull Unicode scalar value
Representationi32UTF-32 codepoint (U+0000 to U+10FFFF, excluding surrogates)
Alignment4 bytesNatural alignment
MLIRi32Direct mapping

Design Decision (RESOLVED): Characters are UTF-32 codepoints (4 bytes), not UTF-16 code units.

Rationale:

  1. String encoding is UTF-8: Clef strings are UTF-8 fat pointers (see Part 4.1)
  2. Iteration yields codepoints: When iterating over a UTF-8 string, each char is a decoded Unicode scalar value
  3. No surrogate pairs: Unlike UTF-16, a single char always represents a complete character
  4. Consistency with Rust: Rust’s char is also a 32-bit Unicode scalar value

OCaml Divergence: OCaml’s char is a single byte (Latin-1 only, 0-255). Clef explicitly supports full Unicode.

String-Character Interaction:

let s = "Hello, 世界!"  // UTF-8 string

// Iteration yields Unicode codepoints
for c in String.chars s do
    printfn "U+%04X" (int c)  // U+0048, U+0065, ..., U+4E16, U+754C, U+0021

// Indexing by codepoint (not byte offset)
let c = String.charAt 7 s  // voption.Some '世' (U+4E16)

// Byte length vs character length
String.byteLength s   // 15 bytes (UTF-8 encoded)
String.charLength s   // 10 characters

Invalid Codepoints: Surrogate codepoints (U+D800 to U+DFFF) are not valid char values. Attempting to construct such values results in a compile-time or runtime error.


Part 3: Structural Types

Foundation: All composite types are built from products (tuples, records) and sums (discriminated unions). This follows ML/OCaml tradition.

OCaml Provenance: OCaml’s structural assembly semantics are preserved - products are contiguous, sums are tagged. However, OCaml’s runtime block headers (for GC) are eliminated. See Appendix E.

3.1 Tuples (Anonymous Products)

let pair : int * string = (42, "hello")
let triple : int * string * float = (1, "x", 3.14)

Memory Layout:

Tuple: int * string (64-bit platform)
┌─────────────┬─────────────────────────────┐
│ int (word)  │ string (ptr + len = 2 words)│
└─────────────┴─────────────────────────────┘
     8 bytes           16 bytes              = 24 bytes total
PropertyValue
Structural typingint * stringint * string regardless of context
AllocationStack by default, arena when escaping
AlignmentNatural alignment (largest field alignment)
Nesting(a * b) * ca * (b * c) (different memory layouts)
MLIRtuple<index, !fidelity.str>

OCaml Comparison:

AspectOCamlClef
Header8-byte block header (GC info)None
FieldsWord-sized slotsNaturally aligned
BoxingFloats often boxedNever boxed

Design Decision: Tuples are unboxed - no heap allocation, no indirection, no headers. Fields are laid out contiguously with natural alignment.

Padding and Alignment Rules:

// This tuple has internal padding for alignment
let t : int8 * int64 * int8 = (1y, 100L, 2y)
Memory Layout (with padding):
┌────────┬─────────┬──────────┬────────┬─────────┐
│ int8   │ padding │ int64    │ int8   │ padding │
└────────┴─────────┴──────────┴────────┴─────────┘
  1 byte   7 bytes   8 bytes    1 byte   7 bytes  = 24 bytes

Cache Considerations:

Tuple SizeCache Lines (64 bytes)Notes
≤ 64 bytes1Single cache line - optimal
65-128 bytes2May span two lines
> 128 bytes3+Consider record with explicit layout

3.2 Records (Named Products)

type Person = { Name: string; Age: int }
type Point = { X: float; Y: float }

Memory Layout:

Record: Person (64-bit platform)
┌─────────────────────────────┬─────────────┐
│ Name: string (2 words)      │ Age: int    │
└─────────────────────────────┴─────────────┘
           16 bytes               8 bytes    = 24 bytes total
PropertyValue
Nominal typingPerson{ Name: string; Age: int } (different types)
Field orderDeclaration order determines memory layout
AllocationStack by default, arena when escaping
AlignmentNatural alignment per field, struct alignment = max field alignment
MLIR!fidelity.record<"Person", name: !fidelity.str, age: index>

OCaml Comparison:

AspectOCamlClef
Header8-byte block headerNone
Field accessOffset from headerDirect offset
Field orderDeclaration orderDeclaration order

Mutable Fields:

type MutablePoint = { mutable X: float; mutable Y: float }
PropertyMutableImmutable
Memory layoutIdenticalIdentical
Compile-timeAllows <- assignmentDisallows <-
Thread safetyRequires synchronizationNaturally thread-safe

Cache-Aware Record Design:

For high-performance scenarios, consider field ordering for cache efficiency:

// HOT fields together - accessed frequently
type OptimizedActor = {
    // Hot path - first cache line (64 bytes)
    MessageCount: int64          // 8 bytes, offset 0
    LastProcessed: int64         // 8 bytes, offset 8
    State: ActorState            // 16 bytes, offset 16
    mutable Flags: uint32        // 4 bytes, offset 32
    // padding: 28 bytes to fill cache line
    
    // Cold path - second cache line
    Name: string                 // 16 bytes, offset 64
    CreatedAt: int64             // 8 bytes, offset 80
    Config: ActorConfig          // varies
}

False Sharing Prevention:

When mutable fields are accessed from multiple threads, place them on separate cache lines:

[<Struct>]
type ThreadCounters = {
    [<CacheLineAligned>]  // Attribute hint to compiler
    mutable Counter1: int64
    
    [<CacheLineAligned>]  // Separate cache line
    mutable Counter2: int64
}

3.3 Discriminated Unions (Named Sums)

type Shape =
    | Circle of radius: float
    | Rectangle of width: float * height: float
    | Point  // No payload

Memory Layout:

Union: Shape (64-bit platform)
┌──────────┬─────────┬────────────────────────────────────┐
│ Tag (i8) │ padding │ Payload (size of largest variant)  │
└──────────┴─────────┴────────────────────────────────────┘
   1 byte    7 bytes              16 bytes                 = 24 bytes total
PropertyValue
Tag sizei8 for ≤256 variants, i16 for ≤65536
Payload sizeSize of largest variant (all variants same size)
Payload alignmentMax alignment of any variant’s fields
Total alignmentMax(tag alignment, payload alignment)
MLIR!fidelity.union<"Shape", tag: i8, payload: ...>

OCaml Comparison:

AspectOCamlClef
No-arg constructorsUnboxed integer (0, 1, 2…)Tag byte only
With-arg constructorsBlock with tag byte in headerTag + payload
Tag range0-245 (246-255 reserved)0-255 (full i8)
Block header8 bytes (includes tag)None

Tag Assignment (declaration order):

type Color = Red | Green | Blue
// Red = 0, Green = 1, Blue = 2

type Result<'T, 'E> = Ok of 'T | Error of 'E
// Ok = 0, Error = 1

Variant-Specific Layouts:

type Message =
    | Ping                              // Tag only: 1 byte + padding
    | Data of payload: array<byte>      // Tag + fat pointer: 1 + 7 + 16 = 24 bytes
    | Error of code: int * msg: string  // Tag + int + string: 1 + 7 + 8 + 16 = 32 bytes
Union: Message (largest variant determines size)
┌──────────┬─────────┬────────────────────────────────────┐
│ Tag (i8) │ padding │ 32 bytes (Error variant size)      │
└──────────┴─────────┴────────────────────────────────────┘
= 40 bytes total (all variants padded to this size)

Single-Case Unions (newtypes):

type UserId = UserId of int
type Email = Email of string
PropertyValue
OptimizationTag elided - same representation as wrapped type
Runtime sizesizeof(int) for UserId, sizeof(string) for Email
Type safetyCompile-time distinction, zero runtime overhead
UserId = int (no wrapping, no tag)
┌─────────────┐
│ int (word)  │
└─────────────┘
    8 bytes

Recursive Unions:

type List<'T> = Nil | Cons of head: 'T * tail: List<'T>
type Tree<'T> = Leaf of 'T | Node of left: Tree<'T> * value: 'T * right: Tree<'T>
PropertyValue
RecursionPointer indirection for recursive field
Nil/LeafMay be optimized to null pointer (special case)
Cons/NodeAllocated in arena or stack
Cons cell: List<int>
┌──────────┬─────────┬──────────┬────────────────────┐
│ Tag (i8) │ padding │ head: T  │ tail: ptr<List<T>> │
└──────────┴─────────┴──────────┴────────────────────┘
   1 byte    7 bytes   8 bytes        8 bytes         = 24 bytes

Pattern Matching Compilation:

Pattern matching on DUs compiles to efficient branch tables:

match shape with
| Circle r -> computeCircleArea r
| Rectangle (w, h) -> w * h
| Point -> 0.0

Compiles to:

switch (shape.tag) {
    case 0: goto circle_branch;   // Circle
    case 1: goto rect_branch;     // Rectangle
    case 2: goto point_branch;    // Point
}

Exhaustiveness: Pattern matching is verified exhaustive at compile time. Missing cases produce warnings/errors.


Part 4: Reference Types

Principle: Reference types use fat pointers (pointer + length). No null pointers - empty is represented by length 0.

CCS Resolution: See ccs-specification.md Parts 1.2-1.4 for compiler-level type resolution.

4.1 String

let greeting : string = "Hello, World!"
let empty : string = ""

Memory Layout (UTF-8 fat pointer):

string
┌─────────────────┬─────────────────┐
│ ptr: *u8        │ len: usize      │
└─────────────────┴─────────────────┘
     8 bytes           8 bytes       = 16 bytes (64-bit)
PropertyValue
EncodingUTF-8 (NOT UTF-16)
Length semanticsByte count, not character count
Empty string{ ptr: valid, len: 0 } - NOT null
MLIR!fidelity.str or tuple<ptr<i8>, index>

Why UTF-8?

  1. Native interop - Rust, C, and most systems APIs use UTF-8
  2. Compact - 1 byte per ASCII character
  3. Web/JSON native - no transcoding overhead
  4. Embedded-friendly - no UTF-16 surrogate handling

Character Iteration (codepoints, not bytes):

// Iterating over Unicode scalar values
for c in String.chars s do
    printfn "%c" c  // c : char (UTF-32 codepoint, 4 bytes)

Zero-Copy Slicing: The fat pointer representation (pointer + length) enables zero-copy string slicing - substrings reference the same underlying bytes with adjusted pointer/length.

See: Appendix E for encoding comparison with OCaml (Latin-1) and .NET (UTF-16).

API Changes (Null-Freedom Cascades):

BCL PatternClef PatternRationale
s.IndexOf(c)-1String.indexOf c svoption<int>No sentinel values
s.Substring(i, len) throwsString.slice i len svoption<string>No exceptions
s.[i] throwsString.tryItem i svoption<char>Bounds-safe
s.Split(...)string[]String.split ... sarray<string>Native array
String.IsNullOrEmpty(s)String.isEmpty sNo null possible

4.2 Array

let numbers : array<int> = [| 1; 2; 3; 4; 5 |]
let empty : array<int> = [| |]

Memory Layout (fat pointer):

array<'T>
┌─────────────────┬─────────────────┐
│ ptr: *T         │ len: usize      │
└─────────────────┴─────────────────┘
     8 bytes           8 bytes       = 16 bytes (header)
                                     + len * sizeof<'T> (elements)
PropertyValue
Element layoutContiguous, naturally aligned
Bounds checkingAlways (no unsafe indexing by default)
Empty array{ ptr: valid, len: 0 } - NOT null
MLIR!fidelity.array<T> or tuple<ptr<T>, index>

Monomorphized Layout: Unlike uniform representations that box generic elements, Clef arrays are monomorphized - array<int> stores unboxed integers contiguously. Sequential access is cache-optimal (8 int64 or 16 int32 values per 64-byte cache line).

Fixed Size After Creation:

let arr = Array.create 10 0  // 10 elements, all 0
// arr.Length is immutable - no resizing

API Changes (Null-Freedom Cascades):

BCL PatternClef Pattern
arr.[i] throwsArray.tryItem i arrvoption<'T>
Array.find pred arr throwsArray.tryFind pred arrvoption<'T>
Array.head arr throwsArray.tryHead arrvoption<'T>

Array Module Intrinsics (CCS Layer 1):

These operations are fundamental to the array type and are emitted directly by CCS. They cannot be expressed in pure F# because they require memory allocation and element size knowledge.

FunctionSignatureDescription
Array.zeroCreateint -> array<'T>Allocate n elements, zero-initialized
Array.createint -> 'T -> array<'T>Allocate n elements, all set to value
Array.initint -> (int -> 'T) -> array<'T>Allocate n elements, initialized by function
Array.copyarray<'T> -> array<'T>Create a copy of the array
Array.lengtharray<'T> -> intReturn the length of the array
Array.getarray<'T> -> int -> 'TGet element at index (bounds-checked)
Array.setarray<'T> -> int -> 'T -> unitSet element at index (bounds-checked)
Array.tryItemint -> array<'T> -> voption<'T>Safe indexed access returning voption
Array.isEmptyarray<'T> -> boolReturn true if array has zero length

Allocation Semantics: Arrays are allocated in the current memory region (stack arena or actor arena). The allocation strategy is determined by the memory region context, not by the Array function.

4.3 Span and ReadOnlySpan

let span : Span<int> = Span(arr, 2, 3)  // View into arr[2..4]
let roSpan : ReadOnlySpan<byte> = ReadOnlySpan(bytes)

Memory Layout (borrowed view):

Span<'T>
┌─────────────────┬─────────────────┐
│ ptr: *T         │ len: usize      │
└─────────────────┴─────────────────┘
PropertyValue
OwnershipBorrowed (does not own memory)
LifetimeMust not outlive source
Stack onlyCannot be stored in heap structures
MLIRSame as array header (no ownership)

Zero-Copy Views: Spans provide borrowed views into arrays, strings, and memory regions without allocation. The stack-only constraint prevents lifetime escape (similar to Rust’s slice borrowing). Maps directly to MLIR memref with dynamic offset.

Note: Leverage FSharp.Core Span types - already stack-allocated and native-friendly.


Part 5: Parameterized Types

Principle: Parameterized types follow familiar F# syntax. CCS resolves native semantics at compile-time.

CCS Resolution: See ccs-specification.md Part 1.3 for option type resolution and null-free guarantees.

5.1 Option

let maybeValue : int option = Some 42
let nothing : string option = None

Memory Layout (stack-allocated tagged union):

option<'T>  (voption semantics)
┌──────────┬────────────────────┐
│ Tag (i8) │ Payload: 'T        │
└──────────┴────────────────────┘
   1 byte     sizeof<'T>         + padding
PropertyValue
Tag valuesNone = 0, Some = 1
Stack allocatedAlways (never heap)
Null-freedomNone is tag 0, NOT null pointer
MLIR!fidelity.option<T>

Stack-Only Guarantee: Unlike heap-allocated options (cf. OCaml blocks, .NET reference types), Clef options are always stack-allocated with no GC involvement. This enables predictable memory layout for embedded targets and eliminates heap fragmentation from frequent option use.

See: Appendix E for detailed OCaml/Rust comparison.

CCS Resolution:

// User writes familiar F# syntax:
let x : int option = Some 42
let y : int option = None

// CCS compiles with voption<int> semantics:
// - Stack allocated
// - Tag-based discrimination
// - No null anywhere

API (Null-Freedom Cascades):

BCL PatternClef Pattern
opt.Value throwsOption.get opt (or pattern match)
opt.IsSomeOption.isSome opt
Option.defaultValue v optSame (works identically)

Note: CCS provides native option semantics directly - option<'T> maps to stack-allocated voption at compile time.

5.2 Result

let success : Result<int, string> = Ok 42
let failure : Result<int, string> = Error "not found"

Memory Layout (tagged union):

Result<'T, 'E>
┌──────────┬────────────────────────────────────┐
│ Tag (i8) │ Payload: max(sizeof<'T>, sizeof<'E>)│
└──────────┴────────────────────────────────────┘
PropertyValue
Tag valuesOk = 0, Error = 1
Stack allocatedAlways
MLIR!fidelity.result<T, E>

Stack Allocation: Like option, Result is always stack-allocated with zero heap overhead.

Why Result Over Exceptions: Result makes error handling explicit in type signatures, enables compile-time exhaustiveness checking, has zero runtime overhead, and can cross FFI boundaries via BAREWire. Exceptions are not supported for control flow in Clef.

Preferred Error Handling: Result is the idiomatic error handling pattern in Clef. Exceptions are not supported for control flow.

// Idiomatic error handling
let divide x y : Result<int, string> =
    if y = 0 then Error "division by zero"
    else Ok (x / y)

// Chaining with Result.map, Result.bind
divide 10 2
|> Result.map (fun x -> x * 2)
|> Result.defaultValue 0

5.3 List

let numbers : int list = [1; 2; 3; 4; 5]
let empty : int list = []

Memory Layout (cons cells):

list<'T>
┌────────────────────┬─────────────────────┐
│ head: 'T           │ tail: ptr<list<'T>> │
└────────────────────┴─────────────────────┘
     sizeof<'T>            8 bytes (pointer)
PropertyValue
Empty listSpecial tag, no allocation
ImmutableAlways (structural sharing)
AllocationArena or stack (not GC heap)
MLIR!fidelity.list<T>

Arena Allocation: List cons cells are allocated in arenas (not GC heap), providing better cache locality and batch deallocation at scope end. The immutable structure enables structural sharing as in OCaml.

When to Use:

  • Pattern matching on head/tail
  • Recursive algorithms
  • Functional transformations (map, filter, fold)

When NOT to Use (prefer array):

  • Random access by index
  • Performance-critical loops
  • Large collections with mutations

5.4 Map

let lookup : Map<string, int> = Map.ofList [("a", 1); ("b", 2)]
let empty : Map<int, string> = Map.empty

Memory Layout (AVL tree nodes):

Map<'K, 'V>  (AVL tree node)
┌────────────────────┬────────────────────┬─────────────────────┬─────────────────────┬──────────┐
│ key: 'K            │ value: 'V          │ left: ptr<Map>      │ right: ptr<Map>     │ height: i8│
└────────────────────┴────────────────────┴─────────────────────┴─────────────────────┴──────────┘
     sizeof<'K>           sizeof<'V>            8 bytes              8 bytes            1 byte
PropertyValue
Empty mapNull pointer (no allocation)
StructureSelf-balancing AVL tree
ImmutableAlways (structural sharing on update)
AllocationArena or stack (not GC heap)
Key constraint'K : comparison
MLIR!fidelity.map<K, V>

AVL Balance Property: Height difference between left and right subtrees is at most 1. Rebalancing occurs on Map.add when this property would be violated.

When to Use:

  • Key-value associations with O(log n) lookup
  • Ordered iteration by key
  • Immutable dictionary semantics

When NOT to Use (prefer Dictionary or array):

  • Frequent updates (mutable dictionary better)
  • Small fixed key sets (array with enum index)
  • Hash-based O(1) lookup needed

See: Map Representation for detailed AVL algorithms and HOF specifications.

5.5 Set

let numbers : Set<int> = Set.ofList [1; 2; 3]
let empty : Set<string> = Set.empty

Memory Layout (AVL tree nodes):

Set<'T>  (AVL tree node)
┌────────────────────┬─────────────────────┬─────────────────────┬──────────┐
│ value: 'T          │ left: ptr<Set>      │ right: ptr<Set>     │ height: i8│
└────────────────────┴─────────────────────┴─────────────────────┴──────────┘
     sizeof<'T>            8 bytes              8 bytes            1 byte
PropertyValue
Empty setNull pointer (no allocation)
StructureSelf-balancing AVL tree
ImmutableAlways (structural sharing on update)
AllocationArena or stack (not GC heap)
Element constraint'T : comparison
MLIR!fidelity.set<T>

Relationship to Map: Set<'T> is structurally equivalent to Map<'T, unit> but with optimized layout (no value field).

When to Use:

  • Membership testing with O(log n) lookup
  • Ordered unique elements
  • Set operations (union, intersect, difference)

When NOT to Use (prefer HashSet or array):

  • Very frequent membership tests (hash-based O(1) better)
  • When order doesn’t matter and hash is cheaper

See: Set Representation for detailed AVL algorithms and HOF specifications.


Part 6: Function Types

Principle: Functions are first-class values. Inline-by-default semantics (fsil) eliminates most closure overhead.

6.1 Pure Functions

let add : int -> int -> int = fun x y -> x + y
let apply : ('a -> 'b) -> 'a -> 'b = fun f x -> f x

Representation Strategies:

CaseRepresentation
Known call siteDirect call (no indirection)
Inline functionInlined at call site (fsil default)
First-class valueFunction pointer or closure
Captures environmentClosure struct

6.2 Closure Representation

When a function captures variables from its environment:

let makeAdder n =
    fun x -> x + n  // Captures 'n'

Memory Layout:

Closure
┌─────────────────────┬─────────────────────┐
│ fn_ptr: ptr<fn>     │ env: captured values│
└─────────────────────┴─────────────────────┘
     8 bytes              sizeof<env>
PropertyValue
EnvironmentStruct containing captured values
Invocationfn_ptr(env, args...)
MLIR!fidelity.closure<(args) -> ret, env>

Closure Allocation: Closures are allocated on stack or in arenas (not GC heap). Small closures (<64 bytes) fit within one cache line for efficient invocation.

6.3 Inline Semantics (fsil Absorption)

Most functions are inlined by default:

// These are inlined at call site - no closure allocation
let inline double x = x * 2
let inline (|>) x f = f x
Function TypeDefault Behavior
Simple arithmeticAlways inlined
Pipe operatorsAlways inlined
SRTP-constrainedAlways inlined
RecursiveNot inlined (requires call)
Large bodyCompiler decides

6.4 Partial Application

let add x y = x + y
let add5 = add 5  // Partial application

Representation: Creates a closure capturing applied arguments:

add5 = { fn_ptr: add_impl, env: { x = 5 } }

Currying Optimization (fsil): Fully-applied curried calls compile to direct multi-argument calls (no intermediate closures). Partial application creates flat closures capturing applied arguments. Higher-order uses like List.map f are typically inlined at call sites.

See: ClefExpr § 5.2 Curried Call Flattening for the normative specification of how the SemanticGraph represents curried applications.


Part 7: Mutable State

Principle: Mutable state is explicit and controlled. All mutation is visible in the type system or syntax.

7.1 Ref Cells

let counter : int ref = ref 0
counter := !counter + 1  // Mutation
let value = !counter     // Dereference

Memory Layout:

ref<'T>
┌────────────────────┐
│ contents: 'T       │
└────────────────────┘
    sizeof<'T>
PropertyValue
RepresentationSingle-field mutable record
AllocationStack or arena (not GC)
MLIR!fidelity.ref<T> or ptr<T>

Stack/Arena Allocation: Refs are allocated on stack or in arenas (not GC heap), eliminating allocation pressure in loops and providing predictable memory behavior for embedded targets.

7.2 Mutable Bindings

let mutable x = 0
x <- x + 1  // Direct mutation
PropertyValue
ScopeLocal to function
RepresentationStack slot
Cannot escapeCannot be captured by closures

Escape Prevention: Mutable bindings cannot be captured by closures (use ref instead). This restriction enables guaranteed stack allocation.

7.3 Mutable Record Fields

type Counter = { mutable Value: int }

let c = { Value = 0 }
c.Value <- c.Value + 1
PropertyValue
LayoutSame as immutable field
MutabilityCompile-time property

Cache Line Isolation: For concurrent access, isolate frequently-written mutable fields using [<CacheLinePadded>] to prevent false sharing.


Part 8: Memory Region Types (UMX Integration)

Core Fidelity Principle: Memory region types are intrinsic to Clef - as fundamental as voption. They carry semantic meaning through the entire compilation pipeline, guiding every memory layout decision. Fidelity makes ALL memory layout decisions - MLIR/LLVM never determine layout.

Erasure at Last Lowering: These types ARE erased - but at the last possible lowering stage, after Fidelity has made all memory layout decisions. By the time code reaches LLVM, “the type information that guided every transformation has done its job and compiled away to nothing.” This is the entire point of “Fidelity” - preserving type fidelity through compilation so the F# compiler controls memory layout.

CCS Enforcement: See ccs-specification.md Parts 9-10 for region constraint enforcement and diagnostic codes.

8.1 Memory Regions

Memory region types define where memory lives and how it behaves. They guide Fidelity’s code generation decisions throughout the pipeline:

RegionUse CaseVolatileCacheableCode Generation Effect
StackThread-local automaticNoYesStack allocation, automatic cleanup
ArenaBulk allocation, batch freeNoYesArena allocator calls, scope-bounded
PeripheralMemory-mapped I/OYesNoVolatile loads/stores, no reordering
SramGeneral RAMNoYesStandard memory access
FlashRead-only program memoryNoYesRead-only access, link-time placement
// These types guide memory layout decisions through compilation
type Stack      // Thread-local, automatic lifetime
type Arena      // Compiler-managed bulk allocation
type Peripheral // Memory-mapped I/O (volatile semantics)
type Sram       // General-purpose RAM
type Flash      // Read-only storage

Why This Matters - The “Fidelity” in Fidelity Framework:

The entire point of Fidelity is that the F# compiler controls memory layout, not MLIR, not LLVM. Memory region types:

  1. Carry semantic meaning through the entire compilation pipeline
  2. Guide Alex’s code generation - Peripheral → volatile LLVM instructions
  3. Determine allocation strategy - Stack vs Arena vs explicit
  4. Enable compile-time safety - region mismatches are type errors
  5. Erase at final lowering - only after ALL decisions are made by Fidelity

This is fundamentally different from letting MLIR/LLVM “figure out” memory layout. Fidelity dictates; LLVM implements.

8.2 Access Kinds

Access kinds are also intrinsic - they determine what operations are legal AND affect code generation:

KindReadWriteCMSISCode Generation Effect
ReadOnlyYesNo__INo store instructions generated
WriteOnlyNoYes__ONo load instructions generated
ReadWriteYesYes__IOBoth permitted
// Intrinsic access kind types
type ReadOnly   // Input registers, flash, const data
type WriteOnly  // Output registers, write-only buffers
type ReadWrite  // General mutable access

8.3 Region-Typed Pointers

Pointers carry region and access information as part of their type:

type Ptr<'T, 'Region, 'Access>

// Examples - region and access are part of the type, not annotations
let gpioReg : Ptr<uint32, Peripheral, ReadWrite> = ...
let flashData : Ptr<byte, Flash, ReadOnly> = ...
let stackBuffer : Ptr<int, Stack, ReadWrite> = ...

Compile-Time Safety:

// ERROR: Cannot write to readOnly pointer
let writeFlash (p: Ptr<byte, Flash, ReadOnly>) =
    Ptr.write p 0uy  // Compile error!

// OK: Can read from readOnly
let readFlash (p: Ptr<byte, Flash, ReadOnly>) =
    Ptr.read p  // OK

Cache Behavior: Stack, Arena, Sram, and Flash regions are cacheable with normal load/store semantics. Peripheral access bypasses cache and uses memory barriers - essential for hardware registers where timing and order matter.

8.4 Arena as CCS Intrinsic Type

Status (January 2026): Arena is implemented as an CCS intrinsic type.

Type Definition:

Arena<[<Measure>] 'lifetime>

Memory Layout (NTUCompound 3):

Arena<'lifetime>
┌─────────────────┬─────────────────┬─────────────────┐
│ Base: nativeint │ Capacity: int   │ Position: int   │
└─────────────────┴─────────────────┴─────────────────┘
     8 bytes           8 bytes           8 bytes       = 24 bytes (64-bit)
PropertyValue
CCS TypeIntrinsic with NTUCompound(3)
Lifetime paramMeasure type for tracking
AllocationStack-backed (typical) or heap-backed
MLIRThree-word struct with InsertValue/ExtractValue

CCS Intrinsic Operations:

OperationType Signature
Arena.fromPointernativeint -> int -> Arena<'lifetime>
Arena.allocArena<'lifetime> byref -> int -> nativeint
Arena.allocAlignedArena<'lifetime> byref -> int -> int -> nativeint
Arena.remainingArena<'lifetime> -> int
Arena.resetArena<'lifetime> byref -> unit

Usage Pattern:

// Arena backed by stack memory
let arenaMem = NativePtr.stackalloc<byte> 4096
let mutable arena = Arena.fromPointer (NativePtr.toNativeInt arenaMem) 4096

// Allocate from arena
let buffer = Arena.alloc &arena 256

// Arena freed when stack frame exits

Byref Parameter: Operations that mutate arena state (alloc, reset) take Arena<'lifetime> byref to enable in-place position updates without copying the 24-byte struct.

Lifetime Inference Principle: Arena demonstrates the three-level approach:

  • Level 3 (Explicit): Current - full manual control
  • Level 2 (Hints): Future - arena { } computation expressions
  • Level 1 (Inferred): Future - compiler escape analysis

See: memory-regions.md for detailed Arena semantics.

8.5 Hardware Peripheral Descriptors

See: Farscape documentation for peripheral binding generation.

[<PeripheralDescriptor("GPIO", 0x48000000UL)>]
type GPIO_TypeDef = {
    [<Register("MODER", 0x00u, "rw")>]
    MODER: Ptr<uint32, peripheral, readWrite>
    
    [<Register("IDR", 0x10u, "r")>]
    IDR: Ptr<uint32, peripheral, readOnly>
    
    [<Register("ODR", 0x14u, "rw")>]
    ODR: Ptr<uint32, peripheral, readWrite>
}

Part 9: Coeffects and Resource Tracking

DEFERRED TO PHASE B: This section will be completed after QuantumCredential PoC.

9.1 Pure vs Effectful Classification

TBD - Continuation-based RAII model

9.2 Resource Coeffects

TBD - ResourceCoeffect type integration


Part 10: Interactive Development (fsni)

Spec Reference: See interactive-development.md for full specification.

The type universe must account for interactive development scenarios where types are defined and evaluated incrementally.

10.1 Interactive Session Types

In fsni (Clef Interactive), types are resolved in an evolving environment:

> type Point = { x: int; y: int };;
type Point

> let origin = { x = 0; y = 0 };;
val origin : Point

Type Resolution in Interactive Mode:

  • Types defined in the session are immediately available
  • Types from #require directives are loaded into the type environment
  • CCS resolves types against both session-local and loaded definitions

10.2 Arena Semantics in Interactive Mode

Interactive sessions use arena-based allocation:

AspectCompiled ProgramInteractive Session
Arena LifetimeProgram-controlledSession-controlled
ResetExplicit#arena reset directive
PersistenceN/A#persist for cross-reset values

Type Implications:

> let data = [1..1000000];;
val data : int list  // Allocated in session arena

> #arena reset;;
// data is no longer valid - type checker knows this

10.3 Interpretation vs Compilation Type Semantics

ModeType CheckingExecution
InterpretFull CCSInterpreted
CompileFull CCSNative code
HybridFull CCSMode-dependent

All modes use identical type semantics. The difference is only in execution:

> #mode compile;;
> let rec fib n = if n < 2 then n else fib (n-1) + fib (n-2);;
val fib : int -> int  // Same type in all modes

10.4 Script File Type Semantics

Script files (.fsnx) follow the same type semantics as compiled modules:

// script.fsnx
let greeting : string = "Hello"  // string with native UTF-8 fat pointer semantics
let maybe : int option = Some 42  // voption<int>, non-null

10.5 Cross-Compilation Type Considerations

When targeting a different platform:

> #target linux-arm64;;
Target: linux-arm64 (cross-compiling from linux-x64)

> sizeof<nativeint>;;
val it : int = 8  // Reflects target, not host

Platform-Specific Types:

  • nativeint/unativeint size reflects target platform
  • Pointer types respect target architecture
  • Type layouts follow target ABI

Appendix A: Type Mapping to MLIR

F# TypeMLIR Type
unit(none - ZST)
booli8
intindex
int32i32
int64i64
floatf64
float32f32
string!fidelity.str
option<'T>!fidelity.option<T>
'a * 'btuple<A, B>
Record!fidelity.record<...>
DU!fidelity.union<...>
'a -> 'b!fidelity.fn<A, B> or direct (A) -> B

Appendix B: OCaml Type System Reference

Source: OCaml Basic Data Types, Memory Representation, Real World OCaml

B.1 OCaml Uniform Memory Representation

OCaml uses a “uniform memory representation” where every variable is a single word containing either an immediate integer or a pointer. The runtime distinguishes them using the lowest bit:

  • Bit = 1: Integer value (unboxed)
  • Bit = 0: Pointer to heap-allocated data

This tagging allows integers to remain “unboxed” (stored directly without heap allocation), making them fast and register-friendly. The tradeoff is losing one bit of precision in integer values.

B.2 OCaml Primitive Types

TypeOCaml RepresentationSizeNotes
intTagged integer63-bit (64-bit platform)One bit reserved for GC tag
floatBoxed IEEE 754 double8 bytes + headerNo single-precision built-in
boolUnboxed integer1 wordtrue=1, false=0
charUnboxed integer1 byte valueLatin-1 only (0-255)
stringBlock with String_tagVariableWord-aligned, explicit length
unitUnboxed integer 00Same representation as []
'a optionVariant1 word or boxedNone=0, Some x=block

B.3 OCaml Composite Types

TypeRepresentationTag
TupleBlock with fields0
RecordBlock with fields0
ArrayBlock with variable fields0
Float arrayUnboxed contiguous254 (Double_array_tag)
Variant (no param)Unboxed integerN/A
Variant (with param)Block with tag + fields0-245

B.4 Key OCaml Design Decisions Clef Diverges From

OCaml ChoiceClef ChoiceRationale
63-bit tagged intFull word nativeintNo GC tag needed (compile-time safety)
Latin-1 charUTF-32 codepointModern Unicode support
Byte-sequence stringUTF-8 fat pointerText correctness + Rust interop
Boxed option (sometimes)Always stack voptionNull-freedom guarantee
Runtime type taggingCompile-time onlyNo runtime overhead

Appendix C: Comparison with OCaml/F#/Rust

AspectOCamlF# (.NET)ClefRust
String encodingByte sequenceUTF-16UTF-8UTF-8
Option'a option (heap)'a option (heap)voption (stack)Option<T> (stack)
Default int63-bit tagged32-bitPlatform wordPlatform word
NullNo nullNullable refsNo nullNo null
Memory safetyRuntime GCRuntime GCCompile-timeCompile-time
Type representationRuntime taggedRuntime boxedCompile-timeCompile-time

Appendix D: Migration from Shadow Types

Note: CCS provides native type resolution directly. No shadow types or library workarounds are needed.

FSharp.Core Types Leveraged

TypeLocationNotes
voption<'T>FSharp.CoreTHE underlying option implementation
nativeintFSharp.CorePlatform-sized integer
nativeptr<'T>FSharp.NativePtrNative pointer operations
Span<'T>FSharp.CoreContiguous memory view

CCS Type Resolution

CCS (Clef Compiler Service) resolves types to native representations at the source:

  • string → UTF-8 fat pointer
  • option<'T>voption (stack-allocated)
  • array<'T> → fat pointer with native element layout
  • int → platform word

Appendix E: OCaml Provenance and Fidelity Extensions

Design Philosophy: Clef draws provenance from OCaml’s direct memory layout idioms - concepts that F#/.NET lacks entirely because the CLR abstracts memory away. However, OCaml is desktop-centric and non-cache-aware. Fidelity extends OCaml’s foundation with modern hardware realities while incorporating Rust’s RAII principles.

E.1 What OCaml Provides (KEEP)

OCaml contemplates direct memory layout in ways F#/.NET never does:

OCaml ConceptValue for ClefNotes
Products/Sums/Functions as primitivesFoundationType universe axioms (Part 1)
Value-oriented structural assemblyCore principleRecords/tuples laid out contiguously
Deterministic tag layout for DUsDirectly usableTag = 0,1,2… in declaration order
Fat pointer conceptDirectly usable{ptr, len} for strings/arrays
No null philosophyCore principleEverything representable without sentinel values
Unboxed by defaultCore principleNo implicit heap allocation

These concepts form the bedrock of Clef’s type universe. The structural assembly semantics, deterministic tag assignment, and fat pointer model are adopted with minimal modification.

E.2 What OCaml Lacks (SET ASIDE)

OCaml’s desktop-centric, non-cache-aware limitations that Clef explicitly diverges from:

OCaml LimitationFidelity RequirementGap
63-bit tagged integersFull-width integersGC tag overhead eliminated - compile-time safety replaces runtime discrimination
No cache line awareness64-byte alignment mattersWorking set calculation, false sharing prevention
No memory region typesStack/Arena/Peripheral/Sram/FlashCompile-time placement decisions
No access kindsReadOnly/WriteOnly/ReadWriteHardware register semantics for embedded targets
Desktop “sufficient RAM” assumptionEmbedded/constrained targetsMemory-mapped peripherals, limited stack
No NUMA awarenessMulti-socket topologyActor placement strategies (Prospero)
Runtime type discriminationCompile-time onlyNo runtime overhead
No prefetch/cache bypass semanticsProcessor-specific strategiesIntel vs AMD vs ARM optimization

OCaml assumes a desktop environment with ample RAM and a garbage collector managing memory. Clef targets the full spectrum from microcontrollers to GPU clusters.

E.3 Rust RAII Guideposts (ADAPT)

Rust’s ownership model provides valuable guideposts, adapted to F#’s functional idioms:

Rust ConceptFidelity AdaptationImplementation
OwnershipMemory region types + coeffectsPtr<'T, 'Region, 'Access> carries placement semantics
Borrow checkerType-guided analysisLess invasive than Rust’s explicit lifetimes
Drop semanticsContinuation-bounded resourcesRAII via delimited continuation completion
Deterministic cleanupArena/scope-bounded lifetimesResources released at scope exit, not GC
Move semanticsLinear types (Phase B)Ownership transfer without copying

Rust’s insight that ownership can be tracked statically is preserved, but expressed through F#’s type system rather than explicit lifetime annotations.

E.4 Fidelity Extensions Beyond Both

Fidelity’s cache-aware compilation doctrine extends beyond both OCaml and Rust:

Cache Hierarchy Awareness

ConcernOCamlRustFidelity
Cache line alignment (64 bytes)NoManualFirst-class - [<CacheLineAligned>]
False sharing preventionNoManualAutomatic for mutable fields
Working set calculationNoNoCompile-time via BAREWire layouts
L1/L2/L3 tier placementNoNoArena configuration strategies

Memory Hierarchy Management

// Fidelity-specific: Cache-conscious arena configuration
let configureArena (profile: ActorProfile) =
    match profile.WorkingSetSize with
    | size when size <= 32<KB> -> L1Resident    // Fits in L1
    | size when size <= 256<KB> -> L2Optimized  // Fits in L2
    | size when size <= 8<MB> -> L3Optimized    // Fits in L3
    | _ -> StreamingMode                         // Bypass cache

Processor-Specific Optimization

FeaturePurposeOCaml/Rust
Prefetch distance calibrationIntel vs AMD vs ARM strategiesNo
Non-temporal movesBypass cache for streaming dataManual only
NUMA-aware placementMulti-socket actor placementNo
TLB/huge page selectionBased on access densityOS-level only

Hardware Memory Regions

Clef introduces memory regions unknown to both OCaml and Rust:

RegionUse CaseOCamlRustClef
StackThread-local automaticImplicitImplicitExplicit type
ArenaBulk allocation, batch freeNoManualFirst-class
PeripheralMemory-mapped I/O (volatile)NoManual unsafeType-safe
SramGeneral RAMN/AN/AExplicit
FlashRead-only program memoryN/AN/AExplicit

E.5 The Synthesis

Clef’s type universe represents a synthesis:

  1. From OCaml: Products, sums, functions as primitives; value-oriented assembly; fat pointers; no null
  2. Set Aside from OCaml: GC tagging overhead; desktop assumptions; runtime discrimination
  3. From Rust: Deterministic cleanup; ownership tracking (adapted to coeffects); RAII semantics
  4. Fidelity Original: Memory regions; access kinds; cache hierarchy awareness; processor-specific optimization

This synthesis provides provenance (we didn’t start from scratch) while forging a path suited to native compilation across the full hardware spectrum - from embedded microcontrollers to GPU clusters.

E.6 References


Changelog

  • 2024-12-30: Session 1 (Unit, Bool, Integers) - Comprehensive primitive type specification with OCaml provenance, cache alignment considerations, resolved char design decision (UTF-32 codepoints)
  • 2024-12-30: Added Appendix E (OCaml Provenance and Fidelity Extensions) documenting design heritage and extensions
  • 2024-12-30: Initial skeleton created from plan