Native Type Mappings

Native Type Mappings

This chapter defines how F# types map to native representations in Clef compilation.

Overview

Clef uses familiar F# syntax with native semantics. The compiler (CCS) resolves types to native representations at compile time, not to BCL types.

Principle: Users write standard F# type names. CCS provides native semantics transparently.

The Universal Base Type obj Is Not Available

In managed F#, all types inherit from System.Object (aliased as obj). This enables:

  • Boxing value types to heap-allocated objects
  • Runtime type information and reflection
  • Heterogeneous collections (obj list)
  • Generic %A formatting via runtime inspection

Clef eliminates obj entirely. There is no universal base type. The compiler SHALL reject any code that references obj or System.Object.

Rationale

Managed F# CapabilityWhy It Requires objClef Alternative
Boxing (box x)Wraps value in heap objectNot needed; value types stay value types
Unboxing (unbox x)Extracts value from objectNot available; no boxed values exist
%A / %O formattingRuntime type inspectionSRTP-based formatting with compile-time dispatch
obj listHeterogeneous collectionDiscriminated union with explicit cases
Downcasting (:?>)Runtime type checkPattern matching on discriminated unions
typeof<'T>Runtime type tokenNot available; types are compile-time only

Why obj Cannot Exist in Native Compilation

  1. No runtime type information: Native binaries do not carry type metadata. There is no mechanism to inspect a value’s type at runtime.

  2. No garbage collector: The obj type implies heap allocation with GC-managed lifetime. Clef uses deterministic, scope-based memory management.

  3. Full static resolution: All types are resolved at compile time. Generic type parameters are monomorphized (specialized at each call site). Type erasure to obj is unnecessary and would lose type safety.

  4. SRTP replaces runtime dispatch: Where managed F# uses obj and runtime dispatch (like printf "%A"), Clef uses statically resolved type parameters with compile-time method resolution.

Migrating Code That Uses obj

Code using obj must be refactored to use type-safe alternatives:

Heterogeneous collections:

// DOES NOT COMPILE in Clef
let values : obj list = [box 1; box "hello"; box 3.14]

// Use discriminated union instead
type Value = 
    | Int of int 
    | Str of string 
    | Float of float
let values : Value list = [Int 1; Str "hello"; Float 3.14]

Polymorphic formatting:

// DOES NOT COMPILE in Clef  
let show (x: obj) = sprintf "%A" x

// Use SRTP with operator overloading
type Showable = Showable
    with static member inline ($) (Showable, x: int) = intToString x
         static member inline ($) (Showable, x: string) = x
         // ... additional overloads

let inline show x = Showable $ x

Type-based dispatch:

// DOES NOT COMPILE in Clef
let process (x: obj) =
    match x with
    | :? int as i -> handleInt i
    | :? string as s -> handleString s
    | _ -> handleOther ()

// Use discriminated union with exhaustive matching
type Input = IntInput of int | StringInput of string
let process (x: Input) =
    match x with
    | IntInput i -> handleInt i
    | StringInput s -> handleString s

Compile-Time Metaprogramming

The absence of obj and System.Reflection does not leave Clef without metaprogramming capabilities. Three F# features provide typed, compile-time metaprogramming that surpasses what reflection-based approaches can offer:

FeatureRoleReflection Equivalent
Quotations (Expr<'T>)Encode program fragments as inspectable dataMethodInfo, Expression<T>
Active PatternsCompositional structural recognitionGetType(), type discrimination
Computation ExpressionsContinuation capture as notationCallback-based async, monadic patterns

Why This Matters

Other native-compiled ML-family languages lack typed metaprogramming:

CapabilityOCamlRustClef
Typed quotationsNoNoYes
Pattern-based recognitionMatch onlyMatch onlyActive patterns
Continuation notationNoNoComputation expressions
MetaprogrammingPPX (string-based)proc_macro (token-based)Quotations (typed)

F# quotations carry full type information through transformations. OCaml’s PPX system and Rust’s procedural macros operate on strings or token streams - they lack the type safety that quotations provide.

Quotations as Semantic Carriers

Quotations encode constraints and metadata as compile-time data that the compiler can inspect:

// Peripheral descriptor carried as typed quotation
let gpioDescriptor: Expr<PeripheralDescriptor> = <@
    { Name = "GPIO"
      BaseAddress = 0x48000000un
      MemoryRegion = Peripheral }
@>

The compiler extracts semantic information from quotations during PSG construction. No runtime reflection is needed - the information is available at compile time and can guide code generation (e.g., emitting volatile loads for peripheral access).

Active Patterns for Structural Recognition

Active patterns enable compositional matching without type discrimination hierarchies:

// Recognize SRTP dispatch in PSG nodes
let (|SRTPDispatch|_|) (node: PSGNode) =
    match node.TypeCorrelation with
    | Some { SRTPResolution = Some srtp } -> Some srtp
    | _ -> None

// Composable usage
match currentNode with
| SRTPDispatch srtp -> emitResolvedCall srtp
| PeripheralAccess info -> emitVolatileAccess info
| _ -> emitDefault node

Active patterns compose with & and |, can be tested in isolation, and encapsulate recognition logic - capabilities that runtime type inspection cannot match.

Computation Expressions as Continuation Capture

Every let! in a computation expression captures a continuation:

maybe {
    let! x = someOption    // Bind(someOption, fun x -> ...)
    let! y = otherOption   // Bind(otherOption, fun y -> ...)
    return x + y
}

This desugaring to nested lambdas provides continuation semantics as notation. The compilation strategy depends on the computation pattern:

PatternCompilation Strategy
Sequential effects (async, state)Preserve continuations (DCont dialect)
Parallel pure (validated, reader)Compile to data flow (Inet dialect)

Normative Requirements

NORMATIVE: System.Reflection and all reflection-based APIs SHALL NOT be available in Clef. The compiler SHALL reject any code that references reflection types or methods.

NORMATIVE: Quotations, active patterns, and computation expressions SHALL be fully supported. These features operate at compile time and impose no runtime overhead.

NORMATIVE: Quotation-based metaprogramming SHALL NOT require runtime evaluation. All quotation inspection and transformation occurs during compilation.

Primitive Types

Numeric Types

F# SyntaxNative RepresentationSizeNotes
unitZero-sized type0No runtime representation
booli81 byte0 = false, non-zero = true
intisizePlatform word4 bytes (32-bit), 8 bytes (64-bit)
uintusizePlatform wordUnsigned platform word
int8 / sbytei81 byteSigned 8-bit
uint8 / byteu81 byteUnsigned 8-bit
int16i162 bytesSigned 16-bit
uint16u162 bytesUnsigned 16-bit
int32i324 bytesSigned 32-bit
uint32u324 bytesUnsigned 32-bit
int64i648 bytesSigned 64-bit
uint64u648 bytesUnsigned 64-bit
nativeintisizePlatform wordSigned pointer-sized
unativeintusizePlatform wordUnsigned pointer-sized

Floating Point Types

F# SyntaxNative RepresentationSizeNotes
float / doublef648 bytesIEEE 754 double precision
float32 / singlef324 bytesIEEE 754 single precision

Character and String Types

F# SyntaxNative RepresentationSizeNotes
chari324 bytesUTF-32 codepoint (Unicode scalar value)
string{ptr: *u8, len: usize}16 bytesUTF-8 fat pointer

Composite Types

Tuples

Tuples are laid out as contiguous structs with natural alignment:

let pair : int * float = (42, 3.14)

Layout:

┌─────────┬─────────┬─────────┐
│ int (8) │ pad (0) │ float(8)│
└─────────┴─────────┴─────────┘
Total: 16 bytes

Records

Records are named product types with field-order layout:

type Point = { X: float; Y: float }

Layout: Same as tuple of fields in declaration order.

Discriminated Unions

Discriminated unions use tagged representation:

type Option<'T> = None | Some of 'T

Layout:

┌──────────┬────────────────────────┐
│ Tag (i8) │ Payload (size of 'T)   │
└──────────┴────────────────────────┘
PropertyValue
Tag sizei8 for ≤256 variants
Tag values0, 1, 2… in declaration order
PayloadSize of largest variant

Single-Case Unions (Newtypes)

Single-case unions have no tag overhead:

type UserId = UserId of int

Layout: Same as wrapped type (int).

Struct Alignment

Default Alignment

Structs use natural alignment based on their largest field:

Largest FieldDefault Alignment
i8, u81 byte
i16, u162 bytes
i32, u32, f324 bytes
i64, u64, f64, pointer8 bytes

Explicit Alignment

The [<Align(n)>] attribute requests specific alignment:

[<Align(64)>]
[<Struct>]
type CacheAligned = { Value: int64 }

NORMATIVE: The compiler SHALL respect alignment requests that are:

  • Powers of two
  • Greater than or equal to natural alignment
  • Less than or equal to platform page size (typically 4096)

NORMATIVE: Alignment requests that cannot be satisfied SHALL produce a compile-time error.

Alignment and SIMD

For SIMD operations, alignment affects performance significantly:

Vector WidthRecommended Alignment
128-bit (SSE, NEON)16 bytes
256-bit (AVX2)32 bytes
512-bit (AVX-512)64 bytes

Misaligned vector loads may incur penalties or faults depending on the instruction.

Stack and Arena Allocation

NORMATIVE: Stack-allocated aligned types SHALL be placed at appropriately aligned addresses.

NORMATIVE: Arena allocators SHALL provide an alignment-aware allocation function:

Arena.allocAligned<'T> : Arena -> alignment:int -> count:int -> nativeptr<'T>

Intrinsic Operations

Certain operations have direct hardware support that F# loops cannot match. CCS intrinsics provide guaranteed-efficient implementations.

Bit Manipulation Intrinsics

FunctionLLVM IntrinsicDescription
clz : uint32 -> intllvm.ctlz.i32Count leading zeros
clz64 : uint64 -> intllvm.ctlz.i64Count leading zeros (64-bit)
ctz : uint32 -> intllvm.cttz.i32Count trailing zeros
ctz64 : uint64 -> intllvm.cttz.i64Count trailing zeros (64-bit)
popcount : uint32 -> intllvm.ctpop.i32Population count
popcount64 : uint64 -> intllvm.ctpop.i64Population count (64-bit)
bswap : uint32 -> uint32llvm.bswap.i32Byte swap
bswap64 : uint64 -> uint64llvm.bswap.i64Byte swap (64-bit)

NORMATIVE: These functions SHALL emit the corresponding LLVM intrinsic, not loop-based implementations.

Arithmetic Intrinsics

FunctionLLVM IntrinsicDescription
mulhi : uint64 -> uint64 -> uint64(platform-specific)High 64 bits of 128-bit product
addCarry : uint64 -> uint64 -> uint64 -> struct(uint64 * uint64)llvm.uadd.with.overflowAdd with carry in/out

NORMATIVE: Multi-word arithmetic operations SHALL use carry-propagating instructions where available.

Usage

let extractRegime (bits: uint32) =
    let shifted = bits <<< 1
    let leadingZeros = clz shifted  // Guaranteed 1-2 cycles, not a loop
    // ... regime extraction logic

Fallback Behavior

On targets without hardware support for specific intrinsics:

NORMATIVE: The compiler SHALL emit efficient software fallbacks that match the semantic behavior.

NORMATIVE: The compiler MAY emit warnings when intrinsics fall back to software implementation on performance-critical targets.

Reference Types

Arrays

Arrays use fat pointer representation:

let numbers : array<int> = [| 1; 2; 3 |]

Layout:

Header (16 bytes):
┌─────────────────┬─────────────────┐
│ ptr: *T         │ len: usize      │
└─────────────────┴─────────────────┘

Elements (contiguous):
┌─────┬─────┬─────┐
│ [0] │ [1] │ [2] │
└─────┴─────┴─────┘

Strings

Strings use UTF-8 fat pointer representation:

Layout:

┌─────────────────┬─────────────────┐
│ ptr: *u8        │ len: usize      │
└─────────────────┴─────────────────┘
16 bytes (64-bit platform)
PropertyValue
EncodingUTF-8
LengthByte count (not character count)
Empty string{ptr: valid, len: 0}
NullNot representable

Parameterized Types

Option

Option types use voption (value option) semantics:

let maybe : int option = Some 42

Layout: Stack-allocated tagged union (see Discriminated Unions).

PropertyValue
None tag0
Some tag1
Heap allocationNever
NullNot representable

Result

Result types are stack-allocated tagged unions:

let result : Result<int, string> = Ok 42

Layout: Tag + max(sizeof Ok payload, sizeof Error payload).

List

Lists use cons cell representation:

let numbers : int list = [1; 2; 3]

Layout (per cons cell):

┌─────────────────┬─────────────────────┐
│ head: 'T        │ tail: ptr<list<'T>> │
└─────────────────┴─────────────────────┘

Function Types

Direct Functions

Known call sites compile to direct calls:

let add x y = x + y
add 1 2  // Direct call, no closure

Closures

Functions capturing environment use closure representation:

let makeAdder n = fun x -> x + n

Layout:

┌─────────────────────┬─────────────────────┐
│ fn_ptr: ptr<fn>     │ env: captured values│
└─────────────────────┴─────────────────────┘

MLIR Type Mappings

F# TypeMLIR Type
unit(none - ZST)
booli8
intindex
int32i32
int64i64
floatf64
float32f32
chari32
string!fidelity.str
option<'T>!fidelity.option<T>
Tupletuple<...>
Record!fidelity.record<...>
DU!fidelity.union<...>
Function!fidelity.fn<A, B>

Why IL Infrastructure Is Removed from CCS

Clef Compiler Service (CCS) targets native compilation via MLIR (Multi-Level Intermediate Representation), not CLR bytecode. While CCS originated from the F# Compiler Services (FCS) codebase, its type universe, compilation passes, and concurrency primitives are independently defined. Consequently, all IL-based infrastructure has been removed from the typed tree operations.

The Architecture Boundary

┌─────────────────────────────────────────────────┐
│  CCS (Clef Compiler Services)             │
│  - Type checking, resolution, inference         │
│  - Produces typed tree with native types        │
│  - NO code generation, NO IL                    │
└─────────────────────────────────────────────────┘
                      │
                      ▼ Typed Tree (native types)
┌─────────────────────────────────────────────────┐
│  Alex (Code Generation)                         │
│  - PSG traversal via Zipper                     │
│  - Platform bindings for syscalls               │
│  - MLIR emission                                │
└─────────────────────────────────────────────────┘
                      │
                      ▼ MLIR
┌─────────────────────────────────────────────────┐
│  MLIR Optimization Passes                       │
│  - Loop optimization (SCF dialect)              │
│  - Arithmetic optimization (arith dialect)      │
│  - Memory optimization                          │
└─────────────────────────────────────────────────┘
                      │
                      ▼ LLVM IR → Native Binary

Why IL Operations Are Not Stubbed

The original FCS contains IL-based operations for loop optimization, null handling, and arithmetic. These were initially stubbed during the CCS fork, but stubs produce semantically wrong results:

Stubbed FunctionWrong BehaviorWhy It’s Wrong
mkAsmExprReturns Coerce/identityShould compute arithmetic
mkILAsmCeq, mkILAsmCltReturns constant falseShould compare values
mkGetStringLengthReturns constant 0Should return actual length
mkDecrReturns expression unchangedShould decrement value

Principle: “Delete, don’t stub” - Broken stubs hide defects and produce silent wrong behavior. Complete removal makes missing functionality explicit.

What Functionality Moves Downstream

IL InfrastructureNative EquivalentLocation
TOp.ILAsm (arithmetic)MLIR arith dialect opsAlex code generation
TOp.ILCall (method calls)MLIR func.call / platform bindingsAlex code generation
Loop optimizationMLIR SCF dialect transformsMLIR optimization passes
String length/concatNative string fat pointer opsAlex code generation
Integer conversionsMLIR arith.extsi/extui/trunciAlex type lowering
Null handlingNot needed - Clef has no nullSee below

Null Is Not Representable

NORMATIVE: Clef has no null values. The null keyword and null checking operations are not available.

  • mkNull, mkNullTest, mkNonNullTest, mkNonNullCond - all removed
  • Option types (voption) replace nullable references
  • Pattern matching replaces null checks

This is consistent with Clef’s safety guarantees: no null pointer dereferences are possible because null cannot be expressed.

Removed IL Infrastructure

The following were removed from TypedTreeOps.fs:

IL Instruction Stubs:

  • ILDataType type
  • AI_ldnull, AI_cgt_un, AI_clt_un, AI_add, AI_sub, AI_div_un, etc.
  • ILInstr module
  • mkAsmExpr function

Loop Optimization (vestigial - no callers):

  • DetectAndOptimizeForEachExpression
  • mkOptimizedRangeLoop, mkRangeCount
  • mkFastForLoop
  • Pattern matchers: Int32Expr, RangeInt32Step, CompiledForEachExpr, etc.
  • IntegralConst module, IntegralRange, EmptyRange, ConstCount patterns

Null Operations:

  • mkNull, mkNullTest, mkNonNullTest, mkNonNullCond

Broken Comparison Stubs:

  • mkILAsmCeq, mkILAsmClt, mkDecr, mkGetStringLength

The Key Insight

IL-based loop optimization at the typed tree level was premature optimization at the wrong layer. Native loop optimization belongs in MLIR passes where the target architecture is known and appropriate loop transformations (vectorization, unrolling, tiling) can be applied.

See Also