forge — a systems language
A hand-written compiler for a small statically-typed systems language. Started as a toy, currently chasing self-hosting on a custom RISC-V backend. No LLVM, no libc, no runtime — just text in, machine code out.
First commit. The plan: build a small statically-typed systems language, in Rust, from scratch — lexer, parser, type checker, code generator, the whole thing. No grammar generator. No LLVM. Eventually I want it to compile itself.
Working name: forge. Syntax leans on Rust and ML, but the type system will start out much simpler: monomorphic, no traits, no lifetimes. The hard parts get added later, once the easy parts work.
The first goal is unambitious: tokenise this and print the tokens.
fn add(a: int, b: int) -> int { a + b }
That’s it. No parsing, no semantics. Just turn characters into a
stream of Ident("fn"), Ident("add"),
LParen, Ident("a"), etc. If I can’t get
that working cleanly, the rest is doomed.
Lexer done. ~600 lines, hand-written state machine, no regex. Handles identifiers, integer + float literals, string literals with escape sequences, comments (line + block, nested), and 22 punctuation tokens.
The fun bit was getting source locations right. Every token carries a
Span { start: u32, end: u32 } indexing into the source.
Errors later will need this — a parser error at column 47 is much less
useful than one that says “right here ↑”.
Bench: 1.2 MB/s on a debug build, 18 MB/s release. Plenty for now.
Parser is up. Recursive descent with Pratt-style precedence for expressions. Currently parses functions, let-bindings, if-else, while loops, and arithmetic.
The AST is more verbose than I’d like — every node carries its
Span and a placeholder TypeId. The verbosity
will pay off once type errors start pointing at the right thing, but for
now it just slows me down.
One annoying decision: should if be an expression or a
statement? Rust says expression, C says statement, Python says
statement-with-ternary. I went with expression because pattern matching
will be one too and I want them to compose. We’ll see if I regret
it.
fn main() -> int { 1 + 2 } compiles, links, runs, and
exits with status 3.
It does this by emitting LLVM IR, piping it to llc, then
ld. The compiler is currently ~3000 lines and the actual
codegen step is maybe 200 of them — LLVM is doing a staggering
amount of work that I’m not.
That’s fine for now. The point of the LLVM backend is to defer codegen complexity so I can focus on the language itself. I’ll write a real backend later. Maybe.
Type inference, Hindley-Milner-style. Took two weeks longer than I budgeted. The algorithm is famously elegant in papers and famously brutal to debug in practice.
The hardest part wasn’t unification — it was building a substitution that survives generalization without leaking type variables across function boundaries. I rewrote the inference engine twice before settling on a level-based generalization scheme à la OCaml.
Now this works:
fn identity(x) { x }
fn pair(a, b) { (a, b) }
let p = pair(identity(1), identity("hello"));
identity is correctly inferred as
fn<T>(T) -> T and the two call sites instantiate
it at int and str respectively. Eight months
ago I would have called this magic.
Pattern matching + algebraic data types. Both arrived in the same week because they really only make sense together.
enum Tree {
Leaf,
Node(int, Tree, Tree),
}
fn sum(t: Tree) -> int {
match t {
Tree::Leaf => 0,
Tree::Node(v, l, r) => v + sum(l) + sum(r),
}
}
The exhaustiveness checker is naive — it expands patterns into a
decision tree and looks for uncovered leaves. Works fine on
Tree, falls over on anything with more than ~6
constructors. Need to read up on Maranget’s algorithm for the proper
version.
Also realised: with ADTs working, the compiler can finally represent its own AST in its own type system. That’s a long way from self-hosting but it’s the first time I’ve thought about it as a real goal rather than a fantasy.
Mostly bug fixes this month. The interesting one: match
on a tuple of bools was producing wrong code because the decision-tree
compiler was de-duplicating subtrees that looked identical but
had different binding contexts.
I lost almost a week to this bug because it manifested as a runtime miscompile, not a type error. Every test passed. The output binary just did the wrong thing on inputs that exercised the deduplication.
Lesson re-learned: a type system catches a lot of things, but it
cannot catch a bug in the compiler that violates its own invariants. The
fix was a property test (quickcheck-style) that compiles a random
match expression, runs it, and compares against a
tree-walking interpreter. Found three more bugs the same evening.
Made a hard decision: I’m dropping LLVM.
The reasons piled up. LLVM is slow (link times are creeping into the tens of seconds even for trivial programs), the IR has a hundred edge cases the documentation glosses over, and — most damningly — depending on it makes self-hosting a much bigger lift. If forge has to bring LLVM along to bootstrap itself, the bootstrap binary balloons from kilobytes to a third of a gigabyte.
So: I’m writing a real backend. Starting with x86-64, naive register allocation, no peephole passes. The generated code will be embarrassing. That’s fine — the point is to own the path from AST to bytes-on-disk.
I’m bracing for this to take a long time.
x86-64 backend, first end-to-end run. Generates assembly, assembles
with nasm, links with ld. Code quality is
roughly what you’d expect from a compiler written by a person who has
never written a compiler before: every variable spills to the stack,
every operation reloads, every call is preceded by a flurry of
movs that exist for no reason.
fib(30) is 11× slower than the LLVM build. Eleven. I’m
leaving it that way for now — premature optimisation is the root of all
PRs that never land. The semantics are what matter.
One nice surprise: forge’s binaries are tiny. The fib executable is 4.8 KB. The LLVM equivalent was 16 KB before stripping, 8 KB after.
Added a register allocator. Linear scan, the variant from Poletto & Sarkar 1999. Three days of reading, four days of implementing, six days of bug-hunting.
The bugs were almost all about liveness — getting the live-range computation right when you have loops, when you have ϕ-nodes (which I don’t have because I’m not in SSA, but I have a pre-SSA equivalent), when a variable is technically dead at a join point but a sibling branch still uses it. Standard fare for anyone who’s done this before. New territory for me.
fib(30) is now 2.1× slower than LLVM. Calling that a
win.
Started a RISC-V backend.
The plan is bigger than just “another target”. I want forge to be bootable — to compile down to a single ELF that runs on a bare-metal RISC-V machine (or an emulator), with no operating system, no libc, no runtime. Just the instructions you wrote, plus whatever the language needs to run them.
This is the kind of project that quietly absorbs a year. Worth it for the educational value alone.
The RISC-V ISA is genuinely a delight to target after x86. Three operand format, no flags register, no instruction prefixes. The whole base instruction set fits on two pages. I keep them taped to my desk.
It boots.
I have a forge program compiled to a RISC-V ELF, loaded
by a tiny boot stub I wrote in assembly, running on
qemu-system-riscv64 with no kernel. It prints “hello” to
the UART and halts. The whole binary is 1.4 KB.
The boot stub is 38 instructions: set up the stack pointer, zero the
BSS, jump to main. That’s it. There’s no libc to link
against — when the program needs to print a character it writes directly
to the UART memory-mapped IO address.
This unlocks something I’ve been daydreaming about: forge can be its own runtime. The standard library — once I have one — will be written in forge and compiled by forge, with no C dependency anywhere in the stack.
Spent the holidays on the standard library. Currently have:
- A minimal allocator (bump, then a free-list pass)
- Slices, strings (UTF-8, length-prefixed)
- A
Vecequivalent - File I/O (only on hosted targets — bare metal has no concept)
- Basic
print/printlnvia the UART driver on RISC-V, via awrite(2)syscall on Linux
All written in forge. The whole thing is about 1800 lines. The compiler is now ~14,000 lines of Rust.
The ratio is going to flip eventually.
The lexer compiles itself.
I rewrote forge’s lexer in forge. It produces byte-identical token streams to the Rust lexer on every test input I’ve thrown at it (1.2 GB of corpus across the Rust stdlib, the Linux kernel headers, and a bunch of forge programs). The two-language detour through Rust to bootstrap the forge lexer was worth it.
I am — for the first time — running forge code generated by the forge compiler on forge programs.
Next: parser. Then type checker. Then codegen. Each of those is a multi-week project. Self-hosting is not one milestone; it’s a sequence of them, and any one of them stalling can stall the whole thing.
Parser compiles itself.
The parser was the part I was most nervous about. Lexers are mechanical; parsers have opinions. The forge-in-forge parser has to handle precedence the same way, recover from errors the same way, build the same AST shape. Subtle drift here would compound through the rest of the pipeline.
I cheated: the forge-in-forge parser generates the AST as a stream of tags and offsets, which the forge-in-Rust compiler can read directly. So I can compile half with forge and half with Rust and check the outputs at every boundary.
This is going to be the last entry before something either dramatic happens or I get stuck for a long time. The type checker is the hardest part. If it self-hosts, the codegen will follow within weeks.
Quick aside while I’m deep in the type-checker rewrite: forge is now fast enough that I’m starting to care about its build times again.
The current type checker is dominated by string interning during name
resolution. Profiled it last weekend; 38% of the time is in
HashMap<&str, SymbolId>. Going to swap it for an
open-addressed hash table with a tighter probing scheme and see how much
that buys me.
This kind of micro-optimisation work is what got me writing the base64 post — the same pattern applies. Most of the wins came from looking at the structure of the data, not from tuning the loop. The type-checker rewrite is the same shape of problem at a different scale.
Status of self-hosting: type checker is ~70% rewritten in forge. Codegen still untouched. Best guess: the whole pipeline self-hosts by autumn. The Rust compiler will then live on as a reference implementation, but production builds will be forge-on-forge.
That’s the goal. Whether I get there in 2026 or 2027 is no longer the interesting question.