forge — a systems language — Oskar Behzadpour

Sep 15, 2024

First commit. The plan: build a small statically-typed systems language, in Rust, from scratch — lexer, parser, type checker, code generator, the whole thing. No grammar generator. No LLVM. Eventually I want it to compile itself.

Working name: forge. Syntax leans on Rust and ML, but the type system will start out much simpler: monomorphic, no traits, no lifetimes. The hard parts get added later, once the easy parts work.

The first goal is unambitious: tokenise this and print the tokens.

fn add(a: int, b: int) -> int { a + b }

That’s it. No parsing, no semantics. Just turn characters into a stream of Ident("fn"), Ident("add"), LParen, Ident("a"), etc. If I can’t get that working cleanly, the rest is doomed.

Oct 8, 2024

Lexer done. ~600 lines, hand-written state machine, no regex. Handles identifiers, integer + float literals, string literals with escape sequences, comments (line + block, nested), and 22 punctuation tokens.

The fun bit was getting source locations right. Every token carries a Span { start: u32, end: u32 } indexing into the source. Errors later will need this — a parser error at column 47 is much less useful than one that says “right here ↑”.

Bench: 1.2 MB/s on a debug build, 18 MB/s release. Plenty for now.

Nov 2, 2024

Parser is up. Recursive descent with Pratt-style precedence for expressions. Currently parses functions, let-bindings, if-else, while loops, and arithmetic.

The AST is more verbose than I’d like — every node carries its Span and a placeholder TypeId. The verbosity will pay off once type errors start pointing at the right thing, but for now it just slows me down.

One annoying decision: should if be an expression or a statement? Rust says expression, C says statement, Python says statement-with-ternary. I went with expression because pattern matching will be one too and I want them to compose. We’ll see if I regret it.

Nov 28, 2024

fn main() -> int { 1 + 2 } compiles, links, runs, and exits with status 3.

It does this by emitting LLVM IR, piping it to llc, then ld. The compiler is currently ~3000 lines and the actual codegen step is maybe 200 of them — LLVM is doing a staggering amount of work that I’m not.

That’s fine for now. The point of the LLVM backend is to defer codegen complexity so I can focus on the language itself. I’ll write a real backend later. Maybe.

Dec 15, 2024

Type inference, Hindley-Milner-style. Took two weeks longer than I budgeted. The algorithm is famously elegant in papers and famously brutal to debug in practice.

The hardest part wasn’t unification — it was building a substitution that survives generalization without leaking type variables across function boundaries. I rewrote the inference engine twice before settling on a level-based generalization scheme à la OCaml.

Now this works:

fn identity(x) { x }
fn pair(a, b) { (a, b) }

let p = pair(identity(1), identity("hello"));

identity is correctly inferred as fn<T>(T) -> T and the two call sites instantiate it at int and str respectively. Eight months ago I would have called this magic.

Jan 22, 2025

Pattern matching + algebraic data types. Both arrived in the same week because they really only make sense together.

enum Tree {
    Leaf,
    Node(int, Tree, Tree),
}

fn sum(t: Tree) -> int {
    match t {
        Tree::Leaf => 0,
        Tree::Node(v, l, r) => v + sum(l) + sum(r),
    }
}

The exhaustiveness checker is naive — it expands patterns into a decision tree and looks for uncovered leaves. Works fine on Tree, falls over on anything with more than ~6 constructors. Need to read up on Maranget’s algorithm for the proper version.

Also realised: with ADTs working, the compiler can finally represent its own AST in its own type system. That’s a long way from self-hosting but it’s the first time I’ve thought about it as a real goal rather than a fantasy.

Feb 10, 2025

Mostly bug fixes this month. The interesting one: match on a tuple of bools was producing wrong code because the decision-tree compiler was de-duplicating subtrees that looked identical but had different binding contexts.

I lost almost a week to this bug because it manifested as a runtime miscompile, not a type error. Every test passed. The output binary just did the wrong thing on inputs that exercised the deduplication.

Lesson re-learned: a type system catches a lot of things, but it cannot catch a bug in the compiler that violates its own invariants. The fix was a property test (quickcheck-style) that compiles a random match expression, runs it, and compares against a tree-walking interpreter. Found three more bugs the same evening.

Apr 12, 2025

Understanding Recursion in Functional Programming

How recursive thinking shapes clean, elegant solutions — and when to reach for it.

Apr 12, 2025 · 7 min read ·

Programming Functional Theory

May 3, 2025

Made a hard decision: I’m dropping LLVM.

The reasons piled up. LLVM is slow (link times are creeping into the tens of seconds even for trivial programs), the IR has a hundred edge cases the documentation glosses over, and — most damningly — depending on it makes self-hosting a much bigger lift. If forge has to bring LLVM along to bootstrap itself, the bootstrap binary balloons from kilobytes to a third of a gigabyte.

So: I’m writing a real backend. Starting with x86-64, naive register allocation, no peephole passes. The generated code will be embarrassing. That’s fine — the point is to own the path from AST to bytes-on-disk.

I’m bracing for this to take a long time.

Jun 30, 2025

x86-64 backend, first end-to-end run. Generates assembly, assembles with nasm, links with ld. Code quality is roughly what you’d expect from a compiler written by a person who has never written a compiler before: every variable spills to the stack, every operation reloads, every call is preceded by a flurry of movs that exist for no reason.

fib(30) is 11× slower than the LLVM build. Eleven. I’m leaving it that way for now — premature optimisation is the root of all PRs that never land. The semantics are what matter.

One nice surprise: forge’s binaries are tiny. The fib executable is 4.8 KB. The LLVM equivalent was 16 KB before stripping, 8 KB after.

Jul 20, 2025

Added a register allocator. Linear scan, the variant from Poletto & Sarkar 1999. Three days of reading, four days of implementing, six days of bug-hunting.

The bugs were almost all about liveness — getting the live-range computation right when you have loops, when you have ϕ-nodes (which I don’t have because I’m not in SSA, but I have a pre-SSA equivalent), when a variable is technically dead at a join point but a sibling branch still uses it. Standard fare for anyone who’s done this before. New territory for me.

fib(30) is now 2.1× slower than LLVM. Calling that a win.

Sep 14, 2025

Started a RISC-V backend.

The plan is bigger than just “another target”. I want forge to be bootable — to compile down to a single ELF that runs on a bare-metal RISC-V machine (or an emulator), with no operating system, no libc, no runtime. Just the instructions you wrote, plus whatever the language needs to run them.

This is the kind of project that quietly absorbs a year. Worth it for the educational value alone.

The RISC-V ISA is genuinely a delight to target after x86. Three operand format, no flags register, no instruction prefixes. The whole base instruction set fits on two pages. I keep them taped to my desk.

Nov 8, 2025

It boots.

I have a forge program compiled to a RISC-V ELF, loaded by a tiny boot stub I wrote in assembly, running on qemu-system-riscv64 with no kernel. It prints “hello” to the UART and halts. The whole binary is 1.4 KB.

The boot stub is 38 instructions: set up the stack pointer, zero the BSS, jump to main. That’s it. There’s no libc to link against — when the program needs to print a character it writes directly to the UART memory-mapped IO address.

This unlocks something I’ve been daydreaming about: forge can be its own runtime. The standard library — once I have one — will be written in forge and compiled by forge, with no C dependency anywhere in the stack.

Dec 22, 2025

Spent the holidays on the standard library. Currently have:

A minimal allocator (bump, then a free-list pass)
Slices, strings (UTF-8, length-prefixed)
A Vec equivalent
File I/O (only on hosted targets — bare metal has no concept)
Basic print/println via the UART driver on RISC-V, via a write(2) syscall on Linux

All written in forge. The whole thing is about 1800 lines. The compiler is now ~14,000 lines of Rust.

The ratio is going to flip eventually.

Feb 14, 2026

The lexer compiles itself.

I rewrote forge’s lexer in forge. It produces byte-identical token streams to the Rust lexer on every test input I’ve thrown at it (1.2 GB of corpus across the Rust stdlib, the Linux kernel headers, and a bunch of forge programs). The two-language detour through Rust to bootstrap the forge lexer was worth it.

I am — for the first time — running forge code generated by the forge compiler on forge programs.

Next: parser. Then type checker. Then codegen. Each of those is a multi-week project. Self-hosting is not one milestone; it’s a sequence of them, and any one of them stalling can stall the whole thing.

Apr 29, 2026

Parser compiles itself.

The parser was the part I was most nervous about. Lexers are mechanical; parsers have opinions. The forge-in-forge parser has to handle precedence the same way, recover from errors the same way, build the same AST shape. Subtle drift here would compound through the rest of the pipeline.

I cheated: the forge-in-forge parser generates the AST as a stream of tags and offsets, which the forge-in-Rust compiler can read directly. So I can compile half with forge and half with Rust and check the outputs at every boundary.

This is going to be the last entry before something either dramatic happens or I get stuck for a long time. The type checker is the hardest part. If it self-hosts, the codegen will follow within weeks.

May 8, 2026

Decoding Base64 at 4 GB/s with SIMD

How vectorized lookup and a couple of bit-packing tricks turn a memory-bound loop into a memcpy-fast monster.

May 11, 2026 · 14 min read ·

Systems C Performance SIMD

May 11, 2026

Quick aside while I’m deep in the type-checker rewrite: forge is now fast enough that I’m starting to care about its build times again.

The current type checker is dominated by string interning during name resolution. Profiled it last weekend; 38% of the time is in HashMap<&str, SymbolId>. Going to swap it for an open-addressed hash table with a tighter probing scheme and see how much that buys me.

This kind of micro-optimisation work is what got me writing the base64 post — the same pattern applies. Most of the wins came from looking at the structure of the data, not from tuning the loop. The type-checker rewrite is the same shape of problem at a different scale.

Status of self-hosting: type checker is ~70% rewritten in forge. Codegen still untouched. Best guess: the whole pipeline self-hosts by autumn. The Rust compiler will then live on as a reference implementation, but production builds will be forge-on-forge.

That’s the goal. Whether I get there in 2026 or 2027 is no longer the interesting question.