Compilers — Vol. VII

LedeII

OpeningWhat a compiler is.

A compiler is a program that translates a program. The input is source code in one language; the output is an executable program — usually in machine code, sometimes in another high-level language, sometimes in bytecode for a virtual machine.

The translation is not literal. A good compiler restructures the program along the way: laying out memory, allocating registers, eliminating dead code, unrolling loops, fusing operations into vector instructions. The output is faster than a literal transcription would be — sometimes orders of magnitude faster.

Compilers are the unsung infrastructure of computing. Every operating system, every browser, every game engine and database, was produced by a compiler. The history of programming languages is, in large part, the history of compilers learning new tricks.

Vol. VII— ii —

PhasesIII

Chapter IThe five phases.

The classical pipeline, the one taught in every undergraduate course:

source → lexer → parser → semantic analyser → IR → optimiser → code generator → target

The lexer (or scanner) reads characters and emits tokens — keywords, identifiers, literals, punctuation. The parser reads tokens and builds an abstract syntax tree. The semantic analyser resolves names, checks types, and decorates the tree with attributes. The intermediate representation (IR) is a language-neutral and target-neutral form on which optimisation is performed. The code generator walks the optimised IR and emits assembly or machine code for a specific architecture.

Real compilers add stages: macro expansion, monomorphisation, register allocation, instruction scheduling, link-time optimisation. The five-phase model is a teaching abstraction. The substance is in the details of each stage.

Compilers · Phases— iii —

HopperIV

Chapter IIHopper and the A-0.

The first compiler — by anyone's loose definition of the word — was Grace Hopper's A-0 System for the UNIVAC I, written in 1951–52. A-0 was really a linker: it concatenated assembly subroutines from a tape library according to a pseudocode-driven recipe. Translation in the modern sense came later.

Hopper's larger contribution was FLOW-MATIC (1955), a business-oriented language whose syntax was deliberately English-like. It directly inspired COBOL, of which Hopper was a principal architect. Her insistence that programmers should not have to think in machine code is the founding ideology of high-level languages.

Hopper rose to the rank of Rear Admiral and remained an active proselytiser into the 1980s. Her famous nanosecond — a foot of wire, the distance light travels in a billionth of a second — was the most memorable visual aid in the history of computing.

Compilers · Hopper— iv —

FORTRANV

Chapter IIIFORTRAN I.

The first real compiler — translating a high-level language into machine code competitive with hand-written assembly — was FORTRAN I, released by IBM in 1957. The team, led by John Backus, spent eighteen man-years on the project. It nearly killed them.

Backus's bet was that scientists would tolerate a high-level language only if the generated code was fast. So the FORTRAN I compiler was built around a then-radical optimiser: flow analysis, register allocation, basic loop optimisation. The result ran within ten percent of hand-tuned assembly on the IBM 704 — performance that converted skeptics overnight.

FORTRAN was an instant hit and made the case for compiler-based language design irreversibly. Backus would later receive the 1977 Turing Award for it, and (separately) for inventing BNF, the standard notation for context-free grammars.

Compilers · FORTRAN— v —

LexingVI

Chapter IVLexical analysis.

The lexer reduces a stream of characters to a stream of tokens. if (x <= 10) becomes [IF, LPAREN, IDENT("x"), LE, INTLIT(10), RPAREN]. The lexer also discards whitespace and comments and tracks line numbers for error reporting.

The theory is regular languages. Each token class is defined by a regular expression; the union of these regexes is compiled to a deterministic finite automaton (DFA). The DFA reads the input one character at a time, transitioning between states; on accepting states, it emits a token.

lex (Mike Lesk, 1975) and its modern descendant flex automate this. You write the regexes; the tool emits a C function. The technique is so well-understood that hand-written lexers are usually written for performance reasons, not correctness — Clang's lexer is hand-written, and is among the fastest in any major compiler.

Compilers · Lexing— vi —

GrammarsVII

Chapter VContext-free grammars.

A grammar is a finite set of rewriting rules. Backus–Naur Form — invented for ALGOL 60 by John Backus and Peter Naur — is the standard notation:

expr ::= expr "+" term | term term ::= term "*" factor | factor factor ::= NUMBER | "(" expr ")"

The grammar generates a language: the set of all strings derivable by repeated rewriting from the start symbol. Parsing inverts this: given a string, find the (unique, ideally) derivation tree.

Context-free grammars are strictly more powerful than regular expressions but strictly weaker than context-sensitive grammars. They cover virtually every programming language's syntax, and the parsing problem for unambiguous CFGs is solvable in O(n³) for the general case, O(n) for the LL and LR subclasses that almost every real language belongs to.

Compilers · Grammars— vii —

ParsingVIII

Chapter VILL, LR, and friends.

Two parsing strategies dominate. Top-down (LL): start from the start symbol, predict productions one at a time. Bottom-up (LR): start from the input, reduce by productions until the start symbol is reached. Both run in linear time on suitable grammars.

LR(1) parsing — invented by Donald Knuth in 1965 — handles essentially all useful programming languages. The tables are large; LALR(1) (Frank DeRemer, 1969) compresses them at the cost of slightly more grammar restrictions, and is what yacc (Steve Johnson, 1975) and its descendants bison generate.

Modern compilers often use hand-written recursive descent (a top-down style) for control over error messages. GCC switched from a yacc-generated C parser to recursive descent in 2004; Clang has used recursive descent from the start. The gain in error-recovery and IDE-friendliness usually outweighs the loss of grammar formality.

IBM_704 — The IBM 704 (1954–1960), Backus's target machine — first commercial computer with floating-point hardware, and the proving ground for FORTRAN's optimiser.

Compilers · Parsing— viii —

ASTIX

Chapter VIIAbstract syntax trees.

The parser's output is rarely the parse tree — it is the abstract syntax tree, a stripped-down version with the irrelevant punctuation discarded. 2 + 3 * 4 becomes a tree with + at the root, 2 as left child, and a * subtree on the right. Parentheses do not appear; precedence is captured by tree shape.

The AST is the substrate for everything downstream. Type checkers walk it; optimisers transform it; code generators emit instructions from it. Most modern compilers represent ASTs as object graphs in their implementation language: a class hierarchy in C++ or Java, an algebraic data type in OCaml, Haskell, or Rust.

The visitor pattern — a class with one method per AST node type — is the standard way to walk an AST in object-oriented compilers. Functional compilers use pattern matching, which is the same idea expressed more concisely.

Compilers · AST— ix —

TypesX

Chapter VIIIType checking.

The semantic analyser checks that the program is well-typed: that x + y doesn't add an integer to a string, that f(a, b) calls a function with arguments of the right types, that the value returned from a function matches its declared return type.

For statically-typed languages with explicit annotations (C, Java), type checking is a tree walk. For languages with type inference (ML, Haskell, Rust, Swift), the analyser must derive types that the programmer left implicit. The classical algorithm is Hindley–Milner — published independently by Roger Hindley (1969) and Robin Milner (1978), who later won a Turing Award for it.

Hindley–Milner gives a most-general type to every well-typed expression in time linear in program size for typical inputs. Most modern type-inferencing languages descend from it. Polymorphism, generics, and traits or type-classes are all extensions of the Hindley–Milner core.

Compilers · Types— x —

IRXI

Chapter IXIntermediate representations.

Optimisation works best on a representation that is close to machine code but free of target-specific details. Three-address code — every instruction has at most three operands — is the canonical pedagogical IR.

t1 = b * c t2 = a + t1 x = t2

SSA — Static Single Assignment — augments three-address code with the rule that every variable is assigned exactly once. Where the same variable would have been re-assigned, a fresh name is introduced; control-flow merges use phi-functions to select the right name. SSA was developed at IBM by Cytron, Ferrante, Rosen, Wegman, and Zadeck in 1991.

SSA makes data-flow analyses simpler and more powerful. Almost every modern industrial compiler — GCC, LLVM, V8, HotSpot, Cranelift — uses SSA as its primary IR.

Compilers · IR— xi —

OptimisationXII

Chapter XOptimisation passes.

An optimiser is a pipeline of passes, each a small transformation that preserves meaning while improving some metric (usually runtime or code size).

Constant folding: evaluate 2 + 3 at compile time. Constant propagation: replace uses of a variable that holds a constant. Dead code elimination: remove computations whose results are never used. Common subexpression elimination: if a*b is computed twice with the same operands, compute it once.

Loop-invariant code motion: hoist computations out of loops if they don't depend on the loop variable. Strength reduction: replace expensive operations with cheap ones (x*2 → x+x or x<<1). Loop unrolling, function inlining, tail-call elimination, vectorisation: each is its own pass, each compounds with the others.

The art is in the ordering. Inlining exposes more opportunity for constant propagation, which exposes dead code, which exposes more inlining. Compilers run their pass pipelines twice or thrice for this reason.

Compilers · Optimisation— xii —

CodegenXIII

Chapter XICode generation.

The back end emits target machine code. Three classical sub-problems: instruction selection, instruction scheduling, register allocation.

Instruction selection: rewrite IR operations as target instructions. The naive approach gives one instruction per IR node; the good approach matches whole subtrees against a hand-written or generated table of target idioms (e.g. on x86, an array access can become a single mov rax, [rbx + 8*rcx]).

Instruction scheduling reorders independent instructions to keep the pipeline full and avoid stalls. Register allocation assigns each virtual register to a physical register or, when registers run out, to a stack slot — typically by graph colouring (Chaitin et al., 1981) or by linear scan (Poletto and Sarkar, 1999).

Compilers · Codegen— xiii —

Dragon BookXIV

Chapter XIIThe Dragon Book.

Compilers: Principles, Techniques, and Tools — by Alfred Aho, Monica Lam, Ravi Sethi, and Jeffrey Ullman — has been the canonical compiler textbook for almost half a century. The first edition appeared in 1986; the second (with Lam added) in 2006. The cover features a knight battling a dragon labeled "complexity of compiler design," which gave the book its nickname.

The Dragon Book pioneered the canonical pipeline view of compilation and codified the names by which we still know the parts: lexer, parser, syntax-directed translation, type checking, code generation, optimisation. Subsequent textbooks — Appel's Modern Compiler Implementation, Cooper and Torczon's Engineering a Compiler, Muchnick's Advanced Compiler Design and Implementation — extend or contest the framework, but every one of them frames itself in relation to the Dragon Book.

Aho and Ullman shared the 2020 Turing Award.

Compilers · Dragon Book— xiv —

PCCXV

Chapter XIIIThe C compiler tradition.

The first C compiler was written by Dennis Ritchie at Bell Labs in 1972 to bootstrap Unix. It targeted only the PDP-11. Stephen C. Johnson's Portable C Compiler (PCC, 1978) was the next leap: a retargetable compiler whose front-end and back-end were cleanly separated, and which could be ported to a new architecture by writing a few hundred lines of machine description.

PCC was the standard Unix C compiler through the 1980s. It shipped with every Unix from AT&T and the Berkeley distributions. Its design — front-end emits a tree-shaped IR, back-end pattern-matches against target idioms — set the template for two generations of compilers.

By 1990 PCC had been overtaken by GCC for portability and code quality, and by vendor compilers (Sun, IBM, DEC) for native performance. But the architectural ideas survived: every modern retargetable compiler is, structurally, PCC's grandchild.

Compilers · PCC— xv —

GCCXVI

Chapter XIVGCC.

The GNU Compiler Collection began as Richard Stallman's GCC 1.0 (1987), a free C compiler for the GNU project. Within a decade it had become the de-facto compiler for Linux and most BSDs and a serious cross-platform engineering project — multiple front ends (C, C++, Fortran, Ada, Objective-C, later D and Go), multiple back ends (every architecture under the sun, from Intel x86 to obscure embedded controllers).

The EGCS fork (1997) — pushed by Cygnus Solutions and others frustrated with the slow pace of mainline GCC — accelerated development and was eventually merged back as GCC 2.95 (1999). The episode was a milestone in open-source governance.

GCC's IR went through three generations: the original RTL (Register Transfer Language); GIMPLE, an SSA-friendly tree IR added in 4.0 (2005); and a continued layered structure with both tree and RTL phases. By 2024, GCC remained the most widely-deployed compiler on the planet.

Bell_Labs — Bell Telephone Laboratories at Murray Hill — where Dennis Ritchie wrote the first C compiler in 1972 and Stephen Johnson built PCC six years later.

Compilers · GCC— xvi —

LLVMXVII

Chapter XVLLVM.

LLVM began in 2000 as Chris Lattner's master's project at the University of Illinois under Vikram Adve. The animating idea was a clean, well-specified intermediate representation — LLVM IR — that could be the lingua franca of multiple front-ends and back-ends, with optimisation as a separate, reusable library.

Apple hired Lattner in 2005 and bet the company's compiler stack on the project. Clang, the C/C++/Objective-C front-end for LLVM, shipped with Xcode 4 in 2011. By 2015, Apple had retired GCC entirely from macOS in favour of Clang. Google, Sony, ARM, AMD, NVIDIA, and a long tail of others adopted LLVM as their backend of choice.

LLVM's modular design made it the platform for a generation of new languages: Rust, Swift (also Lattner's creation), Julia, Zig. Without LLVM, none of these would exist as quickly or as polished as they are. It is the most influential compiler infrastructure since GCC, and arguably more so.

Compilers · LLVM— xvii —

ClangXVIII

Chapter XVIClang.

Clang is the LLVM project's C, C++, and Objective-C front-end. Its design priorities — a hand-written recursive-descent parser, a careful AST that preserves source-level information, fast compilation, and extraordinary error messages — made it a darling of tooling developers as much as of end users.

The error-message improvement deserves particular credit. Where GCC would say error: expected ';', Clang would point at the exact column with a fix-it hint and a caret. The bar moved permanently. By 2015, GCC had spent years catching up on diagnostics quality, partly in response.

Clang's library-first design enabled libclang, used by IDEs (Xcode, CLion, VS Code with clangd) for syntax highlighting, refactoring, and code completion. The language server protocol ecosystem around clangd is now the standard way to bring C++ tooling to any editor.

Compilers · Clang— xviii —

JITXIX

Chapter XVIIJust-in-time compilation.

JIT compilers translate code at runtime, usually starting from a bytecode interpreter and promoting hot methods to native code. The technique was anticipated by Smalltalk in the 1980s but reached maturity with Self (David Ungar and Craig Chambers, 1987–95), whose dynamic-dispatch optimisations directly inspired the Java HotSpot JVM.

HotSpot (1999) brought JIT into the mainstream. It was followed by V8 (Google's JavaScript engine, 2008), which made browser JavaScript 10–100× faster overnight; JavaScriptCore in WebKit; SpiderMonkey in Firefox; PyPy for Python. .NET's CLR has been JIT-based since its 2002 launch.

JIT trades startup latency for steady-state speed, and works best on programs that run long enough to amortise the compilation cost. The tiered compilation trick — interpret first, JIT-compile hot methods at progressively higher optimisation levels — is now standard in every major managed runtime.

Compilers · JIT— xix —

V8XX

Chapter XVIIIThe JavaScript engines.

JavaScript was untyped, dynamic, and reputedly hopeless to compile efficiently. V8's 2008 launch in Chrome — led by Lars Bak, fresh from HotSpot and Self — disproved that. Within a few releases, hot JavaScript code ran at within 2× of equivalent C.

The trick was aggressive type speculation. The compiler observes the actual types passed at runtime, generates code specialised to those types, and inserts cheap guards that deoptimise back to the interpreter if the assumptions fail. Hidden classes (an idea borrowed from Self) make object-property lookups predictable.

The other engines followed: SpiderMonkey's IonMonkey, JavaScriptCore's FTL/B3, and a steady arms race. The discipline of writing fast JavaScript today — monomorphic call sites, stable object shapes — is essentially the discipline of staying on the JIT's fast path.

Compilers · V8— xx —

WebAssemblyXXI

Chapter XIXWebAssembly.

WebAssembly — Wasm — became a W3C standard in December 2019, after development at Mozilla, Google, Microsoft, and Apple. It is a portable binary instruction format with strict structural validation and a deterministic semantics. Code in Wasm can be JIT-compiled to native by the host browser at near-native speed.

Wasm's compile pipeline is a story in itself. C, C++, and Rust compile to Wasm via LLVM's Wasm back-end; the browser parses the binary, validates it, and re-compiles to native — twice-compiled code that nonetheless reaches the user faster than equivalent JavaScript.

The implications go beyond the web. Wasm runtimes (Wasmtime, Wasmer, WAMR) run Wasm binaries server-side as a sandboxed execution format — a successor, in some sense, to the JVM and CLR, with no language ownership. Edge platforms (Cloudflare Workers, Fastly Compute@Edge) make Wasm their native execution model.

Compilers · WebAssembly— xxi —

PolyhedralXXII

Chapter XXPolyhedral optimisation.

The deepest theory in modern loop optimisation. The polyhedral model represents a loop nest as an integer polyhedron (the iteration domain), maps statements to points in it (the schedule), and treats loop transformations — interchange, tiling, fusion, parallelisation — as affine transformations of the polyhedron.

The mathematics goes back to Paul Feautrier's 1991 work and was developed by Alain Darte, Cédric Bastoul, Albert Cohen, and others through the 1990s and 2000s. Production implementations include Polly (Tobias Grosser, on top of LLVM) and isl (Sven Verdoolaege).

The model handles the affine fragment of a program — loop bounds and array indices that are linear in loop variables — exactly. Outside that fragment, fallback heuristics apply. The polyhedral model is what makes today's high-performance computing compilers extract parallelism that hand-tuning rarely catches.

Compilers · Polyhedral— xxii —

Auto-vecXXIII

Chapter XXIAuto-vectorisation.

Modern CPUs have SIMD units — SSE, AVX-2, AVX-512, ARM Neon, SVE — that operate on 4, 8, 16, or 32 lanes of data at once. The compiler's job is to recognise loops that can be rewritten to use these units, and to do it without programmer intervention.

The classical algorithm is loop vectorisation: take a loop iterating n times over scalars, transform it into a loop iterating n/k times over vectors of width k, plus a scalar epilogue for the remainder. Dependence analysis decides whether the rewrite preserves meaning.

LLVM's LoopVectorize pass and GCC's tree-vect pass do this for non-trivial loops. SLP vectorisation (Superword-Level Parallelism, Larsen and Amarasinghe 2000) catches straight-line code that vectorises without a loop. Both passes have improved year over year; vectorisation now extracts SIMD speedups out of code its authors never thought about.

Static_single-assignment_form — SSA — Static Single Assignment form — the IR shape pioneered at IBM in 1991 and adopted by GCC, LLVM, V8, HotSpot, and almost every other industrial compiler.

Compilers · Auto-vec— xxiii —

GPUXXIV

Chapter XXIIGPU compilation.

NVIDIA's CUDA (2007) brought C-like programming to the GPU. Behind it sits nvcc, a compiler that splits a CUDA source file into host code (compiled by the system C++ compiler) and device code (compiled to PTX, NVIDIA's virtual ISA, then JIT-compiled to a specific GPU's binary at runtime).

PTX is to GPUs what LLVM IR is to CPUs: a target-neutral representation that decouples the language frontend from the rapidly-evolving hardware. OpenCL, HIP (AMD), SYCL, and oneAPI (Intel) chase the same model with different vendor flavours.

The compiler challenges are unusual. GPU programs run on thousands of threads in lockstep within a warp; divergent branches serialise. Memory has many levels (registers, shared memory, L1, L2, global, host) with vast bandwidth differences. Kernels are launched at runtime with shape parameters known only then. Modern GPU compilers do aggressive inlining, loop unrolling, and shared-memory layout optimisation to extract performance from this regime.

Compilers · GPU— xxiv —

MLIRXXV

Chapter XXIIIMLIR.

The 2010s machine-learning explosion produced a wave of tensor-computation frameworks (TensorFlow, PyTorch, JAX), each with its own compiler. MLIR — Multi-Level Intermediate Representation — was Lattner's response, started at Google in 2018 and contributed to LLVM in 2019.

MLIR generalises LLVM's IR concept by allowing multiple coexisting dialects in a single module. A graph of TensorFlow ops can be lowered through a chain of dialects — high-level tensor ops, then linear-algebra ops, then loop nests, then LLVM IR — with each lowering step a well-defined pass.

The framework is now the substrate of XLA, IREE, Mojo, and parts of TensorFlow and PyTorch. It also powers experimental compilers for hardware accelerators (TPUs, Cerebras, Graphcore). The pattern of "one IR, many dialects" looks likely to be the pattern of the next decade of compiler infrastructure.

Compilers · MLIR— xxv —

BootstrappingXXVI

Chapter XXIVBootstrapping.

A compiler written in language X compiling itself: the rite of passage of every serious language. FORTRAN bootstrapped first, in 1957, but the cleanest example is PL/I (1966). The Unix C compiler bootstrapped in 1973, and the discipline of self-hosting has been routine ever since.

Self-hosting forces the language designers to use the language they are designing. The benefits — sharpened taste, exposed ergonomic weaknesses, accumulated dogfood — usually outweigh the bootstrap cost. Modern languages (Rust, Go, Swift, TypeScript) typically self-host within their first three or four years.

Ken Thompson's 1984 Turing Award lecture, Reflections on Trusting Trust, exposed a darker corollary: a malicious modification to a self-hosting compiler can persist invisibly across rebuilds, recreating itself in each new compiler binary even after the source has been cleaned. The result is a foundational result in software supply-chain security.

Compilers · Bootstrapping— xxvi —

ErrorsXXVII

Chapter XXVDiagnostics.

Compiler error messages are a user interface. The 2010s saw a wave of attention to this fact. Clang set the bar with caret-anchored diagnostics, fix-it hints, and template-instantiation backtraces. Elm (Evan Czaplicki, 2012) made friendly errors a feature: each error includes a textual explanation of what went wrong and what the programmer probably meant.

Rust (Aaron Turon and Niko Matsakis, then a long line of contributors) inherited and extended the tradition. Rust's borrow-checker errors are some of the most thoughtfully-designed diagnostics in any compiler — they include code spans, lifetime annotations, suggested edits, and pointers to the relevant section of the Rust book.

The compiler-as-IDE trend is part of the same evolution. rust-analyzer, clangd, gopls, TypeScript Server, HLS (for Haskell): every modern language ships a language-server-protocol implementation. The compiler is no longer just a build-step program; it is the brain of the editor.

Compilers · Errors— xxvii —

VerifiedXXVIII

Chapter XXVIVerified compilers.

Can a compiler be proved correct? CompCert — by Xavier Leroy and collaborators at INRIA, started in 2003 — is a C compiler whose entire pipeline is proved, in the Coq proof assistant, to preserve program semantics from source to assembly.

CompCert is mainly used in safety-critical settings (avionics, medical devices). Its generated code is slower than GCC's -O2 but faster than GCC's -O0, and uniquely it carries a machine-checkable certificate of correctness. The 2011 paper Finding and Understanding Bugs in C Compilers (Yang et al.) found bugs in every major production C compiler; CompCert was the only one in which the only bugs found were in the unverified front-end's parsing, not in the verified middle-end.

The lesson generalises. Verified compilation is rare in mainstream practice but increasingly important for cryptography (where mistranslation is a security failure), and is a steady source of theoretical advances in semantics, type theory, and proof engineering.

Compilers · Verified— xxviii —

ModernXXIX

Chapter XXVIIThe state of the art.

By 2025, three compiler families dominate. GCC remains the default Linux toolchain and the canonical free Fortran/Ada compiler. LLVM/Clang is the platform of choice for new languages and for Apple, Google, and most of the embedded and game-development worlds. MSVC remains the default for Windows-native development, with its own SSA-based middle-end (UTC).

Beyond the C family: HotSpot and OpenJ9 for the JVM; V8, JavaScriptCore, and SpiderMonkey for JavaScript; RyuJIT for .NET; Cranelift as a fast non-LLVM Wasm and Rust backend; Cython, Numba, PyPy, and now Mojo for Python and its successors.

The frontier is in AI compilers: XLA, IREE, TVM, Triton, MLX. The frontier is in verified compilation: CompCert, CakeML, RustBelt. The frontier is in language-as-compiler tools: rust-analyzer, Clangd, the compiler-as-IDE. There has never been a more interesting decade for the field.

Compilers · Modern— xxix —

Reading listXXX

Chapter XXVIIITwenty essentials.

1986Compilers: Principles, Techniques, and Tools (Dragon Book)Aho et al.
1997Modern Compiler Implementation in ML/Java/CAppel
2003Engineering a CompilerCooper & Torczon
1997Advanced Compiler Design and ImplementationMuchnick
1992The Theory of Parsing, Translation, and CompilingAho & Ullman
2005Linkers and LoadersLevine
2002SSA-Based Compiler DesignRastello & Bouchez
2011Programming Language PragmaticsScott
2002Types and Programming LanguagesPierce
1996Essentials of Programming LanguagesFriedman & Wand
2009The LLVM Compiler Infrastructure papersLattner & Adve
1991Efficiently Computing Static Single Assignment FormCytron et al.
1984Reflections on Trusting TrustThompson
1978The C Programming LanguageKernighan & Ritchie
2009Formal Verification of a Realistic CompilerLeroy
2011Crafting InterpretersNystrom
2014Writing Compilers and InterpretersMak
2015Build Your Own LispHolden
2018The Garbage Collection HandbookJones, Hosking, Moss
2020SSA-Based Compiler Design (online book)community

Compilers · Reading list— xxx —

Watch & ReadXXXI

Chapter XXIXWatch & read.

↑ How Compilers Work — overview of the classical pipeline

More on YouTube

Watch · LLVM in 100 Seconds
Watch · Lex and Yacc in 5 Minutes

And read

Aho, Lam, Sethi, Ullman, Compilers: Principles, Techniques, and Tools (2006) is still the right textbook to start with. Then Appel's Modern Compiler Implementation. For LLVM, Lattner's original 2004 master's thesis is short and lucid. Bob Nystrom's Crafting Interpreters (free online) is the best practical introduction written this century.

Compilers · Watch & Read— xxxi —

ColophonXXXII

The end of the deck.

Compilers — Volume VII, Deck 11 of The Deck Catalog. Set in Iowan Old Style with monospaced accents in SF Mono and Menlo. Slate background #0b0d10; mint accent #7df9c4; warm amber for headings.

Thirty-two leaves on the long apprenticeship of compilation — from Hopper's A-0 in 1952 to MLIR in 2019, by way of FORTRAN, the Dragon Book, GCC, LLVM, V8, and the verified compilers of the present.

FINIS

Vol. VII · Technology · Deck 11

Compilers.