Copyright 2023 Brian Davis - CC-BY-NC-SA

The Ladder of Abstraction

A major question when creating a programming language is whether it will be compiled to a target and what target. Even interpreted languages are typically compiled to a bytecode of some form for a particular virtual machine and compiler tools like LLVM mean that new language compilers aren't typically spitting out machine code. This illustrates an idea that I call the Ladder of Abstraction. I may have read it somewhere but if so, I can't seem to find where.

In simplified terms, computers execute machine code, but machine code is difficult for humans to write directly. Composed of binary numbers and too verbose for expressing complex concepts, so humans created abstractions. The first abstraction was assembly which did little more than replace the numbers of machine code with words. Quickly, the next rung on the ladder was added with early programming languages. These early languages provided a few simple abstractions that will be familiar to programmers, even today. If/else branching, loops, local variables and procedures. These languages were followed by even more abstract languages that were more expressive, dynamic types, objects, etc.

Another abstraction was that languages started targeting virtual machines. A virtual machine (VM) is a very simple program that takes instructions, just like a real computer but provides abstractions for the physical hardware so that a program could be written once, and run on many different kinds of computers, so long as the virtual machine was ported to the new architecture.

Recent developments in this area include WASM, v8 and node.js, but virtual machines have been around a long time. New and shiny are often attractive to programmers but don't discount the massive amount of engineering hours that have been sunk in the JVM or .Net.

Compiler writers made a similar abstraction, creating intermediate representations (IR) that both allow specific optimizations and decouple compiler front ends and back ends. Front ends are now typically written to take a program's source to an IR and then back ends compile that to platform specific machine code.

Idea 1: Physical Target's Don't Matter

We are at a juncture where we're moving up the ladder of abstraction. Languages of the next couple decades will all target an IR or virtual machine. We are already seeing compiler backends that target virtual machines with emscripten. Convergent evolution may well result in translation between VM bytecodes becomes possible. I think new languages can safely discount the importance of choosing a compilation target by picking an IR or VM with a lot of inertia and assume that there will be a path forward.

Writing a customer VM can provide really neat language features but I think should be considered a separate effort for designing a new language.

Idea 2: Tooling Matters More Than Design Now.

Programmers need a good testing library, a formatter, linter, debugger, logger, and packaging system. These quality of life tools may well be more important than language design.

Idea 3: Batteries Included Or Good FFI.

A new language needs a good standard library. The only practical path forward for a small team is to provide a really good (productive) foreign function interface (FFI). This allows programmers to libraries written in other languages in your new language.

Note that for the WASM use case executable file size is an important consideration so linking external libraries needs to be intelligent.