Copyright 2023 Brian Davis - CC-BY-NC-SA

The Ladder of Abstraction

A major question when creating a programming language is whether it will be compiled to a target and what target. Interpreted languages are typically compiled to a bytecode of some form for a particular virtual machine. Compiler tools like LLVM mean that new language compilers aren't typically spitting out native machine code but rather an intermediate representation (IR).

I have an idea that I call the Ladder of Abstraction. I may have read it somewhere but if so, I can't seem to find where.

Computers execute machine code, but machine code is difficult for humans to write directly. It's too verbose for expressing complex concepts, so humans created abstractions. The first abstraction was assembly which did little more than replace the numbers of machine code with words. Quickly, the next rung on the ladder was added with early programming languages. These early languages provided a few simple abstractions that will be familiar to programmers even today: if/else branching, loops, local variables and procedures. These languages were followed by even more abstract languages that were more expressive, dynamic types, objects, etc.

Another abstraction was that languages started targeting virtual machines. A virtual machine (VM) is a very simple program that takes instructions, just like a real computer but provides abstractions for the physical hardware so that a program could be written once, and run on many different kinds of computers, so long as the virtual machine was ported to the new architecture.

Recent developments in this area include WASM, v8 and node.js, but virtual machines have been around a long time. A massive amount of engineering hours have been sunk in the JVM or .Net.

Compiler writers made a similar abstraction, creating intermediate representations (IR) that both allow specific optimizations and decouple compiler front ends and back ends. Front ends are now typically written to take a program's source to an IR and then back ends compile that to platform specific machine code.

When Physical Target's Don't Matter

Despite having robust virtual machines, true cross platform software is still hard. Even if Java code can theorectically run on both laptops and cell phones, these devices have wildly different capabilities. Touch screen versus touchpad for example. Designing a program to work well with either is a real challenge.

But we're starting to see cross platform toolkits that help tackle challenges presented by running on wildly different hardware and huge driver in this area is web technologies. More and more, programs are targeting the web, via javascript or WASM, and need to be able to work well on a tiny touch screen of a cell phone and on a full workstation. I think we are at a juncture where we're moving up the ladder of abstraction. Languages of the next couple decades will all target an IR or virtual machine. Compiler backends will be able to target native or virtual machines. Convergent evolution may well result in translation between VM bytecodes. New languages can safely discount the importance of choosing a compilation target by picking an IR or VM with a lot of inertia and assume that there will be a path forward.