An Introduction to Computer Programming: Data
Copyright 2023 Brian Davis - CC-BY-NC-SA
There are only two things and they're the same.
Computer programs are composed of two things: Instructions and Data. Since we know a computer only ever does exactly what we tell it to do, there must be a way to communicate our intent, to provide it instructions. Computer instructions may be written by humans in a variety of languages and dialects, none of which are natural human languages. A computer language must be completely free of ambiguity, its grammar and vocabulary must cover the sorts of things that computers can do, and its syntax is often constructed to make it easier for the computer to parse. In this work you will introduced to many of these computer languages and we will discuss their differences and similarities. Looking at computer languages through a polyglot lens can be constructive as it highlights the fundamental concepts that underpin all computer languages.
Data, on the other hand, are (or is depending on your grammatical proclivities) pieces of information you wish the computer to do things to. Consider for example.
2 + 1
This line contains an instruction: + and two pieces of data, the numbers 2 and 1. Taken together this line will instruct the computer to add 2 and 1. Some languages will have a problem with the fact that we haven't told the computer what to do with the result while others operate with a standing assumption that expressions like this, where the result isn't used will be ignored, or perhaps printed to the screen. The behavior will always follow the rules but those rules may change between environments.
We're going to review how data are represented in computer languages first and dive into instructions second. Even with going fairly deep this section should be, by far, the shorter.
Binary
You may have heard that computers store everything as ones and zeros, or bits. This is true. The mechanisms that make up all modern computers are all bystate devices, like a switch, that at any given moment are either in a state that means one (like a closed switch) or zero (like an open one). There also transistors and capacitors and flash cells and magnetic memory cells and all kinds of these that work in different ways but they all used to store ones and zeros.
When you store numbers as ones and zeros we call that system the binary number system. You are probably used to the decimal number system where each decimal place represents a power of ten like below. It might be helpful to remind you that x^0 = 1, that is, any number raised to the zero power is 1.
1 = 1 * 10^0 23 = 2 * 10^1 + 3 * 10^0 157 = 1 * 10^2 + 5 * 10^1 + 7 * 10^0
That may seem a ridiculously complicated way to rewrite numbers but below is that same sort of thing done for three number systems that are useful for computers. Each number system has a base, that is the thing that each digit place stands for, like ten in good old decimal. When I write 0b01 I mean a binary number (base-2) with the numerals 01. 0x4E would be the hexadecimal (base-16) number with the numerals 4E and 0o45 the octal (base-8) number with the numerals 45. This scheme is handy for being compact and adopted in a few computer languages.
Binary numerals: 0, 1 Octal numerals: 0, 1, 2, 3, 4, 5, 6, 7 Decimal numerals: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Hexadecimal numerals: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F
0b101 = 1 * 2^2 + 0 * 2^1 + 1 * 2^0 = 4 + 1 = 5 0b1111 = 1 * 2^3 + 1 * 2^2 + 1 * 2^1 + 1 * 2^0 = 8 + 4 + 2 + 1 = 15 0o17 = 1 * 8^1 + 7 * 8^0 = 8 + 7 = 15 0xF = 15 * 16^0 = 15 0xFF = 15 * 16^1 + 15 * 16^0 = 15 * 16 + 15 = 255 0b11111111 = Do you really want me to write this one all the way out? = 255
Notice how 255 in binary is much longer than in hexadecimal. The point I'm trying to make is that computers use binary but it's so ugh boring that humans use other things. If you only care about the numbers, decimal will do fine. Other humans have already programmed your computer to convert your decimal numbers to binary. If you care about which digits are one or zero (we'll get to why you might) hexadecimal or octal are way more convenient and your computer has already been programmed to convert those too.
Types of Data
So far we've only talked about numbers but humans don't just care about numbers, we care about words and music and pictures (of cats mostly) and movies and all sorts of data. Computers don't care about anything, their cold calculating mechanisms only deal with numbers, binary numbers, ones and zeros. Fortunately humans have devised many clever ways to represent words and music and cat pictures and all the rest as binary numbers[1]. But that means when you add data to your programs the computer has to also store the type of that data.
The type tells the computer how the data translate between the binary world of ones and zeros and the programmer's world. Let's explore some common types in various computer languages.
Boolean
A boolean data value is simply true or false. It is the result of tests or conditions, and used to control branching or as flags to modify program behavior. In most languages the literal words: true and false are used while Python made the strange choice to capitalize. As the Pythonistas say,
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
C, Javascript, Rust
true
false
TODO: Add default casts
Python
True
False
Integer
Integers are whole numbers. As such, no language I've ever used has made the perverse choice to represent them in any way other than just as series of numerals. As hinted at above you can sometimes write them in number systems other than decimal, but mostly you'll just use decimal.
Where languages differ is in how many bits are used to store common integers. The more bits the larger the integer can get. (ie 2^N - 1)
8-bit = 255 16-bit = 65,535 32-bit = 4,294,967,295 64-bit = 18,446,744,073,709,551,615
Depending on your computer, C will pick 32 or 64 bits. Javascript treats most numbers as floating points (see below) until they need to be integers, at which time they are treated as 32-bit. Rust allows you to specify the number of bits but if you don't, defaults to 32 or 64 depending on your computer. Python integers have unlimited size, which means the computer will grow an integer value until it fills the computer's entire memory.
Floating Point
Strings
Null
Representations
JSON
YAML
MessagePack
Protobuf
[1] In a later article I'd like to talk about the details of common data formats.