A Brief History of Computation

This article is based on this talk about the theory of computation by Advait Shinde. I very much recommend it.

A little bit of theory

Axiom Towers

You can see a system like an axioms tower where blocks, or theorems, are built one upon another. The axiom is the cornerstone and then you theorize hypothesis (theorems) from that base axiom and try to build them keeping a supportive, loose, cohesive and coherent relationship between each one of them.

For example, multiplication is built upon addition. The same is for kinds of numbers (natural, rational, imaginary, etc, etc). Computers operate on binary (on, off, add, subtract) and everything else is built upon these two operations.

We can extrapolate this to every aspect of software engineering: micro services that work with each other, frontends, dependencies, components, classes, services and endpoints, etc, etc. We must see beyond the immediate impact and surface when creating and modifying pieces of our systems and make sure that what we are doing does not affect and is in sync with every other piece, otherwise we risk crumbling it all and, sometimes, silently, which is even worse.

Rules & Constraints

When operating within a given system we do so by following certain rules and having in mind certain constraints. But how are these represented? With symbols, but these same symbols don't mean anything without the rules and the rules only make sense and can be represented in terms of symbols.

Think of computers that paint pixels (characters) in the screen while we are typing. They make sense because of the meaning we give them as symbols; language itself. But for a more down to earth analogy, think of programming languages, architecture and design patterns, software frameworks, that we use and implement while programming.

They make sense within a certain context, with a certain set of rules and under certain constraints. We call this function because of what it does in a given language or library. We implement this pattern because of what it might give us. We use this or that framework because it has certain features that fits our use-case.

Nevertheless, theorems (blocks) are discovered, not invented (specially for math), and for that we have to try things, new things. Experiment. That's how axiom towers are built, and new sets of rules and constraints come to life.

Question the common sense of things, strip everything down to its fundamentals, to what we know is undeniably true. This is tackling problems from first principles, to reduce complexity, to refuse anything that cannot be independently verified and rebuild our understanding from the ground up. This way we can build, create new things.

Axiom Towers: Computation

Like the axioms proposed for math by Peano, computation had its own proposed by Alan Turing, Alonzo Church and Kurt Gödel (recursive functions), circa 1930. We will focus on the first two: Turing Machines and Lambda Calculus.

They were proposed at roughly the same time and try to represent the same thing, the computation of an algorithm (a set of instructions to reach a desired result/state). They differ in how they do it, meaning, in the rules they apply:

Turing Machines

it's like a tape with contiguous boxes one next to each other, starts blank with a list of instructions
it has a HEAD (a box) that holds the initial value
boxes can be marked and unmarked (X), i.e modifying the tape (state)
depending on if the box is marked/unmarked (state), go to N (jump, goto), i.e state determines how the program jumps/runs
the behavior of the program is changed with every tape/state modification
to reason about the behavior of the program one must understand the state of the tape at every moment of modification
HEAD can be moved to left and right
you could think of it like a linked list

Turing Machines Tower

What we have today was mainly built upon Turing Machines, but Von Neumann model is considered the base axiom. Turing Machines are so popular because VN Instructions are easily implementable in hardware (efficient).

The tower would be something like this:

C++ (1985) - Classes, Objects
C (1972) - Functions, Structs, Malloc
Fortran (1957) - If Statements, Loops
Assembly (1949) - Human Readable, Primary Instructions (ADD, SUBTRACT, MULTIPLY, DIV)
Von Neumann Model (1945) - VN Instructions
Turing Machines (1936) - TM Instructions

Lambda Calculus

minimal set of rules:
- programs are expressions (aka lambda terms or λ-terms)
- expressions can either be a named variable, a function (formally called an abstraction), or a function call (formally an application)
a more functional approach, literally you just built everything with functions
declare a function
call a function
modify the parameters/arguments of a function by currying to make it more modular and reusable (calling a function returned from another function)
recursion not allowed, although the Y Combinator, a higher-order function (a function that takes a function and returns a function) in functional programming, enables recursion in anonymous functions
can be used to represent minimal models of Turing Complete machines

Lambda Calculus Tower

Lambda Calculus is the Assembly of functional programming, but it's hard to implement in hardware.

React (2013) - View is Pure Function of State
Elm (2012) - Flux/Redux Pattern
Haskell (1985) - Pure
ML/OCAML (1973) - Pattern Matching
System F (1972) - Lambda Calculus with Types
LISP (1958) - Meta-Circular, GC
Lambda Calculus (1936)

You could say that React is to jQuery what Lambda Calculus is to Turing Machines:

jQuery
- the DOM is your state
- anything can make modifications to the DOM
- the DOM ends up in unexpected states because of unforeseen state modifications
React
- the state is explicitly defined
- your view is a pure function of state
- you don't modify state, you create new state

React's constraints make it easier to reason about state and the DOM. Functional programming's constraints make it easier to reason about programs.

So...

If an algorithm can be encoded as any of these two, then is computable. With any of the two you can represent a program in a computer.

It's said that a machine "is Turing Complete" when it can simulate a Turing Machine and represent any algorithm that is computable, for example Conway's Game of Life. The same can be said of programming languages, like general purpose languages. The opposite are DSL (Domain Specific Languages) that serve to a very specific endeavor, goal or domain.

Lambda Calculus has no tape (state), all you have are input arguments to functions. Functions are pure, like in math, i.e with the same input you always get the same output (deterministic). All values are immutable, though you can generate new values from existing ones. There are no loops (requires modifying counter variables), so recursion is used instead. And finally functions are your unit of composition, and because there is no global state, it's much easier to reason about (composition).

So, Turing Machines and Lambda Calculus are equivalent:

Turing Machines can be rewritten as Lambda expressions
Lambda expressions can be rewritten as Turing Machines
they also are equivalent to recursive functions

All the way from theory to architecture

Von Neumann Architecture

This axiom came around 1945, designed by physicist John Von Neumann. It's a foundational computer design where program instructions and data are stored together in the same memory. It consists of a CPU (with an ALU and Control Unit), memory, and input/output devices, operating sequentially via a single bus. Despite being developed in the 1940s, this design remains the basis for most modern general-purpose computers (servers, desktop, laptop, mobile devices, microcontrollers).

Instructions

load X from memory cell P
store X to memory cell P
add, subtract, multiply
if memory cell P = 0, go to N
if memory cell P != 0, go to N

The main characteristics are:

Shared Memory: both data and instructions are stored in the same main memory (RAM)
Central Processing Unit (CPU): comprises an Arithmetic Logic Unit (ALU) for calculations, a Control Unit (CU) to manage instructions, and registers for temporary storage
Sequential Execution: the CPU fetches, decodes, and executes instructions one at a time (the instruction cycle)
Input/Output (I/O): mechanisms to connect the computer with external devices (keyboards, mouses, etc)

The Von Neumann Bottleneck

A critical limitation is that the CPU can only perform one instruction or data fetch at a time because they share the same bus. This, known as the Von Neumann bottleneck, means that processing speed is often limited by memory access times, especially when data transfer is slower than computations, often limiting efficiency in modern applications.

Some solutions or workarounds are optimizing caching (closer to the CPU), designing processors with greater bandwidth and new architectures that process data within memory, avoiding data transfer.

Today, almost all modern computers utilize a multi-bus architecture. While early computers often used a single system bus, modern high-performance systems require multiple, specialized buses to prevent data bottlenecks and allow simultaneous operations between the CPU, memory, and peripheral devices.

Multi-Bus Architecture

This is the standard today:

Performance Bottleneck Elimination: A single bus creates a "bottleneck" because only one device can send data at a time. Separating memory, I/O, and peripheral traffic onto separate buses allows simultaneous communication, greatly increasing speed
Separation of Speeds: High-speed components like RAM (using a dedicated memory bus) need to communicate much faster than slow components like hard drives or USB ports (which use peripheral buses like PCIe or SATA)
Internal vs. External Buses: Modern computers distinguish between an internal/local bus (for communication between the CPU and cache/memory) and external/peripheral buses (for connecting to peripherals)
Modern Interconnects: Classic system buses have been replaced in modern computers by high-performance technologies like PCI Express (PCIe) and Intel QuickPath Interconnect (QPI) or similar technologies that facilitate multiple parallel communication paths

Key Takeaways:

Multi-bus is standard: even small embedded systems or smartphones often use multiple bus structures internally (e.g., AHB/APB buses in ARM architecture)
Specialized Buses: computers use specialized, high-speed buses for data-intensive tasks (memory bus) and different buses for input/output
Von Neumann remains: despite using multiple buses for speed, most computers still follow the general Von Neumann architecture, where data and instructions share the same memory, but this memory is accessed through optimized, complex bus structures