From the first artificial neuron sketched in 1943 to large language models trained on most of the public internet — eight decades of one idea, scaled.
Neurophysiologist Warren McCulloch and logician Walter Pitts publish "A Logical Calculus of the Ideas Immanent in Nervous Activity." They prove that simple threshold units, wired together, can compute any logical proposition.
If Σ w·x ≥ θ, fire. Otherwise, stay silent.
A summer gathering at Dartmouth College, organized by John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon, declared a new field. The proposal claimed that "every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it."
Intelligence as computation.
10 researchers, 8 weeks, no guarantees.
A name, a community, and 70 years of follow-up.
Frank Rosenblatt, at Cornell Aeronautical Laboratory, builds the Mark I Perceptron — a physical machine of motors, potentiometers, and a 20×20 photocell array. Crucially, it adjusts its own weights from examples.
w ← w + η(y − ŷ)x
The book Perceptrons proves a single-layer perceptron cannot learn XOR — or any non-linearly-separable function. The technical result was narrow. Its cultural effect was not.
Rumelhart, Hinton, and Williams publish "Learning representations by back-propagating errors" in Nature. Multilayer networks become trainable: the chain rule, applied recursively, sends a useful gradient back through every layer.
IBM's Deep Blue defeats the reigning world chess champion 3.5 to 2.5 in a six-game match. It is a triumph of brute-force search — 200 million positions per second — paired with hand-crafted evaluation. Symbolic AI's last great stand.
positions evaluated per second.
match score, May 11, 1997.
neural networks involved.
Krizhevsky, Sutskever, and Hinton train an 8-layer convolutional network on two consumer GPUs. Top-5 error on ImageNet drops from 26% to 15.3%. Within five years, every serious computer-vision system is a deep neural network.
Vaswani et al. at Google publish the Transformer: a sequence model built entirely from self-attention, with no recurrence and no convolution. It parallelizes beautifully on GPUs, scales gracefully with data, and quietly becomes the backbone of nearly every modern AI system.
Each row: how much one token attends to every other.
OpenAI shows that an autoregressive Transformer, trained on much of the public internet, becomes a capable few-shot learner without any task-specific fine-tuning. Kaplan et al. quantify the trend: loss falls as a clean power law in compute, data, and parameters.
Released as a "research preview" on November 30, 2022, ChatGPT becomes the fastest-growing consumer application in history. The breakthrough was not raw capability — GPT-3.5 had existed for months — but interface: a chat box, free, with a model tuned by RLHF to be useful and to refuse less.
to 1 million users.
to 100 million users.
cost to try, at launch.
Eight decades in, the field is louder than ever and less certain than it sounds. Three threads worth watching:
How do you make a system that pursues the goal you actually meant — including in situations its training data did not cover?
Are LLMs a stepping stone to AGI, or a powerful but bounded technology that needs new ingredients (memory, planning, embodiment)?
Does capability accelerate smoothly, or does recursive self-improvement create a discontinuous jump? The answer changes everything else.
A short, opinionated list. Each YouTube link is a search query — the top results stay reasonably current.
← to revisit · Home to start over · thank you.