How a hominid throat, a social brain, and a vocabulary of perhaps fifty thousand items diverged from every other primate communication system. Theories, evidence, and the irreducibly speculative core.
No fossil preserves a sentence. No artefact records the first word. The origin of language is the central problem of human evolution and the one for which direct evidence is most absent.
What we have is indirect: comparative anatomy, archaeological proxies, the genetics of speech-related disorders, the developmental sequence in modern children, and the structure of the ~7,000 surviving languages. From these we triangulate.
This deck surveys the major theories — Chomsky's faculty, the gradualist alternatives, Bickerton's protolanguage, Tomasello's cooperative origins — and the empirical anchors: the FOXP2 gene, the descended larynx debate, the Nicaraguan sign-language emergence, the symbolic-revolution evidence at Blombos and Sibudu.
Language is not communication. Bees communicate; vervets communicate; the immune system communicates. What language adds to communication is discrete infinity — the capacity, from a finite vocabulary and a finite set of combinatorial rules, to produce and understand an unbounded set of expressions, including ones never heard before.
Hockett's 1960 list of design features identified sixteen properties; the durable subset is six: arbitrariness (no resemblance between sound and meaning), productivity (novel utterances), displacement (reference to the absent and the imagined), cultural transmission (learned, not innate-as-content), duality of patterning (meaningless sounds combine into meaningful units), and reflexivity (language can talk about itself).
No animal communication system has all of these. A few have one or two. The gap is large enough that the question of origins is not "how was the gap closed gradually" but "what made the gap possible at all."
Noam Chomsky's position, defended for sixty years and refined many times, is that humans possess a language faculty — a biological cognitive system, distinct from general intelligence, that contains the abstract architecture of grammar. The claim is genetic, not cultural.
The 2002 Hauser-Chomsky-Fitch paper The Faculty of Language: What Is It, Who Has It, and How Did It Evolve? sharpened the position. The "narrow" faculty (FLN) is the recursive combinatorial engine — perhaps a single computational primitive, Merge. The "broad" faculty (FLB) includes the sensory-motor and conceptual-intentional systems that most other animals share in some form.
The strong Chomskyan claim is that FLN emerged once, recently, in a single mutation or small set, in the line that produced anatomically modern humans. This is unfashionable and contested. The case for it: the apparent uniformity of human grammatical capacity, the speed of child acquisition, the absence of intermediate forms.
Universal Grammar — the abstract structural template Chomsky argued every human language must conform to — was the most ambitious linguistic claim of the twentieth century. The "principles and parameters" framework (1981) proposed that children come pre-equipped with a small set of switches; experience sets them.
The empirical content of UG has shrunk over time. The 2002 minimalist programme reduced it nearly to Merge alone. Critics — Daniel Everett (Don't Sleep, There Are Snakes, 2008) on Pirahã's apparent absence of recursion, Nick Evans and Stephen Levinson's 2009 Behavioral and Brain Sciences paper on the diversity of languages — have argued the universals are weaker than UG requires.
The fight is partly empirical, partly definitional. Even most critics of UG accept that human children acquire language with extraordinary uniformity from radically different inputs, and that this uniformity wants explanation.
Derek Bickerton's Language and Species (1990) and Adam's Tongue (2009) proposed that full syntactic language was preceded by protolanguage — a stage in which hominins combined symbolic vocabulary without grammatical structure. Words, but no sentences in the modern sense.
The evidence Bickerton drew on: the early speech of human children (telegraphic, content words only); the language produced by feral or severely deprived children acquired late (Genie); the pidgins generated when adult speakers of mutually unintelligible languages must communicate; the trained-ape vocabularies of Kanzi and Washoe.
Protolanguage gives a gradualist alternative to the single-mutation Chomskyan account. The transition from protolanguage to full language might still have been rapid — Bickerton put it after Homo erectus, perhaps with Homo sapiens — but the protolanguage stage is taken to extend back as far as 1.5 million years.
Michael Tomasello's Origins of Human Communication (2008) and A Natural History of Human Thinking (2014) argue that the prerequisite for language is not a grammar module but a particular kind of social cognition: shared intentionality. The capacity to represent that another agent represents the world, and to coordinate joint action around shared goals.
Children point declaratively at a year — sharing attention with no instrumental request — at an age when chimpanzees raised in the same human environment never do. This pointing-to-share, Tomasello argues, is the missing precursor: language extends a pre-linguistic system of cooperative reference that other apes do not possess.
The Tomasello account makes the origin of language partly a matter of evolved sociality. The cognitive machinery that enables grammar might be relatively cheap once the cooperative platform is in place.
The KE family in west London were studied through the 1990s. About half of three generations had a severe speech and language disorder — articulation deficits, grammatical impairments, low verbal IQ. In 2001 Cecilia Lai, Simon Fisher, and colleagues identified the cause: a single mutation in FOXP2, on chromosome 7.
FOXP2 is a transcription factor — it regulates other genes — and is found across vertebrates. The human version differs from the chimpanzee version by two amino acid substitutions, both fixed in the human population. Initial 2002 work suggested the substitutions were swept to fixation within the last 200,000 years. Later analysis — Krause et al. 2007 — found the same human-type FOXP2 in Neanderthals.
FOXP2 is not "the language gene." It is necessary for normal speech-motor development; mutations disrupt language acquisition; and it shows signatures of recent selection in the human lineage. It is the strongest single piece of genetic evidence we have for an evolved speech apparatus.
The human larynx sits low in the throat compared to other primates. The descent enlarges the supralaryngeal vocal tract and enables the vowel space — the contrast between [i], [a], and [u] — that human speech depends on. Philip Lieberman's research from the 1970s argued the descent was a recent and uniquely human adaptation, requiring full speech-readiness only in Homo sapiens.
The Lieberman position has been substantially weakened. Tecumseh Fitch's work on dynamic larynx descent during vocalisation — observed in deer, dogs, goats, and chimpanzees — shows the anatomical configuration is not exclusively human. Reconstructed Neanderthal vocal tracts, once argued to preclude full speech, now look adequate.
What remains: human speech requires fine motor control of the breath, tongue, and lips, and a cortical-bulbar pathway capable of voluntary phonation. This control is much more developed in humans than in any primate. The anatomy is permissive; the neural control is the harder lift.
The hyoid is a small horseshoe-shaped bone in the throat, suspended in soft tissue, anchoring the tongue. It rarely fossilises. Three matter for the origins question.
The Kebara Cave hyoid (Israel, ~60 kya), Neanderthal — discovered 1989, indistinguishable in shape from a modern human's. The Atapuerca SH hyoids (~430 kya), Homo heidelbergensis — also modern in form. The Dikika australopithecine infant hyoid (~3.3 mya) — chimp-like, indicating air sacs of the kind seen in great apes today.
The inference: the modern hyoid configuration is at least Middle Pleistocene. Whatever speech adaptations the bone supports, they are not unique to Homo sapiens. Combined with the Neanderthal FOXP2 finding, this suggests a deep shared substrate of speech-readiness across the later Homo lineage.
The Blombos Cave site on the South African coast yielded engraved ochre pieces dated 75–100 kya — geometric crosshatched patterns, manufactured pigment, perforated marine shells likely strung as beads. Older versions of similar evidence come from Sibudu, Diepkloof, Pinnacle Point.
The argument: symbolic behaviour of this kind requires conventional shared meaning. A bead has no functional value; it works only as a sign. Pigment use likewise. Where there is convention, there is communicative practice; where there is communicative practice of this complexity, there is plausibly language.
The Blombos evidence pushes the symbolic threshold back to roughly the emergence of anatomically modern humans — perhaps earlier. The older view of a "creative explosion" at 40 kya with the European Upper Palaeolithic transition is now read as a regional rather than species-wide event.
Michael Corballis (From Hand to Mouth, 2002) and Michael Arbib argue language began in gesture and migrated to vocalisation. The case rests on the mirror-neuron system in primates — Rizzolatti's discovery in macaques (1992) — which links action observation and action production, and on the manual dexterity already present in Homo erectus.
Sign languages prove the cognitive content of language is medium-independent: deaf children acquiring ASL pass through identical developmental milestones as hearing children acquiring spoken language. Whatever language is at the level of the brain, it is not specifically vocal.
The gesture-first hypothesis remains contested. Vocalisation has its own evolutionary advantages — it works in the dark, around obstacles, while hands are occupied with tools. The likely answer is that both modalities co-evolved, with vocal control accelerating later in the Homo lineage.
Steven Mithen's The Singing Neanderthals (2005) revives an older proposal: language and music descend from a shared ancestral system — what Alison Wray and Mithen call "Hmmmmm" (Holistic, manipulative, multi-modal, musical, mimetic). Pre-linguistic hominins produced affectively charged whole utterances, not yet decomposed into discrete combinable units.
The argument explains shared features of language and music — pitch, rhythm, prosody — and the emotional power of song to coordinate group action. The line of descent splits when the phonological system fractures the holistic units into recombinable phonemes, leaving music as the residue of the older system.
Like all origin theories, this one is hard to test. It does plausibly account for why language is universally rhythmic and why infant-directed speech (motherese) is so musical. The hypothesis is not mainstream but is not foolish.
When adults of different languages must communicate without a shared tongue, they generate pidgins — reduced systems with limited vocabulary, no inflection, simple word order. Pidgins are nobody's first language; they are scaffolding.
The next generation, raised in pidgin-using communities, generates creoles — fully grammatical languages with the full range of features pidgins lack. Tense, aspect, embedding, agreement. Bickerton's claim, controversial but influential: creoles converge on a common grammar (the "bioprogram") because children, given impoverished input, fall back on innate structural defaults.
The pidgin-creole cycle is sometimes invoked as a model of language origin: protolanguage is a pidgin-like state; full language is creolisation. The analogy is imperfect — pidgin speakers already have full languages of their own — but it is the closest natural experiment we have on language emergence under impoverished input.
The closest thing to language emergence ever observed by linguists. In 1977 the Sandinista government opened the first school for the deaf in Managua. The students arrived with home signs — idiosyncratic gestures invented within their families. Within a decade the children, conversing daily, generated a full language.
Judy Kegl, Ann Senghas, Marie Coppola documented the emergence in real time. The first cohort produced an unstable pidgin-like system. Successive cohorts each added grammatical structure: spatial agreement, classifiers, embedded clauses. By the early 2000s Idioma de Señas Nicaragüense (ISN) was a fully grammatical sign language with native speakers — the world's youngest language with a documented birthdate.
Implications: language emerges from communities of children, not from individual invention; grammatical structure accumulates across generations rather than being installed once; the cognitive resources to generate language are present even in environments with no prior linguistic input.
Eric Lenneberg's Biological Foundations of Language (1967) proposed a critical period for first-language acquisition, closing around puberty. The cases that bear on it are tragic and rare.
Genie, the California girl confined in isolation until age thirteen, never acquired full grammar despite years of intensive teaching. Late-acquired ASL signers show systematic grammatical deficits compared to native signers exposed from birth. Children adopted internationally past age seven retain detectable accent and subtler structural traces from the donor language environment, even with decades in the new language.
The data fit a sensitive-period model: the brain is maximally plastic for language acquisition through the early years, with declining ability through adolescence. This is one of the strongest behavioural arguments for an evolved, biologically scheduled language faculty: the species has a critical window for language learning, the way songbirds have a critical window for song learning.
Recursion — the embedding of a structure within a structure of the same kind — is the property the Hauser-Chomsky-Fitch paper proposed as the unique core of the language faculty. The cat that the dog that the boy saw chased ran. Productive nesting of clauses within clauses.
Daniel Everett's claim that Pirahã, an Amazonian language he lived with for years, lacks recursion was the highest-profile challenge to UG in the 2000s. Pirahã, on Everett's analysis, expresses what other languages embed using parallel sentences and discourse structure — no centre-embedding, no relative clauses in the standard sense.
The fight has been bitter. Whether Pirahã genuinely lacks recursion at the syntactic level (and not just at the surface), and whether one language without recursion would falsify universal recursion, remain open. The deeper point: testing claims about universal grammar against the full diversity of the world's ~7,000 languages is harder than the early UG framework assumed.
Six decades of attempts to teach great apes language. Washoe (chimpanzee, ASL, from 1966); Koko (gorilla, ASL, from 1972); Kanzi (bonobo, lexigram board, from 1980s); Nim Chimpsky (chimpanzee, ASL, named in deliberate parody, 1973–1977).
The findings, sober and stable: apes can acquire vocabularies of several hundred symbols; they understand spoken English at near-toddler levels (Kanzi's most striking achievement); they can request, refuse, name. They show no convincing evidence of grammatical productivity. Nim's signed utterances were almost entirely imitative repetitions of his trainers; the famous Project Nim re-analysis (Terrace et al. 1979) was devastating.
The honest summary: the conceptual prerequisites for language are partly present in great apes — symbolic reference, social cognition, intentional communication. The grammatical machinery is not. Whatever the language faculty consists in, it is not present in our nearest relatives.
The most useful animal model of vocal learning is not in primates but in songbirds. Zebra finches and Bengalese finches acquire songs from a tutor during a critical period; they show babbling-like subsong; the genes implicated overlap substantially with human speech genes — including FOXP2.
The convergent evolution of vocal learning in songbirds, parrots, hummingbirds, cetaceans, bats, and humans — and the absence of vocal learning in our nearest primate relatives — is one of the deeper puzzles in the field. The neural circuitry differs in detail but shares a common architecture: a forebrain pathway that connects motor cortex directly to laryngeal motor neurons, present in vocal learners and absent in non-learners.
What humans share with finches and not with chimpanzees: this pathway, the genes that build it, and the developmental capacity to learn vocalisations from a tutor. The implication for language origins is that the speech-motor side of language is grafted onto a relatively old vertebrate system, repurposed.
Tecumseh Fitch's framework distinguishes two requirements for language-readiness. The vocal-learning system — the capacity to imitate sounds and acquire a vocabulary — and the conceptual-syntactic system — the capacity to combine units according to abstract rules.
Vocal learning is widespread across vertebrate clades but absent in primates other than humans. Conceptual-syntactic capacity, in some primitive form, is present in primates: chimpanzees can form abstract categories, plan, deceive, infer.
The two systems map onto different parts of the brain (basal ganglia and motor cortex for vocal learning; frontal and temporal cortex for syntax) and have different evolutionary histories. Language as we know it requires both, and the requirement that they be integrated. The integration may be the critical recent step.
Paul Broca's 1861 patient "Tan" — who could understand speech but could only utter a single syllable — localised speech production to a region of the left frontal cortex now bearing Broca's name. Carl Wernicke's 1874 work identified a left posterior temporal region whose damage produced fluent but semantically empty speech.
The Broca-Wernicke localisation was the founding map of language in the brain. Modern imaging has complicated it: language draws on a distributed network across both hemispheres, with the classical regions central but not exclusive. The arcuate fasciculus, the white-matter tract connecting Broca's and Wernicke's areas, is dramatically expanded in humans relative to chimpanzees.
The lateralisation is striking. About 95% of right-handers and 70% of left-handers have left-hemisphere-dominant language. The depth of this asymmetry — present from infancy, with structural correlates in the planum temporale — is one of the strongest neural arguments for an evolved specialisation.
Modern humans left Africa in successive waves from roughly 70,000 years ago, reaching Australia by 65 kya, Europe by 45 kya, the Americas by 16 kya. They carried language with them. By the time the dispersal completed, every continent had spoken languages — and every language was internally complex.
The deep-time linguistic question: was there a single ancestral human language ("proto-Sapiens") spoken before the dispersal, from which all 7,000 modern languages descend? Most historical linguists are sceptical. The comparative method can reconstruct families with reliability up to ~6,000–8,000 years; beyond that, signal degrades into noise. A 70,000-year-old common ancestor is not recoverable by orthodox methods.
Some have argued (Joseph Greenberg, Merritt Ruhlen, the contested Nostratic and Borean families) that traces of deeper relations survive in mass lexical comparison. Mainstream historical linguistics rejects the methodology. The honest answer: we cannot tell whether language was invented once or many times.
The 1866 Linguistic Society of Paris famously banned papers on the origin of language. The ban was a reaction to a flood of bow-wow theories ("language began as imitation of natural sounds"), pooh-pooh theories ("language began as exclamations"), yo-he-ho theories ("language began as work-coordination chants"), and ding-dong theories ("language began as resonant sympathy with the world").
The 1866 verdict was that none of these were testable. A century and a half later, with FOXP2, brain imaging, comparative animal cognition, archaeological evidence of symbolic behaviour, and the natural experiment of Nicaragua, the field has more empirical purchase than it did. But the central event — what happened, in which population, when, and why — remains conjectural.
Steven Pinker's The Language Instinct (1994) made the field popular again; the 2002 Hauser-Chomsky-Fitch paper made it scientific again. The honest summary today: language emerged in our lineage, somewhere in the last 500,000 years, and we have a partial map of the cognitive and anatomical changes that made it possible.
Steven Pinker's The Language Instinct (1994) defended a Chomskyan view to a wide audience and revived adaptationist evolutionary thinking about language. The 1990 Pinker-Bloom paper Natural Language and Natural Selection argued that language is a complex adaptation shaped by gradual natural selection — not a side-effect of brain enlargement.
The Pinker-Bloom argument was directed at Chomsky's own scepticism that language could be the product of standard adaptive evolution. (Chomsky's 1972 view, never fully retracted, was that language might be a side-effect of cognitive complexity, no more selected for than ear-wiggling.) Pinker and Bloom said: complex adaptations require selection; language is a complex adaptation; therefore.
The Pinker view dominated through the 1990s and remains influential. The 2002 Hauser-Chomsky-Fitch paper partly answered it by distinguishing FLN (potentially non-adaptive, possibly recent) from FLB (clearly adaptive, with deep evolutionary history). The argument continues.
Some Amazonian languages — Pirahã, Mundurukú — lack precise number words above two or three. Speakers asked to match arrays of objects perform well at exact small numerosities and at approximate large ones, and badly at exact intermediate numerosities. Peter Gordon's 2004 Pirahã work, Pierre Pica's Mundurukú studies (2004), Stanislas Dehaene's broader research programme.
The implication for the origin question: precise counting requires linguistic scaffolding; without number words, the cognitive capacity for exact large numerosity is absent. Language is not just a tool for expressing pre-existing thought; it enables certain kinds of thought.
This is the strong Whorfian claim restricted to a domain where it is empirically defensible. Language as cognitive technology — what the Soviet psychologist Vygotsky proposed in the 1930s and what Lera Boroditsky's lab has continued to develop. Whether the same holds for non-numerical domains is more contested.
Edward Sapir and Benjamin Whorf's 1930s claim — that the structure of one's language shapes the structure of one's thought — has gone in and out of fashion. The strong version (you can't think what your language can't say) was refuted decades ago. The weak version (your language influences which distinctions are easy and habitual) is empirically supported.
Cases that hold up: colour categorisation (speakers of languages with distinct grue-green words discriminate more accurately at the boundary; Davidoff's work on Berinmo); spatial reference (Tzeltal and Guugu Yimithirr speakers, who use absolute cardinal frames, perform differently from English speakers on non-linguistic spatial tasks; Stephen Levinson); temporal metaphor (Mandarin speakers' vertical metaphors for time produce measurable differences in time-judgement tasks).
The relevance for origins: if language shapes thought, then evolving language reshaped human cognition. The species that emerged from the language transition is a different cognitive animal from the one that entered it.
Robin Dunbar's Grooming, Gossip, and the Evolution of Language (1996) proposed a social-cohesion origin. Primate groups maintain bonds through grooming — the time-cost limits group size to about 50 in chimpanzees. Human groups are larger (Dunbar's number, ~150). Language allows social bonding at a distance and with multiple partners simultaneously: gossip is grooming at scale.
The argument predicts that early language was for social information — who is with whom, who is reliable, who is the troublemaker — rather than for tool-coordination or hunting. Most surveyed conversation, in modern humans, is in fact about social topics. Dunbar takes this as evidence for the original function.
The grooming hypothesis is testable in part: it predicts a correlation between neocortex size and group size across primates (Dunbar's social brain hypothesis, well-supported), and predicts language emerged when group sizes exceeded the grooming-limit threshold. The chronology fits Homo erectus reasonably well.
↑ Michael Corballis · The Origins and Evolution of Language · TEDxAuckland
Watch · Noam Chomsky on Language Acquisition
Watch · When We First Talked — human language evolution
What sixty years of research has established with reasonable confidence:
1. Language is a biologically based capacity, with dedicated neural specialisation, a critical-period acquisition window, and clear genetic substrates (FOXP2 and others).
2. The capacity is uniquely human in its full form, though many of its components — vocal learning, symbolic reference, theory of mind — are present in attenuated form in other species.
3. The relevant anatomical adaptations (descended larynx, modern hyoid, expanded arcuate fasciculus) accumulated over the last 500,000 years; the FOXP2-Neanderthal evidence places at least some of the apparatus deep in the genus Homo.
4. Symbolic behaviour at archaeological scale dates to at least 100 kya in Africa, consistent with full language being present in modern humans by their dispersal.
5. Children, given any input — even degraded pidgin or home-sign — generate full grammar within a generation or two. The Nicaraguan case is the cleanest demonstration.
What we do not know: the precise sequence of changes, the relative roles of culture and biology, the depth of any single ancestral language. The honest answer is mixed; the honest answer is also enough.
The field's live questions, circa 2026:
Genetics beyond FOXP2. Genome-wide association studies on dyslexia, specific language impairment, and developmental verbal dyspraxia are turning up loci. CNTNAP2, ROBO1, ATP2C2, KIAA0319, DCDC2 each have small, robust effects. The genetic architecture of language is polygenic; FOXP2 was the first found, not the only one.
Ancient DNA and Neanderthal speech. Whether Neanderthals had language remains contested; the FOXP2 and hyoid evidence is permissive but not decisive. The interbreeding evidence (Pääbo, Reich, Slatkin) — modern non-Africans carry 1–4% Neanderthal DNA — opens questions about cultural-linguistic exchange across the species boundary.
The role of self-domestication. Cieri, Hare, Wrangham have argued modern humans are domesticated forms of an earlier robust ancestor — selected for prosociality and reduced aggression. The hypothesis predicts language and prosociality co-evolved. Evidence: anatomical feminisation, reduced sexual dimorphism, retained juvenile features.
Computational modelling. Simon Kirby's iterated learning experiments at Edinburgh — humans learning artificial languages from previous-generation outputs — show how compositional structure emerges from transmission alone. Language structure is not all in the genes; some of it self-organises across generations.
Origins of Language — Volume XVI, Deck 1 of The Deck Catalog. Set in Iowan Old Style with Trajan capitals. Bone #f3eddc; ochre and sage accents.
Thirty-two leaves on the question that has no fossils. We end where the field ends — with FOXP2, with Nicaragua, with Blombos, and with the irreducibly speculative core.
↑ Vol. XVI · Lang. · Deck 1