From Mendel's peas to a personalised cancer vaccine in seventy years; from a $3 billion genome to a $200 one in twenty.
Genomics is the systematic study of an organism's complete DNA — every base, every gene, every regulatory element — and the relationships between sequence, function, and disease.
The discipline did not exist before 1990. The Human Genome Project produced the first reference sequence in 2003 at a cost of about $2.7 billion. By 2008 the cost per genome had crossed below the Moore's Law line and kept falling. The price now sits near $200, with two-day turnaround, and is on its way to a hundred dollars.
This deck covers the science from Mendel through the next decade's clinical genomics: sequencing technology, the studies that translate sequence into knowledge, and the ethical and political infrastructure that has grown around population-scale DNA data.
Gregor Mendel's pea-plant crosses (Brünn, 1856–63; published 1866) established that inheritance is particulate. The paper was ignored until Hugo de Vries, Carl Correns, and Erich von Tschermak independently rediscovered it in 1900.
James Watson, Francis Crick, Rosalind Franklin, and Maurice Wilkins at Cambridge and King's College London — drawing on Franklin's Photo 51 — published the double-helix structure of DNA on 25 April 1953 in Nature. The structure carried an obvious copying mechanism within it.
Frederick Sanger's 1977 dideoxy chain-termination method made sequencing routine. Sanger had already won a 1958 Nobel for protein sequencing; the 1980 Nobel (with Walter Gilbert and Paul Berg) was for DNA. Sanger sequencing held the field for nearly thirty years; the first generation of sequencers, including those used by the Human Genome Project, were Sanger machines.
Funded by the US Department of Energy and the NIH, with major contributions from the UK Wellcome Trust, France, Germany, Japan, and China. Officially launched 1 October 1990, with James Watson as initial director (he resigned in 1992 over patents). Francis Collins took over and led to completion.
Twenty centres across six countries divided the genome by chromosome. The Sanger Centre (Cambridge, UK) under John Sulston handled the largest single share — about a third. The strategy was hierarchical shotgun: clone large fragments into bacterial artificial chromosomes, map them, then sequence each.
The working draft was announced jointly with Celera at the White House on 26 June 2000; the essentially complete sequence in Nature on 14 February 2001; the finished sequence on 14 April 2003. Total cost: approximately $2.7 billion. The data were placed in the public domain under the Bermuda Principles (1996), released within 24 hours of generation.
A mathematician by training (Princeton, Oxford), Eric Lander taught economics at Harvard Business School before genomics found him in the late 1980s. He founded the Whitehead/MIT Center for Genome Research in 1990, which became the Broad Institute in 2003 with $100 million from Eli and Edythe Broad.
The Whitehead-MIT centre sequenced about 30 percent of the public human genome — more than any other single institution. Lander was the lead author on the 2001 Nature paper announcing the sequence.
The Broad has become the most productive single genomic-research organisation in the world. It led the HapMap, the 1000 Genomes Project, the Cancer Genome Atlas's mutation calling, and many of the foundational CRISPR papers (with Feng Zhang in residence). Lander served as Joe Biden's science adviser and director of the Office of Science and Technology Policy from 2021 to 2022.
The HGP's outside agitator. Craig Venter, a former NIH researcher and Vietnam veteran, founded Celera Genomics in May 1998 with the explicit aim of sequencing the human genome faster and cheaper than the public consortium — and selling subscription access to the data.
Venter's strategy was whole-genome shotgun: skip the mapping step, fragment the entire genome at once, sequence in massive parallel, and reassemble computationally. Critics — Sulston and Lander chief among them — said it would not work for a genome the size and repetitive complexity of the human's. It did work, helped considerably by access to public consortium data for scaffolding.
The 26 June 2000 White House announcement was a brokered tie: Collins and Venter, side by side, declared simultaneous "completion" of working drafts. Celera's commercial model collapsed when the public data were free; Venter left in 2002. He has since founded the J. Craig Venter Institute, sequenced his own genome, and built the first synthetic bacterial cell (Mycoplasma laboratorium, 2010).
The Celera-public competition, ugly as it was politically, accelerated the science by years. The threat of a privately-owned reference sequence forced the public consortium to compress its timeline; the public release of all consortium data forced Celera to publish rather than sell.
The 2001 papers — Lander et al. in Nature and Venter et al. in Science — between them established that the human genome contained roughly 20,000–25,000 protein-coding genes, far fewer than the 100,000 most biologists had expected. Most of the genome was non-coding; much of what was once called "junk DNA" turned out to have regulatory function.
The reference is now the Telomere-to-Telomere (T2T) consortium's 2022 release, which closed the last 8 percent of gaps left by HGP — the centromeres, the heterochromatin, the long tandem repeats — using long-read sequencing. The first complete, gap-free human genome arrived 19 years after the first "complete" one.
The NHGRI has tracked sequencing cost since 2001. The curve is the steepest sustained price decline in the history of any technology — steeper than the Moore's Law line for transistor density, against which it is conventionally compared.
The first genome cost $2.7 billion. The hundred-thousandth cost about $1,000. The ten-millionth cost a few hundred. The clinical implication is that whole-genome sequencing has been moving from a research procedure to a routine clinical assay since around 2018.
Solexa, a Cambridge UK startup, developed sequencing-by-synthesis on a flow cell from 2004. Bridge amplification clones each fragment in a tight spot, then fluorescently-labelled bases are added one at a time and imaged. Hundreds of millions of reads per run, in parallel.
Illumina acquired Solexa for $600 million in January 2007. The HiSeq 2000 (2010) brought a per-run yield of 200 gigabases. The HiSeq X Ten (2014, $10 million for ten machines) crossed the $1,000-genome line. The NovaSeq X (2023) does six terabases per run, supporting clinical-scale population sequencing.
For most of the 2010s, Illumina commanded around 70 percent of global sequencing-instrument revenue. The competitor that mattered, eventually, was China's MGI Tech (BGI Group), whose DNBSEQ platform reached price parity in the early 2020s and broke Illumina's monopoly in some markets.
Illumina reads are short — typically 150 bases. They reassemble well in unique regions but miss large structural variants and cannot bridge long repeats. Two long-read technologies emerged.
PacBio (Pacific Biosciences, founded 2004) — single-molecule real-time (SMRT) sequencing in zero-mode waveguides; reads of 10–20 kilobases routinely, with circular consensus accuracy approaching Illumina's. Founder Stephen Turner; commercialised 2011; the Revio (2023) brought human-genome cost to clinically reasonable.
Oxford Nanopore (founded 2005, spun out from the University of Oxford). DNA threaded through a protein nanopore; bases identified by changes in ionic current. The MinION sequencer (2014) is the size of a USB stick and runs from a laptop. Reads exceed 100 kilobases routinely; the longest single read on record is over 4 megabases.
The 2022 T2T-CHM13 reference sequence, the first complete human genome, was made possible by combining PacBio HiFi and Oxford Nanopore ultra-long reads. Long reads are now standard for de novo assembly, structural-variant calling, and methylation profiling.
A genome-wide association study compares the frequency of single-nucleotide polymorphisms (SNPs) in cases and controls. The first major one was the Wellcome Trust Case Control Consortium (2007), which examined 14,000 cases of seven common diseases against 3,000 controls.
The early years produced thin gruel — a handful of weak-effect variants for each disease, explaining only a few percent of heritability. The "missing heritability" problem became a literature of its own.
The fix turned out to be scale. Biobank-linked GWAS — UK Biobank's 500,000 participants, FinnGen's 500,000, Million Veteran Program's 1 million, All of Us's million — produced thousands of robust associations. Modern psychiatric GWAS (schizophrenia: 287 loci as of 2022) and cardiovascular GWAS (coronary artery disease: 250+ loci) have transformed the genetic architecture of common disease from a black box into a map.
23andMe (founded 2006 by Anne Wojcicki, Linda Avey, and Paul Cusenza) launched at $999 with what was, on day one, the largest genotyping array offered to consumers. It dropped to $99 in 2012 and 2 million customers had been tested by 2017. The company went public via SPAC in 2021 at $3.5 billion.
The FDA halted 23andMe's health reports in November 2013 over validation concerns; reports were reinstated incrementally from 2015 to 2017. The trajectory since: ancestry-only product, then carefully-validated single-variant disease reports, then polygenic risk scores.
The 2023 data breach exposed the genetic data of about 6.9 million users. The company filed for Chapter 11 bankruptcy in March 2025; the data of those 15 million customers became, in legal terms, an asset of the bankruptcy estate. The episode crystallised two long-standing critiques: that 23andMe's business model depended on monetising customer data through pharma partnerships (the GSK deal, 2018), and that the regulatory regime for consumer genomic data has never caught up with the technology.
The clinical translation of GWAS. A polygenic risk score (PRS) sums the effects of thousands or millions of SNPs, weighted by their GWAS effect sizes, to produce a single number predicting disease risk for an individual.
For coronary artery disease, breast cancer, type 2 diabetes, and several cancers, PRS adds clinically meaningful information beyond traditional risk factors. The 2018 paper by Amit Khera and Sekar Kathiresan — using UK Biobank to identify the top 8 percent of CAD-risk individuals at threefold-elevated risk — was the inflection point in clinical credibility.
The unsolved problem is portability. PRS derived from the predominantly European-ancestry GWAS data perform substantially worse in African, East Asian, and South Asian individuals. The 2019 Martin et al. paper in Nature Genetics quantified the gap; subsequent biobank investment — Million Veteran Program (60 percent non-European), All of Us (50 percent target), the African genomics initiatives — has begun to close it.
Drugs work differently in different genomes. Codeine is converted to morphine by the enzyme CYP2D6; carriers of certain variants are ultra-rapid metabolisers and overdose at standard doses (the FDA black-boxed it for paediatric use after several deaths). Clopidogrel requires CYP2C19 activation; poor metabolisers receive no antiplatelet effect.
The HLA-B*57:01 allele predicts severe hypersensitivity to abacavir; pre-prescription testing is standard of care since 2008. HLA-B*15:02 predicts Stevens-Johnson syndrome from carbamazepine in Han Chinese populations; testing is mandatory in Hong Kong and Taiwan.
Two consortia, CPIC (Clinical Pharmacogenetics Implementation Consortium) and PharmGKB, maintain the prescribing guidelines. The FDA's Table of Pharmacogenomic Biomarkers in Drug Labelling now lists more than 460 drugs with relevant variants. Pre-emptive panel testing for 50–100 actionable pharmacogenes is now offered routinely at major academic medical centres in the US and Europe.
Tumour sequencing has been the genomic discipline's most consequential clinical application. The Cancer Genome Atlas (2006–2018) characterised the somatic mutations in 33 cancer types across 11,000 tumours.
BRCA1/BRCA2 testing — Mary-Claire King's 1990 mapping of BRCA1 to chromosome 17q21 led to Myriad Genetics' commercial test in 1996. The 2013 Supreme Court decision in Myriad ruled human gene patents invalid; the test market opened. Today, BRCA testing guides surgical decisions, PARP inhibitor prescribing, and family screening.
FoundationOne CDx (Foundation Medicine, FDA-approved 2017) sequences 324 tumour genes and provides a clinical report on actionable variants and tumour mutational burden. Tempus and Caris are the major US competitors. Comprehensive tumour sequencing is now standard of care in most US academic oncology centres for advanced cancers and in trials for earlier-stage disease.
Robert Guthrie's bacterial-inhibition assay for phenylketonuria (1962) is the founder of population-scale newborn screening. The dried blood spot — the Guthrie card — has been collected from nearly every baby born in the United States and most of the developed world since the 1960s.
The Recommended Uniform Screening Panel in the US includes 38 conditions. The UK's NHS panel has nine. Screening detects rare metabolic and genetic disorders — phenylketonuria, congenital hypothyroidism, sickle cell disease, cystic fibrosis, severe combined immunodeficiency — early enough for treatment.
The BabySeq study at Brigham and Women's Hospital (started 2015) and NC NEXUS at UNC have begun adding whole-exome or whole-genome sequencing to newborn screening as research. The Generation Study in the UK, launching 2024, will sequence 100,000 newborns. The clinical, ethical, and social policy questions — what to screen for, what to disclose, what to do about adult-onset variants found in babies — are unresolved.
The protein-coding portion of the genome — the exome — accounts for about 1.5 percent of total DNA but contains an estimated 85 percent of known disease-causing variants. Sequencing only the exome (whole-exome sequencing, WES) was, for most of the 2010s, the cost-efficient choice for clinical diagnostics.
The trade-offs as of 2025: WES is roughly half the cost of WGS but misses non-coding variants, structural variants, and copy-number changes. WGS has uniform coverage across the genome and detects all variant classes; the analytic burden is larger.
The diagnostic yield of WES for paediatric rare disease is around 30 percent. WGS adds 5–10 percentage points beyond WES. As cost converges, WGS is replacing WES at most academic centres. The UK Genomic Medicine Service uses WGS first-line for rare disease since 2020.
Announced by David Cameron in December 2012; delivered by Genomics England (a Department of Health-owned company) under chief scientist Mark Caulfield. The aim: sequence 100,000 whole genomes from NHS patients with rare disease or cancer, by 2018. The target was reached in December 2018.
The project produced two operational achievements that mattered more than the sequence itself. First, an NHS Genomic Medicine Service launched in October 2018 — the first national health system to offer whole-genome sequencing as routine clinical care. Seven Genomic Laboratory Hubs handle the workflow.
Second, a secure research environment: the de-identified data sit in a single data centre that researchers query without ever downloading individual records. The architecture has become the template for analogous projects (FinnGen, Estonian Biobank, the planned Our Future Health 5-million-Briton biobank).
Ancestry.com's DNA service launched in 2012. By 2019 it had sold over 15 million kits — the largest consumer genomic database in the world. AncestryDNA, 23andMe, MyHeritage, and FamilyTreeDNA together held the genotypes of an estimated 30+ million people by 2020.
The science is mostly reliable: the population reference panels are large enough that broad regional ancestry inference (sub-Saharan African, European, East Asian, South Asian, Native American) is robust. Sub-regional resolution within Europe is decent; within Africa it remains poor due to under-sampling. Ethnicity estimates change as reference panels expand — the 2020 update to AncestryDNA's reference reshuffled millions of users' assigned heritage.
The cultural impact has been significant. Family secrets — adopted children, undisclosed paternity, NPEs ("non-paternity events") — have become statistically common discoveries among test-takers, with downstream consequences for families and for the testing companies' customer-service operations.
Genomic data is uniquely sticky: it cannot be changed, it identifies relatives, it persists for centuries in storage. The privacy risks are unlike those of other health data.
Identifiability. A 2013 paper by Gymrek et al. demonstrated that surname could be inferred from Y-chromosome data and a public surname-Y-haplotype database; combined with publicly known age and state, the surname inference re-identified 50 percent of "anonymous" male genome donors in a 1000 Genomes-style study.
Inference about relatives. Your genome reveals roughly 50 percent of each parent's, 25 percent of each grandparent's, 12.5 percent of each first cousin's. A consenting individual's genomic data exposes non-consenting relatives.
Storage. Genomic data sits in commercial databases (23andMe, Ancestry), public ones (dbGaP, EGA), national ones (Genomics England, FinnGen), and private clouds. Each has its own security regime; breaches happen.
The Genetic Information Nondiscrimination Act was signed by George W. Bush on 21 May 2008, after thirteen years of advocacy led by Louise Slaughter (D-NY). It prohibits health insurers from using genetic information to set premiums, deny coverage, or decide eligibility, and prohibits employers from using it in hiring, firing, or promotion.
The gaps. GINA does not cover life insurance, disability insurance, or long-term-care insurance. It does not cover the military. It does not apply to small employers (under 15 employees). The Affordable Care Act's prohibition on pre-existing-condition discrimination, since 2014, fills the health-insurance gap independently.
The European framework is the GDPR (2018), which classifies genetic data as a "special category" requiring explicit consent. The UK's Concordat between the government and the Association of British Insurers (since 2001, renewed periodically) prohibits insurers from using predictive genetic test results except for Huntington's disease above a £500,000 policy threshold.
On 24 April 2018, Sacramento authorities arrested Joseph DeAngelo, a 72-year-old former police officer, for the Golden State Killer rapes and murders of 1976–86. He had been identified by uploading crime-scene DNA to GEDmatch, a free genealogy database, finding distant cousins of the unknown perpetrator, and reconstructing the family tree until DeAngelo was the only candidate matching the geographic and demographic constraints.
The technique is now standard. The genetic genealogist CeCe Moore and her firm Parabon NanoLabs have helped solve over 250 cases. The arrest of Bryan Kohberger for the 2022 University of Idaho murders was a forensic-genealogy case.
The legal and ethical framework is unsettled. GEDmatch changed its default to opt-in for law enforcement queries in 2019; FamilyTreeDNA confirmed it had been allowing FBI access. Most state legislatures have not addressed the issue directly. The DOJ issued advisory guidelines (2019) but no statute governs.
The 1000 Genomes Project (2008–2015) catalogued common variation across 26 populations. The Genome Aggregation Database (gnomAD, Konrad Karczewski et al., Broad), in its v4 release, contains 730,947 exomes and 76,215 genomes — the variation reference for clinical genetics.
Major underrepresentation persists. As of 2021, 78 percent of GWAS participants were of European descent, against ~16 percent of the global population. Investment in non-European biobanks is correcting it. H3Africa (Human Heredity and Health in Africa, NIH and Wellcome, 2010) supports 51 projects across 30 African countries. GenomeAsia 100K (Singapore-led) is sequencing across South, Southeast, and East Asian populations. BioBank Japan, China Kadoorie Biobank, and Mexico City Prospective Study add hundreds of thousands of non-European participants each.
The findings have been substantive. Population-specific risk variants for type 2 diabetes (Pima, Han Chinese), for hypertension (Black populations: APOL1 nephropathy variants), and for adverse drug response (HLA alleles) are largely invisible to European-only studies.
Cells release fragments of DNA into the bloodstream; tumours release more. Cell-free DNA (cfDNA) sequencing — "liquid biopsy" — has become a clinical tool over the past decade.
Non-invasive prenatal testing (NIPT), introduced clinically in 2011, sequences foetal cfDNA from maternal blood to screen for trisomies 13, 18, and 21. It has largely replaced amniocentesis as first-line aneuploidy screening; uptake exceeds 50 percent of US pregnancies.
Tumour cfDNA for treatment selection: Guardant360 (FDA-approved 2020) and competitors offer 70-gene tumour-mutation panels from a blood draw. Useful when tissue biopsy is impossible.
Multi-cancer early detection: the holy grail. GRAIL's Galleri test (commercial 2021) screens for 50+ cancer types from cfDNA methylation patterns. The NHS-Galleri trial, enrolling 140,000 UK participants, will read out in 2026 — the first major test of population-scale liquid-biopsy screening.
Genomics identifies the target; gene therapy delivers the fix. The two disciplines have converged in the 2020s.
Spinraza (nusinersen, 2016) for spinal muscular atrophy. Luxturna (voretigene neparvovec, 2017) for RPE65-mediated retinal dystrophy — the first FDA-approved AAV gene therapy. Zolgensma (onasemnogene, 2019) for SMA, at $2.1 million per dose. Hemgenix (etranacogene, 2022) for haemophilia B at $3.5 million. Casgevy and Lyfgenia (December 2023) for sickle cell disease — the first CRISPR-edited therapies on the market.
The pricing is the unresolved question. Curative therapies for diseases of small populations price at million-dollar levels because the development cost cannot be amortised across many patients. Insurers, governments, and the manufacturers are negotiating outcome-based payment models in real time. The cost of not treating sickle cell — lifetime medical costs estimated at $1–2 million per patient — clarifies the calculation.
The Human Genome Project allocated 5 percent of its budget to studying the ethical, legal, and social implications of its own work. The ELSI programme, launched 1990, was the first of its kind in any large science project. Its questions remain open.
Incidental findings. A genome sequenced for one indication often contains a clinically actionable finding for another — a BRCA mutation found while sequencing for cardiomyopathy. The ACMG's secondary-findings list (currently 78 genes, periodically expanded) is the consensus on what should be returned.
Reproductive genetics. Pre-implantation genetic testing for monogenic disease has been routine for two decades. The 2023 expansion to polygenic embryo screening (Genomic Prediction, Orchid) has reopened the eugenics conversation in earnest.
Germline editing. The 2018 He Jiankui CRISPR babies were a regulatory and ethical disaster; the international moratorium that followed has held. The 2023 Third International Summit on Human Genome Editing reaffirmed the consensus that heritable editing is not yet acceptable.
↑ The race to sequence the human genome · Tien Nguyen / TED-Ed
Watch · NHGRI oral history · Eric Lander
Watch · The 23andMe controversy
Mukherjee's The Gene (2016) is the literate one-volume history. Sulston's The Common Thread (2002) is the public-consortium memoir. For a primer on the post-HGP science, Euan Ashley's The Genome Odyssey (2021).
Pangenomes. A single linear reference sequence misrepresents a species whose individual genomes differ at millions of sites. The Human Pangenome Reference Consortium released a draft pangenome (47 phased assemblies from a globally diverse panel) in May 2023; the target is 350 by 2026. The pangenome is graph-shaped, not linear, and requires new alignment and variant-calling algorithms.
Spatial and single-cell genomics. 10x Genomics' Visium and the next generation of subcellular-resolution platforms (MERFISH, Stereo-seq, Slide-seq) are mapping gene expression across tissue at near-cellular resolution. The Human Cell Atlas project aims to characterise every cell type in the human body — somewhere between several thousand and tens of thousands.
Long-read clinical. Oxford Nanopore's promise of a $200, 24-hour clinical-grade whole genome at the bedside is approaching delivery. The implications for emergency-department genetics and for global access — sequencing in regions without cold chains or refrigerated transport — are large.
Genomics has not delivered the medical revolution that was advertised in 2000. Common diseases turn out to be more polygenic, more environmentally modulated, and more refractory to gene-by-gene mechanism than the early enthusiasm suggested. Cancer was the easy case; cardiovascular disease, psychiatric disease, and metabolic disease are harder.
What it has delivered is substantial. Tens of thousands of rare-disease patients now receive a molecular diagnosis. Pharmacogenomic prescribing prevents predictable adverse events. Tumour profiling guides targeted therapy. Carrier screening and prenatal testing are routine. Forensic genetic genealogy has solved hundreds of cold cases. Population biobanks are restructuring epidemiology.
The cost trajectory is the underlying point. Sequencing has fallen from $3 billion to $200. The implication is not that everything in medicine is a sequencing problem, but that sequencing is no longer a constraint. The constraint, increasingly, is the data: how to interpret it, how to share it, how to protect it, how to make it useful in clinical workflows that have not changed nearly as fast as the data has accumulated.
Genomics — Volume VI, Health, of The Deck Catalog. Set in Inter and Tiempos Text with JetBrains Mono for sequence and apparatus. Paper at #fafaf6; ink in deep navy; accents in helix-cyan and warm orange.
Thirty-one leaves on the discipline that turns the longest molecule in the body into a clinical document.
↑ Vol. VI · Health · Genomics