From Bacon's induction to Popper's falsification, Kuhn's revolutions, and the Bayesian present. The discipline that asks what makes a claim scientific — and why the answer keeps changing.
A second-order discipline. Philosophy of science does not do science; it studies what scientists do, what they should do, and why we are entitled to believe them.
The questions are old and unfinished. What distinguishes a scientific theory from a pseudoscientific one? When does evidence justify belief? Are scientific theories descriptions of the world or convenient fictions? Do scientific revolutions amount to progress, or only to change?
Four hundred years of attempts to give clean answers have produced something less tidy: a series of frameworks, each illuminating, none final. This deck moves through that sequence — Bacon and Hume, the logical positivists and Popper, Kuhn and his successors, the science wars, and the contemporary Bayesian and naturalist turns.
Francis Bacon's Novum Organum (1620) replaced Aristotle's syllogistic Organon with a programme for systematic empirical inquiry. Knowledge was to be built upward from carefully collected observations rather than deduced downward from received first principles.
Bacon's method asked the natural philosopher to compile tables of presence, absence, and degree — every case where a phenomenon occurred, every case where it did not, every variation in intensity — and to read causes off the patterns. The idols he warned against (of the Tribe, the Cave, the Marketplace, the Theatre) were the cognitive and social biases that corrupted unaided observation.
The Royal Society of London (founded 1660) took Bacon as its patron saint. Its motto, Nullius in verba — "take nobody's word for it" — translates Bacon into institutional practice. The modern conception of science as cumulative, public, empirical inquiry begins here.
David Hume's An Enquiry Concerning Human Understanding (1748) raised a question that Bacon had not seen. We infer that the future will resemble the past — that the sun will rise tomorrow because it rose today — but the inference cannot be deductively justified, because nothing in past observations entails anything about the future.
Nor can it be inductively justified without circularity: any argument that induction has worked before, and so will work again, presupposes the very principle in question.
The result, Hume argued, is that our confidence in nature's uniformity is a habit of mind, not a logical conclusion. Custom, not reason, supplies the link.
The problem of induction has not been solved. Every subsequent philosophy of science is partly a response to it — an attempt to ground scientific inference on something other than the bare appeal to past success.
Auguste Comte's Cours de philosophie positive (1830–1842) coined "positivism" and proposed a stadial history of human thought: theological, metaphysical, positive. In the positive stage, knowledge consists of laws describing observable regularities; explanations referring to unobservable essences or final causes are abandoned.
Comte ranked the sciences in order of complexity — mathematics, astronomy, physics, chemistry, biology, sociology — each grounded in the one below. Sociology, his coinage, would complete the system.
The programme was ambitious and authoritarian; Comte's later "Religion of Humanity," with its calendar of secular saints, embarrassed even sympathetic readers. But the core thesis — that meaningful claims must be tied to observation, that metaphysics is to be replaced by science — set the agenda for a century. The Vienna Circle would inherit it directly.
From 1924 the philosopher Moritz Schlick chaired a Thursday-evening seminar in Vienna whose participants — Rudolf Carnap, Otto Neurath, Hans Hahn, Friedrich Waismann, Herbert Feigl, the young Kurt Gödel — became the Vienna Circle. The 1929 manifesto Wissenschaftliche Weltauffassung ("The Scientific Conception of the World") announced the programme.
The Circle drew on three sources: the empiricism of Hume and Mach; the new mathematical logic of Frege, Russell, and the early Wittgenstein; and the physics of Einstein, whose general relativity was taken as the model of how a mature science should look.
The political climate ended the Circle. Schlick was murdered on the steps of the University of Vienna by a former student in 1936. The rise of Nazism scattered the rest. Carnap, Feigl, and others emigrated to the United States, where logical positivism became the dominant English-language philosophy of science through the 1950s.
Two doctrines defined the movement. The first was the verification principle: a sentence is meaningful if and only if it is either analytic (true by definition) or empirically verifiable. Metaphysics, theology, and large parts of traditional philosophy were not false — they were strictly meaningless.
The second was the programme of logical reconstruction. Carnap's Der logische Aufbau der Welt (1928) attempted to derive the language of physical objects from a base of immediate sensory experience using only the logical apparatus of Principia Mathematica. The aspiration was to show that scientific knowledge could in principle be cashed out in observation reports plus logic.
A. J. Ayer's Language, Truth and Logic (1936) gave the doctrine a punchy English statement and made positivism a sensation in Oxford common rooms.
The verification principle had a self-application problem. Was the principle itself analytic? Plainly not. Was it empirically verifiable? Plainly not. By its own standard, the principle was meaningless.
Other difficulties accumulated. Universal generalisations — "all copper conducts electricity" — could never be conclusively verified by any finite set of observations. Existential statements about the unobservable past or about regions beyond the light cone could not be verified at all. Successive weakenings of the principle (confirmability, testability, partial verification) lost the original sharpness without recovering plausibility.
By the late 1950s logical positivism was widely regarded as a programme that had failed on its own terms. Carnap continued to refine the formal apparatus through the 1960s, but the centre of gravity had moved. The successor question was no longer "what makes a sentence meaningful?" but "what makes a theory scientific?"
Karl Popper's Logik der Forschung (1934, English as The Logic of Scientific Discovery, 1959) cut the knot differently. Verification was the wrong target. The mark of a scientific theory was not that it could be confirmed — every theory could find some confirming instance — but that it could be falsified.
"All swans are white" cannot be conclusively verified by any number of white-swan sightings; it can be conclusively refuted by a single black swan. The asymmetry between verification and falsification, Popper argued, was the logical basis of empirical science.
A scientific theory must therefore make risky predictions — claims that forbid certain observations. The theory's content is measured by what it rules out. Einstein's general relativity, predicting a specific value for the bending of starlight at the 1919 eclipse, was the model: a definite, testable, falsifiable claim.
The asymmetry has a clean logical statement. From "all F are G" together with "a is F" and "a is not G," the universal claim is refuted by modus tollens. From the universal claim plus any number of confirming instances, the universal claim is not entailed. Refutation is deductively valid; confirmation is at best inductive.
Popper drew the moral broadly. Theories are conjectures. We never have proof; we have, at best, theories that have survived serious attempts to refute them. A surviving theory is corroborated, not confirmed; corroboration measures past performance, not future likelihood.
The view was austere. It denied that evidence could ever make a theory probable. Critics — Carnap, later Bayesians — found this implausible: surely a thousand successful tests should make us more confident than one. The Popper–Carnap dispute over the role of probability in science ran for decades and is in some ways still open.
Popper's most influential application of the criterion was a critique. Marxism, Freudian psychoanalysis, and Adlerian individual psychology presented themselves as scientific theories. Their proponents found confirming instances everywhere. Popper argued this was their weakness, not their strength.
A psychoanalytic theory that explained both Adler's case (a man pushes a child into a river out of inferiority) and the opposite (a man saves a drowning child out of sublimated inferiority) ruled nothing out. Marxist predictions of capitalist collapse, when they failed, were rescued by ad hoc auxiliary hypotheses. Both theories absorbed every possible observation.
This was not a knockdown argument that the theories were false. It was an argument that, in the form their defenders practised them, they were not playing the scientific game at all.
The critique stung. It is one of the most-cited demarcation arguments in the literature. It is also more nuanced than its slogans: Popper allowed that an unfalsifiable theory might be metaphysically interesting or generative of testable successors.
Thomas Kuhn's The Structure of Scientific Revolutions (1962) was, alongside Popper, the single most influential book in the field. A short volume — under 200 pages — written for the International Encyclopedia of Unified Science, it inverted the standard picture of scientific progress.
Kuhn was a physicist by training who turned to history. Reading Aristotle's physics, he was struck not by its errors but by its internal coherence — the sense that, given Aristotle's questions, his answers were reasonable. Modern physics had not corrected Aristotle so much as replaced him; the questions had changed.
From this came the book's central proposal. Science alternates between long periods of normal science, conducted within an accepted framework, and brief crises in which one framework is replaced by another. The replacements are scientific revolutions. Newton replacing Aristotle, Lavoisier replacing phlogiston, Einstein replacing Newton, Darwin replacing fixed species: each was a wholesale shift, not a cumulative refinement.
Kuhn's central term was paradigm — used, his critic Margaret Masterman counted, in twenty-one different senses in the book. The two main uses: a paradigm as a shared disciplinary matrix (assumptions, methods, standards) and as an exemplar (a concrete problem-solution that students learn to imitate).
Normal science is puzzle-solving within a paradigm. Most working scientists, Kuhn argued, do not test the paradigm; they apply it, extend it, and clean up the loose ends it generates. Anomalies accumulate. A few become acute.
When anomalies cannot be absorbed, a discipline enters crisis. Multiple competing alternatives appear. Eventually one prevails — not because it is proven, but because younger scientists adopt it, the older generation retires, and the field reorganises around the new exemplars. Max Planck's quip — that science advances funeral by funeral — is the underlying picture.
The hardest of Kuhn's claims. Across a paradigm shift, the meanings of central terms change. "Mass" in Newtonian mechanics and "mass" in relativistic mechanics are not the same concept; the words have different conditions of application, different connections to other terms, different roles in laws.
If meanings change, then sentences from one paradigm cannot be straightforwardly translated into sentences of another. Old and new theories are incommensurable — they lack a common measure. The implication some readers drew: there is no theory-neutral standpoint from which to judge between paradigms; rational comparison is impossible.
Kuhn always insisted he meant something weaker — that translation was difficult, not impossible; that comparison required interpretive labour, not that it was illegitimate. The strong reading nevertheless took on a life of its own and fed into the relativist readings of the 1970s and 1980s.
Structure sold over a million copies in English and was translated into more than thirty languages. Outside the philosophy-of-science seminar, "paradigm shift" became a piece of general intellectual furniture, used loosely to mean any large change in thinking.
Inside the field, the reception was mixed. Popper's circle disliked the apparent abandonment of rational comparison between theories. Imre Lakatos tried to combine Popper's normative ambitions with Kuhn's descriptive realism. Paul Feyerabend took Kuhn further toward methodological pluralism than Kuhn himself wanted to go.
The 1969 second edition added a postscript clarifying — and partially walking back — the strong incommensurability thesis. Kuhn spent the rest of his career insisting he was less radical than his readers had taken him to be. The original book remained more influential than its corrections.
Imre Lakatos's "Falsification and the Methodology of Scientific Research Programmes" (1970) tried to rescue Popperian rationality without ignoring Kuhnian history. Lakatos's unit of evaluation was not the single theory but the research programme.
A programme has a hard core of central commitments held immune from refutation by methodological decision; a protective belt of auxiliary hypotheses that absorb anomalies; and a positive heuristic directing how to extend the programme. A programme is progressive if its modifications predict novel facts that turn out to be true. It is degenerating if modifications merely accommodate known anomalies.
The Newtonian programme, Lakatos argued, was wildly progressive — predicting the return of Halley's comet, the existence of Neptune. By the late nineteenth century its accommodations of Mercury's perihelion were degenerating. Einstein's programme replaced it not in a single moment but through years of comparative progressiveness.
Paul Feyerabend's Against Method (1975) argued that no methodology — Popperian, Kuhnian, Lakatosian, or any other — fits the actual practice of successful science. Galileo's defence of Copernicanism, Feyerabend showed at length, violated almost every rule the prescriptivists endorsed: he ignored counter-evidence, used rhetoric, made auxiliary assumptions for which he had no independent support.
The provocative thesis was epistemological anarchism: "the only principle that does not inhibit progress is: anything goes." Feyerabend did not literally mean that any method was as good as any other. He meant that no fixed methodology had ever survived contact with the history of science, and that fixed methodologies were a threat to scientific creativity.
The book was deliberately performative. Feyerabend in person was a brilliant teacher and a serious historian; Against Method is partly a satire on the ambition to systematise. It has been read both as a contribution to philosophy of science and as the demolition of the project.
Sociology of scientific knowledge — the empirical study of how scientific beliefs actually get formed in laboratories and journals — emerged in the 1970s as a distinct discipline. Its most influential statement was David Bloor's Knowledge and Social Imagery (1976) and the Strong Programme developed at Edinburgh.
The programme had four tenets: causality (look for causes of belief), impartiality (treat true and false beliefs symmetrically), symmetry (use the same kinds of cause to explain both), and reflexivity (apply the same method to the sociology of science itself).
The symmetry principle was the controversial one. Traditional history of science had explained accepted theories by reference to evidence and reason, and rejected theories by reference to social factors (prejudice, politics, error). The Strong Programme insisted that the same kinds of social explanation applied to both. Truth and falsity dropped out as explanatory variables.
By the 1990s, science studies had grown into a substantial academic field — the SSK tradition (Sociology of Scientific Knowledge), the Paris-school actor-network theory of Bruno Latour, feminist epistemologies, and various postmodern approaches to scientific knowledge.
Working scientists noticed. The biologist Paul Gross and the mathematician Norman Levitt published Higher Superstition: The Academic Left and Its Quarrels with Science (1994), an angry attack on what they took to be wilful incompetence in science studies. The 1995 New York Academy of Sciences conference "The Flight from Science and Reason" hardened the lines.
The dispute was real, but the framing was crude on both sides. Few science-studies scholars actually denied the existence of a mind-independent world; few scientists actually believed their disciplines were free of social shaping. The "Science Wars" produced more heat than philosophical clarity, but they put epistemological questions back on the public agenda.
In 1996 the physicist Alan Sokal submitted "Transgressing the Boundaries: Towards a Transformative Hermeneutics of Quantum Gravity" to the cultural-studies journal Social Text. The article was a parody — a string of fashionable postmodern claims, deliberate misuses of physics terminology, and unfounded political conclusions, all dressed in plausible-sounding academic prose. Social Text published it.
The day of publication, Sokal revealed the hoax in Lingua Franca. The point, he argued, was that the journal had failed to apply elementary critical standards to a paper that flattered its editors' politics. The 1997 follow-up book with Jean Bricmont, Fashionable Nonsense, documented misuses of mathematics and physics in Lacan, Kristeva, Irigaray, Baudrillard, Deleuze, and others.
The hoax did not refute science studies. It did make a specific accusation stick: that some of the field had a problem with intellectual standards, particularly when crossing into the natural sciences. The episode reshaped the public reputation of cultural theory for a decade.
While the philosophy-of-science debate was running, W. V. O. Quine at Harvard had been quietly remaking the field's foundations. "Two Dogmas of Empiricism" (1951) attacked the analytic/synthetic distinction on which logical positivism rested. "Epistemology Naturalized" (1969) proposed that the study of how knowledge is acquired should itself become an empirical science — a chapter of psychology and cognitive science, not an a priori philosophical inquiry.
Quine's holism — the thesis that no single statement faces the tribunal of experience alone, that we revise our web of beliefs at the points of least resistance — undercut Popper's clean falsification logic. There is always somewhere else in the web to make the change.
The naturalist turn has been the dominant late-twentieth-century development. Most working philosophers of science now treat their subject as continuous with cognitive psychology, history of science, and the sciences themselves rather than as an a priori inquiry into rationality.
The other dominant late-twentieth-century framework is Bayesian. Scientific inference is modelled as conditional probability update: given a prior probability for a hypothesis and the likelihood of the evidence under that hypothesis, the posterior probability is fixed by Bayes's theorem.
The framework absorbs much of what Popper resisted. Confirmation comes in degrees; theories become more probable as evidence accumulates; auxiliary hypotheses can be assigned their own priors and updated independently. Colin Howson and Peter Urbach's Scientific Reasoning: The Bayesian Approach (1989, 3rd ed. 2006) is the standard book-length treatment.
The chief objections concern the priors. Where do they come from? Two scientists with different starting probabilities will, given enough evidence, converge — but the speed of convergence and the meaning of starting probabilities for one-off historical events remain contested. The problem of old evidence (how can already-known facts confirm a new theory?) is a perennial puzzle.
Gilbert Harman's 1965 paper "The Inference to the Best Explanation" named a pattern common to scientific and ordinary reasoning: from the fact that a hypothesis would, if true, best explain a body of evidence, we infer the hypothesis.
The pattern is everywhere — in detective work, in palaeontology, in cosmology. Peter Lipton's Inference to the Best Explanation (1991, 2nd ed. 2004) gave the canonical defence. The strategy distinguishes likeliest from loveliest explanations and argues that scientific practice does, and should, weight loveliness — explanatory virtues such as unification, simplicity, and depth — alongside fit with evidence.
Bayesians have argued IBE either reduces to Bayesian update or, where it does not, fails. Defenders argue that explanatory considerations enter directly into the assignment of likelihoods and priors. The dispute is unresolved. In practice, most working scientists reason by IBE without troubling themselves about its formal foundations.
Are mature scientific theories approximately true descriptions of a mind-independent world, or are they merely useful instruments for organising and predicting observations?
Scientific realism's standard argument is the no-miracles argument (Hilary Putnam, 1975): the predictive success of mature science would be a miracle if its theories were not at least approximately true. Electrons behave the way they do because there are electrons.
Antirealism takes various forms. Instrumentalism treats theories as calculation devices. Constructive empiricism (next leaf) accepts theories as literally describing observable phenomena while withholding belief about unobservables. Structural realism (John Worrall, 1989) splits the difference: we should believe the mathematical structure of mature theories, even if the entities described change across revolutions.
The debate has the unusual property of being philosophically central and practically inert. Working physicists rarely take a position; the experiments do not change.
Underdetermination of theory by evidence (the Duhem–Quine thesis): for any finite body of observational evidence, multiple incompatible theories are consistent with it. Choice between them must rest on something other than the evidence — simplicity, conservatism, elegance — none of which is obviously truth-tracking.
The pessimistic meta-induction (Larry Laudan, 1981): the history of science is a graveyard of once-successful theories — phlogiston, caloric, the luminiferous ether, Newtonian absolute space, the fixed continents. By induction on the historical record, current theories are likely also false. Predictive success has not, historically, been a reliable guide to truth.
Realists have responded by distinguishing parts of theories that genuinely contributed to predictive success (which tend to be retained) from idle wheels (which get discarded). The exchange — Laudan's list of false-but-successful theories, the realist's analysis of which parts were really doing the work — is one of the field's longest-running debates.
Bas van Fraassen's The Scientific Image (1980) gave the most rigorous antirealist position of the late twentieth century. Science aims at empirical adequacy — saving the phenomena, getting the observable consequences right — and acceptance of a theory commits one only to belief that it is empirically adequate, not that its claims about unobservables are true.
The key move is the observable/unobservable distinction, which van Fraassen draws by reference to unaided human perception. Electrons are unobservable; the fossil record is observable, even though we did not observe its formation. The line is anthropocentric and admittedly fuzzy at the edges.
Realists object that the line is arbitrary and that our reasons for believing in observable past objects are continuous with our reasons for believing in unobservable present ones. Van Fraassen replies that the asymmetry is not arbitrary — it is the line beyond which we lose the capacity for direct observational check.
The book remains the high-water mark of sophisticated antirealism.
The original demarcation question — what distinguishes science from pseudoscience? — has not been answered to general satisfaction. Popperian falsifiability, the once-popular sharp criterion, fails on both sides: serious sciences make claims that are difficult to falsify cleanly (string theory, multiverse cosmology), and serious pseudosciences (creationism rebranded as intelligent design) can be dressed in the form of falsifiable predictions.
Most contemporary philosophers accept that demarcation is a cluster question. Scientific status is a matter of multiple features — testability against independent evidence, integration with other sciences, willingness to revise in light of evidence, openness to outside checking, the absence of immune-from-criticism core dogmas — held to varying degrees.
The practical demarcation question matters in courts (the 1993 Daubert decision on expert testimony), in school curricula (the 2005 Kitzmiller decision on intelligent design), and in public-health contexts. The philosophical literature has fed those decisions; it has not produced a single clean criterion.
Beginning around 2011, a series of large-scale replication projects in psychology (the Open Science Collaboration's Estimating the reproducibility of psychological science, Science, 2015), cancer biology, and economics found that a substantial fraction of published findings did not replicate.
The methodological diagnosis was reasonably clear. Underpowered studies, flexible analysis pipelines (the "garden of forking paths," Andrew Gelman), publication bias toward positive results, p-hacking, and the absence of pre-registration combined to produce a literature richer in false positives than its formal statistics suggested.
The philosophical implications are still being absorbed. Popperian falsificationism took for granted that failed predictions led to abandoned theories; the replication crisis showed that whole subfields could maintain false beliefs for decades. Bayesian frameworks need to take seriously how priors and likelihoods are actually estimated in practice. The naturalised epistemologist's claim — that we should study scientific practice as it is — has gained empirical force.
The traditional view — defended by Carnap, by Popper, and by most twentieth-century philosophers — distinguished sharply between epistemic values (truth, predictive accuracy, simplicity) which belonged to science, and non-epistemic values (political, ethical, social) which did not.
The contemporary literature has eroded the distinction. Heather Douglas's Science, Policy, and the Value-Free Ideal (2009) argues that decisions about how much evidence is enough to publish, how to set significance thresholds, how to weight false positives against false negatives — decisions internal to scientific practice — depend on judgements about the costs of error, which are unavoidably non-epistemic.
Feminist philosophers of science (Helen Longino, Sandra Harding, Donna Haraway) had argued similar points earlier from different premises. The convergent conclusion: the ideal of value-free science is unattainable. The serious question is how to make scientific value-judgements transparent, accountable, and revisable, not how to eliminate them.
↑ Thomas Kuhn · The Structure of Scientific Revolutions
Watch · Karl Popper on falsification
Watch · In Our Time · Pascal and the scientific method
Kuhn's Structure, Popper's Conjectures and Refutations, and Hacking's Representing and Intervening — in that order. Read Kuhn for the picture, Popper for the criterion, Hacking for the antidote to too much theory.
Philosophy of Science — Volume II, Deck 12 of The Deck Catalog. Set in Tiempos Text with Inter for small-caps section markers. Off-white #f5f5f0, deep ink, with deep blue #2a4a8a and burnt orange #c46028 accents.
Thirty-one leaves on the discipline that asks what we are doing when we do science. Four centuries of attempts at clean answers. None final. The work continues.
↑ Vol. II · Philosophy · Deck 12