For most of the history of academic publishing, a citation carried a basic implicit guarantee. It meant someone had read a source, engaged with it, and was directing the reader to a real thing. That guarantee has eroded quietly. A new audit makes the scale of the erosion visible.
The audit, published in The Lancet and summarised in Nature, scanned approximately 2.47 million biomedical papers and 97 million references from PubMed Central between January 2023 and February 2026. It found 4,046 fabricated references across 2,810 papers. The rate climbed from around 4 per 10,000 papers in early 2023 to 56.9 per 10,000 by early 2026, a twelvefold increase in three years.
That rate of increase matters more than the absolute count. It strongly suggests AI-assisted writing is the driver, and it means the problem is getting worse faster than publishing infrastructure can respond to it.
What fabricated citations actually look like
The word "fabricated" implies obvious fakery. It shouldn't. These references are not garbled nonsense or obviously wrong titles. They are plausible: correct formatting for the journal style, real-looking author names, believable publication years, titles that fit the topic of the paper citing them. They pass the eyeball test.
This is what makes them particularly dangerous. A reviewer skimming a reference list isn't checking whether each citation resolves to a real document. They're checking whether the reference list looks credible, and a well-formed fabricated reference looks exactly as credible as a real one.
The audit's detection method classified a reference as fabricated when the listed title couldn't be found across major databases, PubMed, Crossref, OpenAlex, Google Scholar. That's a powerful check at scale, though not without limitations: metadata errors, title variants, non-indexed sources, and translation artefacts can produce false positives. At the scale and trajectory the audit documents, however, the signal is hard to dismiss.
Why this is worse in review articles
The audit found that review articles had higher fabrication rates than primary research papers. This is the detail that makes the problem genuinely serious rather than merely embarrassing.
Review articles are not read the same way primary studies are. They're used as shorthand, as authoritative summaries of what a body of literature says. Clinical guidelines are often built on them. A fabricated citation in a primary study is an isolated integrity failure. A fabricated citation in a review that shapes a clinical guideline is a failure that propagates downstream through every decision made on the basis of that guideline.
This is the same cascade logic we described in the previous piece on corroboration versus amplification one weak node in the evidence chain becomes invisible once downstream citations treat it as established. Fabricated references take that failure mode to its logical extreme: the node doesn't just have weak provenance, it has no provenance. It doesn't exist.
What "citation present" used to mean, and doesn't now
The implicit contract of academic citation has always been: if I cite something, it exists, and it supports the point I'm making. The audit shows the first part of that contract breaking down at scale. Citation presence is no longer a meaningful guarantee of source existence.
But source existence was never the same as claim support. Even before AI-generated fabrications became a material concern, a real citation could support a claim weakly, out of context, as a third-hand derivative of the original evidence, or not at all. The fabrication problem makes a bad situation explicit rather than creating it from scratch.
The three-part distinction matters:
Source supports the claim ≠ independent corroboration
Independent corroboration ≠ final truth
Most research tools, including AI-assisted ones, operate implicitly at step 1 or between steps 1 and 2. The Lancet audit is a reminder that step 1 now requires explicit verification, not assumption.
What reference integrity checking actually requires
Checking whether a reference is real turns out to be more layered than it sounds. A genuine reference integrity check has to answer several distinct questions, not just one:
Does the reference exist at all?
Does a document with this title, these authors, this venue, and this year appear in major bibliographic databases, PubMed, Crossref, OpenAlex, arXiv, Google Scholar? This is the check the Lancet audit operationalised. It's necessary but not sufficient.
Do the identifiers match?
A DOI or PMID should resolve to a specific document. Does the identifier in the citation match the title and authors claimed? A real DOI pointing to a different paper is still a fabricated citation for the purpose it's being used for.
Is the source accessible and verifiable?
A metadata record existing is not the same as the full text being verifiable. For claims that depend on specific findings, the source needs to be accessible enough to confirm it actually contains what's attributed to it.
Does the source contain the cited claim?
This is where reference integrity shades into claim support verification. A real paper that exists and resolves correctly can still be misrepresented, the cited finding might be a minor caveat in the original, or contradict the use being made of it, or come from a different study referenced within the cited paper rather than the paper itself.
Is the source independent?
Even a real, verifiable, claim-supporting source can be part of a citation cascade, traced back to a single original that all subsequent citations are derived from. Independence has to be assessed at the level of research lineage, not just document existence.
What this means for AI-assisted research tools
The irony is clean: AI tools are generating fabricated citations, and AI tools are being used to do research that relies on those citations. A research assistant that confidently summarises a body of literature has no intrinsic mechanism to check whether the papers it's drawing on are real. It was trained on text. If the text contained fabricated references, those references are now part of what it learned from.
This isn't a problem that more powerful language models solve. A more capable model that fabricates citations less often is still fabricating at some rate, and one fabricated citation in a clinical guideline context is one too many. The solution isn't a better model. It's a verification step that sits outside the model and checks its outputs against authoritative bibliographic sources.
The audit's detection method, cross-checking titles against PubMed, Crossref, OpenAlex, and Google Scholar, is exactly this kind of external verification step. It works because it doesn't trust the model's output; it checks it. That's the architectural posture that matters: treat model-generated citations as candidates to be verified, not facts to be accepted.
How the community is responding, and why it's not enough yet
The audit found that at the time of scanning, 98.4% of the affected papers had received no response from their publishers. The fabrications were present, documented in many cases by the researchers, and largely ignored by the publishing infrastructure meant to catch them.
Some platforms are moving. arXiv tightened its sanctions for unchecked LLM output in manuscripts, including hallucinated sources, threatening offending authors with a one-year ban. An analysis of accepted NeurIPS 2025 papers found that even top AI conferences struggle to catch fabricated citations reliably — the problem isn't confined to lower-tier publications or less rigorous fields.
CiteAudit, an open-source tool for automated citation checking, offers one practical countermeasure. It also illustrates the underlying difficulty: commercial language models are poor at catching their own reference problems. The tool that generates fabricated citations is structurally ill-placed to verify them. External checking, against authoritative bibliographic databases rather than model memory, is what the task requires.
The researchers themselves recommend four steps: automated reference checks before peer review, integrity metadata embedded in article datasets, retroactive screening of already-published papers, and a dedicated "fabricated references" category in research integrity databases. It's worth noting that the researchers used Claude for code development and grammar checking during the study — a distinction that matters. AI assistance for well-defined tasks with verifiable outputs is not the same as AI generation of citations, which have no intrinsic verification mechanism.
The broader pattern
The fabrication problem is new in scale but not in kind. Research has always had to grapple with citations used out of context, secondary sources misrepresenting primary findings, and evidence cascades where volume substitutes for independence. What AI generation has done is industrialise the production of a failure mode that previously required human effort to create.
The appropriate response isn't to stop using AI in research workflows. It's to build verification into those workflows at the right points, not as an optional quality check, but as a mandatory gate that outputs don't pass until they've cleared it.
A citation used to carry an implicit guarantee. That guarantee is gone. Explicit verification is what replaces it.