A citation used to be evidence. It isn't anymore, Epistamate

A Lancet audit scanned 2.5 million biomedical papers and found fabricated citations rising 12x in three years. The fake references look real, correct formatting, genuine author names, plausible titles. Peer review isn't catching them. Neither are most AI research tools. Here's what reference integrity checking actually requires.

For most of the history of academic publishing, a citation carried a basic implicit guarantee. It meant someone had read a source, engaged with it, and was directing the reader to a real thing. That guarantee has eroded quietly. A new audit makes the scale of the erosion visible.

The audit, published in The Lancet and summarised in Nature, scanned approximately 2.47 million biomedical papers and 97 million references from PubMed Central between January 2023 and February 2026. It found 4,046 fabricated references across 2,810 papers. The rate climbed from around 4 per 10,000 papers in early 2023 to 56.9 per 10,000 by early 2026, a twelvefold increase in three years.

2.5M

biomedical papers scanned across PubMed Central

12×

rise in fabricated citation rate, 2023 to early 2026

4,046

fabricated references found across 2,810 papers

That rate of increase matters more than the absolute count. It strongly suggests AI-assisted writing is the driver, and it means the problem is getting worse faster than publishing infrastructure can respond to it.

What fabricated citations actually look like

The word "fabricated" implies obvious fakery. It shouldn't. These references are not garbled nonsense or obviously wrong titles. They are plausible: correct formatting for the journal style, real-looking author names, believable publication years, titles that fit the topic of the paper citing them. They pass the eyeball test.

This is what makes them particularly dangerous. A reviewer skimming a reference list isn't checking whether each citation resolves to a real document. They're checking whether the reference list looks credible, and a well-formed fabricated reference looks exactly as credible as a real one.

The audit's detection method classified a reference as fabricated when the listed title couldn't be found across major databases, PubMed, Crossref, OpenAlex, Google Scholar. That's a powerful check at scale, though not without limitations: metadata errors, title variants, non-indexed sources, and translation artefacts can produce false positives. At the scale and trajectory the audit documents, however, the signal is hard to dismiss.

Why this is worse in review articles

The audit found that review articles had higher fabrication rates than primary research papers. This is the detail that makes the problem genuinely serious rather than merely embarrassing.

Review articles are not read the same way primary studies are. They're used as shorthand, as authoritative summaries of what a body of literature says. Clinical guidelines are often built on them. A fabricated citation in a primary study is an isolated integrity failure. A fabricated citation in a review that shapes a clinical guideline is a failure that propagates downstream through every decision made on the basis of that guideline.

This is the same cascade logic we described in the previous piece on corroboration versus amplification one weak node in the evidence chain becomes invisible once downstream citations treat it as established. Fabricated references take that failure mode to its logical extreme: the node doesn't just have weak provenance, it has no provenance. It doesn't exist.

What "citation present" used to mean, and doesn't now

The implicit contract of academic citation has always been: if I cite something, it exists, and it supports the point I'm making. The audit shows the first part of that contract breaking down at scale. Citation presence is no longer a meaningful guarantee of source existence.

But source existence was never the same as claim support. Even before AI-generated fabrications became a material concern, a real citation could support a claim weakly, out of context, as a third-hand derivative of the original evidence, or not at all. The fabrication problem makes a bad situation explicit rather than creating it from scratch.

The three-part distinction matters:

The evidence ladder, each step is necessary, none is sufficient

1.Reference exists, the cited work is a real document

The fabrication problem sits here. Most tools don't check this.

2.Source supports the claim, the document actually contains the cited finding

A real paper can be misrepresented. Existence doesn't imply relevance or support.

3.Independent corroboration, separate research paths arrive at the same conclusion

Volume of citation doesn't imply independence. A cascade of real papers can all trace to one weak original.

Reference exists ≠ source supports the claim
Source supports the claim ≠ independent corroboration
Independent corroboration ≠ final truth

Most research tools, including AI-assisted ones, operate implicitly at step 1 or between steps 1 and 2. The Lancet audit is a reminder that step 1 now requires explicit verification, not assumption.

What reference integrity checking actually requires

Checking whether a reference is real turns out to be more layered than it sounds. A genuine reference integrity check has to answer several distinct questions, not just one:

Does the reference exist at all?

Does a document with this title, these authors, this venue, and this year appear in major bibliographic databases, PubMed, Crossref, OpenAlex, arXiv, Google Scholar? This is the check the Lancet audit operationalised. It's necessary but not sufficient.

Do the identifiers match?

A DOI or PMID should resolve to a specific document. Does the identifier in the citation match the title and authors claimed? A real DOI pointing to a different paper is still a fabricated citation for the purpose it's being used for.

Is the source accessible and verifiable?

A metadata record existing is not the same as the full text being verifiable. For claims that depend on specific findings, the source needs to be accessible enough to confirm it actually contains what's attributed to it.

Does the source contain the cited claim?

This is where reference integrity shades into claim support verification. A real paper that exists and resolves correctly can still be misrepresented, the cited finding might be a minor caveat in the original, or contradict the use being made of it, or come from a different study referenced within the cited paper rather than the paper itself.

Is the source independent?

Even a real, verifiable, claim-supporting source can be part of a citation cascade, traced back to a single original that all subsequent citations are derived from. Independence has to be assessed at the level of research lineage, not just document existence.

The practical implication Any research workflow that treats "citation present" as a validity signal is now operating on an assumption that has demonstrably broken down. Reference existence needs to become an explicit checkpoint, not a background assumption, and that checkpoint is only the first in a sequence that ends at independent corroboration.

What this means for AI-assisted research tools

The irony is clean: AI tools are generating fabricated citations, and AI tools are being used to do research that relies on those citations. A research assistant that confidently summarises a body of literature has no intrinsic mechanism to check whether the papers it's drawing on are real. It was trained on text. If the text contained fabricated references, those references are now part of what it learned from.

This isn't a problem that more powerful language models solve. A more capable model that fabricates citations less often is still fabricating at some rate, and one fabricated citation in a clinical guideline context is one too many. The solution isn't a better model. It's a verification step that sits outside the model and checks its outputs against authoritative bibliographic sources.

The audit's detection method, cross-checking titles against PubMed, Crossref, OpenAlex, and Google Scholar, is exactly this kind of external verification step. It works because it doesn't trust the model's output; it checks it. That's the architectural posture that matters: treat model-generated citations as candidates to be verified, not facts to be accepted.

How the community is responding, and why it's not enough yet

The audit found that at the time of scanning, 98.4% of the affected papers had received no response from their publishers. The fabrications were present, documented in many cases by the researchers, and largely ignored by the publishing infrastructure meant to catch them.

Some platforms are moving. arXiv tightened its sanctions for unchecked LLM output in manuscripts, including hallucinated sources, threatening offending authors with a one-year ban. An analysis of accepted NeurIPS 2025 papers found that even top AI conferences struggle to catch fabricated citations reliably — the problem isn't confined to lower-tier publications or less rigorous fields.

CiteAudit, an open-source tool for automated citation checking, offers one practical countermeasure. It also illustrates the underlying difficulty: commercial language models are poor at catching their own reference problems. The tool that generates fabricated citations is structurally ill-placed to verify them. External checking, against authoritative bibliographic databases rather than model memory, is what the task requires.

The researchers themselves recommend four steps: automated reference checks before peer review, integrity metadata embedded in article datasets, retroactive screening of already-published papers, and a dedicated "fabricated references" category in research integrity databases. It's worth noting that the researchers used Claude for code development and grammar checking during the study — a distinction that matters. AI assistance for well-defined tasks with verifiable outputs is not the same as AI generation of citations, which have no intrinsic verification mechanism.

The response gap 98.4% of papers with fabricated citations had received no publisher response at the time of audit. The detection methods exist. The institutional response infrastructure does not yet match the scale of the problem.

The broader pattern

The fabrication problem is new in scale but not in kind. Research has always had to grapple with citations used out of context, secondary sources misrepresenting primary findings, and evidence cascades where volume substitutes for independence. What AI generation has done is industrialise the production of a failure mode that previously required human effort to create.

The appropriate response isn't to stop using AI in research workflows. It's to build verification into those workflows at the right points, not as an optional quality check, but as a mandatory gate that outputs don't pass until they've cleared it.

A citation used to carry an implicit guarantee. That guarantee is gone. Explicit verification is what replaces it.

A citation used to be evidence.It isn't anymore.

What fabricated citations actually look like

Why this is worse in review articles

What "citation present" used to mean, and doesn't now

What reference integrity checking actually requires

Does the reference exist at all?

Do the identifiers match?

Is the source accessible and verifiable?

Does the source contain the cited claim?

Is the source independent?

What this means for AI-assisted research tools

How the community is responding, and why it's not enough yet

The broader pattern

A citation used to be evidence.
It isn't anymore.