Research sessions end, files are archived, and the next person starts from zero. A claim that should have been contested gets into the final report because nobody tracked the conflicting source. Six months later, the same question is being answered again.
This is not a technology problem. It is a structural problem with how evidence-based work is done — in consulting, in policy, in investigations, in accountability work. The format changes; the problem doesn't.
Epistamate is one implementation of a more general engine. This page describes the engine.
Claims are structured objects — text, status, computed confidence, credibility tier, citations, evidence type — not free-text summaries. Each claim is individually addressable: it can be verified, contested, weakened, or carried forward independently.
Confidence is computed from a deterministic formula — source tier, consensus across providers, adversarial challenge outcome, evidence recency — not LLM self-report. Range [5, 95]. LLM-reported confidence is recorded in audit metadata but never used in the score.
Mandatory Phase 3 runs before synthesis. Claims that don't survive lose their socratic bonus and are reassessed. Challenges are persisted as typed output — not discarded after scoring. The brief reflects what survived scrutiny, not what sounded best.
Knowledge gaps are first-class objects with importance ratings. They accumulate across sessions and narrow as evidence arrives. The reader knows where the brief stops being reliable — not as a disclaimer, but as a structured finding.
VERIFIED claims from session N reduce re-verification burden in session N+1. The knowledge graph accumulates with use. Contradictions between sessions are preserved, not silently resolved. Session five builds on sessions one through four.
Synthesis direction (Question → Brief) and Verification direction (Document → Decision Record) share the same graph, formula, and adversarial mechanism. Ingest an authoritative report; its claims enter the same evidence quality system as retrieved sources.
Source trust hierarchy, claim type vocabulary, scoring weights, and output format are runtime parameters — not prompts or hardcoded logic. The same binary runs policy research, investment due diligence, and regulatory compliance with no architectural change.
None of these domains currently has a tool that does what the engine does. Each is doing the equivalent work manually — in spreadsheets, committee documents, and institutional reports that nobody reads systematically three years later.
Dozens of UN agencies produce simultaneous situation reports on the same crisis. The claims conflict. Nobody tracks which ones are established. OCHA's Humanitarian Needs Overviews are evidence synthesis documents built under time pressure with no structured memory between crises.
A claim circulates in 40 sources. All 40 trace back to one original. That's amplification, not corroboration — but standard tools can't tell the difference. Fact-checking organisations and digital forensics labs do structured evidence work that needs to be itself defensible.
Government surveys, community testimonies, environmental studies, and corporate reports all exist in the same dispute. Contradictions between them are resolved by institutional power, not by evidence quality.
Treaty bodies assess state compliance claims cycle after cycle. Previous findings sit in PDF reports. Each new cycle starts from near-zero institutional memory. States report; bodies assess. Currently this is manual evidence synthesis with no compounding knowledge.
A company claims 95% accuracy. An independent study finds 60% on a specific demographic. Both claims exist in the public record. Neither is resolved — they just accumulate. Civil society and government auditors assessing AI harm need structured evidence work.
Government and defence procurement decisions are made over 18 months, across teams, based on claims that need to be traceable to source when the auditor arrives two years later. Capability, price, risk, compliance — each requires provenance, each is subject to challenge.
The best strategy research already works the way the engine works: individual claims are sourced and graded, contradictions between data points are noted, gaps in the evidence are named, and the final recommendation is honest about its confidence level. What it doesn't do is carry that structure forward to the next engagement, the next client, the next analyst who joins the team.
The structured brief a senior consultant produces for a board is a claim vault — it just doesn't look like one, and it evaporates when the project ends. The engine is what that process looks like when the institutional memory is preserved rather than PDF'd into an archive.
The EU AI Act, UNESCO's AI Ethics Recommendation, and a growing number of national frameworks share one underlying requirement: AI used in high-stakes contexts must be explainable and traceable — not just technically, but epistemically. The question is not only "what did the system output" but what evidence did it draw on, where did that evidence conflict, and what was uncertain when the decision was made.
Every claim carries its source tier, citations, and confidence derivation. Not a summary — a structured assertion with provenance.
Weak and contested findings surface explicitly. The brief reflects what the evidence supports — not what sounds most authoritative.
When a decision is logged, the full evidence state is preserved at that moment — verified, contested, gaps acknowledged. Article 12 record-keeping as a byproduct.
Marking a claim as evidence-grounded raises its confidence. The system reflects researcher judgment, not just model output.
The claim extraction, confidence scoring, contradiction detection, gap tracking, and decision logging are domain-agnostic. What changes between implementations is the source tier definitions, the claim type vocabulary, and the output artefact format.
We're in active development and talking to organisations where defensible, compounding evidence matters. If you recognise your domain in this page, we'd like to hear about the specific problem before we describe the solution.