Proof-Carrying Generation for Multi-Agent Systems
Every accepted answer is split into atomic claims and audited along five channels.
Submit a question with optional evidence — pasted text, a URL, or any uploaded document — and watch the demo-runtime PCG-MAS pipeline produce per-claim certificates, mask-and-replay responsibility scores, audit envelopes, and a calibrated risk-aware action in real time.
Drag & drop a file here, or click the button above.
PDF, DOCX, CSV, XLSX, MD, TXT, JSON, HTML
Currently only one file can be uploaded at a time.
V_I Integrity (evidence commitment) · V_R Replay (support-pipeline) · V_D Drift (semantic replay drift) · V_Ch Checker (entailment + execution contract) · V_Cov Coverage (cited support covers the claim)
Run PCG-MAS to extract atomic claims and audit them along the 5 channels.
No certificate yet. Run PCG-MAS first.
Per-claim verdict matrix
| Claim | V_I | V_R | V_D | V_Ch | V_Cov | Action | Hash |
|---|
Full structured certificate
Runs the selected model on the same input under 1 ORIGINAL + 8 canonical adversarial perturbations. Each variant goes through the full 5-channel pipeline. Output rows show the risk action chosen by the controller.
Click the button above to start.
For each accepted claim, every cited component is masked one at a time and the certificate is re-evaluated. Components whose removal flips the verdict to reject are the ones responsible for the acceptance. Estimates carry distribution-free Hoeffding confidence intervals.
Expected cost per action
| Action | Expected cost C(a,r) | Residual risk |
|---|
Bootstrap / Wilson-score confidence intervals on each channel's pass rate across this run's claims. Tight CIs at small N use Wilson; larger N switches to paired bootstrap (paper appendix).
Run any input — text, URL, or file — and compare the raw LLM answer with the full PCG-MAS pipeline's verified answer + certificate.
Drag & drop a file here, or click the button above.
PDF, DOCX, CSV, XLSX, MD, TXT, JSON, HTML
Currently only one file can be uploaded at a time.
Curated comparisons auto-load the question + the source document. Or fill the fields above manually.
Run the comparison above to see output.
Run the comparison above to see output.
Each atomic claim is audited along five independent channels. A claim is accepted only if all five channels pass.
| Symbol | Channel | Role |
|---|---|---|
V_I | Integrity | Evidence-commitment check: recomputes SHA-256 of cited supports; fails on any tamper. |
V_R | Replay | Re-runs the support-pipeline proposer; fails if the replay cannot reconstruct an answer. |
V_D | Drift | Semantic-equivalence judge between original claim and replay; fails on drift below the threshold. |
V_Ch | Checker | Entailment + execution-contract: cited evidence must logically entail the claim, no contradictions. |
V_Cov | Coverage | Substantive coverage judge: cited evidence must address the claim's propositions, not merely overlap topically. |
One /api/run call streams these events over SSE, in this order:
N is the number of atomic claims extracted. channel events fire twice per
(claim, channel) pair: first pending, then pass or fail.
Posterior false-accept risk r ∈ [0,1] is computed from the
per-claim acceptance pattern and confidences. The threshold policy
(Theorem 3, part ii) selects one of four actions by
minimising expected cost C(a, r):
| Range of r | Action | Meaning |
|---|---|---|
r < 0.20 | Answer | Risk is low; emit the answer directly. |
0.20 ≤ r < 0.55 | Verify | Run a second-pass verification before committing. |
0.55 ≤ r < 0.85 | Escalate | Hand off to a stronger verifier or human review. |
r ≥ 0.85 | Refuse | Decline to answer; residual risk too high. |
PCG-MAS — Proof-Carrying Generation for Multi-Agent Systems — splits every answer into atomic claims and audits each claim along five independent channels. Accepted answers carry a tamper-evident certificate; a risk-aware controller then chooses between Answer, Verify, Escalate, and Refuse.
Scope of this Space
This Hugging Face Space runs a demo-runtime PCG-MAS pipeline
that mirrors the paper algorithms over a compact certificate schema. It is
intentionally lean so it deploys quickly and is easy to audit. The full
experimental implementations used for R1–R5 evaluation remain in
src/pcg/ in the source repository and are used for benchmark
reproduction.
Demo runtime modules: schemas.py, claim_extractor.py,
channels.py, responsibility.py, risk_control.py,
audit_envelopes.py, redundancy.py, pipeline.py.
Privacy & data
No telemetry, no analytics, no key persistence. API keys live in the browser session and in process memory for the duration of a single request only.
Source code
Public repository: github.com/supratik-sarkar/proof-carrying-multi-agents