Proof-Carrying Generation for Multi-Agent Systems

Every accepted answer is split into atomic claims and audited along five channels.

Submit a question with optional evidence — pasted text, a URL, or any uploaded document — and watch the demo-runtime PCG-MAS pipeline produce per-claim certificates, mask-and-replay responsibility scores, audit envelopes, and a calibrated risk-aware action in real time.

demo runtime 5 channels BYOK no telemetry

Backend

API key Top-k evidence

Run V_R replay

Input

Question / claim

Context (optional)

Or load a curated example

Aggregate pipeline

V_I · Integrity

V_R · Replay

V_D · Drift

V_Ch · Checker

V_Cov · Coverage

idle running verified rejected

V_I Integrity (evidence commitment) · V_R Replay (support-pipeline) · V_D Drift (semantic replay drift) · V_Ch Checker (entailment + execution contract) · V_Cov Coverage (cited support covers the claim)

Run PCG-MAS to extract atomic claims and audit them along the 5 channels.

Certificate Inspector

Per-claim verdict matrix

Claim	V_I	V_R	V_D	V_Ch	V_Cov	Action	Hash

Full structured certificate

Risk-aware decision

Action chosen —

Posterior risk r —

Dominant failure —

Expected cost per action

Action	Expected cost C(a,r)	Residual risk

Per-channel audit envelopes

Bootstrap / Wilson-score confidence intervals on each channel's pass rate across this run's claims. Tight CIs at small N use Wilson; larger N switches to paired bootstrap (paper appendix).

Five-channel checker

Each atomic claim is audited along five independent channels. A claim is accepted only if all five channels pass.

Symbol	Channel	Role
`V_I`	Integrity	Evidence-commitment check: recomputes SHA-256 of cited supports; fails on any tamper.
`V_R`	Replay	Re-runs the support-pipeline proposer; fails if the replay cannot reconstruct an answer.
`V_D`	Drift	Semantic-equivalence judge between original claim and replay; fails on drift below the threshold.
`V_Ch`	Checker	Entailment + execution-contract: cited evidence must logically entail the claim, no contradictions.
`V_Cov`	Coverage	Substantive coverage judge: cited evidence must address the claim's propositions, not merely overlap topically.

Event choreography

One /api/run call streams these events over SSE, in this order:

start → evidence → claim → channel × 5N → claim_cert × N → responsibility × N → audit_envelope × 5 → risk → certificate done

N is the number of atomic claims extracted. channel events fire twice per (claim, channel) pair: first pending, then pass or fail.

Risk-aware control regimes

Posterior false-accept risk r ∈ [0,1] is computed from the per-claim acceptance pattern and confidences. The threshold policy (Theorem 3, part ii) selects one of four actions by minimising expected cost C(a, r):

Range of r	Action	Meaning
`r < 0.20`	Answer	Risk is low; emit the answer directly.
`0.20 ≤ r < 0.55`	Verify	Run a second-pass verification before committing.
`0.55 ≤ r < 0.85`	Escalate	Hand off to a stronger verifier or human review.
`r ≥ 0.85`	Refuse	Decline to answer; residual risk too high.

Proof-Carrying Generation for Multi-Agent Systems

Per-claim verdict matrix

Full structured certificate

Expected cost per action

Scope of this Space

Privacy & data

Source code