Scan Model Output
Check generated model output before users, workflows, or agents act on it.
Goal
Scan AI output before it reaches the user or a downstream tool.
Use this for chat responses, generated summaries, extraction results, agent tool output, draft emails, risk decisions, and claim recommendations.
Architecture
- Scan the user input with
scan_phase=input. - Run the model only if input routing allows it.
- Scan the model output with
scan_phase=output. - Reuse the input scan's
scan_group_id. - Show, redact, review, or block the output.
Request And Response
export async function scanModelOutput({
output,
originalPrompt,
scanGroupId,
sessionId,
dataSensitivity = "strict",
}: {
output: string;
originalPrompt: string;
scanGroupId: string;
sessionId: string;
dataSensitivity?: "standard" | "tolerant" | "strict";
}) {
const response = await fetch("https://gateway.trymighty.ai/v1/scan", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.MIGHTY_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
content: output,
content_type: "text",
scan_phase: "output",
mode: "secure",
focus: "both",
profile: "ai_safety",
data_sensitivity: dataSensitivity,
scan_group_id: scanGroupId,
session_id: sessionId,
original_prompt: originalPrompt,
}),
});
if (!response.ok) {
throw new Error(`Mighty output scan failed with ${response.status}`);
}
return response.json();
}Use data_sensitivity=strict for public user-visible output. Use data_sensitivity=tolerant only for internal summaries, claim notes, or reviewer-only output where business PII is expected.
threats is an array of objects with category, confidence, an optional evidence excerpt (the substring that triggered the rule), and a human-readable reason. When redacted_output is present and policy allows substitution, prefer it over the raw model text.
Routing Logic
export function routeModelOutput(scan: {
action: string;
redacted_output?: string;
}) {
if (scan.action === "ALLOW") {
return { type: "show_original" as const };
}
if (scan.action === "WARN") {
return { type: "show_safe_fallback_and_review" as const };
}
if (scan.redacted_output) {
return { type: "show_redacted" as const, content: scan.redacted_output };
}
return { type: "block" as const };
}Output Tolerance
Generated output needs a tolerance setting because the same text can be acceptable in an internal review note and unacceptable in a public assistant answer.
| Output surface | Suggested settings | Routing |
|---|---|---|
| Public assistant answer | profile=ai_safety, data_sensitivity=strict | Show ALLOW. Show redacted_output when returned. Block otherwise. |
| Internal claim summary | profile=balanced, data_sensitivity=tolerant | Show ALLOW. Send WARN to review. Stop automation on BLOCK. |
| OCR or IDP summary | focus=both, data_sensitivity=tolerant | Keep file scan, OCR scan, and summary scan connected by scan_group_id. |
| Agent tool output | profile=ai_safety or code_assistant, data_sensitivity=standard | Keep WARN and BLOCK out of model context unless reviewed. |
| High-stakes recommendation | profile=strict, data_sensitivity=strict | Route WARN, BLOCK, and indeterminate to review. |
tolerant does not mean unsafe output is allowed. It means expected business PII should not block by itself.
Common Mistakes
- Scanning only the prompt. Model output can leak secrets, repeat unsafe instructions, or turn suspicious OCR into trusted wording.
- Creating a new
scan_group_idfor output. Reuse the input scan group. - Showing output while the output scan is still pending.
- Using
data_sensitivity=tolerantfor public AI output. - Failing open on scan errors in high-risk flows.
Production Checklist
- Always keep input and output scans connected with
scan_group_id. - Store
original_promptwhen useful for audit. - Use
profile=ai_safetyfor public AI surfaces. - Use
data_sensitivity=strictfor public output. - Use
data_sensitivity=tolerantonly for internal PII-heavy output. - Use a safe fallback if the output scan fails.
- Do not show BLOCK output to users.
- Add tests for redacted output and unsafe output paths.
Ready to scan real traffic?
Create an API key, keep it on your server, then wire Mighty into the workflow that handles untrusted material.
AI-Agent Prompt
Paste this into Cursor, Codex, Claude Code, or Windsurf.
Add Mighty output scanning around model responses.
Requirements:
- The app already scans user input or must add that first.
- Reuse the input scan_group_id when scanning output.
- Send output to POST https://gateway.trymighty.ai/v1/scan.
- Use content_type=text, scan_phase=output, mode=secure, focus=both, profile=ai_safety.
- Use data_sensitivity=strict for public output.
- Use data_sensitivity=tolerant only for internal output where business PII is expected.
- Include original_prompt when available.
- Route ALLOW to show output.
- Route WARN to safe fallback plus review.
- Route BLOCK to redacted_output if present, otherwise block.
Acceptance criteria:
- Output never reaches the user before scan routing.
- Tests cover ALLOW, WARN, BLOCK, redacted_output, and scan failure.
- Logs connect input and output scans by scan_group_id.