Scan Text And OCR Output
Use Mighty on plain text, extracted text, OCR output, and IDP fields before downstream automation trusts them.
Goal
Scan text before it reaches AI, search, indexing, workflow automation, or a human review queue.
This is the right guide when you already have text. That text can come from a form, chat message, OCR engine, IDP pipeline, email parser, PDF extractor, or agent tool.
Architecture
- Receive text from the user or pipeline.
- Call
POST /v1/scanwithcontent_type=text. - Store
scan_id,request_id,scan_group_id, andaction. - Route ALLOW, WARN, or BLOCK.
- Only send safe or reviewed text to downstream AI or automation.
Request And Response
export async function scanText(content: string, workflowId: string) {
const response = await fetch("https://gateway.trymighty.ai/v1/scan", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.MIGHTY_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
content,
content_type: "text",
scan_phase: "input",
mode: "secure",
focus: "both",
profile: "balanced",
data_sensitivity: "tolerant",
session_id: workflowId,
metadata: {
source: "ocr_output",
},
}),
});
if (!response.ok) {
throw new Error(`Mighty scan failed with ${response.status}`);
}
return response.json();
}Use data_sensitivity=tolerant when OCR text normally contains names, addresses, phone numbers, policy numbers, invoice numbers, or claim details.
threats is an array of objects with category, confidence, an optional evidence excerpt, and a human-readable reason.
Routing Logic
export function routeScannedText(scan: { action: string }) {
switch (scan.action) {
case "ALLOW":
return { decision: "continue" as const };
case "WARN":
return { decision: "queue_review" as const };
case "BLOCK":
return { decision: "stop" as const };
default:
return { decision: "queue_review" as const };
}
}AI Fraud And OCR
OCR text can carry hidden or altered instructions. A PDF can contain normal visible text plus hidden text that says to ignore policy, approve a claim, or exfiltrate data. Mighty helps catch those signals before your AI or IDP system treats the text as trusted.
Honest wording for your product: "This item was flagged for review." Avoid saying "This item is fraudulent" unless your own review process confirms it.
Production Checklist
- Keep original text and scanned text versioned if your retention policy allows it.
- Store
scan_group_idwith the workflow record. - Send model output from the same workflow with
scan_phase=output. - Log
request_idandscan_id. - Route unknown scan errors to review when the workflow is high risk.
- Use
data_sensitivity=tolerantfor expected contact PII.
Ready to scan real traffic?
Create an API key, keep it on your server, then wire Mighty into the workflow that handles untrusted material.
AI-Agent Prompt
Paste this into Cursor, Codex, Claude Code, or Windsurf.
Add Mighty scanning to the OCR or IDP pipeline.
Requirements:
- Use server env MIGHTY_API_KEY.
- Call POST https://gateway.trymighty.ai/v1/scan.
- Send extracted text with content_type=text and scan_phase=input.
- Use mode=secure, focus=both, data_sensitivity=tolerant.
- Store scan_id, request_id, scan_group_id, session_id, action, risk_score, and threats.
- Route ALLOW to continue.
- Route WARN to human review.
- Route BLOCK to stop or require manual override.
- Do not send WARN or BLOCK content to downstream AI without review.
Acceptance criteria:
- Unit tests cover ALLOW, WARN, BLOCK.
- Integration code never exposes MIGHTY_API_KEY to the browser.
- Logs include request_id and scan_id.