Browse docs

Scan Text And OCR Output

Use Mighty on plain text, extracted text, OCR output, and IDP fields before downstream automation trusts them.

Goal

Scan text before it reaches AI, search, indexing, workflow automation, or a human review queue.

This is the right guide when you already have text. That text can come from a form, chat message, OCR engine, IDP pipeline, email parser, PDF extractor, or agent tool.

Architecture

  1. Receive text from the user or pipeline.
  2. Call POST /v1/scan with content_type=text.
  3. Store scan_id, request_id, scan_group_id, and action.
  4. Route ALLOW, WARN, or BLOCK.
  5. Only send safe or reviewed text to downstream AI or automation.

Request And Response

export async function scanText(content: string, workflowId: string) {
  const response = await fetch("https://gateway.trymighty.ai/v1/scan", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.MIGHTY_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      content,
      content_type: "text",
      scan_phase: "input",
      mode: "secure",
      focus: "both",
      profile: "balanced",
      data_sensitivity: "tolerant",
      session_id: workflowId,
      metadata: {
        source: "ocr_output",
      },
    }),
  });

  if (!response.ok) {
    throw new Error(`Mighty scan failed with ${response.status}`);
  }

  return response.json();
}

Use data_sensitivity=tolerant when OCR text normally contains names, addresses, phone numbers, policy numbers, invoice numbers, or claim details.

threats is an array of objects with category, confidence, an optional evidence excerpt, and a human-readable reason.

Routing Logic

export function routeScannedText(scan: { action: string }) {
  switch (scan.action) {
    case "ALLOW":
      return { decision: "continue" as const };
    case "WARN":
      return { decision: "queue_review" as const };
    case "BLOCK":
      return { decision: "stop" as const };
    default:
      return { decision: "queue_review" as const };
  }
}

AI Fraud And OCR

OCR text can carry hidden or altered instructions. A PDF can contain normal visible text plus hidden text that says to ignore policy, approve a claim, or exfiltrate data. Mighty helps catch those signals before your AI or IDP system treats the text as trusted.

Honest wording for your product: "This item was flagged for review." Avoid saying "This item is fraudulent" unless your own review process confirms it.

Production Checklist

  • Keep original text and scanned text versioned if your retention policy allows it.
  • Store scan_group_id with the workflow record.
  • Send model output from the same workflow with scan_phase=output.
  • Log request_id and scan_id.
  • Route unknown scan errors to review when the workflow is high risk.
  • Use data_sensitivity=tolerant for expected contact PII.
Next step

Ready to scan real traffic?

Create an API key, keep it on your server, then wire Mighty into the workflow that handles untrusted material.

AI-Agent Prompt

AI-ready prompt
Add OCR output scanning

Paste this into Cursor, Codex, Claude Code, or Windsurf.

Add Mighty scanning to the OCR or IDP pipeline.

Requirements:
- Use server env MIGHTY_API_KEY.
- Call POST https://gateway.trymighty.ai/v1/scan.
- Send extracted text with content_type=text and scan_phase=input.
- Use mode=secure, focus=both, data_sensitivity=tolerant.
- Store scan_id, request_id, scan_group_id, session_id, action, risk_score, and threats.
- Route ALLOW to continue.
- Route WARN to human review.
- Route BLOCK to stop or require manual override.
- Do not send WARN or BLOCK content to downstream AI without review.

Acceptance criteria:
- Unit tests cover ALLOW, WARN, BLOCK.
- Integration code never exposes MIGHTY_API_KEY to the browser.
- Logs include request_id and scan_id.