Config Decisions

Choose mode, focus, scan phase, profile, data sensitivity, request IDs, and scan groups.

Mighty config should explain intent. Start with safe defaults, then tighten only when the workflow needs it.

Recommended default:

{
  "content_type": "auto",
  "scan_phase": "input",
  "mode": "secure",
  "focus": "both",
  "profile": "balanced",
  "data_sensitivity": "standard"
}

`content_type`

What it does: tells Mighty which modality you are sending.

When to use it:

Value	Use when
`auto`	Your server does not know the type yet, or you want Mighty to detect it.
`text`	Chat text, OCR text, extracted fields, model output, tool output, or notes.
`image`	Damage photos, identity images, receipt photos, screenshots, or visual evidence.
`pdf`	PDF claim packets, invoices, estimates, forms, or statements.
`document`	Office documents or uploaded business documents.

Default value: auto.

Example request:

{
  "content": "Extracted OCR text",
  "content_type": "text",
  "scan_phase": "input",
  "mode": "secure"
}

Common mistake: using text for an uploaded image or PDF before extraction. Scan the original file when possible, then scan extracted text with the same scan_group_id.

Supported Uploads And Limits

Use content_type for the material you send to Mighty. If a PDF contains images, still send it as pdf. Mighty scans page content and accounts for unique embedded images separately.

Limits can differ by plan and deployment. These are the product defaults developers should design around.

Material	Common inputs	Use `content_type`	Limits and billing notes
Text	JSON strings, chat messages, OCR text, extracted fields, model output, tool output, `.txt`, SVG text	`text`	Text bills as 1 SCU per 1,000 tokens, rounded up. Base64 decoded content shares the 50 MB decoded payload limit.
Images	`.jpg`, `.jpeg`, `.png`, `.webp`, `.gif`, `.bmp`, `.tif`, `.tiff`, `.heic`, `.heif`, `.ico`, `.cur`	`image` or `auto`	Standalone images bill as 4 SCU per image. Default upload limit is 50 MB. Default image cap is 100,000,000 pixels. Default GIF cap is 200 frames.
PDFs	`.pdf`	`pdf` or `auto`	PDFs bill as 2 SCU per page plus 4 SCU per unique embedded image. Pro allows up to 1,000 pages and 100 unique embedded images per PDF. Free preview allows 4 pages and 1 unique embedded image.
Documents	`.docx`, `.xlsx`, `.pptx`, `.odt`, `.ods`, `.odp`, `.rtf`, `.html`, `.htm`, `.csv`, `.tsv`, `.ipynb`, `.eml`, mail-like `.msg`	`document` or `auto`	Default upload limit is 50 MB. Default unzipped document safety cap is 50 MB. Macro-enabled, encrypted, legacy Office, add-in, and template files can be rejected.
Audio	Transcript text today. Audio file scanning is closed beta.	`text` for transcripts	Do not send audio files unless your account is beta-enabled. Scan transcripts as text and set metadata like `source=audio_transcript`.

When the type is unknown, use auto. When your server already knows the type, set the explicit value. Explicit values produce clearer failures and make routing, billing, and logs easier to understand.

Common rejections:

Status	Code	What it means
`400`	`invalid_pdf`, `invalid_document`, invalid image format, or unsupported enum value	The file does not match the declared type, the parser cannot safely process it, or a config value is invalid.
`402`	`tier_cap_exceeded`, `tier_pdf_pages_exceeded`, `tier_pdf_embedded_images_exceeded`	The scan is valid, but the current plan does not allow that request size or billing state.
`413`	`payload_too_large`, `image_pixel_limit`, `gif_frame_limit`, `pdf_page_limit`, `document_unzip_limit`	The file is too large or too complex for the configured safety limits.

Common mistake: converting a PDF to plain text to reduce cost, then treating the result as equivalent. That can miss embedded images, hidden text, suspicious layout signals, and document-level attack surfaces. If cost matters, scan the original file for high-risk workflows and scan extracted text for lower-risk enrichment.

`mode`

What it does: chooses the scan depth and latency target.

When to use it:

Value	Use when	Tradeoff
`fast`	Inline chat or low-risk text needs a quick decision.	Lowest latency, less depth.
`secure`	Production default for most apps.	Balanced latency and coverage.
`comprehensive`	Deep image or PDF review is worth more latency.	More depth, higher cost, required for async.

Default value: secure.

Example request:

{
  "content": "Claim note text",
  "content_type": "text",
  "scan_phase": "input",
  "mode": "secure"
}

Common mistake: using comprehensive for every chat message. Start with secure and reserve deep review for images, PDFs, high-value cases, or suspicious workflows.

Mode is not tolerance. mode controls scan depth. profile, data_sensitivity, and your routing policy control how strict the product is after Mighty returns a result. See Modes And Tolerance before you tune production routing.

`focus`

What it does: chooses which family of checks gets priority.

When to use it:

Value	Use when
`standard`	You mainly need threat and safety checks.
`ai`	You mainly need AI authenticity or AI fraud signals.
`both`	You need threat checks and AI signals together.

Default value: standard.

Example request:

{
  "content": "Base64 image or extracted text",
  "content_type": "image",
  "scan_phase": "input",
  "mode": "secure",
  "focus": "both"
}

Common mistake: using focus=ai as a fraud verdict. Mighty flags suspicious evidence. It does not prove fraud by itself.

AI Involvement Metadata

What it does: preserves workflow context that is useful for review, logs, and AI coding agents.

When to use it:

Metadata key	Use when
`ai_involved`	The material will be used by a model, agent, OCR automation, or AI review step.
`submitted_as_ai_generated`	Your app asks the submitter whether the material was AI-generated or edited.
`workflow`	You need to distinguish chat, claims, OCR, image review, invoices, or agent tools.

Default value: none.

Example request:

{
  "content": "Uploaded image or extracted text",
  "content_type": "auto",
  "scan_phase": "input",
  "focus": "both",
  "metadata": {
    "workflow": "damage_photo_review",
    "ai_involved": "true",
    "submitted_as_ai_generated": "unknown"
  }
}

Common mistake: treating app metadata as a detection result. Metadata is your app's context. Mighty response fields like authenticity, forensics, threats, and risk_score are scan evidence.

`scan_phase`

What it does: tells Mighty where the material sits in your workflow.

When to use it:

Value	Use when
`input`	A user, customer, vendor, claimant, partner, or upstream system submitted the material.
`output`	A model, OCR pipeline, extraction pipeline, agent, or automation generated the material.

Default value: none. This field is required.

Example request:

{
  "content": "Generated answer shown to a user",
  "content_type": "text",
  "scan_phase": "output",
  "scan_group_id": "9b3e4f8d-96c9-4f42-8338-8cf9571c1c70"
}

Common mistake: scanning output without scan_group_id. Output scans need the group returned by the input scan.

`profile`

What it does: chooses the risk posture.

When to use it:

Value	Use when
`balanced`	Most production apps.
`strict`	Regulated, financial, insurance, legal, healthcare, or high-value workflows.
`permissive`	Low-risk internal workflows where false positives are more costly.
`code_assistant`	Developer tools and agent code workflows.
`ai_safety`	AI output, public assistants, or agentic systems.

Default value: balanced.

Example request:

{
  "content": "Agent tool output",
  "content_type": "text",
  "scan_phase": "output",
  "mode": "secure",
  "profile": "ai_safety",
  "scan_group_id": "9b3e4f8d-96c9-4f42-8338-8cf9571c1c70"
}

Common mistake: setting permissive because a workflow is noisy. Use tolerant data sensitivity for expected PII instead.

`data_sensitivity`

What it does: controls how expected PII affects blocking.

When to use it:

Value	Use when
`standard`	Default. PII can block unless context allows it.
`tolerant`	Business workflows expect contact details, addresses, claim numbers, or invoices.
`strict`	PII and credentials should block aggressively.

Default value: standard.

Example request:

{
  "content": "Customer: Jane Doe, phone: 555-0100",
  "content_type": "text",
  "scan_phase": "input",
  "mode": "secure",
  "data_sensitivity": "tolerant"
}

Common mistake: using tolerant to bypass credential detection. Credentials and secrets should still be treated as high risk.

Sensitive Data And Redaction

What it does: separates expected business PII from unsafe disclosure paths.

When to use it:

Need	Setting or response field
Claims, invoices, healthcare, identity, or support workflows contain normal PII	`data_sensitivity=tolerant`
Public AI output should not reveal secrets or sensitive text	`data_sensitivity=strict` and `scan_phase=output`
Mighty returns safer replacement text	Use `redacted_output` instead of the original output.
Mighty blocks output and no redaction is available	Do not show the original output.

Default value: data_sensitivity=standard. redacted_output appears only when available.

Example response:

{
  "action": "BLOCK",
  "risk_score": 91,
  "risk_level": "CRITICAL",
  "threats": [
    {
      "category": "secrets_exposure",
      "confidence": 0.96,
      "evidence": "sk_live_8f1c9d4e2ab3",
      "reason": "Output contains a live API key pattern."
    }
  ],
  "redacted_output": "I cannot share that sensitive value.",
  "scan_phase": "output"
}

Common mistake: assuming every block can be safely rewritten. Use redacted_output only when Mighty returns it and your product policy allows it.

IDs And Idempotency

What they do: connect scans to logs, sessions, retries, and downstream review.

Field	Use it for
`request_id`	One unique ID per request. Use it for idempotency and logs.
`scan_id`	The scan result ID returned by Mighty. Use it for audit and polling.
`scan_group_id`	Connect input scans, output scans, and derived evidence.
`session_id`	Keep a chat, claim, case, or workflow together over time.

Default value: Mighty can generate missing request_id, scan_group_id, and session_id for input scans.

Example request:

{
  "content": "Uploaded invoice text",
  "content_type": "text",
  "scan_phase": "input",
  "request_id": "ab82f4ad-8d64-4bb4-b4ed-77df63291198",
  "scan_group_id": "9b3e4f8d-96c9-4f42-8338-8cf9571c1c70",
  "session_id": "claim_18422"
}

Common mistake: generating a new scan_group_id for model output. Reuse the input scan group so the evidence stays connected.

Next step

Ready to scan real traffic?

Create an API key, keep it on your server, then wire Mighty into the workflow that handles untrusted material.

Get an API key Choose mode