# Config Decisions

Choose mode, focus, scan phase, profile, data sensitivity, request IDs, and scan groups.

Source URL: https://trymighty.ai/docs/concepts/configs

Mighty config should explain intent. Start with safe defaults, then tighten only when the workflow needs it.

Recommended default:

```json
{
  "content_type": "auto",
  "scan_phase": "input",
  "mode": "secure",
  "focus": "both",
  "profile": "balanced",
  "data_sensitivity": "standard"
}
```

## `content_type`

What it does: tells Mighty which modality you are sending.

When to use it:

| Value | Use when |
| --- | --- |
| `auto` | Your server does not know the type yet, or you want Mighty to detect it. |
| `text` | Chat text, OCR text, extracted fields, model output, tool output, or notes. |
| `image` | Damage photos, identity images, receipt photos, screenshots, or visual evidence. |
| `pdf` | PDF claim packets, invoices, estimates, forms, or statements. |
| `document` | Office documents or uploaded business documents. |

Default value: `auto`.

Example request:

```json
{
  "content": "Extracted OCR text",
  "content_type": "text",
  "scan_phase": "input",
  "mode": "secure"
}
```

Common mistake: using `text` for an uploaded image or PDF before extraction. Scan the original file when possible, then scan extracted text with the same `scan_group_id`.

## Supported Uploads And Limits

Use `content_type` for the material you send to Mighty. If a PDF contains images, still send it as `pdf`. Mighty scans page content and accounts for unique embedded images separately.

Limits can differ by plan and deployment. These are the product defaults developers should design around.

| Material | Common inputs | Use `content_type` | Limits and billing notes |
| --- | --- | --- | --- |
| Text | JSON strings, chat messages, OCR text, extracted fields, model output, tool output, `.txt`, SVG text | `text` | Text bills as 1 SCU per 1,000 tokens, rounded up. Base64 decoded content shares the 50 MB decoded payload limit. |
| Images | `.jpg`, `.jpeg`, `.png`, `.webp`, `.gif`, `.bmp`, `.tif`, `.tiff`, `.heic`, `.heif`, `.ico`, `.cur` | `image` or `auto` | Standalone images bill as 4 SCU per image. Default upload limit is 50 MB. Default image cap is 100,000,000 pixels. Default GIF cap is 200 frames. |
| PDFs | `.pdf` | `pdf` or `auto` | PDFs bill as 2 SCU per page plus 4 SCU per unique embedded image. Pro allows up to 1,000 pages and 100 unique embedded images per PDF. Free preview allows 4 pages and 1 unique embedded image. |
| Documents | `.docx`, `.xlsx`, `.pptx`, `.odt`, `.ods`, `.odp`, `.rtf`, `.html`, `.htm`, `.csv`, `.tsv`, `.ipynb`, `.eml`, mail-like `.msg` | `document` or `auto` | Default upload limit is 50 MB. Default unzipped document safety cap is 50 MB. Macro-enabled, encrypted, legacy Office, add-in, and template files can be rejected. |
| Audio | Transcript text today. Audio file scanning is closed beta. | `text` for transcripts | Do not send audio files unless your account is beta-enabled. Scan transcripts as text and set metadata like `source=audio_transcript`. |

When the type is unknown, use `auto`. When your server already knows the type, set the explicit value. Explicit values produce clearer failures and make routing, billing, and logs easier to understand.

Common rejections:

| Status | Code | What it means |
| --- | --- | --- |
| `400` | `invalid_pdf`, `invalid_document`, invalid image format, or unsupported enum value | The file does not match the declared type, the parser cannot safely process it, or a config value is invalid. |
| `402` | `tier_cap_exceeded`, `tier_pdf_pages_exceeded`, `tier_pdf_embedded_images_exceeded` | The scan is valid, but the current plan does not allow that request size or billing state. |
| `413` | `payload_too_large`, `image_pixel_limit`, `gif_frame_limit`, `pdf_page_limit`, `document_unzip_limit` | The file is too large or too complex for the configured safety limits. |

Common mistake: converting a PDF to plain text to reduce cost, then treating the result as equivalent. That can miss embedded images, hidden text, suspicious layout signals, and document-level attack surfaces. If cost matters, scan the original file for high-risk workflows and scan extracted text for lower-risk enrichment.

## `mode`

What it does: chooses the scan depth and latency target.

When to use it:

| Value | Use when | Tradeoff |
| --- | --- | --- |
| `fast` | Inline chat or low-risk text needs a quick decision. | Lowest latency, less depth. |
| `secure` | Production default for most apps. | Balanced latency and coverage. |
| `comprehensive` | Deep image or PDF review is worth more latency. | More depth, higher cost, required for async. |

Default value: `secure`.

Example request:

```json
{
  "content": "Claim note text",
  "content_type": "text",
  "scan_phase": "input",
  "mode": "secure"
}
```

Common mistake: using `comprehensive` for every chat message. Start with `secure` and reserve deep review for images, PDFs, high-value cases, or suspicious workflows.

Mode is not tolerance. `mode` controls scan depth. `profile`, `data_sensitivity`, and your routing policy control how strict the product is after Mighty returns a result. See [Modes And Tolerance](/docs/concepts/modes-tolerance) before you tune production routing.

## `focus`

What it does: chooses which family of checks gets priority.

When to use it:

| Value | Use when |
| --- | --- |
| `standard` | You mainly need threat and safety checks. |
| `ai` | You mainly need AI authenticity or AI fraud signals. |
| `both` | You need threat checks and AI signals together. |

Default value: `standard`.

Example request:

```json
{
  "content": "Base64 image or extracted text",
  "content_type": "image",
  "scan_phase": "input",
  "mode": "secure",
  "focus": "both"
}
```

Common mistake: using `focus=ai` as a fraud verdict. Mighty flags suspicious evidence. It does not prove fraud by itself.

## AI Involvement Metadata

What it does: preserves workflow context that is useful for review, logs, and AI coding agents.

When to use it:

| Metadata key | Use when |
| --- | --- |
| `ai_involved` | The material will be used by a model, agent, OCR automation, or AI review step. |
| `submitted_as_ai_generated` | Your app asks the submitter whether the material was AI-generated or edited. |
| `workflow` | You need to distinguish chat, claims, OCR, image review, invoices, or agent tools. |

Default value: none.

Example request:

```json
{
  "content": "Uploaded image or extracted text",
  "content_type": "auto",
  "scan_phase": "input",
  "focus": "both",
  "metadata": {
    "workflow": "damage_photo_review",
    "ai_involved": "true",
    "submitted_as_ai_generated": "unknown"
  }
}
```

Common mistake: treating app metadata as a detection result. Metadata is your app's context. Mighty response fields like `authenticity`, `forensics`, `threats`, and `risk_score` are scan evidence.

## `scan_phase`

What it does: tells Mighty where the material sits in your workflow.

When to use it:

| Value | Use when |
| --- | --- |
| `input` | A user, customer, vendor, claimant, partner, or upstream system submitted the material. |
| `output` | A model, OCR pipeline, extraction pipeline, agent, or automation generated the material. |

Default value: none. This field is required.

Example request:

```json
{
  "content": "Generated answer shown to a user",
  "content_type": "text",
  "scan_phase": "output",
  "scan_group_id": "9b3e4f8d-96c9-4f42-8338-8cf9571c1c70"
}
```

Common mistake: scanning output without `scan_group_id`. Output scans need the group returned by the input scan.

## `profile`

What it does: chooses the risk posture.

When to use it:

| Value | Use when |
| --- | --- |
| `balanced` | Most production apps. |
| `strict` | Regulated, financial, insurance, legal, healthcare, or high-value workflows. |
| `permissive` | Low-risk internal workflows where false positives are more costly. |
| `code_assistant` | Developer tools and agent code workflows. |
| `ai_safety` | AI output, public assistants, or agentic systems. |

Default value: `balanced`.

Example request:

```json
{
  "content": "Agent tool output",
  "content_type": "text",
  "scan_phase": "output",
  "mode": "secure",
  "profile": "ai_safety",
  "scan_group_id": "9b3e4f8d-96c9-4f42-8338-8cf9571c1c70"
}
```

Common mistake: setting `permissive` because a workflow is noisy. Use `tolerant` data sensitivity for expected PII instead.

## `data_sensitivity`

What it does: controls how expected PII affects blocking.

When to use it:

| Value | Use when |
| --- | --- |
| `standard` | Default. PII can block unless context allows it. |
| `tolerant` | Business workflows expect contact details, addresses, claim numbers, or invoices. |
| `strict` | PII and credentials should block aggressively. |

Default value: `standard`.

Example request:

```json
{
  "content": "Customer: Jane Doe, phone: 555-0100",
  "content_type": "text",
  "scan_phase": "input",
  "mode": "secure",
  "data_sensitivity": "tolerant"
}
```

Common mistake: using `tolerant` to bypass credential detection. Credentials and secrets should still be treated as high risk.

## Sensitive Data And Redaction

What it does: separates expected business PII from unsafe disclosure paths.

When to use it:

| Need | Setting or response field |
| --- | --- |
| Claims, invoices, healthcare, identity, or support workflows contain normal PII | `data_sensitivity=tolerant` |
| Public AI output should not reveal secrets or sensitive text | `data_sensitivity=strict` and `scan_phase=output` |
| Mighty returns safer replacement text | Use `redacted_output` instead of the original output. |
| Mighty blocks output and no redaction is available | Do not show the original output. |

Default value: `data_sensitivity=standard`. `redacted_output` appears only when available.

Example response:

```json
{
  "action": "BLOCK",
  "risk_score": 91,
  "risk_level": "CRITICAL",
  "threats": [
    {
      "category": "secrets_exposure",
      "confidence": 0.96,
      "evidence": "sk_live_8f1c9d4e2ab3",
      "reason": "Output contains a live API key pattern."
    }
  ],
  "redacted_output": "I cannot share that sensitive value.",
  "scan_phase": "output"
}
```

Common mistake: assuming every block can be safely rewritten. Use `redacted_output` only when Mighty returns it and your product policy allows it.

## IDs And Idempotency

What they do: connect scans to logs, sessions, retries, and downstream review.

| Field | Use it for |
| --- | --- |
| `request_id` | One unique ID per request. Use it for idempotency and logs. |
| `scan_id` | The scan result ID returned by Mighty. Use it for audit and polling. |
| `scan_group_id` | Connect input scans, output scans, and derived evidence. |
| `session_id` | Keep a chat, claim, case, or workflow together over time. |

Default value: Mighty can generate missing `request_id`, `scan_group_id`, and `session_id` for input scans.

Example request:

```json
{
  "content": "Uploaded invoice text",
  "content_type": "text",
  "scan_phase": "input",
  "request_id": "ab82f4ad-8d64-4bb4-b4ed-77df63291198",
  "scan_group_id": "9b3e4f8d-96c9-4f42-8338-8cf9571c1c70",
  "session_id": "claim_18422"
}
```

Common mistake: generating a new `scan_group_id` for model output. Reuse the input scan group so the evidence stays connected.
