Reference · Errors

Errors

Two distinct failure surfaces: HTTP errors on the request itself, and per-document errors that surface as a failed job. Both carry a stable code so your recovery logic can switch on it.

Error shape

HTTP errors are returned as a JSON body with a string detail:

json
{ "detail": "UNSUPPORTED_FILE_TYPE: application/zip" }

The portion before the colon is the stable code. Treat the portion after the colon as a human-readable hint that may change.

Per-document failures surface on the job itself:

json
{
  "status": "failed",
  "error": {
    "code": "PDF_PASSWORD_PROTECTED",
    "message": "Source PDF requires a password we don't have."
  }
}

HTTP errors

StatusCodeHow to recover
400UNSUPPORTED_FILE_TYPESend PDF / image / PPTX. Convert first if your source is something else.
400FILE_TOO_LARGEMax 100 MB. Split the document upstream.
400EMPTY_FILE0 bytes uploaded. Re-read the file before sending.
400INVALID_OPTIONS_JSONoptions field is not valid JSON.
400INVALID_OPTIONSJSON parsed but failed schema validation. Detail names the offending field.
400INVALID_SCOPESYou requested API key scopes outside the allow-list.
401MISSING_API_KEYNo Authorization header.
401INVALID_AUTH_FORMATHeader was present but not Bearer <token>.
401INVALID_API_KEYBearer doesn't match a live key, or the key was revoked.
402INSUFFICIENT_BALANCELifetime free pages used up AND wallet balance below the minimum reserve. Top up in Billing.
403INSUFFICIENT_SCOPEKey lacks the scope this endpoint needs. Create a new key with broader scopes.
404JOB_NOT_FOUNDWrong job ID, or it belongs to a different customer. Same response either way.
404API_KEY_NOT_FOUNDTried to revoke a key that doesn't exist or is already revoked.
409JOB_ALREADY_COMPLETED / FAILED / CANCELLEDJob is terminal; can't cancel.
429RATE_LIMITEDBack off per Retry-After. Use idempotency keys to make retries safe.
500INTERNAL_ERROROur fault. Safe to retry. Mail support if it persists.

Per-document errors

Returned on the job's error.codewhen the file uploaded fine but extraction couldn't complete.

CodeCauseHow to recover
PDF_PASSWORD_PROTECTEDSource PDF is encrypted.Decrypt before upload.
PDF_MALFORMEDHeader/xref unreadable.Re-export the PDF.
PPTX_MALFORMEDPPTX archive is corrupt.Re-save as .pptx from PowerPoint or Keynote.
PPTX_CONVERSION_UNAVAILABLEOffice conversion step couldn't run for this deck (typically an exotic .ppt shape we can't open).Re-save as a modern .pptx.
OCR_LOW_CONFIDENCEPage-level signal: scan was too noisy for reliable OCR. The text is still returned but blocks carry verified: false.Rescan at higher DPI, or filter by verified downstream.
MULTI_COLUMN_AMBIGUOUSPage-level signal: column detection wasn't confident. Reading order may not match the visual layout.Use bbox coordinates if visual order matters.
EXTRACTION_TIMEOUTWorker hit the per-job time budget.Split very large documents, or retry — many timeouts are transient.
SERVICE_TEMPORARILY_UNAVAILABLEUpstream model or infra blip.Retry with backoff. Same Idempotency-Key keeps billing safe.
EXTRACTION_FAILEDGeneric fallback when none of the above matched.Retry once; if it persists, send us the file.
INTERNAL_ERRORWorker crashed mid-pipeline.Safe to retry with the same Idempotency-Key — you won't be double-billed.

Recovery strategy

A simple rule that covers 95% of cases:

  • 4xx= your fault. Fix the request; don't retry the same request unchanged.
  • 429 and 500 = retry with exponential backoff, capped at ~5 attempts.
  • Job-level failed with a *_PROTECTED / *_MALFORMEDcode = fix the file. Retrying won't help.
  • SERVICE_TEMPORARILY_UNAVAILABLE, EXTRACTION_TIMEOUT, INTERNAL_ERROR, EXTRACTION_FAILED = retry with the same Idempotency-Key.

Need a code we haven't documented? Open an issue or mail support — we'll either add it here or fix the bug that caused it.