Reference · Jobs

/v1/jobs/{id}

Poll a job, fetch the extracted document, or cancel work that hasn't finished yet.

GET /v1/jobs/{id}

Returns the job's current status. When status == "completed" the document and markdown fields are populated. When failed or cancelled, error is populated.

bash
curl https://api.ocrqueen.com/v1/jobs/5f8a... \
  -H "Authorization: Bearer pk_test_xxx"

Poll loop

Recommended cadence: respect the Retry-After header from the original 202. A simple loop:

python
import time, httpx
HEAD = {"Authorization": "Bearer pk_test_xxx"}

while True:
    r = httpx.get(f"https://api.ocrqueen.com/v1/jobs/{job_id}", headers=HEAD)
    r.raise_for_status()
    job = r.json()
    if job["status"] in ("completed", "failed", "cancelled"):
        break
    time.sleep(int(r.headers.get("Retry-After", "2")))

Status values

StatusMeaning
queuedEnqueued, no worker has picked it up yet.
processingWorker is extracting.
completedDone. document + markdown populated.
failedTerminal error. See error.code.
cancelledCustomer cancelled before completion.

Response shape

json
{
  "job_id": "5f8a...",
  "status": "completed",
  "file_info": { "filename": "doc.pdf", "size_bytes": 184321, "mime_type": "application/pdf" },
  "created_at": "2026-05-14T12:00:00Z",
  "started_at": "2026-05-14T12:00:01Z",
  "completed_at": "2026-05-14T12:00:08Z",
  "estimated_seconds": 0,
  "status_url": "https://api.ocrqueen.com/v1/jobs/5f8a...",
  "expires_at": "2026-05-15T12:00:00Z",
  "document": { /* see below */ },
  "markdown": "# Title\n\n…",
  "cache_hit": false,
  "error": null
}

The document object

The canonical extraction result. JSON is the source of truth; markdown is a deterministic render of the same JSON.

json
{
  "source": {
    "kind": "pdf",                 // pdf | pptx | image
    "filename": "doc.pdf",
    "file_hash": "sha256:...",
    "page_count": 13,
    "page_sizes": [[8.5, 11], ...] // inches
  },
  "extraction": {
    "model": "ocrqueen-advanced",
    "profile": "advanced",
    "duration_ms": 8120,
    "faithfulness_score": 0.987,
    "per_page_failures": []
  },
  "pages": [
    {
      "page": 1,
      "blocks": [
        {
          "id": "blk_a3f9c2",
          "type": "heading",        // see Type taxonomy
          "role": "title",
          "page": 1,
          "bbox": [0.082, 0.044, 0.610, 0.082],
          "bbox_units": "normalized",
          "reading_order_index": 0,
          "text": "Quarterly Report",
          "text_source": "verified",  // verified | inferred | derived
          "confidence": 1.0,
          "verified": true
        }
      ]
    }
  ]
}

Block type taxonomy

Every block has the same identity / position / provenance metadata. Content fields vary by type:

TypeExtra content fields
heading / paragraph / list / caption / footnote / page_header / page_footer / callout / codetext
tableheaders[], rows[][]
imageurl or image_b64 (one is set), alt, caption, image_type, embedded_text
diagramnodes[], edges[], image_url, image_b64
chartchart_type, title, categories[], series[], data_table[] (when source is native PPTX), image_b64
formulalatex, rendered_image_b64

text_source values

ValueTrust class
verifiedByte-exact from a source we can prove against (text layer, native PPTX shape, faithfulness check passed).
inferredProduced by AI. Filter on verified + confidence for stricter pipelines.
derivedStructural block with no text payload.

DELETE /v1/jobs/{id}

Cancels a queued or processing job. Returns 204 No Content on success. Completed jobs cannot be cancelled — you get 409 in that case.

bash
curl -X DELETE https://api.ocrqueen.com/v1/jobs/5f8a... \
  -H "Authorization: Bearer pk_test_xxx"
StatusCodeWhen
204Cancelled.
404JOB_NOT_FOUNDWrong ID, or belongs to another customer.
409JOB_ALREADY_COMPLETED / FAILED / CANCELLEDTerminal job.

POST /v1/jobs/{id}/purge

Hard-erases the source bytes from object storage and clears the extracted result + request options from our database. The job row remains as a billing tombstone (id, customer, pages, timestamps) so usage reports stay accurate. Idempotent — a second call is a no-op.

Requires the jobs:write scope. This is deliberately distinct from extract:write so you can issue read-only keys to dashboards / pipelines that fetch results but cannot delete them. Full contract: Data retention & deletion.

bash
curl -X POST https://api.ocrqueen.com/v1/jobs/5f8a.../purge \
  -H "Authorization: Bearer pk_test_xxx"
StatusCodeWhen
204Purged, or already purged (idempotent).
403INSUFFICIENT_SCOPEKey is missing jobs:write.
404JOB_NOT_FOUNDWrong ID, or belongs to another customer.