Reference · Jobs
/v1/jobs/{id}
Poll a job, fetch the extracted document, or cancel work that hasn't finished yet.
GET /v1/jobs/{id}
Returns the job's current status. When status == "completed" the document and markdown fields are populated. When failed or cancelled, error is populated.
curl https://api.ocrqueen.com/v1/jobs/5f8a... \
-H "Authorization: Bearer pk_test_xxx"Poll loop
Recommended cadence: respect the Retry-After header from the original 202. A simple loop:
import time, httpx
HEAD = {"Authorization": "Bearer pk_test_xxx"}
while True:
r = httpx.get(f"https://api.ocrqueen.com/v1/jobs/{job_id}", headers=HEAD)
r.raise_for_status()
job = r.json()
if job["status"] in ("completed", "failed", "cancelled"):
break
time.sleep(int(r.headers.get("Retry-After", "2")))Status values
| Status | Meaning |
|---|---|
queued | Enqueued, no worker has picked it up yet. |
processing | Worker is extracting. |
completed | Done. document + markdown populated. |
failed | Terminal error. See error.code. |
cancelled | Customer cancelled before completion. |
Response shape
{
"job_id": "5f8a...",
"status": "completed",
"file_info": { "filename": "doc.pdf", "size_bytes": 184321, "mime_type": "application/pdf" },
"created_at": "2026-05-14T12:00:00Z",
"started_at": "2026-05-14T12:00:01Z",
"completed_at": "2026-05-14T12:00:08Z",
"estimated_seconds": 0,
"status_url": "https://api.ocrqueen.com/v1/jobs/5f8a...",
"expires_at": "2026-05-15T12:00:00Z",
"document": { /* see below */ },
"markdown": "# Title\n\n…",
"cache_hit": false,
"error": null
}The document object
The canonical extraction result. JSON is the source of truth; markdown is a deterministic render of the same JSON.
{
"source": {
"kind": "pdf", // pdf | pptx | image
"filename": "doc.pdf",
"file_hash": "sha256:...",
"page_count": 13,
"page_sizes": [[8.5, 11], ...] // inches
},
"extraction": {
"model": "ocrqueen-advanced",
"profile": "advanced",
"duration_ms": 8120,
"faithfulness_score": 0.987,
"per_page_failures": []
},
"pages": [
{
"page": 1,
"blocks": [
{
"id": "blk_a3f9c2",
"type": "heading", // see Type taxonomy
"role": "title",
"page": 1,
"bbox": [0.082, 0.044, 0.610, 0.082],
"bbox_units": "normalized",
"reading_order_index": 0,
"text": "Quarterly Report",
"text_source": "verified", // verified | inferred | derived
"confidence": 1.0,
"verified": true
}
]
}
]
}Block type taxonomy
Every block has the same identity / position / provenance metadata. Content fields vary by type:
| Type | Extra content fields |
|---|---|
heading / paragraph / list / caption / footnote / page_header / page_footer / callout / code | text |
table | headers[], rows[][] |
image | url or image_b64 (one is set), alt, caption, image_type, embedded_text |
diagram | nodes[], edges[], image_url, image_b64 |
chart | chart_type, title, categories[], series[], data_table[] (when source is native PPTX), image_b64 |
formula | latex, rendered_image_b64 |
text_source values
| Value | Trust class |
|---|---|
verified | Byte-exact from a source we can prove against (text layer, native PPTX shape, faithfulness check passed). |
inferred | Produced by AI. Filter on verified + confidence for stricter pipelines. |
derived | Structural block with no text payload. |
DELETE /v1/jobs/{id}
Cancels a queued or processing job. Returns 204 No Content on success. Completed jobs cannot be cancelled — you get 409 in that case.
curl -X DELETE https://api.ocrqueen.com/v1/jobs/5f8a... \
-H "Authorization: Bearer pk_test_xxx"| Status | Code | When |
|---|---|---|
| 204 | — | Cancelled. |
| 404 | JOB_NOT_FOUND | Wrong ID, or belongs to another customer. |
| 409 | JOB_ALREADY_COMPLETED / FAILED / CANCELLED | Terminal job. |
POST /v1/jobs/{id}/purge
Hard-erases the source bytes from object storage and clears the extracted result + request options from our database. The job row remains as a billing tombstone (id, customer, pages, timestamps) so usage reports stay accurate. Idempotent — a second call is a no-op.
Requires the jobs:write scope. This is deliberately distinct from extract:write so you can issue read-only keys to dashboards / pipelines that fetch results but cannot delete them. Full contract: Data retention & deletion.
curl -X POST https://api.ocrqueen.com/v1/jobs/5f8a.../purge \
-H "Authorization: Bearer pk_test_xxx"| Status | Code | When |
|---|---|---|
| 204 | — | Purged, or already purged (idempotent). |
| 403 | INSUFFICIENT_SCOPE | Key is missing jobs:write. |
| 404 | JOB_NOT_FOUND | Wrong ID, or belongs to another customer. |
