Cookbook

Batch processing with webhooks

Submit everything in parallel, do other work, let the results land on your webhook receiver as they finish.

The pattern

  1. Submit every file with callback_url pointing at your receiver.
  2. client.submit(...) returns immediately — no polling.
  3. Your receiver gets one POST per finished job, signed with OCRQueen-Signature for verification.

Submitting in parallel — Python

python
from concurrent.futures import ThreadPoolExecutor
from ocrqueen import OCRQueen

client = OCRQueen(api_key="pk_live_xxx")
WEBHOOK = "https://api.acme.com/hooks/ocrqueen"

def submit_one(path: str) -> str:
    job = client.submit(path, callback_url=WEBHOOK)
    return job.job_id

# 8 concurrent submits — bounded by the per-key rate limit. Bump the pool
# up to your plan's RPS limit.
with ThreadPoolExecutor(max_workers=8) as pool:
    job_ids = list(pool.map(submit_one, ["doc1.pdf", "doc2.pdf", ...]))

# Record job_ids → your DB so you can correlate webhook payloads later.

Submitting in parallel — Node

typescript
import { OCRQueen } from "ocrqueen";

const client = new OCRQueen({ apiKey: "pk_live_xxx" });
const WEBHOOK = "https://api.acme.com/hooks/ocrqueen";

const files = ["doc1.pdf", "doc2.pdf", /* ... */];

// Promise.all — simple but unbounded. For real batches, use a worker pool
// (p-limit, p-queue, etc.) so you don't exceed your plan's rate limit.
const jobIds = await Promise.all(
  files.map(async (path) => {
    const job = await client.submit(path, { callbackUrl: WEBHOOK });
    return job.job_id;
  })
);

Receiving + verifying webhooks

Every webhook body is HMAC-SHA256 signed using your webhook_secret (visible at /dashboard/settings). Verify before you trust the payload.

Python (FastAPI) receiverpython
import hmac, hashlib, os
from fastapi import FastAPI, Header, HTTPException, Request

WEBHOOK_SECRET = os.environ["OCRQUEEN_WEBHOOK_SECRET"]
app = FastAPI()

@app.post("/hooks/ocrqueen")
async def receive(
    request: Request,
    ocrqueen_signature: str = Header(alias="OCRQueen-Signature"),
):
    body = await request.body()
    expected = hmac.new(
        WEBHOOK_SECRET.encode(),
        body,
        hashlib.sha256,
    ).hexdigest()
    if not hmac.compare_digest(expected, ocrqueen_signature):
        raise HTTPException(401, "invalid signature")

    payload = await request.json()
    # payload is the same JobResponse shape returned by /v1/jobs/{id}
    print(payload["job_id"], payload["status"])
    # → enqueue downstream processing, write to DB, etc.
    return {"ok": True}

Idempotency in your receiver

We retry failed deliveries up to 3 times with exponential backoff (1s, 5s, 25s). Your receiver should be idempotent on job_id — if you write the result to a DB, use an INSERT ... ON CONFLICT (job_id) DO NOTHING (or equivalent) so a retried delivery doesn't double-record.

Failures

If extraction itself fails (e.g. PDF_PASSWORD_PROTECTED), you'll still get a webhook — status will be failed and error populated. Customers who need to retry programmatically should switch on error.code and decide:

  • PDF_PASSWORD_PROTECTED → surface to the human user; ask them to upload an unlocked copy.
  • OCR_LOW_CONFIDENCE → consider re-uploading with profile=advanced (more thorough OCR, slightly more cost).
  • 5xx-ish internal errors → retry the submit; they usually clear within a minute.

Next