Cookbook
Batch processing with webhooks
Submit everything in parallel, do other work, let the results land on your webhook receiver as they finish.
The pattern
- Submit every file with
callback_urlpointing at your receiver. client.submit(...)returns immediately — no polling.- Your receiver gets one POST per finished job, signed with
OCRQueen-Signaturefor verification.
Submitting in parallel — Python
from concurrent.futures import ThreadPoolExecutor
from ocrqueen import OCRQueen
client = OCRQueen(api_key="pk_live_xxx")
WEBHOOK = "https://api.acme.com/hooks/ocrqueen"
def submit_one(path: str) -> str:
job = client.submit(path, callback_url=WEBHOOK)
return job.job_id
# 8 concurrent submits — bounded by the per-key rate limit. Bump the pool
# up to your plan's RPS limit.
with ThreadPoolExecutor(max_workers=8) as pool:
job_ids = list(pool.map(submit_one, ["doc1.pdf", "doc2.pdf", ...]))
# Record job_ids → your DB so you can correlate webhook payloads later.Submitting in parallel — Node
import { OCRQueen } from "ocrqueen";
const client = new OCRQueen({ apiKey: "pk_live_xxx" });
const WEBHOOK = "https://api.acme.com/hooks/ocrqueen";
const files = ["doc1.pdf", "doc2.pdf", /* ... */];
// Promise.all — simple but unbounded. For real batches, use a worker pool
// (p-limit, p-queue, etc.) so you don't exceed your plan's rate limit.
const jobIds = await Promise.all(
files.map(async (path) => {
const job = await client.submit(path, { callbackUrl: WEBHOOK });
return job.job_id;
})
);Receiving + verifying webhooks
Every webhook body is HMAC-SHA256 signed using your webhook_secret (visible at /dashboard/settings). Verify before you trust the payload.
import hmac, hashlib, os
from fastapi import FastAPI, Header, HTTPException, Request
WEBHOOK_SECRET = os.environ["OCRQUEEN_WEBHOOK_SECRET"]
app = FastAPI()
@app.post("/hooks/ocrqueen")
async def receive(
request: Request,
ocrqueen_signature: str = Header(alias="OCRQueen-Signature"),
):
body = await request.body()
expected = hmac.new(
WEBHOOK_SECRET.encode(),
body,
hashlib.sha256,
).hexdigest()
if not hmac.compare_digest(expected, ocrqueen_signature):
raise HTTPException(401, "invalid signature")
payload = await request.json()
# payload is the same JobResponse shape returned by /v1/jobs/{id}
print(payload["job_id"], payload["status"])
# → enqueue downstream processing, write to DB, etc.
return {"ok": True}Idempotency in your receiver
We retry failed deliveries up to 3 times with exponential backoff (1s, 5s, 25s). Your receiver should be idempotent on job_id — if you write the result to a DB, use an INSERT ... ON CONFLICT (job_id) DO NOTHING (or equivalent) so a retried delivery doesn't double-record.
Failures
If extraction itself fails (e.g. PDF_PASSWORD_PROTECTED), you'll still get a webhook — status will be failed and error populated. Customers who need to retry programmatically should switch on error.code and decide:
PDF_PASSWORD_PROTECTED→ surface to the human user; ask them to upload an unlocked copy.OCR_LOW_CONFIDENCE→ consider re-uploading withprofile=advanced(more thorough OCR, slightly more cost).- 5xx-ish internal errors → retry the submit; they usually clear within a minute.
Next
- RAG ingest— what to do with the JSON once it's in your DB.
- Quickstart — back to the basics.
