Concept · Data retention
Data retention & deletion
Default 24-hour retention. Set retain_hours: 0 and use webhooks for ephemeral processing. Hit POST /v1/jobs/{id}/purge to erase on demand. We bill from a tombstone with no document content.
What we store, and for how long
Every extraction job creates three logically separate things. Each has its own retention window, and you control them independently per request.
| Artifact | Where it lives | Default lifetime | Controlled by |
|---|---|---|---|
| Source file (your PDF/PPT/etc.) | Cloudflare R2 (or your BYOS bucket) | 24 hours | retain_hours |
| Extracted content (markdown + structured JSON) | Our Postgres | 24 hours (matches retain_hours) | result_retain_hours |
| Billing tombstone (job id, customer, pages, timestamps) | Our Postgres | Indefinite | Tax + audit retention; no document content |
A tombstone is the row that remains after we delete your data. It carries the fields we need to produce invoices and usage reports (id, customer_id, pages_extracted, completed_at) and nothing about your document — no filename, no hash, no result.
Ephemeral processing (recommended for sensitive data)
Set both retention windows to zero and have the result delivered to you via webhook. Our database holds your content for the duration of the extraction and the webhook delivery, then it's gone.
curl -X POST https://api.ocrqueen.com/v1/extract \
-H "Authorization: Bearer pk_..." \
-F "file=@confidential.pdf" \
-F 'options={
"retain_hours": 0,
"result_retain_hours": 0,
"callback_url": "https://your-app.com/ocrqueen-webhook"
}'The webhook payload contains the full extraction result. You're responsible for persisting whatever you need from there — once we deliver, we forget.
Erase on demand
For GDPR right-to-erasure requests, end-of-customer-engagement cleanups, or any one-off deletion: hit the purge endpoint. It's idempotent — calling it twice on the same job is harmless.
curl -X POST https://api.ocrqueen.com/v1/jobs/{job_id}/purge \
-H "Authorization: Bearer pk_..."
# 204 No Content. The source bytes are deleted from R2,
# the extracted result + request options are nulled, the
# row remains as a billing tombstone.Requires the jobs:write scope on the API key — destructive operations are deliberately gated behind a separate scope from extract:write so you can issue read-only keys that fetch results without being able to delete them.
Which scope does what
| Scope | What it permits |
|---|---|
extract:read | Read job status & results, cancel queued jobs |
extract:write | Submit new extraction jobs |
jobs:write | Hard-purge a job (source + result). Separate from extract:write on purpose. |
What stays after a purge
The job row remains so that:
- You can still see purged jobs in your usage reports.
- We can produce accurate invoices and tax records.
- An idempotent retry of the same request returns a clear “this job was purged” response instead of silently re-running and re-billing.
The fields we keep are: job id, customer id, pages extracted, file size in bytes (no filename, no hash), status, created / started / completed / purged timestamps, and the usage events that drive billing.
BYOS and retention
If you provided a BYOS destination for output artifacts (extracted images, page renders), those live in yourbucket and OCRQueen never touches them again — including on purge. Lifecycle there is governed by your bucket's own policies.
Today, source-side BYOS (uploading directly to your bucket instead of ours) is on the roadmap; until then the source file is uploaded to our R2 and deleted per retain_hours.
What we never persist
- Plaintext API keys. We hash on creation and store only the hash + a short prefix for display.
- Plaintext BYOS credentials. Encrypted at rest with our KMS-backed envelope encryption; workers decrypt to memory only, never to disk.
- Request bodies in logs. Structured logs record metadata (job id, page count, latency, error code) and never the document contents.
Changes to this policy
We'll announce material changes here and to API-key owners by email at least 30 days before they take effect. Tighter defaults (shorter retention) ship immediately; looser defaults (longer retention) require notice.
