Cookbook
Extract patents — figures, claims, references
Standard profile handles claims + bibliography fine. For figure understanding, switch to profile="advanced".
What advanced profile gives you
- Diagram graphs — flowcharts and block diagrams come back as typed
nodes+edges, not just images. LLMs can reason over the structure. - Image alt-text— every figure gets a verified caption. Useful when an LLM needs to know what's in a figure without ingesting the pixels.
- Reference numeral linking— text that says “mobile terminal 200” gets resolved to the actual element labeled
200in the figure. - Embedded text OCR— labels inside figures (“CPU”, “DRAM”) come through as searchable text.
Python — extract and walk the structure
from ocrqueen import OCRQueen
from collections import defaultdict
client = OCRQueen(api_key="pk_live_xxx")
result = client.extract(
"US-11847293-B2.pdf",
profile="advanced",
max_wait_seconds=600, # patents can take longer
)
# ── Pull each kind of structure separately ────────────────────────────
claims = [b for b in result.blocks if b.type == "paragraph"
and (b.raw.get("role") or "").lower() == "claims"]
figures = [b for b in result.blocks if b.type in ("image", "diagram")]
# ── Walk diagrams as graphs ──────────────────────────────────────────
for fig in figures:
if fig.type == "diagram":
nodes = fig.raw.get("nodes") or []
edges = fig.raw.get("edges") or []
print(f"Page {fig.page} diagram: {len(nodes)} nodes, {len(edges)} edges")
for node in nodes:
print(f" [{node.get('label')}] ref# {node.get('reference_numeral')}")
# ── Cross-link references: where does 'element 200' appear? ──────────
refs = defaultdict(list)
for block in result.blocks:
for ref in (block.raw.get("reference_numerals") or []):
refs[ref].append({"page": block.page, "type": block.type})
# refs["200"] → every block mentioning element 200Cost note
Advanced runs roughly 2-4× the cost of standard because it makes additional Gemini Vision calls per figure. For text-only patents, stick with standard — you'll get the same text quality for much less.
Bibliography fields you usually want
meta = result.document.metadata or {}
print({
"title": meta.get("title"),
"inventors": meta.get("authors"),
"assignee": meta.get("assignee"),
"filed": meta.get("filed_date"),
"issued": meta.get("issued_date"),
"patent_no": meta.get("patent_number"),
"ipc_classes": meta.get("classification") or [],
})Bibliographic fields are best-effort and only populated for documents the metadata extractor recognizes. Falling back to parsing the first page heading is always a reasonable Plan B.
Next
- Batch with webhooks — to process a corpus of patents without polling.
- RAG ingest — to chunk these extractions for retrieval.
