Cookbook

Extract patents — figures, claims, references

Standard profile handles claims + bibliography fine. For figure understanding, switch to profile="advanced".

What advanced profile gives you

Diagram graphs — flowcharts and block diagrams come back as typed nodes + edges, not just images. LLMs can reason over the structure.
Image alt-text— every figure gets a verified caption. Useful when an LLM needs to know what's in a figure without ingesting the pixels.
Reference numeral linking— text that says “mobile terminal 200” gets resolved to the actual element labeled 200 in the figure.
Embedded text OCR— labels inside figures (“CPU”, “DRAM”) come through as searchable text.

Python — extract and walk the structure

python

from ocrqueen import OCRQueen
from collections import defaultdict

client = OCRQueen(api_key="pk_live_xxx")
result = client.extract(
    "US-11847293-B2.pdf",
    profile="advanced",
    max_wait_seconds=600,   # patents can take longer
)

# ── Pull each kind of structure separately ────────────────────────────

claims = [b for b in result.blocks if b.type == "paragraph"
          and (b.raw.get("role") or "").lower() == "claims"]

figures = [b for b in result.blocks if b.type in ("image", "diagram")]

# ── Walk diagrams as graphs ──────────────────────────────────────────
for fig in figures:
    if fig.type == "diagram":
        nodes = fig.raw.get("nodes") or []
        edges = fig.raw.get("edges") or []
        print(f"Page {fig.page} diagram: {len(nodes)} nodes, {len(edges)} edges")
        for node in nodes:
            print(f"  [{node.get('label')}] ref# {node.get('reference_numeral')}")

# ── Cross-link references: where does 'element 200' appear? ──────────
refs = defaultdict(list)
for block in result.blocks:
    for ref in (block.raw.get("reference_numerals") or []):
        refs[ref].append({"page": block.page, "type": block.type})

# refs["200"] → every block mentioning element 200

Cost note

Advanced runs roughly 2-4× the cost of standard because it makes additional Gemini Vision calls per figure. For text-only patents, stick with standard — you'll get the same text quality for much less.

Bibliography fields you usually want

python

meta = result.document.metadata or {}
print({
    "title":       meta.get("title"),
    "inventors":   meta.get("authors"),
    "assignee":    meta.get("assignee"),
    "filed":       meta.get("filed_date"),
    "issued":      meta.get("issued_date"),
    "patent_no":   meta.get("patent_number"),
    "ipc_classes": meta.get("classification") or [],
})

Bibliographic fields are best-effort and only populated for documents the metadata extractor recognizes. Falling back to parsing the first page heading is always a reasonable Plan B.

Batch with webhooks — to process a corpus of patents without polling.
RAG ingest — to chunk these extractions for retrieval.

What advanced profile gives you

Python — extract and walk the structure

Cost note

Bibliography fields you usually want

Next