Cookbook

Extract patents — figures, claims, references

Standard profile handles claims + bibliography fine. For figure understanding, switch to profile="advanced".

What advanced profile gives you

  • Diagram graphs — flowcharts and block diagrams come back as typed nodes + edges, not just images. LLMs can reason over the structure.
  • Image alt-text— every figure gets a verified caption. Useful when an LLM needs to know what's in a figure without ingesting the pixels.
  • Reference numeral linking— text that says “mobile terminal 200” gets resolved to the actual element labeled 200 in the figure.
  • Embedded text OCR— labels inside figures (“CPU”, “DRAM”) come through as searchable text.

Python — extract and walk the structure

python
from ocrqueen import OCRQueen
from collections import defaultdict

client = OCRQueen(api_key="pk_live_xxx")
result = client.extract(
    "US-11847293-B2.pdf",
    profile="advanced",
    max_wait_seconds=600,   # patents can take longer
)

# ── Pull each kind of structure separately ────────────────────────────

claims = [b for b in result.blocks if b.type == "paragraph"
          and (b.raw.get("role") or "").lower() == "claims"]

figures = [b for b in result.blocks if b.type in ("image", "diagram")]

# ── Walk diagrams as graphs ──────────────────────────────────────────
for fig in figures:
    if fig.type == "diagram":
        nodes = fig.raw.get("nodes") or []
        edges = fig.raw.get("edges") or []
        print(f"Page {fig.page} diagram: {len(nodes)} nodes, {len(edges)} edges")
        for node in nodes:
            print(f"  [{node.get('label')}] ref# {node.get('reference_numeral')}")

# ── Cross-link references: where does 'element 200' appear? ──────────
refs = defaultdict(list)
for block in result.blocks:
    for ref in (block.raw.get("reference_numerals") or []):
        refs[ref].append({"page": block.page, "type": block.type})

# refs["200"] → every block mentioning element 200

Cost note

Advanced runs roughly 2-4× the cost of standard because it makes additional Gemini Vision calls per figure. For text-only patents, stick with standard — you'll get the same text quality for much less.

Bibliography fields you usually want

python
meta = result.document.metadata or {}
print({
    "title":       meta.get("title"),
    "inventors":   meta.get("authors"),
    "assignee":    meta.get("assignee"),
    "filed":       meta.get("filed_date"),
    "issued":      meta.get("issued_date"),
    "patent_no":   meta.get("patent_number"),
    "ipc_classes": meta.get("classification") or [],
})

Bibliographic fields are best-effort and only populated for documents the metadata extractor recognizes. Falling back to parsing the first page heading is always a reasonable Plan B.

Next