Google Document AI OCR
Extract text, entities, and structured data from PDFs, receipts, invoices, and images using Google Document AI. No credentials or project IDs needed -- the tool uses a backend service account automatically.
Overview
This tool processes documents through Google Document AI with specialized processors for different document types. Provide a file via URL, cloud file ID, or base64-encoded content, and receive extracted text, structured entities (dates, amounts, names, line items), and per-page statistics. Multiple images can be batched into a single multi-page document for processing.
Actions
process_document
Extract text and structured data from a document.
Required parameters (exactly one of):
file_urls(array of strings) -- URL(s) to process. One URL for a single file, or up to 10 image URLs to batch into a multi-page document.file_ids(array of strings) -- Cloud file ID(s) to process. One ID for a single file, or up to 10 image IDs to batch into a multi-page document.content_base64(string) -- Base64-encoded file content to process (single file only).
Optional parameters:
document_type(string, default:"general") -- Selects the specialized processor. Options:general,bank_statement,expense,invoice,drivers_license,passport,utility,w2,w9.mime_type(string) -- MIME type of the input (e.g.,application/pdf,image/png). Auto-detected from URL headers if omitted; defaults toapplication/pdfwhen unresolvable.max_text_chars(integer, default: 12000, min: 200, max: 250000) -- Max characters of extracted text to return.max_entities(integer, default: 200, min: 1, max: 2000) -- Max extracted entities to return.include_pages(boolean, default: true) -- Include per-page summary data (page dimensions, token/line/paragraph/block/table/form field counts).include_entities(boolean, default: true) -- Include extracted entities (type, mention text, confidence, normalized value).include_raw_document(boolean, default: false) -- Include the full raw Document AI response object.
Document Types
document_type | Best for | Extracts |
|---|---|---|
general (default) | Any document or image | Raw OCR text only |
bank_statement | Bank statements | Transactions, balances, dates, account info |
expense | Receipts, expense reports | Line items, totals, tax, vendor, date |
invoice | Invoices | Line items, amounts, due dates, vendor, PO numbers |
drivers_license | US driver's licenses | Name, DOB, address, license number, expiry |
passport | US passports | Name, DOB, nationality, passport number, expiry |
utility | Utility bills | Account number, billing period, charges, usage |
w2 | W-2 tax forms | Employer info, wages, tax withheld, SSN |
w9 | W-9 tax forms | Name, business name, TIN, address, tax classification |
Example: Basic OCR from URL
{
"action": "process_document",
"file_urls": ["https://example.com/document.pdf"]
}
Example: Receipt with structured extraction
{
"action": "process_document",
"document_type": "expense",
"file_urls": ["https://example.com/receipt.jpg"]
}
Example: Invoice from base64
{
"action": "process_document",
"document_type": "invoice",
"content_base64": "JVBERi0xLjQK...",
"mime_type": "application/pdf"
}
Example: Batch multiple images into one document
{
"action": "process_document",
"file_urls": [
"https://example.com/page1.jpg",
"https://example.com/page2.jpg",
"https://example.com/page3.jpg"
]
}
Example: Process from cloud file ID with limited output
{
"action": "process_document",
"document_type": "w2",
"file_ids": ["abc123"],
"max_text_chars": 50000,
"include_pages": false
}
Example: Get full raw response
{
"action": "process_document",
"file_ids": ["abc123"],
"include_raw_document": true
}
Workflows
Extract text from a scanned document
- Call
process_documentwithfile_urlspointing to the scanned PDF or image. - Read
result.text_excerptfor the extracted text content.
Parse a receipt for expense reporting
- Call
process_documentwithdocument_type: "expense"and the receipt file. - Read
result.entitiesfor structured line items, totals, tax, vendor, and date.
Process a multi-page document from images
- Provide up to 10 image URLs in
file_urls. - The images are fetched in parallel, combined into a single multi-page PDF, and processed as one document.
- Use
include_pages: trueto get per-page statistics.
Extract data from tax forms
- Use
document_type: "w2"or"w9"with the tax form file. - Entities will include employer info, wages, tax withheld, TIN, etc.
Notes
- Supported file types: PDF, PNG, JPEG, TIFF, GIF, BMP, WebP.
- Maximum input file size: 20 MB (including combined PDF in batch mode).
- Maximum pages: 10 pages per PDF, or 10 images in batch mode.
- Input source: Exactly one of
file_urls,file_ids, orcontent_base64must be provided. Providing multiple sources returns an error. - Batch mode: When 2+ URLs or file IDs are provided, all images are downloaded in parallel, combined into a single multi-page PDF (one image per page), and sent to Document AI as one request.
- MIME type auto-detection: When
mime_typeis omitted, it is inferred from URL response headers or file metadata. Falls back toapplication/pdfif unresolvable. - Text truncation: Extracted text is truncated to
max_text_charscharacters. Increase this value for long documents. - Entity truncation: Entities are truncated to
max_entities. Increase for documents with many structured fields.






