Document OCR Agent

Model

Available ActionsEach successful request consumes credits as outlined below.

process_document^20cr

Details

Hire the OCR AI model to extract text, structured entities, and page-level data from scanned documents, receipts, invoices, PDFs, and image. Supports OCR text extraction from photos of receipts, handwritten notes, printed forms, business cards, shipping labels, contracts, and any document type. Identifies structured fields like dates, amounts, addresses, line items, tax totals, vendor names, and more. Accepts input via base64 content, public URL, or AgentPMT file storage ID. Ideal for expense tracking, invoice processing, receipt scanning, document digitization, data entry automation, bookkeeping ingestion, form parsing, and archival workflows.

Use Cases

Receipt OCR and text extraction,Invoice parsing and field extraction,PDF document text extraction,Scanned image OCR,Handwritten note digitization,Business card scanning,Expense report data capture,Automated bookkeeping ingestion,Contract and legal document text extraction,Shipping label and barcode text reading,Tax form field extraction,Medical record digitization,Insurance claim document processing,Bank statement parsing,Purchase order data extraction,Form field recognition,ID and passport text extraction,Utility bill parsing,Restaurant receipt itemization,Real estate document processing

Connect Your Agent In 5 Min

Watch the setup guide for your platform

Or Install Locally

STDIO connector for Claude Code, Codex, Cursor, Zed, and other LLMs that require STDIO or custom connections. This lightweight connector routes requests to https://api.agentpmt.com/mcp. All tool execution happens in the cloud and the server cannot edit any files on your computer.

npm install -g @agentpmt/mcp-routeragentpmt-setup

Actions(1)

process_document^20cr10 params

Extract text, entities, and structured data from a document using Google Document AI. Provide exactly one input source: file_urls, file_ids, or content_base64.

document_typestring

Document type. Use 'general' for plain OCR, or a specialized type to extract structured fields (dates, amounts, line items, etc).

Values:

generalbank_statementexpenseinvoicedrivers_licensepassportutilityw2w9

Default: general

file_urlsarray

URL(s) to process. One URL for a single file, or up to 10 image URLs to batch into a multi-page document.

Array of: string

file_idsarray

Cloud file ID(s) to process. One ID for a single file, or up to 10 image IDs to batch into a multi-page document.

Array of: string

content_base64string

Base64-encoded file content to process.

mime_typestring

MIME type of the input (e.g. application/pdf, image/png). Auto-detected if omitted.

max_text_charsinteger

Max characters of extracted text to return.

Default: 12000

Range: 200 - 250000

max_entitiesinteger

Max extracted entities to return.

Default: 200

Range: 1 - 2000

include_pagesboolean

Include per-page summary data.

Default: true

include_entitiesboolean

Include extracted entities.

Default: true

include_raw_documentboolean

Include full raw Document AI response object.

curl -X POST "https://api.agentpmt.com/products/purchase" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ********" \
  -d '{
    "product_id": "69858a64269243768b447d6d",
    "parameters": {
      "action": "process_document",
      "document_type": "general",
      "max_text_chars": 12000,
      "max_entities": 200,
      "include_pages": true,
      "include_entities": true
    }
  }'

import requests
import json

url = "https://api.agentpmt.com/products/purchase"

headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer ********"
}

data = {
    "product_id": "69858a64269243768b447d6d",
    "parameters": {
        "action": "process_document",
        "document_type": "general",
        "max_text_chars": 12000,
        "max_entities": 200,
        "include_pages": true,
        "include_entities": true
    }
}

response = requests.post(url, headers=headers, json=data)
print(response.status_code)
print(response.json())

const url = "https://api.agentpmt.com/products/purchase";

const headers = {
  "Content-Type": "application/json",
  "Authorization": "Bearer ********"
};

const data = {
  product_id: "69858a64269243768b447d6d",
  parameters: {
    "action": "process_document",
    "document_type": "general",
    "max_text_chars": 12000,
    "max_entities": 200,
    "include_pages": true,
    "include_entities": true
  }
};

fetch(url, {
  method: "POST",
  headers,
  body: JSON.stringify(data)
})
  .then(response => response.json())
  .then(data => console.log(data))
  .catch(error => console.error("Error:", error));

const axios = require('axios');

const url = "https://api.agentpmt.com/products/purchase";

const headers = {
  "Content-Type": "application/json",
  "Authorization": "Bearer ********"
};

const data = {
  product_id: "69858a64269243768b447d6d",
  parameters: {
    "action": "process_document",
    "document_type": "general",
    "max_text_chars": 12000,
    "max_entities": 200,
    "include_pages": true,
    "include_entities": true
  }
};

axios.post(url, data, { headers })
  .then(response => {
    console.log(response.status);
    console.log(response.data);
  })
  .catch(error => {
    console.error("Error:", error.message);
  });

Login to view your API and budget keys. The example above uses placeholder values. Sign in to see personalized code with your bearer token.

This tool supports credit-based access for external agents using AgentAddress identities or standard crypto wallets. External agents should use the External Agent API to buy credits with x402 and invoke this tool.

1. Buy Credits

Purchase credits via x402 payment (500 credit minimum, 100 credits = $1).

# Request payment requirements (returns 402 + PAYMENT-REQUIRED header)
curl -i -s -X POST "https://www.agentpmt.com/api/external/credits/purchase" \
  -H "Content-Type: application/json" \
  -d '{ "wallet_address":"0xYOUR_WALLET", "credits": 500, "payment_method":"x402" }'

# Sign the EIP-3009 authorization, then retry with signature header
curl -s -X POST "https://www.agentpmt.com/api/external/credits/purchase" \
  -H "Content-Type: application/json" \
  -H "PAYMENT-SIGNATURE: <base64-json>" \
  -d '{ "wallet_address":"0xYOUR_WALLET", "credits": 500, "payment_method":"x402" }'

2. Create a Session Nonce (nonce used in signed balance/invoke)

curl -s -X POST "https://www.agentpmt.com/api/external/auth/session" \
  -H "Content-Type: application/json" \
  -d '{ "wallet_address":"0xYOUR_WALLET" }'

3. Invoke This Tool

Sign the message with your wallet (EIP-191 personal-sign), then POST to the invoke endpoint.

# Sign this message (wallet MUST be lowercased):
# agentpmt-external
# wallet:0xyourwallet...
# session:<session_nonce>
# request:<request_id>
# action:invoke
# product:69858a64269243768b447d6d
# payload:<sha256(canonical_json(parameters))>

curl -s -X POST "https://www.agentpmt.com/api/external/tools/69858a64269243768b447d6d/invoke" \
  -H "Content-Type: application/json" \
  -d '{
    "wallet_address": "0xYOUR_WALLET",
    "session_nonce": "<session_nonce>",
    "request_id": "invoke-uuid",
    "signature": "0x<signature>",
    "parameters": {
      "action": "your_action",
      "your_param": "value"
    }
  }'

Usage Instructions

Usage guidance provided directly by the developer for this product.

Google Document AI OCR

Extract text, entities, and structured data from PDFs, receipts, invoices, and images using Google Document AI. No credentials or project IDs needed -- the tool uses a backend service account automatically.

Overview

This tool processes documents through Google Document AI with specialized processors for different document types. Provide a file via URL, cloud file ID, or base64-encoded content, and receive extracted text, structured entities (dates, amounts, names, line items), and per-page statistics. Multiple images can be batched into a single multi-page document for processing.

Actions

process_document

Extract text and structured data from a document.

Required parameters (exactly one of):

file_urls (array of strings) -- URL(s) to process. One URL for a single file, or up to 10 image URLs to batch into a multi-page document.
file_ids (array of strings) -- Cloud file ID(s) to process. One ID for a single file, or up to 10 image IDs to batch into a multi-page document.
content_base64 (string) -- Base64-encoded file content to process (single file only).

Optional parameters:

document_type (string, default: "general") -- Selects the specialized processor. Options: general, bank_statement, expense, invoice, drivers_license, passport, utility, w2, w9.
mime_type (string) -- MIME type of the input (e.g., application/pdf, image/png). Auto-detected from URL headers if omitted; defaults to application/pdf when unresolvable.
max_text_chars (integer, default: 12000, min: 200, max: 250000) -- Max characters of extracted text to return.
max_entities (integer, default: 200, min: 1, max: 2000) -- Max extracted entities to return.
include_pages (boolean, default: true) -- Include per-page summary data (page dimensions, token/line/paragraph/block/table/form field counts).
include_entities (boolean, default: true) -- Include extracted entities (type, mention text, confidence, normalized value).
include_raw_document (boolean, default: false) -- Include the full raw Document AI response object.

Document Types

`document_type`	Best for	Extracts
`general` (default)	Any document or image	Raw OCR text only
`bank_statement`	Bank statements	Transactions, balances, dates, account info
`expense`	Receipts, expense reports	Line items, totals, tax, vendor, date
`invoice`	Invoices	Line items, amounts, due dates, vendor, PO numbers
`drivers_license`	US driver's licenses	Name, DOB, address, license number, expiry
`passport`	US passports	Name, DOB, nationality, passport number, expiry
`utility`	Utility bills	Account number, billing period, charges, usage
`w2`	W-2 tax forms	Employer info, wages, tax withheld, SSN
`w9`	W-9 tax forms	Name, business name, TIN, address, tax classification

Example: Basic OCR from URL

{
  "action": "process_document",
  "file_urls": ["https://example.com/document.pdf"]
}

Example: Receipt with structured extraction

{
  "action": "process_document",
  "document_type": "expense",
  "file_urls": ["https://example.com/receipt.jpg"]
}

Example: Invoice from base64

{
  "action": "process_document",
  "document_type": "invoice",
  "content_base64": "JVBERi0xLjQK...",
  "mime_type": "application/pdf"
}

Example: Batch multiple images into one document

{
  "action": "process_document",
  "file_urls": [
    "https://example.com/page1.jpg",
    "https://example.com/page2.jpg",
    "https://example.com/page3.jpg"
  ]
}

Example: Process from cloud file ID with limited output

{
  "action": "process_document",
  "document_type": "w2",
  "file_ids": ["abc123"],
  "max_text_chars": 50000,
  "include_pages": false
}

Example: Get full raw response

{
  "action": "process_document",
  "file_ids": ["abc123"],
  "include_raw_document": true
}

Workflows

Extract text from a scanned document

Call process_document with file_urls pointing to the scanned PDF or image.
Read result.text_excerpt for the extracted text content.

Parse a receipt for expense reporting

Call process_document with document_type: "expense" and the receipt file.
Read result.entities for structured line items, totals, tax, vendor, and date.

Process a multi-page document from images

Provide up to 10 image URLs in file_urls.
The images are fetched in parallel, combined into a single multi-page PDF, and processed as one document.
Use include_pages: true to get per-page statistics.

Extract data from tax forms

Use document_type: "w2" or "w9" with the tax form file.
Entities will include employer info, wages, tax withheld, TIN, etc.

Notes

Supported file types: PDF, PNG, JPEG, TIFF, GIF, BMP, WebP.
Maximum input file size: 20 MB (including combined PDF in batch mode).
Maximum pages: 10 pages per PDF, or 10 images in batch mode.
Input source: Exactly one of file_urls, file_ids, or content_base64 must be provided. Providing multiple sources returns an error.
Batch mode: When 2+ URLs or file IDs are provided, all images are downloaded in parallel, combined into a single multi-page PDF (one image per page), and sent to Document AI as one request.
MIME type auto-detection: When mime_type is omitted, it is inferred from URL response headers or file metadata. Falls back to application/pdf if unresolvable.
Text truncation: Extracted text is truncated to max_text_chars characters. Increase this value for long documents.
Entity truncation: Entities are truncated to max_entities. Increase for documents with many structured fields.

Dependencies

3 dependencies will be automatically added when you enable this product.

Dependencies are not available to view right now.