Skip to main content

Overview

This tutorial walks you through how to parse a document with the API and extract specific fields from it with the API. This tutorial uses the library and library. In this tutorial, we will:
  • Parse this PDF: Wire Transfer Form
  • Extract these fields: Bank Name and Total Invoice Amount
These examples require the Python or TypeScript client library. Before running a script, set your API key and install the library and any required dependencies.
The scripts have been tested with PDF and PNG files and may work with other file types supported by .

1. Download the Document to Process

Download the Wire Transfer Form and save it to a local directory.

2. Create the Script

Copy the script for your language and save it as parse-extract.py or parse-extract.ts in the same directory as the PDF.
import json
from pathlib import Path
from landingai_ade import LandingAIADE

# Initialize client (uses VISION_AGENT_API_KEY environment variable)
client = LandingAIADE()

# Define the extraction schema
schema = json.dumps({
    "type": "object",
    "properties": {
        "bank_name": {
            "description": "The official name of the bank where the account is held.",
            "x-alternativeNames": ["Name of Bank", "Financial Institution", "Bank"],
            "type": "string"
        },
        "total_invoice_amount": {
            "description": "The total monetary amount of the invoice, including all charges and taxes.",
            "x-alternativeNames": ["Grand Total", "Amount Due", "Invoice Total"],
            "type": "number"
        }
    }
})

# Parse the document
# save_to is optional, but saves the full parse response, which is useful for
# keeping a record and for other downstream processing tasks
parse_response = client.parse(
    document=Path('wire-transfer.pdf'),
    model='dpt-2-latest',
    save_to='output'
)

# Extract fields from the parsed output
extract_response = client.extract(
    schema=schema,
    markdown=parse_response.markdown,
    model='extract-latest'
)

# Save the extract results to a JSON file
with open('output/wire-transfer_extract_output.json', 'w') as f:
    json.dump(extract_response.to_dict(), f, indent=2)

3. Run the Script

Run the script from the same directory:
python parse-extract.py

4. View Extraction Output

The results are saved to an output folder in the same directory. View the extracted fields and metadata in wire-transfer_extract_output.json.
{
  "extraction": {
    "bank_name": "JPMorgan Chase Bank, N.A.",
    "total_invoice_amount": 15750.0
  },
  "extraction_metadata": {
    "bank_name": {
      "references": [
        "4f64f8d9-ff3a-4c47-aeb5-2ab6eaa9ce7a"
      ],
      "value": "JPMorgan Chase Bank, N.A."
    },
    "total_invoice_amount": {
      "references": [
        "deeb001e-6b3e-4c4e-96b1-6f321521ad4f",
        "0-h"
      ],
      "value": 15750.0
    }
  },
  "metadata": {
    "credit_usage": 0.5396,
    "duration_ms": 11536,
    "filename": "upload.md",
    "job_id": "bec005b58d144096b0525af3aa6ed12d",
    "org_id": null,
    "version": "extract-20260314",
    "fallback_model_version": null,
    "schema_violation_error": null,
    "warnings": []
  }
}

Next Steps

Now that you have a working script, you can:
  • Replace wire-transfer.pdf with any document you want to parse and extract from.
  • Modify the schema dictionary to extract different fields. For guidance, see Extraction Schema (JSON).
  • Use the Playground to build and test a schema before adding it to your code. See Schema Wizard.
  • Link extracted fields back to their locations in the original document. See Link Extracted Data to Document Locations.