Skip to main content

Overview

When you run the API, the response includes an extraction_metadata field with reference IDs that connect each extracted value back to its location in the original document. This tutorial shows you how to use those references. This tutorial uses the library and library. In this tutorial, we will:
  • Parse this PDF: Pay Stub
  • Extract these fields: Employee Name and Gross Pay
  • Save a PNG of each extracted field’s location
  • Save bounding box coordinates for each extracted field to a JSON file
These examples require the Python or TypeScript client library. Before running a script, set your API key and install the library and any required dependencies.
The Python script has been tested with PDF and PNG files and may work with other file types supported by . The TypeScript script is written specifically for PDF files. If you need to export crops from other file types, use it as a reference and adapt the image export logic.

1. Download the Document to Process

Download the Pay Stub and save it to a local directory.

2. Create the Script

Copy the script for your language and save it as grounding.py or grounding.ts in the same directory as the PDF.
import json
import pymupdf
from pathlib import Path
from landingai_ade import LandingAIADE

# Initialize client (uses VISION_AGENT_API_KEY environment variable)
client = LandingAIADE()

# Define the extraction schema
schema = json.dumps({
    "type": "object",
    "properties": {
        "employee_name": {
            "description": "The employee's full name",
            "type": "string"
        },
        "gross_pay": {
            "description": "The gross pay amount",
            "type": "number"
        }
    }
})

# Parse the document
# save_to is optional, but saves the full parse response, which is useful for
# keeping a record and for other downstream processing tasks
parse_response = client.parse(
    document=Path("pay-stub.pdf"),
    model="dpt-2-latest",
    save_to="output"
)

# Extract data
extract_response = client.extract(
    schema=schema,
    markdown=parse_response.markdown,
    model="extract-latest"
)

# Save the extraction results
with open("output/pay-stub_extract_output.json", "w") as f:
    json.dump(extract_response.to_dict(), f, indent=2)

# Open the PDF for PNG export
pdf = pymupdf.open("pay-stub.pdf")

# Link each extracted field to its location in the document
grounding_results = {}

for field_name, field_data in extract_response.extraction_metadata.items():
    for chunk_id in field_data["references"]:
        # Skip table cell IDs not present in grounding
        if chunk_id not in parse_response.grounding:
            continue
        grounding = parse_response.grounding[chunk_id]

        # Collect extracted value and bounding box coordinates
        grounding_results[field_name] = {
            "value": extract_response.extraction[field_name],
            "page": grounding.page,
            "location": {
                "left": round(grounding.box.left, 3),
                "top": round(grounding.box.top, 3),
                "right": round(grounding.box.right, 3),
                "bottom": round(grounding.box.bottom, 3)
            }
        }

        # Crop the chunk and save as a PNG
        page_image = pdf[grounding.page].get_pixmap(dpi=150)
        left = int(grounding.box.left * page_image.width)
        right = int(grounding.box.right * page_image.width)
        top = int(grounding.box.top * page_image.height)
        bottom = int(grounding.box.bottom * page_image.height)
        crop = page_image.pil_image().crop((left, top, right, bottom))
        crop.save(f"output/{field_name}.png")

pdf.close()

# Save grounding results to a JSON file
with open("output/pay-stub_grounding_output.json", "w") as f:
    json.dump(grounding_results, f, indent=2)

3. Run the Script

Run the script from the same directory:
python grounding.py

4. View Output

The script saves the following files to the output folder:
FileDescription
pay-stub_parse_output.jsonFull parse response, including all chunks and grounding data.
pay-stub_extract_output.jsonExtraction results, including extracted values and reference IDs.
pay-stub_grounding_output.jsonExtracted values and bounding box coordinates for each field.
employee_name.pngCropped image of the chunk where the employee name was found.
gross_pay.pngCropped image of the chunk where the gross pay was found.

Chunk Coordinates

Each entry in pay-stub_grounding_output.json includes the page number and bounding box coordinates. Coordinates are normalized values between 0 and 1, relative to the page dimensions:
{
  "employee_name": {
    "value": "JANE HARPER",
    "page": 0,
    "location": {
      "left": 0.08,
      "top": 0.785,
      "right": 0.933,
      "bottom": 0.837
    }
  },
  "gross_pay": {
    "value": 452.43,
    "page": 0,
    "location": {
      "left": 0.306,
      "top": 0.331,
      "right": 0.438,
      "bottom": 0.345
    }
  }
}

Next Steps

Now that you have a working script, you can:
  • Replace pay-stub.pdf with any document you want to parse and extract from.
  • Modify the schema dictionary to extract different fields. For guidance, see Extraction Schema (JSON).
  • Use the Playground to build and test a schema before adding it to your code. See Schema Wizard.