> ## Documentation Index
> Fetch the complete documentation index at: https://docs.landing.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Link Extracted Data to Document Locations

export const adeTypeScriptLibrary = 'ade-typescript';

export const adePythonLibrary = 'ade-python';

export const extract = 'ADE Extract';

export const parse = 'ADE Parse';

export const ade = 'Agentic Document Extraction';

## Overview

When you run the {extract} API, the response includes an `extraction_metadata` field with reference IDs that connect each extracted value back to its location in the original document. This tutorial shows you how to use those references.

This tutorial uses the [{adePythonLibrary} library](./ade-python) and [{adeTypeScriptLibrary} library](./ade-typescript).

In this tutorial, we will:

* Parse this PDF: <a href="/examples/pay-stub.pdf" download="pay-stub.pdf">Pay Stub</a>
* Extract these fields: **Employee Name** and **Gross Pay**
* Save a PNG of each extracted field's location
* Save bounding box coordinates for each extracted field to a JSON file

<Info>
  These examples require the [Python](./ade-python) or [TypeScript](./ade-typescript) client library. Before running a script, set your API key and install the library and any required dependencies.
</Info>

<Info>
  The Python script has been tested with PDF and PNG files and may work with other file types supported by {ade}. The TypeScript script is written specifically for PDF files. If you need to export crops from other file types, use it as a reference and adapt the image export logic.
</Info>

## 1. Download the Document to Process

Download the <a href="/examples/pay-stub.pdf" download="pay-stub.pdf">Pay Stub</a> and save it to a local directory.

## 2. Create the Script

Copy the script for your language and save it as `grounding.py` or `grounding.ts` in the same directory as the PDF.

<CodeGroup>
  ```python Python [expandable] theme={null}
  import json
  import pymupdf
  from pathlib import Path
  from landingai_ade import LandingAIADE

  # Initialize client (uses VISION_AGENT_API_KEY environment variable)
  client = LandingAIADE()

  # Define the extraction schema
  schema = json.dumps({
      "type": "object",
      "properties": {
          "employee_name": {
              "description": "The employee's full name",
              "type": "string"
          },
          "gross_pay": {
              "description": "The gross pay amount",
              "type": "number"
          }
      }
  })

  # Parse the document
  # save_to is optional, but saves the full parse response, which is useful for
  # keeping a record and for other downstream processing tasks
  parse_response = client.parse(
      document=Path("pay-stub.pdf"),
      model="dpt-2-latest",
      save_to="output"
  )

  # Extract data
  extract_response = client.extract(
      schema=schema,
      markdown=parse_response.markdown,
      model="extract-latest"
  )

  # Save the extraction results
  with open("output/pay-stub_extract_output.json", "w") as f:
      json.dump(extract_response.to_dict(), f, indent=2)

  # Open the PDF for PNG export
  pdf = pymupdf.open("pay-stub.pdf")

  # Link each extracted field to its location in the document
  grounding_results = {}

  for field_name, field_data in extract_response.extraction_metadata.items():
      for chunk_id in field_data["references"]:
          # Skip table cell IDs not present in grounding
          if chunk_id not in parse_response.grounding:
              continue
          grounding = parse_response.grounding[chunk_id]

          # Collect extracted value and bounding box coordinates
          grounding_results[field_name] = {
              "value": extract_response.extraction[field_name],
              "page": grounding.page,
              "location": {
                  "left": round(grounding.box.left, 3),
                  "top": round(grounding.box.top, 3),
                  "right": round(grounding.box.right, 3),
                  "bottom": round(grounding.box.bottom, 3)
              }
          }

          # Crop the chunk and save as a PNG
          page_image = pdf[grounding.page].get_pixmap(dpi=150)
          left = int(grounding.box.left * page_image.width)
          right = int(grounding.box.right * page_image.width)
          top = int(grounding.box.top * page_image.height)
          bottom = int(grounding.box.bottom * page_image.height)
          crop = page_image.pil_image().crop((left, top, right, bottom))
          crop.save(f"output/{field_name}.png")

  pdf.close()

  # Save grounding results to a JSON file
  with open("output/pay-stub_grounding_output.json", "w") as f:
      json.dump(grounding_results, f, indent=2)
  ```

  ```typescript TypeScript [expandable] theme={null}
  import * as mupdf from "mupdf";
  import LandingAIADE, { toFile } from "landingai-ade";
  import fs from "fs";

  // Initialize client (uses VISION_AGENT_API_KEY environment variable)
  const client = new LandingAIADE();

  // Define the extraction schema
  const schema = JSON.stringify({
    type: "object",
    properties: {
      employee_name: {
        description: "The employee's full name",
        type: "string"
      },
      gross_pay: {
        description: "The gross pay amount",
        type: "number"
      }
    }
  });

  // Parse the document
  // saveTo is optional, but saves the full parse response, which is useful for
  // keeping a record and for other downstream processing tasks
  const parseResponse = await client.parse({
    document: fs.createReadStream("pay-stub.pdf"),
    model: "dpt-2-latest",
    saveTo: "output"
  });

  // Extract data
  const extractResponse = await client.extract({
    schema: schema,
    markdown: await toFile(Buffer.from(parseResponse.markdown), "document.md"),
    model: "extract-latest"
  });

  // Save the extraction results
  fs.mkdirSync("output", { recursive: true });
  fs.writeFileSync(
    "output/pay-stub_extract_output.json",
    JSON.stringify(extractResponse, null, 2)
  );

  // Open the PDF for PNG export
  const pdfBuffer = fs.readFileSync("pay-stub.pdf");
  const mupdfDoc = mupdf.Document.openDocument(pdfBuffer, "application/pdf").asPDF()!;

  // Link each extracted field to its location in the document
  const groundingResults: Record<string, unknown> = {};

  for (const [fieldName, fieldData] of Object.entries(extractResponse.extraction_metadata)) {
    for (const chunkId of fieldData.references) {
      if (!parseResponse.grounding?.[chunkId]) continue;
      const grounding = parseResponse.grounding[chunkId];
      groundingResults[fieldName] = {
        value: (extractResponse.extraction as Record<string, unknown>)[fieldName],
        page: grounding.page,
        location: {
          left: parseFloat(grounding.box.left.toFixed(3)),
          top: parseFloat(grounding.box.top.toFixed(3)),
          right: parseFloat(grounding.box.right.toFixed(3)),
          bottom: parseFloat(grounding.box.bottom.toFixed(3))
        }
      };

      // Crop the chunk and save as a PNG
      const scaleFactor = 150 / 72;
      const page = mupdfDoc.loadPage(grounding.page);
      const bounds = page.getBounds();
      const fullWidth = Math.round((bounds[2] - bounds[0]) * scaleFactor);
      const fullHeight = Math.round((bounds[3] - bounds[1]) * scaleFactor);
      const left = Math.round(grounding.box.left * fullWidth);
      const top = Math.round(grounding.box.top * fullHeight);
      const right = Math.round(grounding.box.right * fullWidth);
      const bottom = Math.round(grounding.box.bottom * fullHeight);
      const cropPixmap = new mupdf.Pixmap(mupdf.ColorSpace.DeviceRGB, [left, top, right, bottom], false);
      cropPixmap.clear(255);
      const device = new mupdf.DrawDevice(mupdf.Matrix.scale(scaleFactor, scaleFactor), cropPixmap);
      page.run(device, mupdf.Matrix.identity);
      device.close();
      fs.writeFileSync(`output/${fieldName}.png`, cropPixmap.asPNG());
    }
  }

  // Save grounding results to a JSON file
  fs.writeFileSync(
    "output/pay-stub_grounding_output.json",
    JSON.stringify(groundingResults, null, 2)
  );
  ```
</CodeGroup>

## 3. Run the Script

Run the script from the same directory:

<CodeGroup>
  ```bash Run Python theme={null}
  python grounding.py
  ```

  ```bash Run TypeScript theme={null}
  npx tsx grounding.ts
  ```
</CodeGroup>

## 4. View Output

The script saves the following files to the `output` folder:

| File                             | Description                                                       |
| -------------------------------- | ----------------------------------------------------------------- |
| `pay-stub_parse_output.json`     | Full parse response, including all chunks and grounding data.     |
| `pay-stub_extract_output.json`   | Extraction results, including extracted values and reference IDs. |
| `pay-stub_grounding_output.json` | Extracted values and bounding box coordinates for each field.     |
| `employee_name.png`              | Cropped image of the chunk where the employee name was found.     |
| `gross_pay.png`                  | Cropped image of the chunk where the gross pay was found.         |

### Chunk Coordinates

Each entry in `pay-stub_grounding_output.json` includes the page number and bounding box coordinates. Coordinates are normalized values between 0 and 1, relative to the page dimensions:

```json theme={null}
{
  "employee_name": {
    "value": "JANE HARPER",
    "page": 0,
    "location": {
      "left": 0.08,
      "top": 0.785,
      "right": 0.933,
      "bottom": 0.837
    }
  },
  "gross_pay": {
    "value": 452.43,
    "page": 0,
    "location": {
      "left": 0.306,
      "top": 0.331,
      "right": 0.438,
      "bottom": 0.345
    }
  }
}
```

## Next Steps

Now that you have a working script, you can:

* Replace `pay-stub.pdf` with any document you want to parse and extract from.
* Modify the `schema` dictionary to extract different fields. For guidance, see [Extraction Schema (JSON)](./ade-extract-schema-json).
* Use the Playground to build and test a schema before adding it to your code. See [Schema Wizard](./ade-extract-playground).
