Skip to main content

Decoupled Parse and Extraction APIs

In our original launch of , the field extraction function was part of the parsing function; every time you wanted to run extraction, you had to run parsing, even if you had already parsed the document. In September 2025, we introduced two new endpoints that separate these functions: and . These APIs allow you to decouple parsing and extraction workflows for greater flexibility. You can parse the document once with the API, and then use the API to run field extraction on that output multiple times. This is helpful if you want to experiment with different extraction schemas or you have multiple extraction tasks.

Availability

The and APIs are available:
  • in the Playground
  • by calling the endpoints directly: and
  • when using the library
The and APIs are not available in the agentic-doc library.

Process Overview

  1. Create an extraction schema. The easiest way is to use the Schema Wizard in the Playground to generate a schema.
  2. Run the API. This returns the parsed content as Markdown.
  3. Run a script to save the returned Markdown text as a Markdown file.
  4. Run the API on the Markdown output from the API.
  5. If needed, connect the extracted fields to their original locations in the document.

Use ADE Parse to Parse Documents

See the full API reference here. Use the API to parse data from documents.
Rotation detection can be enabled upon request. To request this feature, contact support@landing.ai.

Specify Documents to Parse

The API offers two parameters for specifying the document you want to parse:
  • document: Specify the actual file you want to parse.
  • document_url: Include the URL to the file that you want to parse.

Set Up Splits for Parsing

By default, the full document is parsed when you call the API. However, you can set the split parameter to page to parse each page of the document separately. When this is selected, the splits object in the API output contains a set of data for each page.

Parsed Output

When you run , the API returns this response in JSON:
{
  "markdown": "<string>", // Markdown for the full document; the start of a chunk is marked with the chunk's unique ID in this format: <a id={chunk_id}></a>
  "chunks": [
    {
      "markdown": "<string>",
      "type": "<string>",
      "id": "<string>", // Unique ID of the chunk in this format: <a id={chunk_id}></a>
      "grounding": {
        "box": {
          "left": 123,
          "top": 123,
          "right": 123,
          "bottom": 123
        },
        "page": 123
      }
    }
  ],
  "splits": [
    {
      "class": "<string>", // Name of the split. 
      "identifier": "<string>",
      "pages": [
        123
      ], // The page numbers of the pages included in the split
      "markdown": "<string>", // The full Markdown for all chunks in the split.
      "chunks": [
        "<string>"
      ] // The unique IDs of each chunk in the split
    }
  ],
  "metadata": {
    "filename": "<string>",
    "org_id": "<string>",
    "page_count": 123,
    "duration_ms": 123,
    "credit_usage": 123, // The number of credits consumed by the API call
    "version": "<string>"
  }
}

Differences Between the Parsed Output from Legacy API & New API

If you’ve been calling the legacy API endpoint (https://api.va.landing.ai/v1/tools/agentic-document-analysis), you will notice that the output for the new API is different. If you’re switching from that endpoint to the new endpoint, you may need to update any scripts you have that interact with the parsed output. Here are some key ways in which the output is different:
  • The output doesn’t include any extraction data, because the API doesn’t perform extraction.
  • The output is not wrapped in a data object.
  • Each chunks object now has a markdown attribute
  • The chunk type is defined in the type attribute. (The legacy endpoint defines this in chunk_type.)
  • The chunk ID is defined in the id attribute. (The legacy endpoint defines this in chunk_id.)
  • The coordinates of each chunk’s bounding box is now spelled out in the attribute: left, top, right, bottom. (The legacy API abbreviates the coordinates: l, t, r, b.)
  • The output includes a splits object that shows how the document was split during the parsing process.
  • The output includes a metadata object that includes important information about the parsing process.

Use ADE Extract to Extract Fields from Markdown

See the full API reference here. Use the API to extract data from the Markdown output created by the API.

Specify Documents to Run Extraction On

The API offers two parameters for specifying the document you want to parse:
  • markdown: Specify the actual Markdown file you want to run extraction on.
  • markdown_url: Include the URL to the Markdown file you want to run extraction on.

Set the Extraction Schema

Set the extraction schema in the schema parameter. This must be a valid JSON schema. To learn more about extraction schemas and how to create them, go to Overview: Extract Data.

Extracted Output

When you run , the API returns this response in JSON:
{
  "extraction": {}, // The extracted key-value pairs
  "extraction_metadata": {}, // Set of data for each key-value pair. Includes the chunk id for each chunk.
  "metadata": {
    "filename": "<string>",
    "org_id": "<string>",
    "duration_ms": 123,
    "credit_usage": 123,
    "version": "<string>"
  }
}

Differences Between the Extracted Output from Legacy API & New API

If you’ve been calling the legacy API endpoint (https://api.va.landing.ai/v1/tools/agentic-document-analysis), you will notice that the output for the new API is different. If you’re switching from that endpoint to the new endpoint, you may need to update any scripts you have that interact with the extraction output. Here is the key way in which the output is different:
  • The output doesn’t include confidence scores.
  • The output doesn’t contain the coordinates to the bounding boxes for each chunk. Instead, it contains a unique ID (id) for the chunk that an extracted key-value pair is from. If you need to locate the source of a key-value pair, you can create a script that connects the id to the bounding box coordinates from the output. To get a sample script that does this, go to End-to-End Workflow: Parse, Extract, and Visually Ground Extracted Fields.

End-to-End Workflow: Parse and Extract

This tutorial walks you through how to parse a document with the API and then extract a subset of fields from it using the API. We provide a separate script for each endpoint, so you can choose to skip the extraction steps if you don’t need them. Scenario and materials:

1. Parse and Save Content as a Markdown File

First, run the script below to parse the document and save the response to a Markdown file (similar to Markdown for Wire Transfer).
import requests

headers = {
    'Authorization': 'Bearer YOUR_API_KEY'
}

url = 'https://api.va.landing.ai/v1/ade/parse'

# Upload a document 
document = open('wire-transfer.pdf', 'rb')
files = {'document': document}

response = requests.post(url, files=files, headers=headers)
response_data = response.json()

# Print the full response
print(response_data)

# Extract and save the markdown content
if 'markdown' in response_data:
    markdown_content = response_data['markdown']
    
    # Save markdown content to file
    with open('markdown-wire-transfer.md', 'w', encoding='utf-8') as f:
        f.write(markdown_content)
    
    print("\nMarkdown content saved to a Markdown file.")
else:
    print("No 'markdown' field found in the response")

# Close the file
document.close()
The full response will be similar to the JSON below. Notice that each chunk has an id. For example, the first chunk is the text ”# WIRE TRANSFER FORM”. The id for that chunk is 33335548-e7c3-40bd-898e-4f23d6c99d34.
{
   'markdown':"<a id='33335548-e7c3-40bd-898e-4f23d6c99d34'></a>\n\n# WIRE TRANSFER FORM\n\n<a id='0777dc07-855b-4b83-b422-5e8063405249'></a>\n\nInvoice Information\n\nInvoice Description: Professional consulting services - Q3 2025\n\nTotal Invoice Amount: $15,750.00 USD\n\n<a id='7c56b114-cc66-4fe4-99cb-9425a5210747'></a>\n\nBeneficiary Bank Information\n\nBank Name: JPMorgan Chase Bank, N.A.\n\nBank Address: 270 Park Avenue, New York, NY 10017, USA\n\nBank Account Number: 4578923456789012\n\nSWIFT Code: CHASUS33\n\nABA Routing Number: 021000021\n\nACH Routing Number: 021000021\n\n<a id='b95955a2-3f1d-4b96-be12-d5af677efd60'></a>\n\nInvoice Line Items\n<table><thead><tr><th>Description</th><th>Amount</th></tr></thead><tbody><tr><td>Strategic planning consultation (40 hours @ $150/hr)</td><td>$6,000.00</td></tr><tr><td>Market analysis report preparation</td><td>$3,500.00</td></tr><tr><td>Implementation roadmap development</td><td>$2,250.00</td></tr><tr><td>Executive presentation materials</td><td>$1,500.00</td></tr><tr><td>Follow-up consultation sessions (15 hours @ $150/hr)</td><td>$2,250.00</td></tr><tr><td>Travel expenses (reimbursable)</td><td>$250.00</td></tr><tr><td>TOTAL</td><td>$15,750.00</td></tr></tbody></table>\n\n<a id='d9296cc1-f804-43e2-9f0f-99e7c62eec48'></a>\n\nWire Transfer Instructions\n\nPayment Method: International Wire Transfer\nCurrency: USD (United States Dollars)\nBeneficiary Name: ABC Consulting Services LLC\n\n<a id='f2b8a1d4-4436-4e05-9467-bdaf5ca4bd3b'></a>\n\nBeneficiary Address: 1234 Business Park Drive, Suite 500, Los Angeles, CA 90210, USA\n\nPurpose of Payment: Payment for professional consulting services as per Invoice #INV-2025-0847\n\n<a id='1d536fff-e204-48d4-a53a-8e524665aec5'></a>\n\n- Special Instructions:\n  - Please include invoice number INV-2025-0847 in the payment reference\n  - All bank charges to be borne by the sender\n  - Payment should be received within 3-5 business days\n  - Please send wire confirmation receipt to accounting@abcconsulting.com\n  - For any questions regarding this transfer, contact: +1 (555) 123-4567\n\n<a id='fb34e8c2-0aa6-4866-895d-060c07b717ea'></a>\n\n**Urgency:** Standard processing (3-5 business days acceptable)\n\n<a id='7c686aab-8142-4da2-a7e7-dae4495aade5'></a>\n\nForm completed on: September 3, 2025\n\nReference Number: WT-2025-0847",
   'chunks':[
      {
         'markdown':'# WIRE TRANSFER FORM',
         'type':'text',
         'id':'33335548-e7c3-40bd-898e-4f23d6c99d34',
         'grounding':{
            'box':{
               'left':0.2622728943824768,
               'top':0.07604080438613892,
               'right':0.7369285821914673,
               'bottom':0.10924206674098969
            },
            'page':0
         }
      },
      {
         'markdown':'Invoice Information\n\nInvoice Description: Professional consulting services - Q3 2025\n\nTotal Invoice Amount: $15,750.00 USD',
         'type':'text',
         'id':'0777dc07-855b-4b83-b422-5e8063405249',
         'grounding':{
            'box':{
               'left':0.10331332683563232,
               'top':0.13015401363372803,
               'right':0.8966385126113892,
               'bottom':0.2544138431549072
            },
            'page':0
         }
      },
      {
         'markdown':'Beneficiary Bank Information\n\nBank Name: JPMorgan Chase Bank, N.A.\n\nBank Address: 270 Park Avenue, New York, NY 10017, USA\n\nBank Account Number: 4578923456789012\n\nSWIFT Code: CHASUS33\n\nABA Routing Number: 021000021\n\nACH Routing Number: 021000021',
         'type':'text',
         'id':'7c56b114-cc66-4fe4-99cb-9425a5210747',
         'grounding':{
            'box':{
               'left':0.10399597883224487,
               'top':0.2693082094192505,
               'right':0.895996630191803,
               'bottom':0.5048781633377075
            },
            'page':0
         }
      },
      {
         'markdown':'Invoice Line Items\n<table><thead><tr><th>Description</th><th>Amount</th></tr></thead><tbody><tr><td>Strategic planning consultation (40 hours @ $150/hr)</td><td>$6,000.00</td></tr><tr><td>Market analysis report preparation</td><td>$3,500.00</td></tr><tr><td>Implementation roadmap development</td><td>$2,250.00</td></tr><tr><td>Executive presentation materials</td><td>$1,500.00</td></tr><tr><td>Follow-up consultation sessions (15 hours @ $150/hr)</td><td>$2,250.00</td></tr><tr><td>Travel expenses (reimbursable)</td><td>$250.00</td></tr><tr><td>TOTAL</td><td>$15,750.00</td></tr></tbody></table>',
         'type':'table',
         'id':'b95955a2-3f1d-4b96-be12-d5af677efd60',
         'grounding':{
            'box':{
               'left':0.10457819700241089,
               'top':0.5198298096656799,
               'right':0.8970209956169128,
               'bottom':0.8072096705436707
            },
            'page':0
         }
      },
      {
         'markdown':'Wire Transfer Instructions\n\nPayment Method: International Wire Transfer\nCurrency: USD (United States Dollars)\nBeneficiary Name: ABC Consulting Services LLC',
         'type':'text',
         'id':'d9296cc1-f804-43e2-9f0f-99e7c62eec48',
         'grounding':{
            'box':{
               'left':0.10443270206451416,
               'top':0.8223555088043213,
               'right':0.8968669176101685,
               'bottom':0.974624514579773
            },
            'page':0
         }
      },
      {
         'markdown':'Beneficiary Address: 1234 Business Park Drive, Suite 500, Los Angeles, CA 90210, USA\n\nPurpose of Payment: Payment for professional consulting services as per Invoice #INV-2025-0847',
         'type':'text',
         'id':'f2b8a1d4-4436-4e05-9467-bdaf5ca4bd3b',
         'grounding':{
            'box':{
               'left':0.11186572909355164,
               'top':0.022329870611429214,
               'right':0.8772550821304321,
               'bottom':0.09824278950691223
            },
            'page':1
         }
      },
      {
         'markdown':'- Special Instructions:\n  - Please include invoice number INV-2025-0847 in the payment reference\n  - All bank charges to be borne by the sender\n  - Payment should be received within 3-5 business days\n  - Please send wire confirmation receipt to accounting@abcconsulting.com\n  - For any questions regarding this transfer, contact: +1 (555) 123-4567',
         'type':'text',
         'id':'1d536fff-e204-48d4-a53a-8e524665aec5',
         'grounding':{
            'box':{
               'left':0.11558690667152405,
               'top':0.10176733136177063,
               'right':0.8238765001296997,
               'bottom':0.20318034291267395
            },
            'page':1
         }
      },
      {
         'markdown':'**Urgency:** Standard processing (3-5 business days acceptable)',
         'type':'text',
         'id':'fb34e8c2-0aa6-4866-895d-060c07b717ea',
         'grounding':{
            'box':{
               'left':0.11588779091835022,
               'top':0.204525887966156,
               'right':0.6877880096435547,
               'bottom':0.23076602816581726
            },
            'page':1
         }
      },
      {
         'markdown':'Form completed on: September 3, 2025\n\nReference Number: WT-2025-0847',
         'type':'text',
         'id':'7c686aab-8142-4da2-a7e7-dae4495aade5',
         'grounding':{
            'box':{
               'left':0.35991770029067993,
               'top':0.26450976729393005,
               'right':0.641823947429657,
               'bottom':0.3033582866191864
            },
            'page':1
         }
      }
   ],
   'splits':[
      {
         'class':'full',
         'identifier':'full',
         'pages':[
            0,
            1
         ],
         'markdown':"<a id='33335548-e7c3-40bd-898e-4f23d6c99d34'></a>\n\n# WIRE TRANSFER FORM\n\n<a id='0777dc07-855b-4b83-b422-5e8063405249'></a>\n\nInvoice Information\n\nInvoice Description: Professional consulting services - Q3 2025\n\nTotal Invoice Amount: $15,750.00 USD\n\n<a id='7c56b114-cc66-4fe4-99cb-9425a5210747'></a>\n\nBeneficiary Bank Information\n\nBank Name: JPMorgan Chase Bank, N.A.\n\nBank Address: 270 Park Avenue, New York, NY 10017, USA\n\nBank Account Number: 4578923456789012\n\nSWIFT Code: CHASUS33\n\nABA Routing Number: 021000021\n\nACH Routing Number: 021000021\n\n<a id='b95955a2-3f1d-4b96-be12-d5af677efd60'></a>\n\nInvoice Line Items\n<table><thead><tr><th>Description</th><th>Amount</th></tr></thead><tbody><tr><td>Strategic planning consultation (40 hours @ $150/hr)</td><td>$6,000.00</td></tr><tr><td>Market analysis report preparation</td><td>$3,500.00</td></tr><tr><td>Implementation roadmap development</td><td>$2,250.00</td></tr><tr><td>Executive presentation materials</td><td>$1,500.00</td></tr><tr><td>Follow-up consultation sessions (15 hours @ $150/hr)</td><td>$2,250.00</td></tr><tr><td>Travel expenses (reimbursable)</td><td>$250.00</td></tr><tr><td>TOTAL</td><td>$15,750.00</td></tr></tbody></table>\n\n<a id='d9296cc1-f804-43e2-9f0f-99e7c62eec48'></a>\n\nWire Transfer Instructions\n\nPayment Method: International Wire Transfer\nCurrency: USD (United States Dollars)\nBeneficiary Name: ABC Consulting Services LLC\n\n<a id='f2b8a1d4-4436-4e05-9467-bdaf5ca4bd3b'></a>\n\nBeneficiary Address: 1234 Business Park Drive, Suite 500, Los Angeles, CA 90210, USA\n\nPurpose of Payment: Payment for professional consulting services as per Invoice #INV-2025-0847\n\n<a id='1d536fff-e204-48d4-a53a-8e524665aec5'></a>\n\n- Special Instructions:\n  - Please include invoice number INV-2025-0847 in the payment reference\n  - All bank charges to be borne by the sender\n  - Payment should be received within 3-5 business days\n  - Please send wire confirmation receipt to accounting@abcconsulting.com\n  - For any questions regarding this transfer, contact: +1 (555) 123-4567\n\n<a id='fb34e8c2-0aa6-4866-895d-060c07b717ea'></a>\n\n**Urgency:** Standard processing (3-5 business days acceptable)\n\n<a id='7c686aab-8142-4da2-a7e7-dae4495aade5'></a>\n\nForm completed on: September 3, 2025\n\nReference Number: WT-2025-0847",
         'chunks':[
            '33335548-e7c3-40bd-898e-4f23d6c99d34',
            '0777dc07-855b-4b83-b422-5e8063405249',
            '7c56b114-cc66-4fe4-99cb-9425a5210747',
            'b95955a2-3f1d-4b96-be12-d5af677efd60',
            'd9296cc1-f804-43e2-9f0f-99e7c62eec48',
            'f2b8a1d4-4436-4e05-9467-bdaf5ca4bd3b',
            '1d536fff-e204-48d4-a53a-8e524665aec5',
            'fb34e8c2-0aa6-4866-895d-060c07b717ea',
            '7c686aab-8142-4da2-a7e7-dae4495aade5'
         ]
      }
   ],
   'metadata':{
      'filename':'wire-transfer.pdf',
      'org_id':None,
      'page_count':2,
      'duration_ms':7861,
      'credit_usage':6.0,
      'version':'latest'
   }
}

2. Create a JSON Extraction Schema

As a reminder, we want to extract these fields from the Wire Transfer form: Bank Account and Bank Account Number. To do this, create a JSON extraction schema that identifies these fields. We will use this JSON file when we run the ADE Extract API in the next step. We’ve created the JSON schema below for you to use. You can also download this schema here: Schema for Wire Transfer.
{  "type": "object",
  "properties": {
    "bankName": {
      "title": "Bank Name",
      "description": "The name of the beneficiary bank as listed in the wire transfer form.",
      "type": "string"
    },
    "bankAccountNumber": {
      "title": "Bank Account Number",
      "description": "The account number of the beneficiary bank as listed in the wire transfer form.",
      "type": "number"
    }
  },
  "required": [
    "bankName",
    "bankAccountNumber"
  ]}
To learn more about extraction schemas and how to create them, go to Overview: Extract Data.

3. Use the Extraction Schema to Extract Data from the Markdown File

Now that we have the parsed output in a Markdown file and a JSON extraction schema, we’re ready to extract these fields: Bank Account and Bank Account Number. To do this, run the script below.
import requests

headers = {
    'Authorization': 'Bearer YOUR_API_KEY'
}

url = 'https://api.va.landing.ai/v1/ade/extract'

# Read the schema file as string
with open('schema-wire-transfer.json', 'r') as f:
    schema_content = f.read()

# Prepare files and data
files = {'markdown': open('markdown-wire-transfer.md', 'rb')}
data = {'schema': schema_content}

# Run extraction
response = requests.post(url, files=files, data=data, headers=headers)

# Return the results
print(response.json())
The extracted fields and other metadata are included in the API response:
{
   'extraction':{
      'bankName':'JPMorgan Chase Bank, N.A.',
      'bankAccountNumber':4578923456789012
   },
   'extraction_metadata':{
      'bankName':{
         'value':'JPMorgan Chase Bank, N.A.',
         'references':[
            '7c56b114-cc66-4fe4-99cb-9425a5210747'
         ]
      },
      'bankAccountNumber':{
         'value':4578923456789012,
         'references':[
            '7c56b114-cc66-4fe4-99cb-9425a5210747'
         ]
      }
   },
   'metadata':{
      'filename':'markdown-wire-transfer.md',
      'org_id':None,
      'duration_ms':1018,
      'credit_usage':0.6,
      'version':'latest'
   }
}

End-to-End Workflow: Parse, Extract, and Visually Ground Extracted Fields

This tutorial walks you through how to parse a document, extract a subset of fields, and then connect the fields back to their original locations in the document. We provide a single script for the full workflow. Running this script saves images of the locations of the fields as PNGs. Scenario and materials:
import requests
import io
import pymupdf

# Start the parse process
headers = {
    'Authorization': 'Bearer YOUR_API_KEY'
}

# Define the parsing endpoint
url_parse = 'https://api.va.landing.ai/v1/ade/parse'

# Upload a document 
document = open('wire-transfer.pdf', 'rb')
files = {'document': document}

# Parse the document
response = requests.post(url_parse, files=files, headers=headers)

# Parse the JSON response
response_parse = response.json()

# Create variables that store the fields from the response
markdown = response_parse['markdown']
chunks = response_parse['chunks']

# Start the extraction process
# Define the extract endpoint
url_extract = 'https://api.va.landing.ai/v1/ade/extract'

# Read the schema file as string
with open('schema-wire-transfer.json', 'r') as f:
    schema_content = f.read()

# Prepare the markdown as a file-like object and schema as data
files_extract = {'markdown': io.StringIO(markdown)}
data_extract = {'schema': schema_content}

# Run extraction
response_extract = requests.post(url_extract, files=files_extract, data=data_extract, headers=headers)

# Parse the extraction response
response_extraction = response_extract.json()

# Start the process to connect the extracted fields to their locations in the document
# Load the original PDF
pdf = pymupdf.open('.wire-transfer.pdf')
extraction = response_extraction['extraction']
extraction_metadata = response_extraction['extraction_metadata']

# Process each extracted field
for field_name, field_value in extraction.items():
    # Get the chunk IDs that contain this field's data
    refs = extraction_metadata[field_name]['references']
    # Find the chunks that match these IDs
    ref_chunks = [chunk for chunk in chunks if chunk['id'] in refs]
    
    # Process each chunk's grounding information
    for chunk in ref_chunks:
        grounding = chunk['grounding']
        # Get the page image
        page_image = pdf[grounding['page']].get_pixmap(dpi=72)
        
        # Convert normalized coordinates (0-1) to pixel coordinates
        left = int(grounding['box']['left'] * page_image.width)
        right = int(grounding['box']['right'] * page_image.width)
        top = int(grounding['box']['top'] * page_image.height)
        bottom = int(grounding['box']['bottom'] * page_image.height)
        
        # Crop the region and save as image
        chunk_crop = page_image.pil_image().crop((left, top, right, bottom))
        chunk_crop.save(f"crop_{field_name}.png")

pdf.close()