Skip to main content
The ADE Parse Jobs API enables you to parse large documents that exceed the size limits of the standard ADE Parse API.

Monitor Parse Jobs

You can monitor parse jobs with these APIs:

Rate Limits for ADE Parse Jobs

The ADE Parse Jobs API allows you to parse large documents. The following table shows the limits for the ADE Parse Jobs API.
Maximum File SizeMaximum Pages
1 GB6,000 pages
To see the rate limits for all APIs, go to Rate Limits.

API Reference

To learn more, go to the reference pages for the Parse Jobs APIs:

Save Parsed Output to a URL

When calling the ADE Parse Jobs endpoint, you can use the output_save_url parameter to save the parsed Markdown to a specified URL instead of returning it in the API response. This is useful for managing large documents, integrating with your existing storage workflow, or complying with data retention policies.

When Parsed Output Is Saved to a URL

The parsed Markdown is saved to a URL in these scenarios:
  • You specify the output_save_url parameter: The Markdown is saved to your specified URL.
  • The parsed Markdown exceeds 1 MB: The Markdown is automatically saved to a presigned S3 URL generated by . The URL expires after 1 hour, but time you call the ADE Get Parse Jobs API, a new presigned URL is generated.
Behavior when the output is saved to a URL:
  • The output_url field in the API response contains the URL where the Markdown is stored.
  • The data field in the API response is None.

URL Requirements

If you specify the output_save_url parameter, your URL must meet these requirements:
  • The URL must be a public or presigned URL that explicitly allows PUT or CREATE operations (depending on the provider).
  • Tested storage providers: Amazon S3, Azure Blob Storage, and Google Cloud Storage. Other storage providers may also work.
  • The API cannot access private URLs, such as folders in Google Drive.

Example: Use Amazon S3 Presigned URLs

If you use Amazon S3, you can generate a presigned URL and provide it as the output_save_url value. Presigned URLs grant temporary access to your S3 bucket without requiring authentication in the API request. For more information about presigned URLs with Amazon S3, go to the Amazon documentation. Here is a sample script that creates a presigned URL and uses it for a parsing job:
import requests
import json

headers = {
    'Authorization': 'Bearer YOUR_API_KEY'
}

url = 'https://api.va.landing.ai/v1/ade/parse/jobs'

# Prepare the request payload
output_save_url = generate_presigned_url(...)
files = {'document_url': 'https://...', 'output_save_url': output_save_url}

response = requests.post(url, files=files, headers=headers)
print(response.json())

ZDR Requirements

When zero data retention (ZDR) is enabled, you must configure the following parameters to ensure that does not store the document content:
  • Pass your document in the document_url parameter. You cannot use the document parameter with ZDR enabled.
  • Include the output_save_url parameter. This ensures that the parsed content is saved to your specified URL instead of being returned in the API response. To learn how to configure this parameter, go to Save Parsed Output to a URL.

Workflow Overview

  1. Parse a document with the ADE Parse Jobs API.
  2. Copy the job_id in the API response.
  3. To get results from the parsing job, call the ADE Get Parse Jobs API with the job_id.
  4. The parsed content is returned as Markdown in data.markdown, or as a URL in output_url (in which case data is None). For more information, go to Save Parsed Output to a URL.
  5. If you need to extract fields:
    1. Create an extraction schema.
    2. Send the Markdown to the API.

End-to-End Workflow: Parse and Extract the Output

This tutorial walks you through how to parse a document with the ADE Parse Jobs API and then extract a subset of fields from it using the API. For the sake of simplicity, we use a 2-page PDF in this example, with the intention that you will use larger documents in your own use case. We provide a separate script for each endpoint, so you can choose to skip the extraction steps if you don’t need them. In this tutorial, we will:
  • Parse this PDF: MRI Report
  • Extract these fields: Exam Date and Procedure

1. Download the Document to Process

Download the MRI Report and save it to a local directory.

2. Create Parse Job & Get Job ID

Create the Script

Copy the script below and save it as create-parse-job.py in the same directory as the PDF.
import requests
import json

headers = {
    'Authorization': 'Bearer YOUR_API_KEY'
}

url = 'https://api.va.landing.ai/v1/ade/parse/jobs'

# Upload a document
document = open('mri-report.pdf', 'rb')
files = {'document': document}
data = {'model': 'dpt-2-latest'}

response = requests.post(url, files=files, data=data, headers=headers)
print(response.json())

Run the Script

Run the script from the same directory:
python create-parse-job.py
This returns the job_id:
{'job_id': 'cmfx34ewm0000hyoqkh9dzd8n'}

3. Use job_id to Get Parsing Results

Create the Script

Copy the script below and save it as get-parse-results.py in the same directory as the PDF. Replace {jobId} with the job_id from the previous step.
import requests

headers = {
    'Authorization': 'Bearer YOUR_API_KEY'
}
url = f'https://api.va.landing.ai/v1/ade/parse/jobs/{jobId}'

response = requests.get(url, headers=headers)
response_data = response.json()

# Print the full response
print(response_data)

# Check if job is completed
if response_data.get('status') == 'completed':
    # Check if markdown content is available in data
    if 'data' in response_data and response_data['data'].get('markdown'):
        markdown_content = response_data['data']['markdown']
        
        # Save markdown content to file
        with open('markdown-mri-report.md', 'w', encoding='utf-8') as f:
            f.write(markdown_content)
        
        print("\nMarkdown content saved to a Markdown file.")
    
    # Check if output_url is available instead
    elif response_data.get('output_url'):
        print("Use the Markdown file specified in `output_url`.")
    
    else:
        print("No Markdown content or `output_url` found in the completed job response.")
else:
    print(f"\nJob status: {response_data.get('status', 'unknown')}.")

Run the Script

Run the script from the same directory:
python get-parse-results.py
When parsing is complete, the script saves the output to markdown-mri-report.md. You will pass this file to the Extract API in the next step.

4. Extract Fields from Markdown

Now that we have the parsed output in a Markdown file, we’re ready to extract these fields: Exam Date and Procedure.

Create the Script

Copy the script below and save it as extract-mri-report.py in the same directory as the Markdown file.
import requests
import json

headers = {
    'Authorization': 'Bearer YOUR_API_KEY'
}

url = 'https://api.va.landing.ai/v1/ade/extract'

# Define the extraction schema
schema = json.dumps({
    "type": "object",
    "properties": {
        "exam_date": {
            "description": "The date on which the medical examination or procedure was performed.",
            "format": "YYYY-MM-DD",
            "x-alternativeNames": [
                "Exam Date",
                "Date of Exam",
                "Examination Date"
            ],
            "type": "string"
        },
        "procedure": {
            "description": "The specific medical procedure or examination that was conducted, such as an MRI or X-ray.",
            "x-alternativeNames": [
                "Procedure",
                "Medical Procedure",
                "Performed Procedure"
            ],
            "type": "string"
        }
    }
})

# Prepare files and data
files = {'markdown': open('markdown-mri-report.md', 'rb')}
data = {'schema': schema, 'model': 'extract-latest'}

# Run extraction
response = requests.post(url, files=files, data=data, headers=headers)

# Save the results to a JSON file
with open('mri-report_extract_output.json', 'w') as f:
    json.dump(response.json(), f, indent=2)

Run the Script

Run the script from the same directory:
python extract-mri-report.py

View the Output

The results are saved to mri-report_extract_output.json. The file includes the extracted fields and metadata:
{
  "extraction": {
    "exam_date": "2010-05-20",
    "procedure": "MRI OF THE LUMBAR SPINE WITH AND WITHOUT CONTRAST"
  },
  "extraction_metadata": {
    "exam_date": {
      "references": [
        "b5183837-035c-4a54-b324-a4c9e8a68027",
        "1-3"
      ],
      "value": "2010-05-20"
    },
    "procedure": {
      "references": [
        "56340fbe-8a51-46a0-a309-d91abd9b8b00"
      ],
      "value": "MRI OF THE LUMBAR SPINE WITH AND WITHOUT CONTRAST"
    }
  },
  "metadata": {
    "filename": "markdown-mri-report.md",
    "org_id": null,
    "duration_ms": 15080,
    "credit_usage": 1.3256,
    "job_id": "dfef14d3b66045b8abaf39788b9d17e8",
    "version": "extract-20260314",
    "schema_violation_error": null,
    "fallback_model_version": null,
    "warnings": []
  }
}

Run Parse Jobs with Our Libraries

Click one of the tiles below to learn how to run Parse Jobs with our libraries.

Python Library

Run Parse Jobs with our Python library.
https://mintcdn.com/landingaitest/9admv5znHgUFfVyj/images/ts-logo-512-green.svg?fit=max&auto=format&n=9admv5znHgUFfVyj&q=85&s=37d2005241bb43dec1aa8716782a7508

TypeScript Library

Run Parse Jobs with our TypeScript library.