Skip to main content
The ADE Extract Jobs API enables you to extract structured data from large Markdown documents asynchronously. Instead of waiting for a single request to finish, you create a job, receive a job_id immediately, and retrieve the results when the job is complete. Use Extract Jobs for long-running extractions, such as when extracting from long documents or when using large, complex schemas.

Run Parse First

runs on the Markdown output created by Parse, which is required as the first step in all ADE workflows. For large documents, use Parse Jobs to generate the Markdown, then pass that Markdown to Extract Jobs.

Monitor Extract Jobs

You can monitor extract jobs with these APIs:

Rate Limits for ADE Extract Jobs

Extract Jobs have their own per-hour rate limit, separate from the other ADE APIs. Each extract job counts as a single submission (one page equivalent) toward this limit, regardless of the size of the Markdown document. To see the rate limits for all APIs, go to Rate Limits.

API Reference

To learn more, go to the reference pages for the Extract Jobs APIs:
For information about pricing and credits, go to Pricing & Billing.

Workflow Overview

  1. Create an extraction schema. For schema requirements, see JSON Schema for Extraction.
  2. Submit the Markdown and schema to the ADE Extract Jobs API.
  3. Get the job_id from the API response.
  4. Poll the ADE Get Extract Jobs API with the job_id until status is completed.
  5. Read the extracted fields from the completed job response. For the field structure, see JSON Response for Extraction.

Job Statuses

The ADE Get Extract Jobs API returns the current status of a job:
StatusDescription
pendingThe job is queued and has not started.
processingThe job is running. The progress field stays 0.0 until the job is completed.
completedThe job finished. The extracted fields are in the data field, or in the output_url field for large results.
failedThe job did not finish. See the failure_reason field for details.
cancelledThe job was cancelled.

Extract Job Response

When the ADE Get Extract Jobs API returns a job, the response wraps the extraction results with job-level fields:
FieldDescription
job_idThe unique identifier for the job.
statusThe current state of the job. See Job Statuses.
received_atUnix timestamp (in seconds) for when the job was received.
created_atUnix timestamp (in seconds) for when the job was created.
progressEither 0.0 (not yet complete) or 1.0 (complete).
org_idThe organization ID associated with the job.
versionThe model snapshot used for the extraction.
dataThe extraction results, returned when the job is complete and you did not set an output_save_url. This object follows the same structure as the standard API response, including the extraction metadata (with schema_violation_error, warnings, and fallback_model_version). See JSON Response for Extraction.
output_urlA URL to download the extraction results, returned when the result is larger than 1 MB or you set an output_save_url. When output_url is present, data is null.
metadataJob-level metadata summarizing the job, such as filename, duration_ms, credit_usage, and version.
failure_reasonIf the job failed, a message describing what went wrong. Otherwise, null.
This example shows the structure of a completed job response, with the extraction abbreviated:
{
  "job_id": "cmf8x2k9p0001abcd1234efgh",
  "status": "completed",
  "received_at": 1781819747,
  "created_at": 1781819747,
  "progress": 1.0,
  "org_id": "a1b2c3d4e5f6",
  "version": "extract-20260314",
  "data": {
    "extraction": { "exam_date": "2010-05-20", "procedure": "MRI OF THE LUMBAR SPINE WITH AND WITHOUT CONTRAST" },
    "extraction_metadata": { "exam_date": { "references": ["93996806-781d-4404-bfa4-f6e49323a227"], "value": "2010-05-20" } },
    "metadata": { "filename": "markdown-mri-report.md", "duration_ms": 6807, "credit_usage": 1.4, "version": "extract-20260314", "schema_violation_error": null, "warnings": [] }
  },
  "output_url": null,
  "metadata": { "filename": "markdown-mri-report.md", "duration_ms": 8100, "credit_usage": 1.4, "version": "extract-20260314" },
  "failure_reason": null
}

ZDR Requirements

When zero data retention (ZDR) is enabled, you must configure the following parameters so that does not store your content:
  • Pass your Markdown in the markdown_url parameter. You cannot upload a local file with the markdown parameter when ZDR is enabled.
  • Include the output_save_url parameter. This saves the extracted content to your specified URL instead of returning it in the API response.

End-to-End Workflow: Parse a Document and Extract Fields

This tutorial walks you through how to parse a document into Markdown, create an extract job from that Markdown, and retrieve the extracted fields. The script runs all three steps in sequence and polls for the results, so you never copy the job_id by hand. For simplicity, this example uses a short, 2-page PDF and the synchronous Parse API. For large documents, use Parse Jobs to generate the Markdown.

1. Download the Document

Download the sample MRI Report and save it to a local directory.

2. Create the Script

Copy the script below and save it as extract-job.py in the same directory as the PDF.
import requests
import json
import time

headers = {
    # Replace YOUR_API_KEY with your API key
    'Authorization': 'Bearer YOUR_API_KEY'
}

# 1. Parse the document into Markdown
# Replace mri-report.pdf with the path to your document
parse_url = 'https://api.va.landing.ai/v1/ade/parse'
parse_files = {'document': open('mri-report.pdf', 'rb')}
parse_data = {'model': 'dpt-2-latest'}

parse_response = requests.post(parse_url, files=parse_files, data=parse_data, headers=headers)
parse_response.raise_for_status()
markdown = parse_response.json()['markdown']

# Save the Markdown so it can be uploaded to the extract job
with open('markdown-mri-report.md', 'w', encoding='utf-8') as f:
    f.write(markdown)

# Define the extraction schema
schema = json.dumps({
    "type": "object",
    "properties": {
        "exam_date": {
            "description": "The date on which the medical examination or procedure was performed.",
            "format": "YYYY-MM-DD",
            "type": "string"
        },
        "procedure": {
            "description": "The specific medical procedure or examination that was conducted, such as an MRI or X-ray.",
            "type": "string"
        }
    }
})

# 2. Create the extract job
extract_url = 'https://api.va.landing.ai/v1/ade/extract/jobs'
extract_files = {'markdown': open('markdown-mri-report.md', 'rb')}
extract_data = {'schema': schema, 'model': 'extract-latest'}

create_response = requests.post(extract_url, files=extract_files, data=extract_data, headers=headers)
create_response.raise_for_status()
job_id = create_response.json()['job_id']
print(f"Created job: {job_id}")

# 3. Poll the job until it finishes, then save the results
# This loop polls until the job reaches a final status. For production use,
# consider adding a timeout and exponential backoff instead of polling forever.
while True:
    job = requests.get(f'{extract_url}/{job_id}', headers=headers)
    job.raise_for_status()
    result = job.json()
    status = result.get('status')
    print(f"Job status: {status}")

    if status == 'completed':
        if result.get('data'):
            # The extracted fields are returned inline in the data field
            with open('extract_output.json', 'w') as f:
                json.dump(result['data'], f, indent=2)
            print("Results saved to extract_output.json.")
        elif result.get('output_url'):
            # Large results or ZDR jobs return a download URL instead of inline data
            print(f"Download the results from: {result['output_url']}")
        break

    if status in ('failed', 'cancelled'):
        print(f"Job {status}: {result.get('failure_reason')}")
        break

    time.sleep(5)  # wait before checking again

3. Run the Script

Run the script from the same directory:
python extract-job.py

4. View the Results

When the job status is completed, the script saves the extracted fields to extract_output.json. The completed job returns the results in one of two ways:
FieldWhen It’s ReturnedContents
dataThe result is 1 MB or smaller and you did not set an output_save_url.The extracted fields, returned inline. This field follows the same structure as the standard API response. See JSON Response for Extraction.
output_urlThe result is larger than 1 MB, or you set an output_save_url.A temporary URL to download the results. The data field is null, and the URL expires one hour after you request the job.

Library Support

Extract Jobs is not available in the Python or TypeScript libraries. Call the Extract Jobs APIs directly.