> ## Documentation Index
> Fetch the complete documentation index at: https://docs.landing.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Parse Large Files (Parse Jobs)

> Parse large documents asynchronously with Parse Jobs, and monitor job status and results.

export const adePythonLibrary = 'ade-python';

export const dpt2 = 'DPT-2';

export const dpt1 = 'DPT-1';

export const dpt = 'Document Pre-Trained Transformer';

export const companyName = 'LandingAI';

export const extract = 'ADE Extract';

export const parse = 'ADE Parse';

export const ade = 'Agentic Document Extraction';

The [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) API enables you to parse large documents that exceed the size limits of the standard ADE Parse API.

## Monitor Parse Jobs

You can monitor parse jobs with these APIs:

* [ADE Get Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-get-parse-jobs): Get the status for a specific parse job.
* [ADE List Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-list-parse-jobs): List all parse jobs associated with your API key.

## Rate Limits for ADE Parse Jobs

The ADE Parse Jobs API allows you to parse large documents. The following table shows the limits for the ADE Parse Jobs API.

| Maximum File Size | Maximum Pages |
| ----------------- | ------------- |
| 1 GB              | 6,000 pages   |

To see the rate limits for all APIs, go to [Rate Limits](./ade-rate-limits).

## API Reference

To learn more, go to the reference pages for the Parse Jobs APIs:

* [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs)
* [ADE Get Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-get-parse-jobs)
* [ADE List Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-list-parse-jobs)

## Save Parsed Output to a URL

When calling the [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) endpoint, you can use the `output_save_url` parameter to save the parsed Markdown to a specified URL instead of returning it in the API response. This is useful for managing large documents, integrating with your existing storage workflow, or complying with data retention policies.

### When Parsed Output Is Saved to a URL

The parsed Markdown is saved to a URL in these scenarios:

* **You specify the `output_save_url` parameter**: The Markdown is saved to your specified URL.
* **The parsed Markdown exceeds 1 MB**: The Markdown is automatically saved to a presigned S3 URL generated by {ade}. The URL expires after 1 hour, but time you call the [ADE Get Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-get-parse-jobs) API, a new presigned URL is generated.

Behavior when the output is saved to a URL:

* The `output_url` field in the API response contains the URL where the Markdown is stored.
* The `data` field in the API response is `None`.

### URL Requirements

If you specify the `output_save_url` parameter, your URL must meet these requirements:

* The URL must be a presigned URL that grants write (PUT) access to a single object.
* These storage provider methods are tested:
  * [Amazon S3 presigned URL](https://docs.aws.amazon.com/AmazonS3/latest/userguide/PresignedUrlUploadObject.html)
  * [Azure Blob Storage shared access signature (SAS)](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview)
  * [Google Cloud Storage signed URL](https://docs.cloud.google.com/storage/docs/access-control/signed-urls)
* Other providers that support presigned URLs may also work, but are not tested.
* To maintain security, do not use a publicly accessible URL or expose a storage bucket.
* The API cannot access private URLs, such as folders in Google Drive.

### Example: Use Amazon S3 Presigned URLs

If you use Amazon S3, you can generate a presigned URL and provide it as the `output_save_url` value. Presigned URLs grant temporary access to a single object in your S3 bucket without requiring authentication in the API request.

For more information about presigned URLs with Amazon S3, go to the [Amazon documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/PresignedUrlUploadObject.html).

Here is a sample script that creates a presigned URL and uses it for a parsing job:

```python theme={null}
import requests
import json

headers = {
    'Authorization': 'Bearer YOUR_API_KEY'
}

url = 'https://api.va.landing.ai/v1/ade/parse/jobs'

# Prepare the request payload
output_save_url = generate_presigned_url(...)
files = {'document_url': 'https://...', 'output_save_url': output_save_url}

response = requests.post(url, files=files, headers=headers)
print(response.json())
```

## ZDR Requirements

When [zero data retention](./zdr) (ZDR) is enabled, you must configure the following parameters to ensure that {ade} does not retain the document content:

* **Pass your document as a presigned read URL in the `document_url` parameter**. You cannot use the `document` parameter with ZDR enabled.
* **Include the `output_save_url` parameter**. This ensures that the parsed content is saved to your specified URL instead of being returned in the API response. To learn how to configure this parameter, go to [Save Parsed Output to a URL](#save-parsed-output-to-a-url).

## Workflow Overview

1. Parse a document with the [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) API.
2. Copy the `job_id` in the API response.
3. To get results from the parsing job, call the [ADE Get Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-get-parse-jobs) API with the `job_id`.
4. The parsed content is returned as Markdown in `data.markdown`, or as a URL in `output_url` (in which case `data` is `None`). For more information, go to [Save Parsed Output to a URL](#save-parsed-output-to-a-url).
5. If you need to extract fields:
   1. Create an extraction schema.
   2. Send the Markdown to the [{extract}](https://docs.landing.ai/api-reference/tools/ade-extract) API.

## End-to-End Workflow: Parse and Extract the Output

This tutorial walks you through how to parse a document with the ADE Parse Jobs API and then extract a subset of fields from it using the {extract} API.

For the sake of simplicity, we use a 2-page PDF in this example, with the intention that you will use larger documents in your own use case.

We provide a separate script for each endpoint, so you can choose to skip the extraction steps if you don't need them.

In this tutorial, we will:

* Parse this PDF: <a href="/examples/mri-report.pdf" download="mri-report.pdf">MRI Report</a>
* Extract these fields: **Exam Date** and **Procedure**

### 1. Download the Document to Process

Download the <a href="/examples/mri-report.pdf" download="mri-report.pdf">MRI Report</a> and save it to a local directory.

### 2. Create Parse Job & Get Job ID

#### Create the Script

Copy the script below and save it as `create-parse-job.py` in the same directory as the PDF.

```python [expandable] theme={null}
import requests
import json

headers = {
    'Authorization': 'Bearer YOUR_API_KEY'
}

url = 'https://api.va.landing.ai/v1/ade/parse/jobs'

# Upload a document
document = open('mri-report.pdf', 'rb')
files = {'document': document}
data = {'model': 'dpt-2-latest'}

response = requests.post(url, files=files, data=data, headers=headers)
print(response.json())
```

#### Run the Script

Run the script from the same directory:

```bash theme={null}
python create-parse-job.py
```

This returns the `job_id`:

```json theme={null}
{'job_id': 'cmfx34ewm0000hyoqkh9dzd8n'}
```

### 3. Use job\_id to Get Parsing Results

#### Create the Script

Copy the script below and save it as `get-parse-results.py` in the same directory as the PDF. Replace `{jobId}` with the `job_id` from the previous step.

```python [expandable] theme={null}
import requests

headers = {
    'Authorization': 'Bearer YOUR_API_KEY'
}
url = f'https://api.va.landing.ai/v1/ade/parse/jobs/{jobId}'

response = requests.get(url, headers=headers)
response_data = response.json()

# Print the full response
print(response_data)

# Check if job is completed
if response_data.get('status') == 'completed':
    # Check if markdown content is available in data
    if 'data' in response_data and response_data['data'].get('markdown'):
        markdown_content = response_data['data']['markdown']
        
        # Save markdown content to file
        with open('markdown-mri-report.md', 'w', encoding='utf-8') as f:
            f.write(markdown_content)
        
        print("\nMarkdown content saved to a Markdown file.")
    
    # Check if output_url is available instead
    elif response_data.get('output_url'):
        print("Use the Markdown file specified in `output_url`.")
    
    else:
        print("No Markdown content or `output_url` found in the completed job response.")
else:
    print(f"\nJob status: {response_data.get('status', 'unknown')}.")
```

#### Run the Script

Run the script from the same directory:

```bash theme={null}
python get-parse-results.py
```

When parsing is complete, the script saves the output to `markdown-mri-report.md`. You will pass this file to the Extract API in the next step.

### 4. Extract Fields from Markdown

Now that we have the parsed output in a Markdown file, we're ready to extract these fields: **Exam Date** and **Procedure**.

#### Create the Script

Copy the script below and save it as `extract-mri-report.py` in the same directory as the Markdown file.

```python [expandable] theme={null}
import requests
import json

headers = {
    'Authorization': 'Bearer YOUR_API_KEY'
}

url = 'https://api.va.landing.ai/v1/ade/extract'

# Define the extraction schema
schema = json.dumps({
    "type": "object",
    "properties": {
        "exam_date": {
            "description": "The date on which the medical examination or procedure was performed.",
            "format": "YYYY-MM-DD",
            "x-alternativeNames": [
                "Exam Date",
                "Date of Exam",
                "Examination Date"
            ],
            "type": "string"
        },
        "procedure": {
            "description": "The specific medical procedure or examination that was conducted, such as an MRI or X-ray.",
            "x-alternativeNames": [
                "Procedure",
                "Medical Procedure",
                "Performed Procedure"
            ],
            "type": "string"
        }
    }
})

# Prepare files and data
files = {'markdown': open('markdown-mri-report.md', 'rb')}
data = {'schema': schema, 'model': 'extract-latest'}

# Run extraction
response = requests.post(url, files=files, data=data, headers=headers)

# Save the results to a JSON file
with open('mri-report_extract_output.json', 'w') as f:
    json.dump(response.json(), f, indent=2)
```

#### Run the Script

Run the script from the same directory:

```bash theme={null}
python extract-mri-report.py
```

#### View the Output

The results are saved to `mri-report_extract_output.json`. The file includes the extracted fields and metadata:

```json [expandable] theme={null}
{
  "extraction": {
    "exam_date": "2010-05-20",
    "procedure": "MRI OF THE LUMBAR SPINE WITH AND WITHOUT CONTRAST"
  },
  "extraction_metadata": {
    "exam_date": {
      "references": [
        "b5183837-035c-4a54-b324-a4c9e8a68027",
        "1-3"
      ],
      "value": "2010-05-20"
    },
    "procedure": {
      "references": [
        "56340fbe-8a51-46a0-a309-d91abd9b8b00"
      ],
      "value": "MRI OF THE LUMBAR SPINE WITH AND WITHOUT CONTRAST"
    }
  },
  "metadata": {
    "filename": "markdown-mri-report.md",
    "org_id": null,
    "duration_ms": 15080,
    "credit_usage": 1.3256,
    "job_id": "dfef14d3b66045b8abaf39788b9d17e8",
    "version": "extract-20260314",
    "schema_violation_error": null,
    "fallback_model_version": null,
    "warnings": []
  }
}
```

## Run Parse Jobs with Our Libraries

Click one of the tiles below to learn how to run Parse Jobs with our libraries.

<CardGroup cols={2}>
  <Card title="Python Library" icon="python" href="./ade-python#parse-jobs">
    Run Parse Jobs with our Python library.
  </Card>

  <Card
    title="TypeScript Library"
    icon={
  <>
    <img className="block dark:hidden" src="https://mintcdn.com/landingaitest/zBBru77Y9z-WA7GH/images/ts-logo-512-03221Dxcf.png?fit=max&auto=format&n=zBBru77Y9z-WA7GH&q=85&s=e41836c55303fe6a82d458a5bdaa2991" alt="TypeScript" style={{ width: "1.5rem", height: "1.5rem", margin: 0 }} />
    <img className="hidden dark:block" src="https://mintcdn.com/landingaitest/zBBru77Y9z-WA7GH/images/ts-logo-512-DBFF9B.png?fit=max&auto=format&n=zBBru77Y9z-WA7GH&q=85&s=27ddad7064b681cea3f5da0aea4c6be6" alt="TypeScript" style={{ width: "1.5rem", height: "1.5rem", margin: 0 }} />
  </>
}
    href="https://docs.landing.ai/ade/ade-typescript#parse-jobs"
  >
    Run Parse Jobs with our TypeScript library.
  </Card>
</CardGroup>