job_id immediately, and retrieve the results when the job is complete.
Use Extract Jobs for long-running extractions, such as when extracting from long documents or when using large, complex schemas.
Run Parse First
runs on the Markdown output created by Parse, which is required as the first step in all ADE workflows. For large documents, use Parse Jobs to generate the Markdown, then pass that Markdown to Extract Jobs.Monitor Extract Jobs
You can monitor extract jobs with these APIs:- ADE Get Extract Jobs: Get the status for a specific extract job.
- ADE List Extract Jobs: List all extract jobs associated with your API key.
Rate Limits for ADE Extract Jobs
Extract Jobs have their own per-hour rate limit, separate from the other ADE APIs. Each extract job counts as a single submission (one page equivalent) toward this limit, regardless of the size of the Markdown document. To see the rate limits for all APIs, go to Rate Limits.API Reference
To learn more, go to the reference pages for the Extract Jobs APIs:For information about pricing and credits, go to Pricing & Billing.
Workflow Overview
- Create an extraction schema. For schema requirements, see JSON Schema for Extraction.
- Submit the Markdown and schema to the ADE Extract Jobs API.
- Get the
job_idfrom the API response. - Poll the ADE Get Extract Jobs API with the
job_iduntilstatusiscompleted. - Read the extracted fields from the completed job response. For the field structure, see JSON Response for Extraction.
Job Statuses
The ADE Get Extract Jobs API returns the currentstatus of a job:
| Status | Description |
|---|---|
pending | The job is queued and has not started. |
processing | The job is running. The progress field stays 0.0 until the job is completed. |
completed | The job finished. The extracted fields are in the data field, or in the output_url field for large results. |
failed | The job did not finish. See the failure_reason field for details. |
cancelled | The job was cancelled. |
Extract Job Response
When the ADE Get Extract Jobs API returns a job, the response wraps the extraction results with job-level fields:| Field | Description |
|---|---|
job_id | The unique identifier for the job. |
status | The current state of the job. See Job Statuses. |
received_at | Unix timestamp (in seconds) for when the job was received. |
created_at | Unix timestamp (in seconds) for when the job was created. |
progress | Either 0.0 (not yet complete) or 1.0 (complete). |
org_id | The organization ID associated with the job. |
version | The model snapshot used for the extraction. |
data | The extraction results, returned when the job is complete and you did not set an output_save_url. This object follows the same structure as the standard API response, including the extraction metadata (with schema_violation_error, warnings, and fallback_model_version). See JSON Response for Extraction. |
output_url | A URL to download the extraction results, returned when the result is larger than 1 MB or you set an output_save_url. When output_url is present, data is null. |
metadata | Job-level metadata summarizing the job, such as filename, duration_ms, credit_usage, and version. |
failure_reason | If the job failed, a message describing what went wrong. Otherwise, null. |
ZDR Requirements
When zero data retention (ZDR) is enabled, you must configure the following parameters so that does not store your content:- Pass your Markdown in the
markdown_urlparameter. You cannot upload a local file with themarkdownparameter when ZDR is enabled. - Include the
output_save_urlparameter. This saves the extracted content to your specified URL instead of returning it in the API response.
End-to-End Workflow: Parse a Document and Extract Fields
This tutorial walks you through how to parse a document into Markdown, create an extract job from that Markdown, and retrieve the extracted fields. The script runs all three steps in sequence and polls for the results, so you never copy thejob_id by hand.
For simplicity, this example uses a short, 2-page PDF and the synchronous Parse API. For large documents, use Parse Jobs to generate the Markdown.
1. Download the Document
Download the sample MRI Report and save it to a local directory.2. Create the Script
Copy the script below and save it asextract-job.py in the same directory as the PDF.
3. Run the Script
Run the script from the same directory:4. View the Results
When the job status iscompleted, the script saves the extracted fields to extract_output.json. The completed job returns the results in one of two ways:
| Field | When It’s Returned | Contents |
|---|---|---|
data | The result is 1 MB or smaller and you did not set an output_save_url. | The extracted fields, returned inline. This field follows the same structure as the standard API response. See JSON Response for Extraction. |
output_url | The result is larger than 1 MB, or you set an output_save_url. | A temporary URL to download the results. The data field is null, and the URL expires one hour after you request the job. |