Skip to main content

Overview

Use the API to pull specific data fields from parsed documents. You define a schema that specifies which fields to extract, and returns their values in a structured, predictable format. is designed for high-volume, repeatable workflows: use it when you need to retrieve the same set of fields from many documents, such as pulling invoice totals, contract dates, or form field values. Results are consistent across documents with varying layouts because extraction is schema-driven.

Extraction Capabilities

The following capabilities are available with model extract-20260314 or later.
  • Unlimited schema size: No limits on the number of fields, nested levels, or characters in a schema.
  • Semantic field matching: Use the x-alternativeNames keyword to define alternative labels for a field. The model maps fields by meaning, so fields with different names across documents resolve to the same schema field.
  • Consistent formatting: Use the format keyword to specify how extracted values should be formatted.
  • Improved handling of large content: Better extraction from large tables, large arrays, and long documents.
  • Cross-page table reconstruction: Tables that span page breaks are returned as a single array, with no post-processing needed.
For schema-building capabilities including master schemas and schema drift detection, see Build Extraction Schemas with the API.

Run Parse First

runs after Parse, which is required as the first step in all ADE workflows. It can also follow Split if you’re working with multi-document files.

Get Started: Extraction Workflow

You can use the schema extraction wizard directly in our Playground to build and validate an extraction schema. The Playground generates scripts that you can then copy and use in your own code:
  1. Use the schema extraction wizard in our Playground to build a schema tailored to your documents. Build a Schema with the Wizard
  2. Copy the script for the method you plan on using: the library or the API. Export the Relevant Format
  3. Paste the script into your code.

Use the ADE Extract API to Extract Fields from Markdown

Use the API to extract data from the Markdown output created by the API. See the full API reference here.

Specify Documents to Run Extraction On

The API offers two parameters for specifying the document you want to extract from:
  • markdown: Specify the actual Markdown file you want to run extraction on.
  • markdown_url: Include the URL to the Markdown file you want to run extraction on.

Set the Extraction Schema

Set the extraction schema in the schema parameter. The schema must meet specific format and property requirements. For detailed guidance, see JSON Schema for Extraction.

Set the strict Parameter

Use the optional strict parameter to control how the API handles schemas that include keywords that cause errors.
  • If strict is false: the API continues processing and returns a 206 (Partial Content).
  • If strict is true: the API stops processing and returns a 422 (Unprocessable Entity).
In both cases, the API returns 422 if the schema fails validation, and 206 if the extracted output does not conform to the schema after extraction completes.

Extracted Output

For details about the extraction response structure and fields, see JSON Response for Extraction.

Run Extract with Our Libraries

Click one of the tiles below to learn how to run the API with our libraries.

Python Library

Run Extract with our Python library.
https://mintcdn.com/landingaitest/9admv5znHgUFfVyj/images/ts-logo-512-green.svg?fit=max&auto=format&n=9admv5znHgUFfVyj&q=85&s=37d2005241bb43dec1aa8716782a7508

TypeScript Library

Run Extract with our TypeScript library.

Use Parse Markdown for Best Results

The API is optimized for Markdown generated by the API. The parsed output includes element IDs, anchor tags, chunk tags, and other metadata that uses during the extraction process. can also process generic Markdown files or edited Parse Markdown, but results may be less accurate. For best results:
  • Use only Markdown output from the API, not generic Markdown files.
  • Do not edit the Markdown from before passing it to .