Skip to main content
An extraction model powers the field extraction capabilities of the API. It analyzes your Markdown content and extracts structured data according to your JSON schema. You can specify a model when calling the API directly or when using the library. If you don’t specify a model, extract-20250930 is used by default. The newer model extract-20251024 is available for testing and will become the default soon. Different model versions have different capabilities and JSON schema requirements. For information about creating JSON schemas for extraction, go to Extraction Schema (JSON).

extract-20251024

Model extract-20251024 offers improved extraction capabilities. Model extract-20251024 provides:
  • Better support for field extraction for large arrays. For example, the API can better extract data from multi-page tables.
  • More deterministic outputs.
  • Consistent handling of missing fields (returns null for all missing values).
  • Improved accuracy for complex fields.
  • Enhanced support for large documents. The model can reliably process 20+ pages of Markdown content.
Extraction model extract-20251024 has different JSON schema requirements than the previous model. Learn about all schema requirements in Extraction Schema (JSON).

Migrate to extract-20251024

If you are migrating from extract-20250930 to extract-20251024, follow this checklist to prepare your JSON schema:
  1. Review keyword usage: Model extract-20251024 only supports specific JSON Schema keywords. Review your schema and remove or replace any unsupported keywords. For details, go to Keyword Support.
  2. Update nullable fields: Model extract-20251024 uses the nullable keyword instead of type arrays with null. Update your schema accordingly. For details, go to Nullable Fields.
  3. Update enum data types: Model extract-20251024 only supports string enums. If your schema uses enums with other data types, the extraction request will fail. For details, go to Restrict Values with Enum.
  4. Simplify complex schemas: If the API determines that your JSON schema is too complex, it will fall back to extract-20250930. For guidance on reducing complexity, go to Reduce JSON Schema Complexity.
  5. Update code to handle null values: Model extract-20251024 returns null for missing fields, even if they’re marked as required. Ensure your downstream code handles null values appropriately. For details, go to Missing Fields.
  6. Understand partial results: If extracted data doesn’t match your schema, the API returns a 206 status with partial results. For details, go to Schema Validation.

Set the Model in the API

When calling the endpoint, you can set the model using the model parameter. If you omit the model parameter, the API uses the latest model. This example shows how to specify a model:
curl -X POST 'https://api.va.landing.ai/v1/ade/extract' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -F 'schema=@{"type": "object", "properties": {"field1": {"type": "string"}, "field2": {"type": "string"}}, "required": ["field1", "field2"]}' \
  -F 'markdown=@markdown.md' \
  -F 'model=extract-20251024'

Model Versions

The following table lists the available model values for the API:
Model ValuesDescription
extract-20250930Use the model released on September 30, 2025.
extract-20251024Use the model released on October 24, 2025.
extract-latestUse the latest extraction model.

Why Model Versioning Matters

When integrating the API, you have two options for specifying the model:
  1. Use extract-latest to always get the newest version. This automatically gives you improvements and updates, but extraction results may change when new model versions are released.
  2. Use a specific version (like extract-20251024) to pin to an exact model version. This ensures consistent extraction results over time, but you won’t receive improvements.