Response Structure
The response contains the following top-level fields:extraction: The extracted key-value pairs as defined by your schema.extraction_metadata: Metadata showing which chunks were referenced for each extracted field.metadata: Processing information including credit usage, duration, filename, job ID, version, and schema validation errors.
Extracted Data (extraction)
The extraction field contains the structured data extracted from your document, formatted according to your JSON schema. The structure matches your input schema exactly.
For a simple schema:
extraction field returns:
Extraction Metadata (extraction_metadata)
The extraction_metadata field has the same structure as your extraction schema, but each field contains a dictionary with references that lists the HTML element IDs where the data was found.
The references field can contain:
- Chunk IDs: UUID-format IDs (e.g.,
72ba3cca-01e5-407b-9fc4-81f54f9f0c51) that reference entire chunks like text blocks or figures - Table cell IDs: Format
{page_number}-{base62_sequential_number}(e.g.,0-u) when extracted data comes from table cells - Other HTML element IDs: Any ID attribute from HTML elements within the
markdownfields from the parsed output
- Tracing which parts of the document contributed to each extracted field
- Debugging extraction issues
- Building confidence scores or validation logic
- Creating audit trails for extracted data
Simple Schema Metadata
For a simple extraction schema, the metadata includes the value and references for each field. When data is extracted from text chunks, references contain chunk IDs (UUIDs):"0-u" is a table cell ID where 0 indicates page 0 and u is the base62-encoded sequential number for that cell.
Nested Schema Metadata
For nested extraction schemas, the metadata preserves the same nested structure:Processing Metadata (metadata)
The metadata field provides information about the extraction process:
filename: The name of the input fileorg_id: Organization identifierduration_ms: Processing time in millisecondscredit_usage: Number of credits consumedjob_id: Unique job identifierversion: Model version used for extraction. For more information, go to Extraction Model Versions.schema_violation_error: Error message if extracted data doesn’t conform to the input schema (null if the schema is valid). For more information, go to Troubleshoot Extraction.

