Skip to main content
When you extract structured data with the API, the extracted data and metadata are returned in a structured JSON format.

Response Structure

The response contains the following top-level fields:
  • extraction: The extracted key-value pairs as defined by your schema.
  • extraction_metadata: Metadata showing which chunks were referenced for each extracted field.
  • metadata: Processing information including credit usage, duration, filename, job ID, version, and schema validation errors.

Extracted Data (extraction)

The extraction field contains the structured data extracted from your document, formatted according to your JSON schema. The structure matches your input schema exactly. For a simple schema:
{
  "type": "object",
  "properties": {
    "employee_name": {
      "type": "string",
      "description": "The employee's full name"
    },
    "employee_ssn": {
      "type": "string",
      "description": "The employee's Social Security Number"
    },
    "gross_pay": {
      "type": "number",
      "description": "The gross pay amount"
    }
  },
  "required": ["employee_name", "employee_ssn", "gross_pay"]
}
The extraction field returns:
{
  "employee_name": "MICHAEL D BRYAN",
  "employee_ssn": "555-50-1234",
  "gross_pay": 6000.00
}

Extraction Metadata (extraction_metadata)

The extraction_metadata field has the same structure as your extraction schema, but each field contains a dictionary with references that lists the HTML element IDs where the data was found. The references field can contain:
  • Chunk IDs: UUID-format IDs (e.g., 72ba3cca-01e5-407b-9fc4-81f54f9f0c51) that reference entire chunks like text blocks or figures
  • Table cell IDs: Format {page_number}-{base62_sequential_number} (e.g., 0-u) when extracted data comes from table cells
  • Other HTML element IDs: Any ID attribute from HTML elements within the markdown fields from the parsed output
This metadata is useful for:
  • Tracing which parts of the document contributed to each extracted field
  • Debugging extraction issues
  • Building confidence scores or validation logic
  • Creating audit trails for extracted data

Simple Schema Metadata

For a simple extraction schema, the metadata includes the value and references for each field. When data is extracted from text chunks, references contain chunk IDs (UUIDs):
{
  "employee_name": {
    "value": "MICHAEL D BRYAN",
    "references": [
      "72ba3cca-01e5-407b-9fc4-81f54f9f0c51"
    ]
  },
  "employee_ssn": {
    "value": "555-50-1234",
    "references": [
      "a3f5d8c9-2b4e-4a1c-8f7e-9d6c5b4a3e2f"
    ]
  }
}
When data is extracted from table cells, references contain table cell IDs:
{
  "employee_name": {
    "value": "JANE HARPER",
    "references": [
      "75a62de4-5120-44bf-a6dd-b2aa63db18c6"
    ]
  },
  "gross_pay": {
    "value": 452.43,
    "references": [
      "0-u"
    ]
  }
}
In this example, "0-u" is a table cell ID where 0 indicates page 0 and u is the base62-encoded sequential number for that cell.

Nested Schema Metadata

For nested extraction schemas, the metadata preserves the same nested structure:
{
  "patient_details": {
    "patient_name": {
      "value": "John Smith",
      "references": [
        "72ba3cca-01e5-407b-9fc4-81f54f9f0c51"
      ]
    },
    "date": {
      "value": "2024-01-15",
      "references": [
        "72ba3cca-01e5-407b-9fc4-81f54f9f0c51"
      ]
    }
  },
  "emergency_contact_information": {
    "emergency_contact_name": {
      "value": "Jane Smith",
      "references": [
        "5b8865b9-1a81-46df-bcf7-0bdbed9130dc"
      ]
    },
    "relationship_to_patient": {
      "value": "Spouse",
      "references": [
        "5b8865b9-1a81-46df-bcf7-0bdbed9130dc"
      ]
    }
  }
}

Processing Metadata (metadata)

The metadata field provides information about the extraction process:
  • filename: The name of the input file
  • org_id: Organization identifier
  • duration_ms: Processing time in milliseconds
  • credit_usage: Number of credits consumed
  • job_id: Unique job identifier
  • version: Model version used for extraction. For more information, go to Extraction Model Versions.
  • schema_violation_error: Error message if extracted data doesn’t conform to the input schema (null if the schema is valid). For more information, go to Troubleshoot Extraction.

Example Response

Here is a complete example showing the extraction response structure:
{
  "extraction": {
    "employee_name": "MICHAEL D BRYAN",
    "employee_ssn": "555-50-1234",
    "gross_pay": 6000.00
  },
  "extraction_metadata": {
    "employee_name": {
      "value": "MICHAEL D BRYAN",
      "references": [
        "72ba3cca-01e5-407b-9fc4-81f54f9f0c51"
      ]
    },
    "employee_ssn": {
      "value": "555-50-1234",
      "references": [
        "a3f5d8c9-2b4e-4a1c-8f7e-9d6c5b4a3e2f"
      ]
    },
    "gross_pay": {
      "value": "$6,000.00",
      "references": [
        "5b8865b9-1a81-46df-bcf7-0bdbed9130dc"
      ]
    }
  },
  "metadata": {
    "filename": "pay-stub.md",
    "org_id": "org_abc123",
    "duration_ms": 1523,
    "credit_usage": 1.0,
    "job_id": "extract_xyz789",
    "version": "extract-20250115",
    "schema_violation_error": null
  }
}