Skip to main content
When you parse a document with the API or complete an ADE Parse Job, the parsed data is returned in a structured JSON format.

Response Structure

The response contains the following top-level fields:
  • markdown: Complete Markdown representation of the document.
  • chunks: Array of chunk objects, one for each parsed region.
  • splits: Array of split objects organizing chunks by page or section.
  • grounding: Object mapping chunk IDs to detailed grounding information, which includes the page number and bounding box coordinates.
  • metadata: Processing information (credit usage, duration, filename, job ID, page count, version). For partial content responses, includes a failed_pages array listing page numbers that failed to process.

Parsed Chunks (chunks)

Each chunk object in the chunks array contains:
  • id: Unique identifier (UUID format)
  • markdown: Markdown content for the chunk
  • type: chunk type
  • grounding: Location information with:
    • box: Bounding box coordinates (normalized 0-1 values)
      • left: Left edge position
      • top: Top edge position
      • right: Right edge position
      • bottom: Bottom edge position
    • page: Zero-indexed page number

Grounding Information (grounding)

The grounding object provides additional metadata about HTML elements in the document, keyed by element ID. This includes grounding for chunks, table cells, and other HTML objects. Each entry contains:
  • box: Same bounding box structure as in chunk grounding
  • page: Zero-indexed page number
  • type: Detailed classification (e.g., chunkText, chunkTable, chunkScanCode, tableCell, table)
The keys in the grounding object can be:
  • Chunk IDs: UUID-format IDs (e.g., 7d58c5cf-e4f5-4a7e-ba34-0cd7bc6a6506) for chunks
  • Table cell IDs: Format {page_number}-{base62_sequential_number} (e.g., 0-2, 0-3) for table cells
  • Table IDs: Format {page_number}-{base62_sequential_number} (e.g., 0-1) for entire tables
The type field in the grounding object is more specific than the chunk’s type field. For example, a table chunk might have type table in the chunk object, but its grounding type could be chunkTable or tableCell.

Working with Bounding Box Coordinates

All bounding box coordinates use normalized values between 0 and 1, where:
  • (0, 0) represents the top-left corner of the page
  • (1, 1) represents the bottom-right corner of the page
To convert to pixel coordinates, multiply the normalized values by the image dimensions:
x1 = left * image_width
y1 = top * image_height
x2 = right * image_width
y2 = bottom * image_height
For practical examples of working with bounding box coordinates using the Python library, see:

Example Responses

The response structure varies based on whether you use the split parameter. Below are examples showing both scenarios.

Without Split Parameter

When no split parameter is specified, the splits array is empty:
{
  "markdown": "<complete markdown content>",
  "chunks": [
    {
      "id": "7d58c5cf-e4f5-4a7e-ba34-0cd7bc6a6506",
      "type": "text",
      "markdown": "<a id='7d58c5cf-e4f5-4a7e-ba34-0cd7bc6a6506'></a>\n\nSKU\nWH-2847-BLK",
      "grounding": {
        "page": 0,
        "box": {
          "left": 0.0663,
          "top": 0.0951,
          "right": 0.4665,
          "bottom": 0.2678
        }
      }
    }
  ],
  "splits": [],
  "grounding": {
    "7d58c5cf-e4f5-4a7e-ba34-0cd7bc6a6506": {
      "page": 0,
      "type": "chunkText",
      "box": {
        "left": 0.0663,
        "top": 0.0951,
        "right": 0.4665,
        "bottom": 0.2678
      }
    }
  },
  "metadata": {
    "credit_usage": 3.0,
    "duration_ms": 3806,
    "filename": "document.pdf",
    "job_id": "nqwf8swa1rxo5ad56ykmvvr7m",
    "page_count": 1,
    "version": "dpt-2-20250919"
  }
}

With Split Parameter Set to Page

When the split parameter is set to page, the response includes a populated splits array that organizes chunks by page:
{
  "markdown": "<a id='fb9a5162-54da-4671-ba2c-db8f02007042'></a>\n\nSKU\n**WH-2847-BLK**\n\nQUANTITY\n**48 Units**\n\n[BARCODE]\n2847 000 4812\n\n<!-- PAGE BREAK -->\n\n<a id='7fd7a324-b87a-4dc3-918c-e56ec7265410'></a>\n\nSKU\n**EL-5039-RED**\n\nQUANTITY\n**72 Units**\n\n[BARCODE]\n5039 0000 7216",
  "chunks": [
    {
      "id": "fb9a5162-54da-4671-ba2c-db8f02007042",
      "type": "card",
      "markdown": "<a id='fb9a5162-54da-4671-ba2c-db8f02007042'></a>\n\nSKU\n**WH-2847-BLK**\n\nQUANTITY\n**48 Units**\n\n[BARCODE]\n2847 000 4812",
      "grounding": {
        "page": 0,
        "box": {
          "left": 0.2696,
          "top": 0.3639,
          "right": 0.7316,
          "bottom": 0.6365
        }
      }
    },
    {
      "id": "7fd7a324-b87a-4dc3-918c-e56ec7265410",
      "type": "card",
      "markdown": "<a id='7fd7a324-b87a-4dc3-918c-e56ec7265410'></a>\n\nSKU\n**EL-5039-RED**\n\nQUANTITY\n**72 Units**\n\n[BARCODE]\n5039 0000 7216",
      "grounding": {
        "page": 1,
        "box": {
          "left": 0.2698,
          "top": 0.3639,
          "right": 0.7313,
          "bottom": 0.6361
        }
      }
    }
  ],
  "splits": [
    {
      "class": "page",
      "identifier": "page_0",
      "pages": [0],
      "markdown": "<a id='fb9a5162-54da-4671-ba2c-db8f02007042'></a>\n\nSKU\n**WH-2847-BLK**\n\nQUANTITY\n**48 Units**\n\n[BARCODE]\n2847 000 4812",
      "chunks": ["fb9a5162-54da-4671-ba2c-db8f02007042"]
    },
    {
      "class": "page",
      "identifier": "page_1",
      "pages": [1],
      "markdown": "<a id='7fd7a324-b87a-4dc3-918c-e56ec7265410'></a>\n\nSKU\n**EL-5039-RED**\n\nQUANTITY\n**72 Units**\n\n[BARCODE]\n5039 0000 7216",
      "chunks": ["7fd7a324-b87a-4dc3-918c-e56ec7265410"]
    }
  ],
  "grounding": {
    "fb9a5162-54da-4671-ba2c-db8f02007042": {
      "page": 0,
      "type": "chunkCard",
      "box": {
        "left": 0.2696,
        "top": 0.3639,
        "right": 0.7316,
        "bottom": 0.6365
      }
    },
    "7fd7a324-b87a-4dc3-918c-e56ec7265410": {
      "page": 1,
      "type": "chunkCard",
      "box": {
        "left": 0.2698,
        "top": 0.3639,
        "right": 0.7313,
        "bottom": 0.6361
      }
    }
  },
  "metadata": {
    "filename": "pallet-label-2-pages.pdf",
    "page_count": 2,
    "duration_ms": 24382,
    "credit_usage": 6.0,
    "job_id": "td8wu72tq2g9l9tfgkwn3q3kp",
    "version": "dpt-2-20251103"
  }
}

Legacy Response Format

The legacy API uses a different response format:
{
  "$defs": {
    "Chunk": {
      "description": "An extracted chunk from the document",
      "properties": {
        "text": {
          "description": "A Markdown representation of the chunk (except for tables, which are represented in HTML).",
          "title": "Text",
          "type": "string"
        },
        "grounding": {
          "description": "The specific spatial location(s) of this chunk within the original document. A chunk can have multiple groundings, for example if it is single paragraph split across two columns.",
          "items": {
            "$ref": "#/$defs/ChunkGrounding"
          },
          "title": "Grounding",
          "type": "array"
        },
        "chunk_type": {
          "$ref": "#/$defs/ChunkType",
          "description": "The detected type of the chunk, matching its role within the document."
        },
        "chunk_id": {
          "description": "A UUID for the chunk. This matches UUIDs in the HTML comments in the Markdown output.",
          "title": "Chunk Id",
          "type": "string"
        }
      },
      "required": [
        "text",
        "grounding",
        "chunk_type",
        "chunk_id"
      ],
      "title": "Chunk",
      "type": "object"
    },
    "ChunkGrounding": {
      "description": "Grounding for a chunk, specifying the location within the original document",
      "properties": {
        "box": {
          "$ref": "#/$defs/ChunkGroundingBox",
          "description": "A bounding box (in relative coordinates) establishing the chunk's spatial location within the page."
        },
        "page": {
          "description": "The chunk's 0-indexed page within the original document.",
          "title": "Page",
          "type": "integer"
        }
      },
      "required": [
        "box",
        "page"
      ],
      "title": "ChunkGrounding",
      "type": "object"
    },
    "ChunkGroundingBox": {
      "description": "Bounding box, expressed in relative coordinates (float from 0 to 1)",
      "properties": {
        "l": {
          "title": "L",
          "type": "number"
        },
        "t": {
          "title": "T",
          "type": "number"
        },
        "r": {
          "title": "R",
          "type": "number"
        },
        "b": {
          "title": "B",
          "type": "number"
        }
      },
      "required": [
        "l",
        "t",
        "r",
        "b"
      ],
      "title": "ChunkGroundingBox",
      "type": "object"
    },
    "ChunkType": {
      "description": "Type of the chunk, signifying its role within the document",
      "enum": [
        "title",
        "page_header",
        "page_footer",
        "page_number",
        "key_value",
        "form",
        "table",
        "figure",
        "text"
      ],
      "title": "ChunkType",
      "type": "string"
    }
  },
  "properties": {
    "markdown": {
      "description": "A Markdown representation of the document, potentially with HTML comments at the end of the each chunk. You can use this as context to an LLM.",
      "title": "Markdown",
      "type": "string"
    },
    "chunks": {
      "description": "List of chunks extracted from the document in reading order.",
      "items": {
        "$ref": "#/$defs/Chunk"
      },
      "title": "Chunks",
      "type": "array"
    }
  },
  "required": [
    "markdown",
    "chunks"
  ],
  "title": "APIResponse",
  "type": "object"
}

ParsedDocument Object (Legacy API)

The legacy API returns a ParsedDocument object with the following attributes:
  • markdown: str - Markdown representation of the document
  • chunks: list[Chunk] - List of parsed content chunks, sorted by page index, then the layout of the content on the page
  • start_page_idx: Optional[int] - Starting page index for PDFs
  • end_page_idx: Optional[int] - Ending page index for PDFs
  • doc_type: Literal[“pdf”, “image”] - Type of document

Chunk Object (Legacy API)

In the legacy API response, each extracted element from a document is represented as a chunk object with the following attributes:
  • text: str - Extracted text content
  • grounding: list[Grounding] - List of content locations in document
  • chunk_type: Literal[“text”, “error”] - Type of chunk
  • chunk_id: Optional[str] - ID of the chunk