Parsing
JSON Response
When you parse a document with the API, the extracted data is returned in a hierarchical JSON format that follows the schema below.
ParsedDocument
Represents a parsed document with the following attributes:
markdown
: str - Markdown representation of the documentchunks
: list[Chunk] - List of parsed content chunks, sorted by page index, then the layout of the content on the pagestart_page_idx
: Optional[int] - Starting page index for PDFsend_page_idx
: Optional[int] - Ending page index for PDFsdoc_type
: Literal[“pdf”, “image”] - Type of document
Chunk
Each extracted element from a document is represented as a chunk
in the JSON response. Each chunk
has the following attributes:
text
: str - Extracted text contentgrounding
: list[Grounding] - List of content locations in documentchunk_type
: Literal[“text”, “error”] - Type of chunkchunk_id
: Optional[str] - ID of the chunk