- Parse: Converts documents into structured Markdown with hierarchical JSON. Identifies elements like text, tables, and form fields with exact page and coordinate references. Parse understands relationships between elements and works without templates or training.
- Split (Preview): Classifies and separates parsed documents into multiple sub-documents based on document types or sections you define. Useful when processing batched documents containing multiple document types.
- Extract: Pulls specific data fields from parsed documents using schema-based extraction. Supports document classification to extract different data based on document type.
Try Out Agentic Document Extraction
Playground
Just getting started? Test out your documents in our demo app.
Python Library
Use our Python library to build custom scripts.
Features
- Layout-agnostic parsing: Extracts data from complex layouts. No training or templates needed.
- Element detection: Identifies specific elements including text, tables, form fields, checkboxes, and more.
- Understands hierarchical relationships: Detects how elements relate in structure and meaning. For example, can understand that a line of text is the caption for an image.
- Precision extraction: Extracts data accurately, even from complex documents.
- Flexible output: Returns results in Markdown and JSON, ready for use in downstream applications like retrieval-augmented generation (RAG).
- Visual grounding: The JSON output includes the document, page, and coordinate-level references for each element to support traceability, validation, and compliance workflows.
- Supports multiple file types: Can extract data from PDFs and common image formats.

