Changelog
New updates and improvements to Agentic Document Extraction
Load Bytes
In addition to supporting PDFs and images, the parse
function now supports raw bytes from PDF and image files.
For more information, go to Sample Script: Parse Files from Bytes.
Consolidated Parsing Function
We released library v0.2.3, which includes a new parsing function: parse
. chunk types.
The parse
function allows you to parse multiple documents, and supports loading documents from Amazon S3 buckets, Google Drive, and other locations by using the connectors
module.
To use the new parse
function and the `connectors module, upgrade the library to v0.2.3.
The orginal parsing functions will continue to work, but we recommending using parse
for new projects.
Consolidated Chunk Types
We released library v0.2.1, which includes consolidated chunk types.
The library now has the following chunk types: table
, figure
, marginalia
, and text
.
These chunk types were consolidated into marginalia
:
page_header
page_footer
page_number
These chunk types were consolidated into text
:
title
form
key_value
Action Required When Using Library
If you use the library and your scripts or workflows use any of the deprecated chunk types, update your code to use the new types.
How the library handles the deprecated chunk types depends on the version you’re using:
- Upgrade to v0.2.1 to use the new chunk types.
- If using v0.0.13 to v0.1.3, the
marginalia
type doesn’t exist and will fallback topage_header
. - If using v0.0.12 or earlier, the code will NOT work after May 22.
Action Required When Calling the API Directly
If you call the API directly and your scripts or workflows use any of the deprecated chunk types, update your code to use the new types.
We are making these same changes (consolidating the chunk types) to the API on Thursday, May 22.
Starting May 22, the API will stop using the deprecated types in the response. If your code uses the deprecated chunk types, the code will no longer work.
Improved Accuracy
now delivers higher accuracy when extracting data from complex tables and multi-column layouts.
Increased Processing Speed
is now significantly faster than before, so you can process thousands of pages per minute.
Process Longer Pages
We’ve increased our page limits, so that you can process longer documents.
For more information, go to Rate Limits.
Zero Data Retention
Users on the Custom plan can enable a zero data retention policy, ensuring all data is deleted immediately after processing—supporting strict privacy and compliance requirements.
For more information, contact us.
Consolidated Chunk Types
We consolidated these chunk types into page_header
:
page_header
page_footer
page_number
We consolidated these chunk types into form
:
form
key_value
For more information, go to Chunk Types.