Updates and improvements to Agentic Document Extraction.
ParseConfig
class for the parse
function. This allows you to pass multiple settings (like api_key
, include_marginalia
, and extraction_model
) in a single ParseConfig
object.For detailed information, go to Pass Settings with ParseConfig.You can now pass settings, like the API key, to the parse
function using the new ParseConfig
class.agentic_doc.config.settings
will be deprecated in a future release. Configure settings with ParseConfig
instead.parse
function now supports raw bytes from PDF and image files.For more information, go to Sample Script: Parse Files from Bytes.parse
. chunk types.The parse
function allows you to parse multiple documents, and supports loading documents from Amazon S3 buckets, Google Drive, and other locations by using the connectors
module.To use the new parse
function and the `connectors module, upgrade the library to v0.2.3.The orginal parsing functions will continue to work, but we recommending using parse
for new projects.table
, figure
, marginalia
, and text
.These chunk types were consolidated into marginalia
:page_header
page_footer
page_number
text
:title
form
key_value
marginalia
type doesn’t exist and will fallback to page_header
.page_header
:page_header
page_footer
page_number
form
:form
key_value