This article is about the legacy agentic-doc library. Use the landingai-ade library for all new projects.
parse
function in the agentic-doc library accepts a ParseConfig
object to simplify and centralize configuration. The ParseConfig
class is available in agentic-doc v0.3.0 and later.
ParseConfig
is a configuration class you can use to group optional settings for the parse
function in the agentic-doc library. Instead of passing multiple parameters individually, you can pass a single ParseConfig
object that holds multiple settings.
We recommend using ParseConfig
to configure settings for the Parse
function, instead of passing the individual parameters to the function.
Use Cases
UseParseConfig
when you want to:
- group multiple parsing options into a single object.
- reuse the same settings across multiple
parse
calls. - control parsing behavior based on user input or environment.
Basic Usage
To useParseConfig
, define your settings inside a ParseConfig
object and pass that object to the parse
function.
ParseConfig Parameters
TheParseConfig
class accepts several optional parameters that control the behavior of the parse
function. You can include only the settings you need.
Here is the ParseConfig
definition:
api_key
Theapi_key
parameter sets the API key for your account. For more information, go to API Key.
include_marginalia
IfTrue
, includes marginalia
chunks (text in the header, footer, and margins) in the output. For more information, go to Chunk Types.
The default value is True
.
include_metadata_in_markdown
IfTrue
, includes metadata in the Markdown output.
The default value is True
.
extraction_model
Enter the Pydantic model schema for field extraction. For more information about extraction, go to Extract Data with the Library.extraction_schema
Enter the JSON schema for field extraction. For more information about extraction, go to Extract Data with the Library.split_size
When you run the parse function without field extraction, splits each document into smaller “chunks” to speed up processing. Thesplit_size
parameter controls how many pages are included in each chunk.
- Range: 1 to 100 pages (inclusive)
- Default: 10 pages
Field extraction processes the document as a whole, so the
split_size
parameter is not used when running field extraction.extraction_split_size
When you run the parse function with field extraction, processes the document as a whole. In other words, the document is not split into smaller chunks. Theextraction_split_size
parameter controls the maximum number of pages the document can have when running parsing with field extraction.
- Range: 1 to 50 pages (inclusive)
- Default: 50 pages
enable_rotation_detection
Theenable_rotation_detection
parameter controls whether or not detects and “corrects” rotated documents.
When the parameter is set to True
, detects if pages are rotated and automatically corrects text and table chunks for better extraction accuracy.
The default value is False
.
Rotation detection is available in the agentic-doc library v0.3.3 and later.
Settings Precedence
We recommend usingParseConfig
to configure settings for the Parse
function, instead of passing the individual parameters to the function. However, if you configure one setting using both methods, values passed directly to the parse
function take precedence over values passed in the ParseConfig
object.
For example, the result of the script below will include marginalia, because the direct argument takes precedence over the value passed in the ParseConfig
object.