Skip to main content
This article is about the legacy agentic-doc library. Use the landingai-ade library for all new projects.
The parse function in the agentic-doc library accepts a ParseConfig object to simplify and centralize configuration. The ParseConfig class is available in agentic-doc v0.3.0 and later. ParseConfig is a configuration class you can use to group optional settings for the parse function in the agentic-doc library. Instead of passing multiple parameters individually, you can pass a single ParseConfig object that holds multiple settings. We recommend using ParseConfig to configure settings for the Parse function, instead of passing the individual parameters to the function.

Use Cases

Use ParseConfig when you want to:
  • group multiple parsing options into a single object.
  • reuse the same settings across multiple parse calls.
  • control parsing behavior based on user input or environment.

Basic Usage

To use ParseConfig, define your settings inside a ParseConfig object and pass that object to the parse function.
from agentic_doc.parse import parse
from agentic_doc.config import ParseConfig

# Set up a configuration object
config = ParseConfig(
    api_key="your-api-key",  
    include_marginalia=False,
    include_metadata_in_markdown=True,
    split_size=5,
)

# Pass the configuration object and parse the file using those settings
results = parse("path/to/file.pdf", config=config)

ParseConfig Parameters

The ParseConfig class accepts several optional parameters that control the behavior of the parse function. You can include only the settings you need. Here is the ParseConfig definition:
class ParseConfig:
    def __init__(
        self,
        api_key: Optional[str] = None,
        include_marginalia: Optional[bool] = None,
        include_metadata_in_markdown: Optional[bool] = None,
        extraction_model: Optional[type[T]] = None,
        extraction_schema: Optional[dict[str, Any]] = None,
        split_size: Optional[int] = None,
        extraction_split_size: Optional[int] = None,
        enable_rotation_detection: Optional[bool] = None,
    ) -> None:
        self.api_key = api_key
        self.include_marginalia = include_marginalia
        self.include_metadata_in_markdown = include_metadata_in_markdown
        self.extraction_model = extraction_model
        self.extraction_schema = extraction_schema
        self.split_size = split_size
        self.extraction_split_size = extraction_split_size
        self.enable_rotation_detection = enable_rotation_detection

api_key

The api_key parameter sets the API key for your account. For more information, go to API Key.

include_marginalia

If True, includes marginalia chunks (text in the header, footer, and margins) in the output. For more information, go to Chunk Types. The default value is True.

include_metadata_in_markdown

If True, includes metadata in the Markdown output. The default value is True.

extraction_model

Enter the Pydantic model schema for field extraction. For more information about extraction, go to Extract Data with the Library.

extraction_schema

Enter the JSON schema for field extraction. For more information about extraction, go to Extract Data with the Library.

split_size

When you run the parse function without field extraction, splits each document into smaller “chunks” to speed up processing. The split_size parameter controls how many pages are included in each chunk.
  • Range: 1 to 100 pages (inclusive)
  • Default: 10 pages
Field extraction processes the document as a whole, so the split_size parameter is not used when running field extraction.

extraction_split_size

When you run the parse function with field extraction, processes the document as a whole. In other words, the document is not split into smaller chunks. The extraction_split_size parameter controls the maximum number of pages the document can have when running parsing with field extraction.
  • Range: 1 to 50 pages (inclusive)
  • Default: 50 pages

enable_rotation_detection

The enable_rotation_detection parameter controls whether or not detects and “corrects” rotated documents. When the parameter is set to True, detects if pages are rotated and automatically corrects text and table chunks for better extraction accuracy. The default value is False.
Rotation detection is available in the agentic-doc library v0.3.3 and later.

Settings Precedence

We recommend using ParseConfig to configure settings for the Parse function, instead of passing the individual parameters to the function. However, if you configure one setting using both methods, values passed directly to the parse function take precedence over values passed in the ParseConfig object. For example, the result of the script below will include marginalia, because the direct argument takes precedence over the value passed in the ParseConfig object.
from agentic_doc.parse import parse
from agentic_doc.config import ParseConfig

# Set up a configuration object
config = ParseConfig(
    include_marginalia=False,  # This will be overridden
    include_metadata_in_markdown=False,
)

# Call parse with a conflicting direct argument
results = parse(
    "path/to/file.pdf",
    include_marginalia=True,  # This overrides the config setting
    config=config,
)

# The result will include marginalia, because the direct argument takes precedence
I