The parse function in the agentic-doc library accepts a ParseConfig object to simplify and centralize configuration. The ParseConfig class is available in agentic-doc v0.3.0 and later.

ParseConfig is a configuration class you can use to group optional settings for the parse function in the agentic-doc library. Instead of passing multiple parameters individually, you can pass a single ParseConfig object that holds multiple settings.

We recommend using ParseConfig to configure settings for the Parse function, instead of passing the individual parameters to the function.

Use Cases

Use ParseConfig when you want to:

  • group multiple parsing options into a single object.
  • reuse the same settings across multiple parse calls.
  • control parsing behavior based on user input or environment.

Basic Usage

To use ParseConfig, define your settings inside a ParseConfig object and pass that object to the parse function.

from agentic_doc.parse import parse
from agentic_doc.config import ParseConfig

# Set up a configuration object
config = ParseConfig(
    api_key="your-api-key",  
    include_marginalia=False,
    include_metadata_in_markdown=True,
    split_size=5,
)

# Pass the configuration object and parse the file using those settings
results = parse("path/to/file.pdf", config=config)

ParseConfig Parameters

The ParseConfig class accepts several optional parameters that control the behavior of the parse function. You can include only the settings you need.

Here is the full list of accepted parameters:

ParseConfig(
    api_key: Optional[str],
    include_marginalia: Optional[bool],
    include_metadata_in_markdown: Optional[bool],
    extraction_model: Optional[type[T]],
    extraction_schema: Optional[dict[str, Any]],
    split_size: Optional[int],
    extraction_split_size: Optional[int],
)

Settings Precedence

We recommend using ParseConfig to configure settings for the Parse function, instead of passing the individual parameters to the function. However, if you configure one setting using both methods, values passed directly to the parse function take precedence over values passed in the ParseConfig object.

For example, the result of the script below will include marginalia, because the direct argument takes precedence over the value passed in the ParseConfig object.

from agentic_doc.parse import parse
from agentic_doc.config import ParseConfig

# Set up a configuration object
config = ParseConfig(
    include_marginalia=False,  # This will be overridden
    include_metadata_in_markdown=False,
)

# Call parse with a conflicting direct argument
results = parse(
    "path/to/file.pdf",
    include_marginalia=True,  # This overrides the config setting
    config=config,
)

# The result will include marginalia, because the direct argument takes precedence