This article is about the legacy agentic-doc library. Use the landingai-ade library for all new projects.
Deprecated Parsing Functions
The parsing functions below were deprecated in v0.2.3 of the library. These functions will continue to work in later versions, but we recommend implementing the parse function instead. Deprecated functions:parse_documents: Use if you want to parse one or more documents and return the output as objects.parse_and_save_documents: Use if you want to parse one or more documents and save the output as JSON filesparse_and_save_document: Use if you want to parse only one file. You can have the output returned as objects or saved as a JSON file.
Parse Documents and Return Results as Objects
Use theparse_documents function to parse one or more documents and return the output as objects. You have the option to save the visual groundings to a directory.
When to Use: Immediate Processing
Because theparse_documents function returns extracted data as objects, this function is best for immediate downstream processing, integrations with other systems, or interactive environments (like Jupyter Notebook).
Use this when:
- You’re running the script in a notebook or web service that will immediately process, transform, or display the data.
- You need to pass the data to another function or microservice as part of a larger pipeline.
- You’re working in an interactive environment (like a Jupyter Notebook or a web-based UI).
- You want to avoid writing to disk due to permission issues or cloud function constraints.
Sample Script
This script parses two PDFs and returns the results as both Markdown and JSON objects. This example uses documents hosted at URLs, but local files are also supported.Function Signature
Parameters
Here are the parameters for theparse_documents function:
documents: List of paths to documents or URLs pointing to documents.include_marginalia: IfTrue, includesmarginaliachunks (text in the header, footer, and margins) in the output. For more information, go to Chunk Types. Defaults toTrue. (Optional)include_metadata_in_markdown: IfTrue, includes metadata in the Markdown output. Defaults toTrue. (Optional)grounding_save_dir: The directory where grounding images will be saved. For more information, go to Save Groundings as Images. (Optional)extraction_model: Pydantic model schema for field extraction. For more information about extraction, go to Extract Data with the Library. (Optional)extraction_schema: JSON schema for field extraction. For more information about extraction, go to Extract Data with the Library. (Optional)config: Pass configuration settings with theParseConfigobject. For more information about using this parameter, go to Pass Settings with ParseConfig. (Optional)
Returns
Theparse_documents function returns a list of ParsedDocument objects. For more information, go to ParsedDocument.
Raises
Theparse_documents function can raise these errors:
FileNotFoundError: This error is raised if the provided file path does not exist.ValueError: This error is raised if the file type is not supported or a URL is invalid.
Parse Documents and Save Results as JSON Files
Use theparse_and_save_documents function if you want to parse one or more documents and save the output as JSON files in a specified directory.
You have the option to save the visual groundings to a directory.
When to Use: Persistence and Auditing
Because theparse_and_save_documents function saves the output as JSON files, this function is best for use cases that require persistence storage or auditing.
Use this when:
- You want to store the extracted output for future reference, manual review, or archiving.
- You have a pipeline that uses batch processing or file watchers.
- You need to debug the output separately or share it with others.
- Your process includes a manual review step, such as human-in-the-loop verification.
Sample Script
This script parses two PDFs and saves the output as JSON files in this directory:./parsed_results. This example uses documents hosted at URLs, but local files are also supported.
Function Signature
Parameters
Here are the parameters for theparse_and_save_documents function:
documents: List of paths to documents or URLs pointing to documents.result_save_dir: The directory where the JSON files will be saved.include_marginalia: IfTrue, includesmarginaliachunks (text in the header, footer, and margins) in the output. For more information, go to Chunk Types. Defaults toTrue. (Optional)include_metadata_in_markdown: IfTrue, includes metadata in the Markdown output. Defaults toTrue. (Optional)grounding_save_dir: The directory where grounding images will be saved. For more information, go to Save Groundings as Images. (Optional)extraction_model: Pydantic model schema for field extraction. For more information about extraction, go to Extract Data with the Library. (Optional)extraction_schema: JSON schema for field extraction. For more information about extraction, go to Extract Data with the Library. (Optional)config: Pass configuration settings with theParseConfigobject. For more information about using this parameter, go to Pass Settings with ParseConfig. (Optional)
Returns
Theparse_and_save_documents function returns a list of file paths to the JSON files that the function created. The JSON files contain the structured data for the extracted elements.
The file paths are sorted in the same order as the input file paths. The JSON file name is the original file name with a timestamp appended. For example if the input file is “document.pdf”, the output file could be “document_20250313_070305.json”.
Example return:
Raises
FileNotFoundError: This error is raised if the provided file path does not exist.ValueError: This error is raised if the file type is not supported or a URL is invalid.
Parse One Document
Use theparse_and_save_document function if you want to parse one document. You have the option to either return the output as objects or save the output as a JSON file in a specified directory.
You have the option to save the visual groundings to a directory.
Sample Script
This script parses a PDF and saves the output as a JSON file in this directory:./parsed_results. This example uses a document hosted at a URL, but local files are also supported.
Function Signature
Parameters
document: The path to a document or URL pointing to a document.result_save_dir: The directory where the JSON files will be saved.include_marginalia: IfTrue, includesmarginaliachunks (text in the header, footer, and margins) in the output. For more information, go to Chunk Types. Defaults toTrue. (Optional)include_metadata_in_markdown: IfTrue, includes metadata in the Markdown output. Defaults toTrue. (Optional)grounding_save_dir: The directory where grounding images will be saved. For more information, go to Save Groundings as Images. (Optional)connector_path: Path for connector to search (when using connectors).connector_pattern: Pattern to filter files (when using connectors)extraction_model: Pydantic model schema for field extraction. For more information about extraction, go to Extract Data with the Library. (Optional)extraction_schema: JSON schema for field extraction. For more information about extraction, go to Extract Data with the Library. (Optional)config: Pass configuration settings with theParseConfigobject. For more information about using this parameter, go to Pass Settings with ParseConfig. (Optional)
Returns
If theresult_save_dir parameter is included, the function returns the file path to the JSON file that the function created.
The JSON file contains the structured data for the extracted elements. The JSON file name is the original file name with a timestamp appended. For example, if the input file is “document.pdf”, the output file could be “document_20250313_070305.json”.
If the result_save_dir parameter is not included, the function returns a list of ParsedDocument objects. For more information, go to ParsedDocument Object.
Raises
FileNotFoundError: This error is raised if the provided file path does not exist.ValueError: This error is raised if the file type is not supported or a URL is invalid.

