parse_documents
: Use if you want to parse one or more documents and return the output as objects.parse_and_save_documents
: Use if you want to parse one or more documents and save the output as JSON filesparse_and_save_document
: Use if you want to parse only one file. You can have the output returned as objects or saved as a JSON file.parse_documents
function to parse one or more documents and return the output as objects. You have the option to save the visual groundings to a directory.
parse_documents
function returns extracted data as objects, this function is best for immediate downstream processing, integrations with other systems, or interactive environments (like Jupyter Notebook).
Use this when:
parse_documents
function:
documents
: List of paths to documents or URLs pointing to documents.include_marginalia
: If True
, includes marginalia
chunks (text in the header, footer, and margins) in the output. For more information, go to Chunk Types. Defaults to True
. (Optional)include_metadata_in_markdown
: If True
, includes metadata in the Markdown output. Defaults to True
. (Optional)grounding_save_dir
: The directory where grounding images will be saved. For more information, go to Save Groundings as Images. (Optional)extraction_model
: Pydantic model schema for field extraction. For more information about extraction, go to Extract Data with the Library. (Optional)extraction_schema
: JSON schema for field extraction. For more information about extraction, go to Extract Data with the Library. (Optional)config
: Pass configuration settings with the ParseConfig
object. For more information about using this parameter, go to Pass Settings with ParseConfig. (Optional)parse_documents
function returns a list of ParsedDocument
objects. For more information, go to ParsedDocument Object.
parse_documents
function can raise these errors:
FileNotFoundError
: This error is raised if the provided file path does not exist.ValueError
: This error is raised if the file type is not supported or a URL is invalid.parse_and_save_documents
function if you want to parse one or more documents and save the output as JSON files in a specified directory.
You have the option to save the visual groundings to a directory.
parse_and_save_documents
function saves the output as JSON files, this function is best for use cases that require persistence storage or auditing.
Use this when:
./parsed_results
. This example uses documents hosted at URLs, but local files are also supported.
parse_and_save_documents
function:
documents
: List of paths to documents or URLs pointing to documents.result_save_dir
: The directory where the JSON files will be saved.include_marginalia
: If True
, includes marginalia
chunks (text in the header, footer, and margins) in the output. For more information, go to Chunk Types. Defaults to True
. (Optional)include_metadata_in_markdown
: If True
, includes metadata in the Markdown output. Defaults to True
. (Optional)grounding_save_dir
: The directory where grounding images will be saved. For more information, go to Save Groundings as Images. (Optional)extraction_model
: Pydantic model schema for field extraction. For more information about extraction, go to Extract Data with the Library. (Optional)extraction_schema
: JSON schema for field extraction. For more information about extraction, go to Extract Data with the Library. (Optional)config
: Pass configuration settings with the ParseConfig
object. For more information about using this parameter, go to Pass Settings with ParseConfig. (Optional)parse_and_save_documents
function returns a list of file paths to the JSON files that the function created. The JSON files contain the structured data for the extracted elements.
The file paths are sorted in the same order as the input file paths. The JSON file name is the original file name with a timestamp appended. For example if the input file is “document.pdf”, the output file could be “document_20250313_070305.json”.
Example return:
FileNotFoundError
: This error is raised if the provided file path does not exist.ValueError
: This error is raised if the file type is not supported or a URL is invalid.parse_and_save_document
function if you want to parse one document. You have the option to either return the output as objects or save the output as a JSON file in a specified directory.
You have the option to save the visual groundings to a directory.
./parsed_results
. This example uses a document hosted at a URL, but local files are also supported.
document
: The path to a document or URL pointing to a document.result_save_dir
: The directory where the JSON files will be saved.include_marginalia
: If True
, includes marginalia
chunks (text in the header, footer, and margins) in the output. For more information, go to Chunk Types. Defaults to True
. (Optional)include_metadata_in_markdown
: If True
, includes metadata in the Markdown output. Defaults to True
. (Optional)grounding_save_dir
: The directory where grounding images will be saved. For more information, go to Save Groundings as Images. (Optional)connector_path
: Path for connector to search (when using connectors).connector_pattern
: Pattern to filter files (when using connectors)extraction_model
: Pydantic model schema for field extraction. For more information about extraction, go to Extract Data with the Library. (Optional)extraction_schema
: JSON schema for field extraction. For more information about extraction, go to Extract Data with the Library. (Optional)config
: Pass configuration settings with the ParseConfig
object. For more information about using this parameter, go to Pass Settings with ParseConfig. (Optional)result_save_dir
parameter is included, the function returns the file path to the JSON file that the function created.
The JSON file contains the structured data for the extracted elements. The JSON file name is the original file name with a timestamp appended. For example, if the input file is “document.pdf”, the output file could be “document_20250313_070305.json”.
If the result_save_dir
parameter is not included, the function returns a list of ParsedDocument
objects. For more information, go to ParsedDocument Object.
FileNotFoundError
: This error is raised if the provided file path does not exist.ValueError
: This error is raised if the file type is not supported or a URL is invalid.