parse
function to parse one or more documents. You have the option to either return the output as objects or save the output as a JSON file in a specified directory.
You have the option to save the visual groundings to a directory.
parse
function is available in the agentic-doc library v0.2.3 and later.parse
function, get your API Key and set it.
parse
function supports raw bytes from PDF and image files. This means you can parse documents that are already loaded into memory, without needing to save them to disk first.
Here are two common situations where this is useful:
parse
function without storing it as a file.parse
function.parse
function:
documents
: List of paths to documents or URLs pointing to documents.result_save_dir
: The directory where the JSON files will be saved.include_marginalia
: If True
, includes marginalia
chunks (text in the header, footer, and margins) in the output. For more information, go to Chunk Types. Defaults to True
. (Optional)include_metadata_in_markdown
: If True
, includes metadata in the Markdown output. Defaults to True
. (Optional)grounding_save_dir
: The directory where grounding images will be saved. For more information, go to Save Groundings as Images. (Optional)connector_path
: Path for connector to search (when using connectors).connector_pattern
: Pattern to filter files (when using connectors).extraction_model
: Pydantic model schema for field extraction. For more information about extraction, go to Extract Data with the Library. (Optional)extraction_schema
: JSON schema for field extraction. For more information about extraction, go to Extract Data with the Library. (Optional)config
: Pass configuration settings with the ParseConfig
object. For more information about using this parameter, go to Pass Settings with ParseConfig. (Optional)ParsedDocument
objects. For more information, go to ParsedDocument Object.
If the result_save_dir
parameter is included, you can find the file path to each generated JSON file in the result_path
field in each ParsedDocument
object.
No documents to parse
: The error is raised if the provided file path does not exist.ValueError
: This error is raised if the file type is not supported or a URL is invalid.ParsedDocument
object contains the data extracted from a document.