In this example, we’ll use the library to extract data from two PDFs.

1. Prerequisites

You will need:

2. Set the API Key as an Environment Variable

After you get the API key, set the key as an environment variable (or put it in a .env file):

export VISION_AGENT_API_KEY=<your-api-key>

3. Install the Library

Install the library.

pip install agentic-doc

4. Extract Data from Two Documents and Return Results as Objects

Run this script to parse two PDFs and returns the results as both Markdown and JSON objects. This example uses documents hosted at URLs, but local files are also supported.

from agentic_doc.parse import parse_documents

# Parse documents from URLs
results = parse_documents(["https://satsuite.collegeboard.org/media/pdf/sample-sat-score-report.pdf", "https://www.rbcroyalbank.com/banking-services/_assets-custom/pdf/eStatement.pdf"])
parsed_doc = results[0]

# Get the extracted data as markdown
print(parsed_doc.markdown)  

# Get the extracted data as structured chunks of content in a JSON schema
print(parsed_doc.chunks)  

The API parses the documents and prints the JSON and Markdown outputs for each document in the console. Because the extracted data is returned as objects, you can write scripts that take that output and immediately process it. For example, you could create a web app that extracts structured data from a PDF and immediately renders it in the UI.

5. Extract Data from Two Documents and Save Results

In the previous example, you parsed two documents and immediately output the results in the console. Now, run a different script to parse those same PDFs and save the results as JSON files in a local directory. Be sure to change ./parsed_results to the directory you want the results saved to.

from agentic_doc.parse import parse_and_save_documents

# URLs to the document
documents = ["https://satsuite.collegeboard.org/media/pdf/sample-sat-score-report.pdf", "https://www.rbcroyalbank.com/banking-services/_assets-custom/pdf/eStatement.pdf"]

# Directory where the parsed results will be saved
result_save_dir = "./parsed_results"

# Parse the documents and save the results
result_paths = parse_and_save_documents(documents=documents, result_save_dir=result_save_dir)

print(f"Result saved to: {result_paths}")

The API parses the documents and saves the results in the directory you specified. Because the extracted data is saved, you can later audit it or build an app that references it. For example, you could built out a document processing system that parses documents nightly and saves the output as JSON files for auditors to inspect the next day.

6. Next Steps

Now that you know the two main functions for parsing documents, learn about the additional parameters in Parsing Basics so that you can build out custom scripts for your use case.