Quickstart
In this example, we’ll use the library to extract data from two PDFs.
1. Prerequisites
You will need:
2. Set the API Key as an Environment Variable
After you get the API key, set the key as an environment variable (or put it in a .env
file):
3. Install the Library
Install the library.
4. Extract Data from Two Documents and Return Results as Objects
Run this script to parse two PDFs and returns the results as both Markdown and JSON objects. This example uses documents hosted at URLs, but local files are also supported.
The API parses the documents and prints the JSON and Markdown outputs for each document in the console. Because the extracted data is returned as objects, you can write scripts that take that output and immediately process it. For example, you could create a web app that extracts structured data from a PDF and immediately renders it in the UI.
5. Extract Data from Two Documents and Save Results
In the previous example, you parsed two documents and immediately output the results in the console. Now, run a different script to parse those same PDFs and save the results as JSON files in a local directory. Be sure to change ./parsed_results
to the directory you want the results saved to.
The API parses the documents and saves the results in the directory you specified. Because the extracted data is saved, you can later audit it or build an app that references it. For example, you could built out a document processing system that parses documents nightly and saves the output as JSON files for auditors to inspect the next day.
6. Next Steps
Now that you know the two main functions for parsing documents, learn about the additional parameters in Parsing Basics so that you can build out custom scripts for your use case.