Quickstart
The easiest way to parse documents with is to use our Python library. In this quickstart, we’ll use the library to extract data from a PDF on a local directory.
1. Set the API Key as an Environment Variable
Get your API key and set it as an environment variable (or put it in a .env
file):
2. Install the Library
3. Extract Data from a Local File and Return Results as Objects
Run this script to parse a file on a local directory and return the results as Markdown and JSON objects.
The API parses the document and prints the JSON and Markdown outputs for the document in the console. Because the extracted data is returned as objects, you can write scripts that take that output and immediately process it. For example, you could create a web app that extracts structured data from a PDF and immediately renders it in the UI.
4. Extract Data from a Local File and Save Results
In the previous example, you parsed a file and immediately output the results in the console. Now, run a different script to parse the same file and save the results as a JSON file in a local directory.
Run this script to parse a local file and save the results as a JSON file at the specified directory.
The API parses the document and saves the results in the directory you specified. Because the extracted data is saved, you can later audit it or build an app that references it. For example, you could build out a document processing system that parses documents nightly and saves the output as JSON files for auditors to inspect the next day.
5. Next Steps
Now that you know how to parse documents, learn about the additional parameters in Parsing Basics so that you can build out custom scripts for your use case.