The easiest way to parse documents with is to use our Python library. In this quickstart, we’ll use the library to extract data from a PDF on a local directory.
3. Extract Data from a Local File and Return Results as Objects
Run this script to parse a file on a local directory and return the results as Markdown and JSON objects.
Copy
Ask AI
from agentic_doc.parse import parse# Parse a local fileresult = parse("path/to/file.pdf")# Get the extracted data as markdownprint("Extracted Markdown:")print(result[0].markdown)# Get the extracted data as structured chunks of content in a JSON schemaprint("Extracted Chunks:")print(result[0].chunks)
parses the document and prints the Markdown and JSON outputs for the document in the console. Because the extracted data is returned as objects, you can write scripts that take that output and immediately process it. For example, you could create a web app that extracts structured data from a PDF and immediately renders it in the UI.
4. Extract Data from a Local File and Save Results
In the previous example, you parsed a file and immediately output the results in the console. Now, run a different script to parse the same file and save the results as a JSON file in a local directory.Run this script to parse a local file and save the results as a JSON file at the specified directory.
Copy
Ask AI
from agentic_doc.parse import parse# Parse a local PDF and save results to directoryresult = parse("path/to/file.pdf", result_save_dir="path/to/save/results")# Print the file path to the JSON fileprint(f"Final result: {result[0].result_path}")
The API parses the document and saves the results in the directory you specified. Because the extracted data is saved, you can later audit it or build an app that references it. For example, you could build out a document processing system that parses documents nightly and saves the output as JSON files for auditors to inspect the next day.
Now that you know how to parse documents, learn about the additional parameters in Parsing Basics so that you can build out custom scripts for your use case.