Parse Documents from Amazon S3, Google Drive, and More
If you need to parse documents stored in places like Google Drive, Amazon S3, URLs, or local folders, you can use the connectors
module to access and authenticate to those locations.
A connector is a Python class, along with configuration settings, that enables the parse
function to access and retrieve documents from a specific source, such as a cloud storage bucket or local directory.
You can pass a connector to the parse
function to fetch and parse all documents from that source, without manually listing each file.
You can use a connector to access all documents in an Amazon S3 bucket or Google Drive. Also, instead of specifying every file path in a local folder, you can use a connector to parse the entire directory in one call.
connectors
module is available in the agentic-doc library v0.2.3 and later.Parse Documents from Google Drive
Before parsing documents from Google Drive, we recommend running through this tutorial first to help you set up your Google credentials: Google Drive API Python Quickstart.
The tutorial guides you through:
- Creating a Google Cloud project
- Enabling the Google Drive API
- Setting up OAuth 2.0 credentials
Sample Script: Google Drive
After completing the tutorial, run the following script to parse documents from Google Drive.
Parse Documents from Amazon S3
Run the following script to parse documents from an Amazon S3 bucket.
Parse Documents from a Local Directory with a Connector
Run the following script to parse documents in a local dirctory. The function only parses documents directly in the local directory; it does not parse documents in nested directories.
Parse Documents from a URL with a Connector
Run the following script to parse documents at a specified URL.