Index Documents

Quickstart: Index Documents
Index Documents
Indexed Data
Access Indexed Data
All Indexed Documents Are Stored in One Table

Use the Index Documents feature in the app to parse documents in Snowflake and save the extracted data to a table. You can then use Cortex to search and query the extracted data in the table. This can be used for workflows like retrieval-Augmented Generation (RAG), document retrieval, “chatting” with documents, and more. To learn how to access the extracted data, go to Access Indexed Data.

Quickstart: Index Documents

Grant permissions to the stage with the files you want to index.
If you plan on using Local Processing to parse the documents, grant permissions to access Cortex.
Follow the instructions in Index Documents.

After you have granted the app permission to access the stage with the files you want to index (and if using Local Processing, granted the app permission to use Cortex), you can index documents.

Open Snowsight.
Go to Catalog > Apps > LandingAI Agentic Document Extraction.
Click Index Documents.
Enter the location of the files you want to index in the Stage Name field. Use this format: <YOUR_DB>.<YOUR_SCHEMA>.<YOUR_STAGE>.
If you want to filter the documents included in indexing, enter a Regex Filter. This field accepts Python-compatible regular expressions to match specific filenames. For example, ^ABC.* matches files starting with “ABC”, or `.*.csv matches files ending with “.csv”. For supported patterns, go to the Snowflake documentation.
Enter the number of Parallel Jobs.
Select the Processing Method:
- Cloud: Uses Cloud Processing.
- Snowflake Cortex: Uses Local Processing.
Click Index Documents.
An indexing job is created and displays. You might need to refresh the page to see the updated status.

Indexed Data

The indexing process creates a table that contains key file information and the extracted data. This is the table schema, where:

APP_NAME is the name of your app
app_data is schema name
indexed_documents is the table name

TABLE "APP_NAME".app_data.indexed_documents (
    stage_name VARCHAR,
    file_path VARCHAR,
    chunk_index INT,
    chunk_info VARIANT
);

Access Indexed Data

Use the following script to see all indexed data. Replace this placeholder with your information: APP_NAME.

SELECT * FROM "APP_NAME".app_data.indexed_documents;

All Indexed Documents Are Stored in One Table

The results from all document indexing jobs are stored in this table: APP_NAME.app_data.indexed_documents. To view results from specific indexing jobs, filter your queries using the stage_name and file_path columns. For example, use the following script to see indexed data from files on a specific stage. Replace these placeholders with your information: APP_NAME, and your_stage.

SELECT * FROM "APP_NAME".app_data.indexed_documents 
WHERE stage_name = 'your_stage';

Parse Documents with Cloud Processing Snowflake Credit Management

⌘I

Get Started

In Preview

Parsing

Extraction

Troubleshooting

General

Security

Administration

Agentic Document Extract on Snowflake

Legacy Python Library

Index Documents

Quickstart: Index Documents

Index Documents

Indexed Data

Access Indexed Data

All Indexed Documents Are Stored in One Table

Get Started

In Preview

Parsing

Extraction

Troubleshooting

General

Security

Administration

Agentic Document Extract on Snowflake

Legacy Python Library

​Quickstart: Index Documents

​Index Documents

​Indexed Data

​Access Indexed Data

​All Indexed Documents Are Stored in One Table

Quickstart: Index Documents

Index Documents

Indexed Data

Access Indexed Data

All Indexed Documents Are Stored in One Table