Use the Index Documents feature in the app to parse documents in Snowflake and save the extracted data to a table. You can then use Cortex to search and query the extracted data in the table. This can be used for workflows like retrieval-Augmented Generation (RAG), document retrieval, “chatting” with documents, and more. To learn how to access the extracted data, go to Access Indexed Data.

Quickstart: Index Documents

  1. Grant permissions to the stage with the files you want to index.
  2. If you plan on using Local Processing to parse the documents, grant permissions to access Cortex.
  3. Follow the instructions in Index Documents.

Index Documents

After you have granted the app permission to access the stage with the files you want to index (and if using Local Processing, granted the app permission to use Cortex), you can index documents.
  1. Open Snowsight.
  2. Go to Catalog > Apps > LandingAI Agentic Document Extraction.
  3. Click Index Documents.
  4. Enter the location of the files you want to index in the Stage Name field. Use this format: <YOUR_DB>.<YOUR_SCHEMA>.<YOUR_STAGE>.
  5. If you want to filter the documents included in indexing, enter a Regex Filter. This field accepts Python-compatible regular expressions to match specific filenames. For example, ^ABC.* matches files starting with “ABC”, or `.*.csv matches files ending with “.csv”. For supported patterns, go to the Snowflake documentation.
  6. Enter the number of Parallel Jobs.
  7. Select the Processing Method:
  8. Click Index Documents.
  9. An indexing job is created and displays. You might need to refresh the page to see the updated status.

Indexed Data

The indexing process creates a table that contains key file information and the extracted data. This is the table schema, where:
  • APP_NAME is the name of your app
  • app_data is schema name
  • indexed_documents is the table name
TABLE "APP_NAME".app_data.indexed_documents (
    stage_name VARCHAR,
    file_path VARCHAR,
    chunk_index INT,
    chunk_info VARIANT
);

Access Indexed Data

Use the following script to see all indexed data. Replace this placeholder with your information: APP_NAME.
SELECT * FROM "APP_NAME".app_data.indexed_documents;

All Indexed Documents Are Stored in One Table

The results from all document indexing jobs are stored in this table: APP_NAME.app_data.indexed_documents. To view results from specific indexing jobs, filter your queries using the stage_name and file_path columns. For example, use the following script to see indexed data from files on a specific stage. Replace these placeholders with your information: APP_NAME, and your_stage.
SELECT * FROM "APP_NAME".app_data.indexed_documents 
WHERE stage_name = 'your_stage';