You can send files in Snowflake to the -hosted version of and see the results directly in Snowsight. This method is referred to as Cloud Processing. When parsing a batch of files, controls parallelism and API rate limits. You can also include both PDFs and images in the same batch.
If you want to process files directly in Snowflake, use the Local Processing method instead.

Cloud Processing and Zero Data Retention

Cloud Processing uses your account. If you want Zero Data Retention (ZDR) with Cloud Processing, you must first enable ZDR in your account. For information on how to do this, go here.

Cloud Processing and Billing

When you use Cloud Processing, documents are processed using your account that is hosted by LandingAI. Billing for processing documents (like running parsing and field extraction) is managed by LandingAI. Billing for all other Snowflake usage, including the app’s CPU and GPU usage, will continue to be billed through Snowflake.

Extracted Output

The output from Cloud Processing matches the response you get when calling the API directly. For complete details on the response format, go to API Reference.

Quickstart: Cloud Processing

  1. Create an account at va.landing.ai.
  2. Get your API key. For detailed instructions on how to get the API key, go to API Key.
  3. Enter your API key in Snowsight.
  4. Grant permissions to the stage with the files you want to parse.
  5. Run the parse commands.
  6. If you want to run field extraction, include the JSON schema in the parse command.

Enter Your API Key in Snowsight

In order to parse documents with the Cloud Processing method, you must enter your API key in the app in Snowsight. Only one API Key can be entered in the app at a time. In case multiple users access the app, the API key is masked in the interface. To enter your API key:
  1. Open Snowsight.
  2. Go to Catalog > Apps > LandingAI Agentic Document Extraction.
  3. Scroll down to Processing Methods and ensure that Cloud Processing (API) is selected.
  4. Enter your API key in the Cloud Processing section.
  5. Click Add API Key.

Parse with Cloud Processing

The app provides this function to run Cloud Processing:
doc_extraction.cloud_extract_doc_structure
The function takes a string as input and returns a VARIANT with the document structure. This section provides examples of how to run the doc_extraction.cloud_extract_doc_structure function in different scenarios.

Parse a Single File

Run the command below to parse a single file in a Snowflake stage. Replace these placeholders with your information: APP_NAME, your_db, your_schema, your_stage, and /path/to/file.pdf.
USE "APP_NAME";
SELECT
    doc_extraction.cloud_extract_doc_structure(
        build_scoped_file_url('@your_db.your_schema.your_stage', '/path/to/file.pdf')
    )

Parse a Batch of Files in a Table

One way to process multiple documents is to create a table that lists the filenames of documents stored in a Snowflake stage. Then, write a SQL script that processes each file by using the file paths from the table. This approach uses the build_scoped_file_url function to connect the file paths in your table to the actual documents in your stage. Use the following script to process multiple files from a table. Replace these placeholders with your information: APP_NAME, your_db, your_schema, your_stage, and documents. The script assumes you have a table called documents that has a column called file_path, which lists the filenames of the documents you want to process.
USE "APP_NAME";
SELECT
    doc_extraction.cloud_extract_doc_structure(
        build_scoped_file_url('@your_db.your_schema.your_stage', documents.file_path)
    )
FROM documents
WHERE
    documents.file_path IS NOT NULL

Sample Script: Parse a Batch of Files in a Table

Let’s say you have the following setup:
  • APP_NAME: LANDINGAI_AGENTIC_DOCUMENT_EXTRACTION
  • Database: DEMO_DB
  • Schema: DEMO_SCHEMA
  • Stage: DEMO_STAGE (contains PDFs)
  • Table: STATEMENTS
The STATEMENTS table has a column called file_path. The file_path column contains the following the following filenames:
  • document-1.pdf
  • document-2.pdf
  • document-3.pdf
You would run the following script to process the three documents in the STATEMENTS table:
USE "LANDINGAI_AGENTIC_DOCUMENT_EXTRACTION";
SELECT  
    doc_extraction.cloud_extract_doc_structure(
        build_scoped_file_url('@DEMO_DB.DEMO_SCHEMA.DEMO_STAGE', STATEMENTS.file_path)
    )
FROM DEMO_DB.DEMO_SCHEMA.STATEMENTS
WHERE
    STATEMENTS.file_path IS NOT NULL;
The screenshot below shows a Snowsight worksheet that ran Cloud Processing on files listed in a table. Cloud Processing for a Table

Parse a Batch of Files in a Fixed List

One way to process multiple documents is to specify the exact filenames directly in your SQL script. This approach is useful when you know exactly which files you want to process and don’t need to create a separate table. This method creates a temporary list of filenames within the SQL query itself and processes each file from your Snowflake stage. Use the following script to process multiple files from a fixed list. Replace these placeholders with your information: APP_NAME, your_db, your_schema, your_stage, and the file paths in the VALUES section.
USE "APP_NAME";
SELECT
    file_name,
    doc_extraction.cloud_extract_doc_structure(
         build_scoped_file_url(@your_db.your_schema.your_stage, file_name)
    )
FROM (VALUES
        -- List of filenames in @your_db.your_schema.your_stage goes here:
        ('document-1.pdf'),
        ('document-2.pdf'),
        ('document-3.pdf')
    )
    AS files(file_name);

Sample Script: Parse a Batch of Files in a Fixed List

Let’s say you have the following setup: This example processes three specific PDF files stored in a stage called DEMO_STAGE.
  • APP_NAME: LANDINGAI_AGENTIC_DOCUMENT_EXTRACTION
  • Database: DEMO_DB
  • Schema: DEMO_SCHEMA
  • Stage: DEMO_STAGE
The DEMO_STAGE stage contains these files:
  • invoice-001.pdf
  • invoice-002.pdf
  • invoice-003.pdf
You would run the following script to process the three documents in the DEMO_STAGE stage:
USE "LANDINGAI_AGENTIC_DOCUMENT_EXTRACTION";
SELECT
    file_name,
    doc_extraction.cloud_extract_doc_structure(
         build_scoped_file_url(@DEMO_DB.DEMO_SCHEMA.DEMO_STAGE, file_name)
    )
FROM (VALUES
        ('invoice-001.pdf'),
        ('invoice-002.pdf'),
        ('invoice-003.pdf')
    )
    AS files(file_name);
The screenshot below shows a Snowsight worksheet that ran Cloud Processing on a list of files. Cloud Processing for a Table

Field Extraction

The app on Snowflake supports field extraction, which allows you to extract specific key:value pairs from the documents you are parsing. The key:value pairs are defined using an extraction schema. Learn more about field extraction in Overview: Extract Data. To run field extraction, add the field extraction schema as a parameter in the doc_extraction.cloud_extract_doc_structure function.

Sample Script: Run Field Extraction on Single File

Let’s say you have the following setup:
  • APP_NAME: LANDINGAI_AGENTIC_DOCUMENT_EXTRACTION
  • Database: DEMO_DB
  • Schema: DEMO_SCHEMA
  • Stage: DEMO_STAGE
The DEMO_STAGE stage contains this file:
  • statement-jane-harper.pdf
You want to extract these fields from the file:
  • Employee Name
  • Employee Social Security Number
You would run the following script to run parsing and field extraction on the document in the DEMO_STAGE stage:
USE "LANDINGAI_AGENTIC_DOCUMENT_EXTRACTION";
SELECT
    doc_extraction.cloud_extract_doc_structure(
        build_scoped_file_url('@DEMO_DB.DEMO_SCHEMA.DEMO_STAGE', '/statement-jane-harper.pdf'),
        '{
            "$schema": "http://json-schema.org/draft-07/schema#",
            "title": "Employee Payroll Field Extraction Schema",
            "description": "Schema for extracting key employee payroll fields from a markdown document, as specified by the user.",
            "type": "object",
            "properties": {
                "employee_name": {
                    "title": "Employee Name",
                    "description": "The full name of the employee as it appears on the payroll document.",
                    "type": "string"
                },
                "employee_ssn": {
                    "title": "Employee Social Security Number", 
                    "description": "The Social Security Number of the employee, formatted as XXX-XX-XXXX.",
                    "type": "string"
                }
            }
        }'
    );