You can parse files in Snowflake with the app that is hosted in Snowflake. This method is referred to as Local Processing and Snowflake Local. Local Processing uses Snowflake Cortex to process documents, so your files never leave your Snowflake environment. Local Processing may have fewer features than Cloud Processing.

Local Processing and Zero Data Retention

When you run Local Processing with on Snowflake, your files are not sent outside of your Snowflake environment. Therefore, using Local Processing with maintains zero data retention (ZDR). For detailed information, go to Zero Data Retention (ZDR) Option Overview.

Local Processing and Billing

When you use Local Processing, all billing is managed by Snowflake. You do not need to create a separate account with LandingAI.

Rate Limits for Local Processing

Local Processing can parse up to 20 pages per minute.

Extracted Output

The output from Cloud Processing matches the response you get when calling the API directly. For complete details on the response format, go to API Reference.

Quickstart: Local Processing

  1. Grant permissions to the stage with the files you want to parse.
  2. Grant permissions to access Cortex.
  3. Run the parse commands.
  4. If you want to run field extraction, include the JSON schema in the parse command.

Parse with Local Processing

The app provides this function to run Local Processing:
doc_extraction.snowflake_extract_doc_structure
This section provides examples of how to run the doc_extraction.snowflake_extract_doc_structure function in different scenarios.

Parse a Single File

Run the command below to parse a single file in a Snowflake stage. Replace these placeholders with your information: APP_NAME, your_db, your_schema, your_stage, and /path/to/file.pdf.
USE "APP_NAME";
SELECT
    doc_extraction.snowflake_extract_doc_structure(
        '@your_db.your_schema.your_stage/path/to/file.pdf'
    )

Sample Script: Parse a Single File

Let’s say you have the following setup:
  • APP_NAME: LANDINGAI_AGENTIC_DOCUMENT_EXTRACTION
  • Database: DEMO_DB
  • Schema: DEMO_SCHEMA
  • Stage: DEMO_STAGE
The DEMO_STAGE stage contains this file:
  • statement-jane-harper.pdf
You would run the following script to process the document in the DEMO_STAGE stage:
USE "LANDINGAI_AGENTIC_DOCUMENT_EXTRACTION";
SELECT
    doc_extraction.snowflake_extract_doc_structure(
        '@DEMO_DB.DEMO_SCHEMA.DEMO_STAGE/statement-jane-harper.pdf'
    )
The screenshot below shows a Snowsight worksheet that ran Local Processing on one file. Cloud Processing for a Table

Parse a Batch of Files in a Table

One way to process multiple documents is to create a table that lists the filenames of documents stored in a Snowflake stage. Then, write a SQL script that processes each file by using the file paths from the table. This approach uses the concatenation operator (||) to connect the file paths in your table to the actual documents in your stage. Use the following script to process multiple files from a table. Replace these placeholders with your information: APP_NAME, your_db, your_schema, your_stage, and documents. The script assumes you have a table called documents that has a column called file_path, which lists the filenames of the documents you want to process.
USE "APP_NAME";
SELECT
    doc_extraction.snowflake_extract_doc_structure(
        '@your_db.your_schema.your_stage/' || documents.file_path
    )
FROM your_db.your_schema.documents
WHERE
    documents.file_path IS NOT NULL

Sample Script: Parse a Batch of Files in a Table

Let’s say you have the following setup:
  • APP_NAME: LANDINGAI_AGENTIC_DOCUMENT_EXTRACTION
  • Database: DEMO_DB
  • Schema: DEMO_SCHEMA
  • Stage: DEMO_STAGE (contains PDFs)
  • Table: STATEMENTS
The STATEMENTS table has a column called file_path. The file_path column contains the following the following filenames:
  • document-1.pdf
  • document-2.pdf
  • document-3.pdf
You would run the following script to process the three documents in the STATEMENTS table:
USE "LANDINGAI_AGENTIC_DOCUMENT_EXTRACTION";
SELECT
    doc_extraction.snowflake_extract_doc_structure(
        '@DEMO_DB.DEMO_SCHEMA.DEMO_STAGE/' || STATEMENTS.file_path
    )
FROM DEMO_DB.DEMO_SCHEMA.STATEMENTS
WHERE
    STATEMENTS.file_path IS NOT NULL
The screenshot below shows a Snowsight worksheet that ran Local Processing on files listed in a table. Local Processing for a Table

Parse a Batch of Files in a Fixed List

One way to process multiple documents is to specify the exact filenames directly in your SQL script. This approach is useful when you know exactly which files you want to process and don’t need to create a separate table. This method creates a temporary list of filenames within the SQL query itself and processes each file from your Snowflake stage. Use the following script to process multiple files from a fixed list. Replace these placeholders with your information: APP_NAME, your_db, your_schema, your_stage, and the file paths in the VALUES section. Make sure to include the forward slash (/) before each filename.
USE "APP_NAME";
SELECT
    file_name,
    doc_extraction.snowflake_extract_doc_structure(
        '@your_db.your_schema.your_stage' || file_name
    )
FROM (VALUES
        -- List of file paths in @your_db.your_schema.your_stage goes here:
        ('/path/to/document-1.pdf'),
        ('/path/to/document-2.pdf'),
        ('/path/to/document-3.pdf')
    )
    AS files(file_name);

Sample Script: Parse a Batch of Files in a Fixed List

Let’s say you have the following setup: This example processes three specific PDF files stored in a stage called DEMO_STAGE.
  • APP_NAME: LANDINGAI_AGENTIC_DOCUMENT_EXTRACTION
  • Database: DEMO_DB
  • Schema: DEMO_SCHEMA
  • Stage: DEMO_STAGE
The DEMO_STAGE stage contains these files:
  • statement-george-mathew.png
  • statement-jane-harper.pdf
  • statement-john-doe.png
You would run the following script to process the three documents in the DEMO_STAGE stage:
USE "LANDINGAI_AGENTIC_DOCUMENT_EXTRACTION";
SELECT
    file_name,
    doc_extraction.snowflake_extract_doc_structure(
        '@DEMO_DB.DEMO_SCHEMA.DEMO_STAGE' || file_name
    )
FROM (VALUES
        -- List of file paths in @DEMO_DB.DEMO_SCHEMA.DEMO_STAGE goes here:
        ('/statement-george-mathew.png'),
        ('/statement-jane-harper.pdf'),
        ('/statement-john-doe.png')
    )
    AS files(file_name);
The screenshot below shows a Snowsight worksheet that ran Local Processing on a list of files. Cloud Processing for a Table

Field Extraction

The app on Snowflake supports field extraction, which allows you to extract specific key:value pairs from the documents you are parsing. The key:value pairs are defined using an extraction schema. Learn more about field extraction in Overview: Extract Data. To run field extraction, add the field extraction schema as a parameter in the doc_extraction.snowflake_extract_doc_structure function.

Schema Structure for Local Processing

Snowflake requires specific JSON schema formatting that may differ from schemas you’ve used directly in or Cloud Processing. When performing field extraction with Local Processing, your JSON schema must include these required elements:
  • "additionalProperties": false: Prevents the extraction of fields not defined in the schema.
  • "required": ["field1", "field2"]: Specifies which fields are mandatory.
Example:
{
    "type": "object",
    "properties": {
        "title": {
            "type": "string",
            "title": "Employee Name",
            "description": "The document title, typically at the top of the page"
        },
        "balance": {
            "type": "number", 
            "title": "Account Balance",
            "description": "The amount of money in the account after all transactions are applied."
        }
    },
    "additionalProperties": false,
    "required": ["title", "balance"]
}
For complete JSON schema specifications and additional formatting requirements, see the Snowflake documentation.

Sample Script: Run Field Extraction on Single File

Let’s say you have the following setup:
  • APP_NAME: LANDINGAI_AGENTIC_DOCUMENT_EXTRACTION
  • Database: DEMO_DB
  • Schema: DEMO_SCHEMA
  • Stage: DEMO_STAGE
The DEMO_STAGE stage contains this file:
  • statement-jane-harper.pdf
You want to extract these fields from the file:
  • Employee Name
  • Employee Social Security Number
You would run the following script to run parsing and field extraction on the document in the DEMO_STAGE stage:
USE "LANDINGAI_AGENTIC_DOCUMENT_EXTRACTION";
SELECT
    doc_extraction.snowflake_extract_doc_structure(
        '@DEMO_DB.DEMO_SCHEMA.DEMO_STAGE/statement-jane-harper.pdf',
        '{
            "type": "object",
            "properties": {
                "employee_name": {
                    "type": "string",
                    "title": "Employee Name",
                    "description": "The full name of the employee as it appears on the payroll document."
                },
                "employee_ssn": {
                    "type": "string",
                    "title": "Employee Social Security Number",
                    "description": "The Social Security Number of the employee, formatted as XXX-XX-XXXX."
                }
            },
            "additionalProperties": false,
            "required": ["employee_name", "employee_ssn"]
        }'
    );