Extract Fields from Parsed Output

To extract fields from documents, first parse a document with the api.parse procedure. Then use the api.extract procedure to extract key-value pairs from the Markdown returned by the api.parse procedure.

Prerequisites

Before you can extract fields, you must parse a document. For more information, go to Parse Documents.

Set Up the Session

Before running a parse or extract procedure, run the command below to set your session to use the Agentic Document Extraction application and procedures. Replace this placeholder with the name of your instance of Agentic Document Extraction: APP_NAME.

USE "APP_NAME";

Extract

To extract fields from parsed documents, use the api.extract procedure. The api.extract procedure sends the Markdown and a JSON schema to the -hosted service, and saves the extracted data to an output table (defaults to db.extract_output). The api.extract procedure runs the ADE Extract API.

Required Inputs

The api.extract procedure requires:

Markdown content from api.parse
JSON schema that defines which fields to extract and their expected format. For more information, go to Create a JSON Schema for Field Extraction.

The procedure accepts these inputs using different methods. For details, go to Methods for Passing the Markdown and Methods for Passing the JSON Schema.

Optional Parameters

The api.extract procedure supports these optional parameters:

doc_id: Document ID from parse output; provide this to link the extraction results to the original parsed document
output_table: Specify a custom output table name instead of the default extract_output
model: Specify the model version to use for extraction. For full details on extraction models, go to Extraction Model Versions.

Extract Return Object

The api.extract procedure returns an OBJECT with the following fields:

message: Success or error message
output_table: Name of the table where results were saved (such as “db.extract_output”). For the table schema, go to Extract Output Table Schema.
doc_id: Document ID from the parse output (for linking results)
extraction_id: Unique extraction job identifier
status_code: HTTP status code for the request

This return object is useful when chaining parse and extract operations together, and for tracking extraction jobs and debugging.

Extract Output Table Schema

The extraction results are stored in the table specified by output_table in the return object. By default, this is db.extract_output. The table has the following schema:

DOC_ID: Document ID from parse output; you can use this to link extraction results to the original parsed document in parse_output
EXTRACTION_JOB_ID: Unique extraction job identifier
SOURCE_MARKDOWN: First 10,000 characters of input Markdown (for reference)
MODEL_VERSION: Model version used for extraction
EXTRACTED_AT: Timestamp when extraction completed
STATUS_CODE: HTTP status code (200 for success)
EXTRACTION: VARIANT containing the extracted data matching your schema
EXTRACTION_METADATA: VARIANT with extraction metadata
METADATA: VARIANT with job metadata
ERROR: VARIANT containing error information (if extraction failed)

Example query to access nested extraction data:

SELECT
    DOC_ID,
    EXTRACTION:field_name::STRING AS field_value,
    EXTRACTION:nested:field::NUMBER AS nested_value
FROM db.extract_output
WHERE STATUS_CODE = 200;

Methods for Passing the Markdown

You can pass the Markdown content to api.extract using two methods:

Pass the Parse Result Object Directly
Pass Markdown Explicitly

Pass the Parse Result Object Directly

You can pass the result object from api.parse directly to api.extract. This is the most streamlined approach for chaining parse and extract operations. You can combine this with any method for passing the JSON schema. Use this method when you:

Want to chain parse and extract in a single script block
Need to avoid querying the parse output table
Want to automatically link parse and extract results

Procedure Signature

PROCEDURE api.extract(
    parse_result OBJECT,
    schema STRING,
    output_table STRING DEFAULT NULL,
    model STRING DEFAULT NULL
)

Example

The procedure automatically:

Extracts the doc_id from the parse result
Retrieves the Markdown from the parse output table
Links the extraction result with the parse result via doc_id

DECLARE
    parse_result OBJECT;
    extract_result OBJECT;
BEGIN
    -- Parse the document
    CALL api.parse(
        'https://va.landing.ai/pdfs/invoice_1.pdf'
    ) INTO :parse_result;

    -- Extract using the parse result directly
    CALL api.extract(
        :parse_result,
        'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json'
    ) INTO :extract_result;

    RETURN extract_result;
END;

Pass Markdown Explicitly

You can query the Markdown directly from the parse_output table and pass it as a parameter. Use this method when you’ve already parsed documents and want to extract from them separately. You can combine this with any method for passing the JSON schema. Procedure Signature

PROCEDURE api.extract(
    markdown STRING,
    schema STRING,
    doc_id STRING DEFAULT NULL,
    output_table STRING DEFAULT NULL,
    model STRING DEFAULT NULL
)

Example

CALL api.extract(
    markdown => (SELECT MARKDOWN FROM db.parse_output WHERE doc_id = 'your_doc_id'),
    schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json',
    doc_id => 'your_doc_id'
);

SELECT * FROM db.extract_output;

Methods for Passing the JSON Schema

You can pass the JSON schema to api.extract using multiple methods:

Include the JSON Schema Inline
Use a Staged Schema File
Pass a URL to an Externally Hosted JSON Schema (Demo Files Only)

Include the JSON Schema Inline

Provide the schema as an inline JSON string. You can combine this with any method for passing the Markdown. Use this method when you:

Have a simple schema specific to one query
Want to keep all logic contained in a single script
Prototype or test schema definitions

Example

CALL api.extract(
    markdown => (SELECT MARKDOWN FROM db.parse_output WHERE doc_id = 'your_doc_id'),
    schema => '{
        "type": "object",
        "properties": {
            "title": {
                "type": "string",
                "title": "Document Title",
                "description": "The main title of the document."
            },
            "content": {
                "type": "string",
                "title": "Main Content",
                "description": "The primary content or text from the document."
            }
        },
        "required": ["title", "content"]
    }',
    doc_id => 'your_doc_id'
);

SELECT * FROM db.extract_output;

Use a Staged Schema File

Use build_scoped_file_url() to reference a schema file in a Snowflake stage. You can combine this with any method for passing the Markdown. Use this method when you:

Store your schema files in Snowflake stages
Want to version control schemas alongside your data
Need to reference schemas from internal stage locations

Example

CALL api.extract(
    markdown => (SELECT MARKDOWN FROM db.parse_output WHERE doc_id = 'your_doc_id'),
    schema => build_scoped_file_url('@your_db.your_schema.your_stage', '/simple_schema.json'),
    doc_id => 'your_doc_id'
);

SELECT * FROM db.extract_output;

Pass a URL to an Externally Hosted JSON Schema (Demo Files Only)

This method only works with schema files hosted at https://va.landing.ai, which were granted access during app installation. To use schemas from other URLs or locations, use another method for passing the JSON schema. Provide the schema as a URL parameter. You can combine this with any method for passing the Markdown.

Example

CALL api.extract(
    markdown => (SELECT MARKDOWN FROM db.parse_output WHERE doc_id = 'your_doc_id'),
    schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json',
    doc_id => 'your_doc_id'
);

SELECT * FROM db.extract_output;

Batch Processing with EXECUTE IMMEDIATE

For processing multiple documents at once, you can use Snowflake’s scripting capabilities to loop through parsed documents and extract data from each. This approach is useful when you have many documents already parsed and want to extract structured data from all of them in one operation.

EXECUTE IMMEDIATE $$
DECLARE
    res RESULTSET DEFAULT (
        SELECT MARKDOWN, DOC_ID FROM db.parse_output WHERE STATUS_CODE = 200
    );
    cur CURSOR FOR res;
BEGIN
    FOR record IN cur DO
        LET markdown STRING := record.MARKDOWN;
        LET doc_id STRING := record.DOC_ID;

        CALL api.extract(
            markdown => :markdown,
            schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json',
            doc_id => :doc_id,
            output_table => 'batch_extract_results'
        );
    END FOR;
    RETURN 'Processing complete';
END;
$$;

-- View all extracted results
SELECT * FROM db.batch_extract_results;

Sample Scenarios

This section provides examples of how to run the api.extract procedure in different scenarios.

Parse and Extract Data from Files at Publicly Accessible URLs
Parse and Extract Data from a Staged File
Parse and Extract Data from All Files in a Stage

Parse and Extract Data from Files at Publicly Accessible URLs

Run the command below to parse multiple files at publicly accessible URLs, and then extract data from the parsed output. We’ve provided the sample files to help you get started. This example uses an externally hosted JSON schema at https://va.landing.ai, which is only available for demo purposes. For production use, use an inline schema or a staged schema file. Replace this placeholder with your information: APP_NAME.

USE "APP_NAME";

-- Step 1: Parse invoice files
CALL api.parse('https://va.landing.ai/pdfs/invoice_1.pdf');
CALL api.parse('https://va.landing.ai/pdfs/invoice_2.pdf');
CALL api.parse('https://va.landing.ai/pdfs/invoice_3.pdf');
CALL api.parse('https://va.landing.ai/pdfs/invoice_4.pdf');

-- Step 2: Extract structured data from each parsed invoice
CALL api.extract(
    markdown => (SELECT MARKDOWN FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_1.pdf'),
    schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json',
    doc_id => (SELECT DOC_ID FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_1.pdf')
);

CALL api.extract(
    markdown => (SELECT MARKDOWN FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_2.pdf'),
    schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json',
    doc_id => (SELECT DOC_ID FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_2.pdf')
);

CALL api.extract(
    markdown => (SELECT MARKDOWN FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_3.pdf'),
    schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json',
    doc_id => (SELECT DOC_ID FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_3.pdf')
);

CALL api.extract(
    markdown => (SELECT MARKDOWN FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_4.pdf'),
    schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json',
    doc_id => (SELECT DOC_ID FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_4.pdf')
);

-- Step 3: View extracted data
SELECT
    SPLIT_PART(p.SOURCE_URL, '/', -1) AS filename,
    e.EXTRACTION:invoice_info:invoice_date::STRING AS invoice_date,
    e.EXTRACTION:company_info:supplier_name::STRING AS supplier_name,
    e.EXTRACTION:line_items[0]:description::STRING AS first_line_item_description,
    e.EXTRACTION:totals_summary:total_due::FLOAT AS total_due
FROM db.extract_output e
JOIN db.parse_output p ON e.DOC_ID = p.DOC_ID
ORDER BY filename;

Parse and Extract Data from a Staged File

Before parsing staged files, you must grant the application access to your stage. For more information, go to Grant Access to Stages.

Run the command below to parse a single file in a Snowflake stage, and then extract data from the parsed output. Replace these placeholders with your information: APP_NAME, your_db, your_schema, your_stage, path/to/file.pdf, and the JSON schema fields.

USE "APP_NAME";

-- Step 1: Parse the staged file
CALL api.parse(
    '@your_db.your_schema.your_stage/path/to/file.pdf'
);

-- Step 2: Extract structured data using inline schema
CALL api.extract(
    markdown => (SELECT MARKDOWN FROM db.parse_output WHERE FILENAME = 'file.pdf'),
    schema => '{
        "type": "object",
        "properties": {
            "field1": {"type": "string"},
            "field2": {"type": "string"}
        },
        "required": ["field1", "field2"]
    }',
    doc_id => (SELECT DOC_ID FROM db.parse_output WHERE FILENAME = 'file.pdf')
);

-- Step 3: View extracted fields
SELECT
    EXTRACTION:field1::STRING AS field1,
    EXTRACTION:field2::STRING AS field2
FROM db.extract_output;

Sample Script: Parse and Extract a Staged File

Let’s say you have the following setup:

APP_NAME: AGENTIC_DOCUMENT_EXTRACTION__APP
Database: DEMO_DB
Schema: DEMO_SCHEMA
Stage: DEMO_STAGE
PDF: statement-jane-harper.pdf

You want to extract these fields from the file:

Employee Name
Employee Social Security Number

First, grant the application access to the stage:

GRANT USAGE ON DATABASE DEMO_DB TO APPLICATION "AGENTIC_DOCUMENT_EXTRACTION__APP";
GRANT USAGE ON SCHEMA DEMO_DB.DEMO_SCHEMA TO APPLICATION "AGENTIC_DOCUMENT_EXTRACTION__APP";
GRANT READ, WRITE ON STAGE DEMO_DB.DEMO_SCHEMA.DEMO_STAGE TO APPLICATION "AGENTIC_DOCUMENT_EXTRACTION__APP";

Then, parse the file and extract fields from statement-jane-harper.pdf:

USE "AGENTIC_DOCUMENT_EXTRACTION__APP";

-- Step 1: Parse the staged file
CALL api.parse(
    '@DEMO_DB.DEMO_SCHEMA.DEMO_STAGE/statement-jane-harper.pdf'
);

-- Step 2: Extract structured data using inline schema
CALL api.extract(
    markdown => (SELECT MARKDOWN FROM db.parse_output WHERE FILENAME = 'statement-jane-harper.pdf'),
    schema => '{
        "title": "Employee Payroll Field Extraction Schema",
        "description": "Schema for extracting key employee payroll fields.",
        "type": "object",
        "properties": {
            "employee_name": {
                "title": "Employee Name",
                "description": "The full name of the employee as it appears on the payroll document.",
                "type": "string"
            },
            "employee_ssn": {
                "title": "Employee Social Security Number",
                "description": "The Social Security Number of the employee, formatted as XXX-XX-XXXX.",
                "type": "string"
            }
        }
    }',
    doc_id => (SELECT DOC_ID FROM db.parse_output WHERE FILENAME = 'statement-jane-harper.pdf')
);

-- Step 3: View extracted fields
SELECT
    EXTRACTION:employee_name::STRING AS employee_name,
    EXTRACTION:employee_ssn::STRING AS employee_ssn
FROM db.extract_output;

Parse and Extract Data from All Files in a Stage

Before parsing staged files, you must grant the application access to your stage. For more information, go to Grant Access to Stages.

Run the script below to iterate through all files in a Snowflake stage, parse each file, and extract data from the parsed output. This example uses Snowflake’s CURSOR functionality to filter and process multiple files from a stage. Replace these placeholders with your information: @your_db.your_schema.your_stage, filtering criteria, and the JSON schema.

-- Example: Parse and extract all PDFs on a stage using some filtering criteria
DECLARE
    file_cursor CURSOR FOR
        SELECT RELATIVE_PATH
        FROM DIRECTORY(@your_db.your_schema.your_stage)
        WHERE SIZE < 1000000  -- Files smaller than 1MB (1,000,000 bytes)
            AND LOWER(RELATIVE_PATH) LIKE '%your_file_pattern%.pdf'  -- Only process PDF files matching your pattern
        LIMIT 10;
    current_file_path STRING;
    full_stage_path STRING;
    parse_ret OBJECT;
    extract_ret OBJECT;

BEGIN
    FOR file_record IN file_cursor DO
        current_file_path := file_record.RELATIVE_PATH;
        full_stage_path := '@"your_db"."your_schema"."your_stage"/' || :current_file_path;

        -- Parse document using direct stage path
        CALL api.parse(file_path => :full_stage_path) INTO :parse_ret;

        -- Extract using the parse return object
        CALL api.extract(
            parse_result => :parse_ret,
            schema => '{"type": "object", "properties": {...}}'  -- Replace with your JSON schema
        ) INTO :extract_ret;

    END FOR;

END;

-- View parsed results
SELECT * FROM db.parse_output;

-- View extracted results
SELECT * FROM db.extract_output;

Get Started

Client Libraries

Parsing

Split

Extraction

General

Security

Administration

Agentic Document Extraction on Snowflake

Legacy ADE

Legacy Python Library

Extract Fields from Parsed Output

Prerequisites

Set Up the Session

Extract

Required Inputs

Optional Parameters

Extract Return Object

Extract Output Table Schema

Methods for Passing the Markdown

Pass the Parse Result Object Directly

Example

Pass Markdown Explicitly

Example

Methods for Passing the JSON Schema

Include the JSON Schema Inline

Example

Use a Staged Schema File

Example

Pass a URL to an Externally Hosted JSON Schema (Demo Files Only)

Example

Batch Processing with EXECUTE IMMEDIATE

Sample Scenarios

Parse and Extract Data from Files at Publicly Accessible URLs

Parse and Extract Data from a Staged File

Sample Script: Parse and Extract a Staged File

Parse and Extract Data from All Files in a Stage

Get Started

Client Libraries

Parsing

Split

Extraction

General

Security

Administration

Agentic Document Extraction on Snowflake

Legacy ADE

Legacy Python Library

​Prerequisites

​Set Up the Session

​Extract

​Required Inputs

​Optional Parameters

​Extract Return Object

​Extract Output Table Schema

​Methods for Passing the Markdown

​Pass the Parse Result Object Directly

​Example

​Pass Markdown Explicitly

​Example

​Methods for Passing the JSON Schema

​Include the JSON Schema Inline

​Example

​Use a Staged Schema File

​Example

​Pass a URL to an Externally Hosted JSON Schema (Demo Files Only)

​Example

​Batch Processing with EXECUTE IMMEDIATE

​Sample Scenarios

​Parse and Extract Data from Files at Publicly Accessible URLs

​Parse and Extract Data from a Staged File

​Sample Script: Parse and Extract a Staged File

​Parse and Extract Data from All Files in a Stage

Prerequisites

Set Up the Session

Extract

Required Inputs

Optional Parameters

Extract Return Object

Extract Output Table Schema

Methods for Passing the Markdown

Pass the Parse Result Object Directly

Example

Pass Markdown Explicitly

Example

Methods for Passing the JSON Schema

Include the JSON Schema Inline

Example

Use a Staged Schema File

Example

Pass a URL to an Externally Hosted JSON Schema (Demo Files Only)

Example

Batch Processing with EXECUTE IMMEDIATE

Sample Scenarios

Parse and Extract Data from Files at Publicly Accessible URLs

Parse and Extract Data from a Staged File

Sample Script: Parse and Extract a Staged File

Parse and Extract Data from All Files in a Stage