Skip to main content
To extract fields from documents, first parse a document with the api.parse procedure. Then use the api.extract procedure to extract key-value pairs from the Markdown returned by the api.parse procedure.

Prerequisites

Before you can extract fields, you must parse a document. For more information, go to Parse Documents.

Set Up the Session

Before running a parse or extract procedure, run the command below to set your session to use the Agentic Document Extraction application and procedures. Replace this placeholder with the name of your instance of Agentic Document Extraction: APP_NAME.
USE "APP_NAME";

Extract

To extract fields from parsed documents, use the api.extract procedure. The api.extract procedure sends the Markdown and a JSON schema to the -hosted service, and saves the extracted data to an output table (defaults to db.extract_output). The api.extract procedure runs the ADE Extract API.

Required Inputs

The api.extract procedure requires: The procedure accepts these inputs using different methods. For details, go to Methods for Passing the Markdown and Methods for Passing the JSON Schema.

Optional Parameters

The api.extract procedure supports these optional parameters:
  • doc_id: Document ID from parse output; provide this to link the extraction results to the original parsed document
  • output_table: Specify a custom output table name instead of the default extract_output
  • model: Specify the model version to use for extraction. For full details on extraction models, go to Extraction Model Versions.

Extract Return Object

The api.extract procedure returns an OBJECT with the following fields:
  • message: Success or error message
  • output_table: Name of the table where results were saved (such as “db.extract_output”). For the table schema, go to Extract Output Table Schema.
  • doc_id: Document ID from the parse output (for linking results)
  • extraction_id: Unique extraction job identifier
  • status_code: HTTP status code for the request
This return object is useful when chaining parse and extract operations together, and for tracking extraction jobs and debugging.

Extract Output Table Schema

The extraction results are stored in the table specified by output_table in the return object. By default, this is db.extract_output. The table has the following schema:
  • DOC_ID: Document ID from parse output; you can use this to link extraction results to the original parsed document in parse_output
  • EXTRACTION_JOB_ID: Unique extraction job identifier
  • SOURCE_MARKDOWN: First 10,000 characters of input Markdown (for reference)
  • MODEL_VERSION: Model version used for extraction
  • EXTRACTED_AT: Timestamp when extraction completed
  • STATUS_CODE: HTTP status code (200 for success)
  • EXTRACTION: VARIANT containing the extracted data matching your schema
  • EXTRACTION_METADATA: VARIANT with extraction metadata
  • METADATA: VARIANT with job metadata
  • ERROR: VARIANT containing error information (if extraction failed)
Example query to access nested extraction data:
SELECT
    DOC_ID,
    EXTRACTION:field_name::STRING AS field_value,
    EXTRACTION:nested:field::NUMBER AS nested_value
FROM db.extract_output
WHERE STATUS_CODE = 200;

Methods for Passing the Markdown

You can pass the Markdown content to api.extract using two methods:

Pass the Parse Result Object Directly

You can pass the result object from api.parse directly to api.extract. This is the most streamlined approach for chaining parse and extract operations. You can combine this with any method for passing the JSON schema. Use this method when you:
  • Want to chain parse and extract in a single script block
  • Need to avoid querying the parse output table
  • Want to automatically link parse and extract results
Procedure Signature
PROCEDURE api.extract(
    parse_result OBJECT,
    schema STRING,
    output_table STRING DEFAULT NULL,
    model STRING DEFAULT NULL
)

Example

The procedure automatically:
  • Extracts the doc_id from the parse result
  • Retrieves the Markdown from the parse output table
  • Links the extraction result with the parse result via doc_id
DECLARE
    parse_result OBJECT;
    extract_result OBJECT;
BEGIN
    -- Parse the document
    CALL api.parse(
        'https://va.landing.ai/pdfs/invoice_1.pdf'
    ) INTO :parse_result;

    -- Extract using the parse result directly
    CALL api.extract(
        :parse_result,
        'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json'
    ) INTO :extract_result;

    RETURN extract_result;
END;

Pass Markdown Explicitly

You can query the Markdown directly from the parse_output table and pass it as a parameter. Use this method when you’ve already parsed documents and want to extract from them separately. You can combine this with any method for passing the JSON schema. Procedure Signature
PROCEDURE api.extract(
    markdown STRING,
    schema STRING,
    doc_id STRING DEFAULT NULL,
    output_table STRING DEFAULT NULL,
    model STRING DEFAULT NULL
)

Example

CALL api.extract(
    markdown => (SELECT MARKDOWN FROM db.parse_output WHERE doc_id = 'your_doc_id'),
    schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json',
    doc_id => 'your_doc_id'
);

SELECT * FROM db.extract_output;

Methods for Passing the JSON Schema

You can pass the JSON schema to api.extract using multiple methods:

Include the JSON Schema Inline

Provide the schema as an inline JSON string. You can combine this with any method for passing the Markdown. Use this method when you:
  • Have a simple schema specific to one query
  • Want to keep all logic contained in a single script
  • Prototype or test schema definitions

Example

CALL api.extract(
    markdown => (SELECT MARKDOWN FROM db.parse_output WHERE doc_id = 'your_doc_id'),
    schema => '{
        "type": "object",
        "properties": {
            "title": {
                "type": "string",
                "title": "Document Title",
                "description": "The main title of the document."
            },
            "content": {
                "type": "string",
                "title": "Main Content",
                "description": "The primary content or text from the document."
            }
        },
        "required": ["title", "content"]
    }',
    doc_id => 'your_doc_id'
);

SELECT * FROM db.extract_output;

Use a Staged Schema File

Use build_scoped_file_url() to reference a schema file in a Snowflake stage. You can combine this with any method for passing the Markdown. Use this method when you:
  • Store your schema files in Snowflake stages
  • Want to version control schemas alongside your data
  • Need to reference schemas from internal stage locations

Example

CALL api.extract(
    markdown => (SELECT MARKDOWN FROM db.parse_output WHERE doc_id = 'your_doc_id'),
    schema => build_scoped_file_url('@your_db.your_schema.your_stage', '/simple_schema.json'),
    doc_id => 'your_doc_id'
);

SELECT * FROM db.extract_output;

Pass a URL to an Externally Hosted JSON Schema (Demo Files Only)

This method only works with schema files hosted at https://va.landing.ai, which were granted access during app installation. To use schemas from other URLs or locations, use another method for passing the JSON schema. Provide the schema as a URL parameter. You can combine this with any method for passing the Markdown.

Example

CALL api.extract(
    markdown => (SELECT MARKDOWN FROM db.parse_output WHERE doc_id = 'your_doc_id'),
    schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json',
    doc_id => 'your_doc_id'
);

SELECT * FROM db.extract_output;

Batch Processing with EXECUTE IMMEDIATE

For processing multiple documents at once, you can use Snowflake’s scripting capabilities to loop through parsed documents and extract data from each. This approach is useful when you have many documents already parsed and want to extract structured data from all of them in one operation.
EXECUTE IMMEDIATE $
DECLARE
    res RESULTSET DEFAULT (
        SELECT MARKDOWN, DOC_ID FROM db.parse_output WHERE STATUS_CODE = 200
    );
    cur CURSOR FOR res;
BEGIN
    FOR record IN cur DO
        LET markdown STRING := record.MARKDOWN;
        LET doc_id STRING := record.DOC_ID;

        CALL api.extract(
            markdown => :markdown,
            schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json',
            doc_id => :doc_id,
            output_table => 'batch_extract_results'
        );
    END FOR;
    RETURN 'Processing complete';
END;
$;

-- View all extracted results
SELECT * FROM db.batch_extract_results;

Sample Scenarios

This section provides examples of how to run the api.extract procedure in different scenarios.

Parse and Extract Data from Files at Publicly Accessible URLs

Run the command below to parse multiple files at publicly accessible URLs, and then extract data from the parsed output. We’ve provided the sample files to help you get started. This example uses an externally hosted JSON schema at https://va.landing.ai, which is only available for demo purposes. For production use, use an inline schema or a staged schema file. Replace this placeholder with your information: APP_NAME.
USE "APP_NAME";

-- Step 1: Parse invoice files
CALL api.parse('https://va.landing.ai/pdfs/invoice_1.pdf');
CALL api.parse('https://va.landing.ai/pdfs/invoice_2.pdf');
CALL api.parse('https://va.landing.ai/pdfs/invoice_3.pdf');
CALL api.parse('https://va.landing.ai/pdfs/invoice_4.pdf');

-- Step 2: Extract structured data from each parsed invoice
CALL api.extract(
    markdown => (SELECT MARKDOWN FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_1.pdf'),
    schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json',
    doc_id => (SELECT DOC_ID FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_1.pdf')
);

CALL api.extract(
    markdown => (SELECT MARKDOWN FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_2.pdf'),
    schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json',
    doc_id => (SELECT DOC_ID FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_2.pdf')
);

CALL api.extract(
    markdown => (SELECT MARKDOWN FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_3.pdf'),
    schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json',
    doc_id => (SELECT DOC_ID FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_3.pdf')
);

CALL api.extract(
    markdown => (SELECT MARKDOWN FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_4.pdf'),
    schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json',
    doc_id => (SELECT DOC_ID FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_4.pdf')
);

-- Step 3: View extracted data
SELECT
    SPLIT_PART(p.SOURCE_URL, '/', -1) AS filename,
    e.EXTRACTION:invoice_info:invoice_date::STRING AS invoice_date,
    e.EXTRACTION:company_info:supplier_name::STRING AS supplier_name,
    e.EXTRACTION:line_items[0]:description::STRING AS first_line_item_description,
    e.EXTRACTION:totals_summary:total_due::FLOAT AS total_due
FROM db.extract_output e
JOIN db.parse_output p ON e.DOC_ID = p.DOC_ID
ORDER BY filename;

Parse and Extract Data from a Staged File

Before parsing staged files, you must grant the application access to your stage. For more information, go to Grant Access to Stages.
Run the command below to parse a single file in a Snowflake stage, and then extract data from the parsed output. Replace these placeholders with your information: APP_NAME, your_db, your_schema, your_stage, path/to/file.pdf, and the JSON schema fields.
USE "APP_NAME";

-- Step 1: Parse the staged file
CALL api.parse(
    '@your_db.your_schema.your_stage/path/to/file.pdf'
);

-- Step 2: Extract structured data using inline schema
CALL api.extract(
    markdown => (SELECT MARKDOWN FROM db.parse_output WHERE FILENAME = 'file.pdf'),
    schema => '{
        "type": "object",
        "properties": {
            "field1": {"type": "string"},
            "field2": {"type": "string"}
        },
        "required": ["field1", "field2"]
    }',
    doc_id => (SELECT DOC_ID FROM db.parse_output WHERE FILENAME = 'file.pdf')
);

-- Step 3: View extracted fields
SELECT
    EXTRACTION:field1::STRING AS field1,
    EXTRACTION:field2::STRING AS field2
FROM db.extract_output;

Sample Script: Parse and Extract a Staged File

Let’s say you have the following setup:
  • APP_NAME: AGENTIC_DOCUMENT_EXTRACTION__APP
  • Database: DEMO_DB
  • Schema: DEMO_SCHEMA
  • Stage: DEMO_STAGE
  • PDF: statement-jane-harper.pdf
You want to extract these fields from the file:
  • Employee Name
  • Employee Social Security Number
First, grant the application access to the stage:
GRANT USAGE ON DATABASE DEMO_DB TO APPLICATION "AGENTIC_DOCUMENT_EXTRACTION__APP";
GRANT USAGE ON SCHEMA DEMO_DB.DEMO_SCHEMA TO APPLICATION "AGENTIC_DOCUMENT_EXTRACTION__APP";
GRANT READ, WRITE ON STAGE DEMO_DB.DEMO_SCHEMA.DEMO_STAGE TO APPLICATION "AGENTIC_DOCUMENT_EXTRACTION__APP";
Then, parse the file and extract fields from statement-jane-harper.pdf:
USE "AGENTIC_DOCUMENT_EXTRACTION__APP";

-- Step 1: Parse the staged file
CALL api.parse(
    '@DEMO_DB.DEMO_SCHEMA.DEMO_STAGE/statement-jane-harper.pdf'
);

-- Step 2: Extract structured data using inline schema
CALL api.extract(
    markdown => (SELECT MARKDOWN FROM db.parse_output WHERE FILENAME = 'statement-jane-harper.pdf'),
    schema => '{
        "title": "Employee Payroll Field Extraction Schema",
        "description": "Schema for extracting key employee payroll fields.",
        "type": "object",
        "properties": {
            "employee_name": {
                "title": "Employee Name",
                "description": "The full name of the employee as it appears on the payroll document.",
                "type": "string"
            },
            "employee_ssn": {
                "title": "Employee Social Security Number",
                "description": "The Social Security Number of the employee, formatted as XXX-XX-XXXX.",
                "type": "string"
            }
        }
    }',
    doc_id => (SELECT DOC_ID FROM db.parse_output WHERE FILENAME = 'statement-jane-harper.pdf')
);

-- Step 3: View extracted fields
SELECT
    EXTRACTION:employee_name::STRING AS employee_name,
    EXTRACTION:employee_ssn::STRING AS employee_ssn
FROM db.extract_output;