Skip to main content
To extract fields from documents, first parse a document with the api.parse function. Then use the api.extract user-defined function (UDF) to extract key-value pairs from the Markdown returned by the api.parse function. The Markdown you specify will be sent to the -hosted service, and the results will display directly in Snowsight. Using the api.extract function is the same as calling the endpoint. For full details about this endpoint and its response, go to .

Extract Inputs

The api.extract function requires two inputs:
  • the Markdown content from api.parse
  • a JSON Schema that defines which fields to extract and their expected format

Methods for Passing the JSON Schema

The api.extract function accepts the JSON schema in two formats:

Pass a URL to an Externally Hosted JSON Schema

Use this method when you:
  • Reuse the same schema across multiple queries
  • Work with complex schemas that would clutter your SQL script
  • Share schemas across teams or projects
Pass the URL as a string parameter to api.extract:
SELECT
    api.extract(
        parse_result:content:markdown,
        'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json'
    ) AS extraction
FROM parsed_invoices;

Include the JSON Schema Inline with PARSE_JSON

Use this method when you:
  • Have a simple schema specific to one query
  • Want to keep all logic contained in a single script
  • Prototype or test schema definitions
Wrap your JSON schema with the PARSE_JSON function:
SELECT
    api.extract(
        parse_result:content:markdown,
        PARSE_JSON('{
            "type": "object",
            "properties": {
                "field1": {"type": "string"},
                "field2": {"type": "string"}
            },
            "required": ["field1", "field2"]
        }')
    ) AS extraction
FROM parsed_file;

Sample Scenarios

This section provides examples of how to run the api.extract function in different scenarios.

Parse and Extract Data from Files at Publicly Accessible URLs

Run the command below to parse multiple files at publicly accessible URLs, and then extract data from the parsed output. We’ve provided the sample files to help you get started. Replace this placeholder with your information: APP_NAME.
USE "APP_NAME";

WITH invoice_files AS (
    -- Step 1: Create a list of invoice file URLs
    SELECT 'https://va.landing.ai/pdfs/invoice_1.pdf' AS file_url
    UNION ALL
    SELECT 'https://va.landing.ai/pdfs/invoice_2.pdf'
    UNION ALL
    SELECT 'https://va.landing.ai/pdfs/invoice_3.pdf'
    UNION ALL
    SELECT 'https://va.landing.ai/pdfs/invoice_4.pdf'
),
parsed_invoices AS (
    -- Step 2: Parse each invoice file
    SELECT
        file_url,
        api.parse(file_url) AS parse_result
    FROM invoice_files
),
extracted_data AS (
    -- Step 3: Extract structured data from parsed Markdown using the invoice schema
    SELECT
        file_url,
        api.extract(
            parse_result:content:markdown,
            'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json'
        ) AS extraction
    FROM parsed_invoices
)
SELECT
    -- Step 4: Select and display specific extracted fields
    SPLIT_PART(file_url, '/', -1) AS filename,
    extraction:content:extraction:invoice_info:invoice_date::STRING AS invoice_date,
    extraction:content:extraction:company_info:supplier_name::STRING AS supplier_name,
    extraction:content:extraction:line_items[0]:description::STRING AS first_line_item_description,
    extraction:content:extraction:totals_summary:total_due::FLOAT AS total_due
FROM extracted_data
ORDER BY filename;

Parse and Extract Data from a Staged File

Run the command below to parse a single file in a Snowflake stage, and then extract data from the parsed output. Replace these placeholders with your information: APP_NAME, your_db, your_schema, your_stage, path/to/file.pdf, and the JSON schema in the PARSE_JSON function.
USE "APP_NAME";

WITH parsed_file AS (
    -- Step 1: Parse the staged file
    SELECT
        api.parse(
            '@your_db.your_schema.your_stage/path/to/file.pdf'
        ) AS parse_result
),
extracted_data AS (
    -- Step 2: Extract structured data from parsed Markdown using inline schema
    SELECT
        api.extract(
            parse_result:content:markdown,
            PARSE_JSON('{
                "type": "object",
                "properties": {
                    "field1": {"type": "string"},
                    "field2": {"type": "string"}
                },
                "required": ["field1", "field2"]
            }')
        ) AS extraction
    FROM parsed_file
)
SELECT
    -- Step 3: Select and display extracted fields
    extraction:content:extraction:field1::STRING AS field1,
    extraction:content:extraction:field2::STRING AS field2
FROM extracted_data;

Sample Script: Parse and Extract a Staged File

Let’s say you have the following setup:
  • APP_NAME: AGENTIC_DOCUMENT_EXTRACTION__APP
  • Database: DEMO_DB
  • Schema: DEMO_SCHEMA
  • Stage: DEMO_STAGE
  • PDF: statement-jane-harper.pdf
You want to extract these fields from the file:
  • Employee Name
  • Employee Social Security Number
You would run the following script to parse the file and extract fields from statement-jane-harper.pdf:
USE "AGENTIC_DOCUMENT_EXTRACTION__APP";

WITH parsed_file AS (
    -- Step 1: Parse the staged file
    SELECT
        api.parse(
            '@DEMO_DB.DEMO_SCHEMA.DEMO_STAGE/statement-jane-harper.pdf'
        ) AS parse_result
),
extracted_data AS (
    -- Step 2: Extract structured data from parsed Markdown using inline schema
    SELECT
        api.extract(
            parse_result:content:markdown,
            PARSE_JSON('{
                "title": "Employee Payroll Field Extraction Schema",
                "description": "Schema for extracting key employee payroll fields.",
                "type": "object",
                "properties": {
                    "employee_name": {
                        "title": "Employee Name",
                        "description": "The full name of the employee as it appears on the payroll document.",
                        "type": "string"
                    },
                    "employee_ssn": {
                        "title": "Employee Social Security Number", 
                        "description": "The Social Security Number of the employee, formatted as XXX-XX-XXXX.",
                        "type": "string"
                    }
                }
            }')
        ) AS extraction
    FROM parsed_file
)
SELECT
    -- Step 3: Select and display extracted fields
    extraction:content:extraction:employee_name::STRING AS employee_name,
    extraction:content:extraction:employee_ssn::STRING AS employee_ssn
FROM extracted_data;
The screenshot below shows a Snowsight worksheet that parsed a staged file and then extracted the employee_name and employee_ssn fields. Extracted Output