api.extract procedure to extract key-value pairs from the Markdown returned by the api.parse procedure.
Prerequisites
Before you can extract fields, you must parse a document. For more information, go to Parse Documents.Set Up the Session
Before running a parse or extract procedure, run the command below to set your session to use the Agentic Document Extraction application and procedures. Replace this placeholder with the name of your instance of Agentic Document Extraction:APP_NAME.
Extract
To extract fields from parsed documents, use theapi.extract procedure.
The api.extract procedure sends the Markdown and a JSON schema to the -hosted service, and saves the extracted data to an output table (defaults to db.extract_output).
The api.extract procedure runs the ADE Extract API.
Required Inputs
Theapi.extract procedure requires:
- Markdown content from
api.parse - JSON schema that defines which fields to extract and their expected format. For more information, go to Create a JSON Schema for Field Extraction.
Optional Parameters
Theapi.extract procedure supports these optional parameters:
doc_id: Document ID from parse output; provide this to link the extraction results to the original parsed documentoutput_table: Specify a custom output table name instead of the defaultextract_outputmodel: Specify the model version to use for extraction. For full details on extraction models, go to Extraction Model Versions.
Extract Return Object
Theapi.extract procedure returns an OBJECT with the following fields:
message: Success or error messageoutput_table: Name of the table where results were saved (such as “db.extract_output”). For the table schema, go to Extract Output Table Schema.doc_id: Document ID from the parse output (for linking results)extraction_id: Unique extraction job identifierstatus_code: HTTP status code for the request
Extract Output Table Schema
The extraction results are stored in the table specified byoutput_table in the return object. By default, this is db.extract_output.
The table has the following schema:
- DOC_ID: Document ID from parse output; you can use this to link extraction results to the original parsed document in
parse_output - EXTRACTION_JOB_ID: Unique extraction job identifier
- SOURCE_MARKDOWN: First 10,000 characters of input Markdown (for reference)
- MODEL_VERSION: Model version used for extraction
- EXTRACTED_AT: Timestamp when extraction completed
- STATUS_CODE: HTTP status code (200 for success)
- EXTRACTION: VARIANT containing the extracted data matching your schema
- EXTRACTION_METADATA: VARIANT with extraction metadata
- METADATA: VARIANT with job metadata
- ERROR: VARIANT containing error information (if extraction failed)
Methods for Passing the Markdown
You can pass the Markdown content toapi.extract using two methods:
Pass the Parse Result Object Directly
You can pass the result object fromapi.parse directly to api.extract. This is the most streamlined approach for chaining parse and extract operations.
You can combine this with any method for passing the JSON schema.
Use this method when you:
- Want to chain parse and extract in a single script block
- Need to avoid querying the parse output table
- Want to automatically link parse and extract results
Example
The procedure automatically:- Extracts the
doc_idfrom the parse result - Retrieves the Markdown from the parse output table
- Links the extraction result with the parse result via
doc_id
Pass Markdown Explicitly
You can query the Markdown directly from theparse_output table and pass it as a parameter. Use this method when you’ve already parsed documents and want to extract from them separately.
You can combine this with any method for passing the JSON schema.
Procedure Signature
Example
Methods for Passing the JSON Schema
You can pass the JSON schema toapi.extract using multiple methods:
- Include the JSON Schema Inline
- Use a Staged Schema File
- Pass a URL to an Externally Hosted JSON Schema (Demo Files Only)
Include the JSON Schema Inline
Provide the schema as an inline JSON string. You can combine this with any method for passing the Markdown. Use this method when you:- Have a simple schema specific to one query
- Want to keep all logic contained in a single script
- Prototype or test schema definitions
Example
Use a Staged Schema File
Usebuild_scoped_file_url() to reference a schema file in a Snowflake stage.
You can combine this with any method for passing the Markdown.
Use this method when you:
- Store your schema files in Snowflake stages
- Want to version control schemas alongside your data
- Need to reference schemas from internal stage locations
Example
Pass a URL to an Externally Hosted JSON Schema (Demo Files Only)
This method only works with schema files hosted athttps://va.landing.ai, which were granted access during app installation. To use schemas from other URLs or locations, use another method for passing the JSON schema.
Provide the schema as a URL parameter.
You can combine this with any method for passing the Markdown.
Example
Batch Processing with EXECUTE IMMEDIATE
For processing multiple documents at once, you can use Snowflake’s scripting capabilities to loop through parsed documents and extract data from each. This approach is useful when you have many documents already parsed and want to extract structured data from all of them in one operation.Sample Scenarios
This section provides examples of how to run theapi.extract procedure in different scenarios.
- Parse and Extract Data from Files at Publicly Accessible URLs
- Parse and Extract Data from a Staged File
Parse and Extract Data from Files at Publicly Accessible URLs
Run the command below to parse multiple files at publicly accessible URLs, and then extract data from the parsed output. We’ve provided the sample files to help you get started. This example uses an externally hosted JSON schema athttps://va.landing.ai, which is only available for demo purposes. For production use, use an inline schema or a staged schema file.
Replace this placeholder with your information: APP_NAME.
Parse and Extract Data from a Staged File
Before parsing staged files, you must grant the application access to your stage. For more information, go to Grant Access to Stages.
APP_NAME, your_db, your_schema, your_stage, path/to/file.pdf, and the JSON schema fields.
Sample Script: Parse and Extract a Staged File
Let’s say you have the following setup:- APP_NAME: AGENTIC_DOCUMENT_EXTRACTION__APP
- Database: DEMO_DB
- Schema: DEMO_SCHEMA
- Stage: DEMO_STAGE
- PDF: statement-jane-harper.pdf
- Employee Name
- Employee Social Security Number
statement-jane-harper.pdf:

