# Changelog Source: https://docs.landing.ai/ade/ade-changelog Updates and improvements to Agentic Document Extraction. ## Updated Playground Design The [Playground](https://va.landing.ai/) has a new look. The updated design makes it easier to load files and get started, and surfaces more resources directly in the left side-navigation panel, including: * **APIs**: Quick access to the ADE API reference * **Plan & Billing**: View and manage your plan and billing information * **Events**: View your processing activity Two new APIs are now available in public preview: ## ADE Classify classifies each page of a document by category. You provide a document and a list of classes, and the API assigns a class to each page concurrently. Use the results to route pages to downstream systems such as [Parse](./ade-separate-apis), [Split](./ade-split), or [Extract](./ade-extract). [Get started with ](./ade-classify). ## ADE Section generates a hierarchical table of contents from a previously parsed document. It accepts the Markdown output from (which contains reference anchors) and returns a flat, reading-order list of sections with their hierarchy levels and chunk references. [Get started with ](./ade-section). ## A Note About Public Preview These APIs are in Preview. These features are still in development and may not return accurate results. Do not use these features in production environments. ## New Parsing Model Snapshot: dpt-2-20260410 A new snapshot of is now available: `dpt-2-20260410`. `dpt-2-20260410` builds on previous versions with these improvements: * **Improved cell parsing in forms and tables**: Text positioned at different locations within a cell is now captured more completely. * **Improved column alignment in complex tables**: Cell data now more accurately aligns with its corresponding column headers. ### How This Affects Your Results If your code uses `dpt-2-latest` or `dpt-2`, it now points to this new snapshot and your parsing results may change. To receive the latest improvements automatically, continue using a `-latest` alias. To maintain consistent results over time, pin your code to a specific snapshot (for example, `dpt-2-20260302`). For more information, go to [Why Model Versioning Matters](./ade-parse-models#why-model-versioning-matters). This release includes a new extraction model version, an updated API with expanded schema support, the [ API](./ade-extract-schema-api) for programmatic schema creation, and a new projects experience in the Playground. It also introduces credit usage for AI-powered schema building tools. * [New Extraction Model: extract-20260314](#new-extraction-model-extract-20260314) * [Expanded Extraction Functionality](#expanded-extraction-functionality) * [Build Extract Schema API](#build-extract-schema-api) * [Projects in the Playground](#projects-in-the-playground) * [Schema Building Tools Consume Credits](#schema-building-tools-consume-credits) ## New Extraction Model: extract-20260314 All extraction improvements in this release require model `extract-20260314`. For a full list of what this model enables, see [extract-20260314](./ade-extract-models#extract-20260314). * **If you use `extract-latest`**: no changes needed. You already have access to the new model. * **If you specify a model version in your code**: update it to `extract-20260314` to access these improvements. ## Expanded Extraction Functionality The API no longer has limits on schema length, number of nested levels, or other schema constraints. ## Cross-Page Table Reconstruction Tables that span page breaks are now returned as a single array, with no post-processing needed. ## Long Document Support The API now supports documents of 1,000 or more pages, without splitting or merging files first. ## Build Extract Schema API The new API lets you programmatically create and update extraction schemas, providing the same schema-building capabilities as the Playground. * [Build Extract Schema API reference](https://docs.landing.ai/api-reference/tools/ade-build-extract-schema) * [How to use the Build Extract Schema API](./ade-extract-schema-api) ## Projects in the Playground Files in the Playground are now organized into projects. Projects let you group similar documents together and create and apply extraction schemas across multiple documents at once. ## Schema Building Tools Consume Credits Running AI-powered schema building tools consumes credits. This applies to: * The **Refine Schema**, **View Suggested Fields**, and **Write a Schema Prompt** tools in the Playground * The API This release changes how subscription plans handle usage beyond the credit allocation for your billing cycle. These changes ensure that exceeding your credits does not disrupt your service. * **No more overage charges.** Purchase prepaid credits in advance to cover usage beyond the credit allocation for your billing cycle. * **Auto Recharge as a buffer.** Automatically purchase credits when your balance falls below a set threshold. * **Explore credits count too.** If you upgraded from an Explore plan on or after April 2, 2026, your unused Explore credits are added to your buffer. Read on for details about each of these changes and other updates in this release. Get all pricing information in [Pricing & Billing](./ade-pricing). ## Prepaid Credits Replace Overage Charges Overage charges for subscription plans have been eliminated. Previously, if you exceeded your allocated credits for a billing cycle, you were billed after the fact. You can now purchase "Pay-As-You-Go" credits in advance. If you exceed your allocated credits for a billing cycle, the prepaid credits are used automatically. This change goes into effect for your upcoming billing cycle. To learn more, go to [Overages](./ade-pricing#overages) ## Auto Recharge Now Available on Subscription Plans Auto Recharge was previously available only on the Explore plan. It is now also available on subscription plans. Auto Recharge is enabled by default on subscription plans. When your credit balance falls below a set threshold, credits are purchased automatically. To learn more, go to [Auto Recharge](./ade-pricing#auto-recharge). ## Explore Credits Transfer When You Upgrade If you upgrade from a Personal account to a subscription plan on or after April 2, 2026, your unused credits transfer to your new organization and are added to your credit pool. This pool is drawn from if you exceed your credit allocation for a billing cycle, with credits that expire soonest consumed first. To learn more, go to [Upgrade Team Plans](./ade-pricing#upgrade-team-plans). ## Updated Credit Expiration Timeline When you create an account, you receive a set of free credits. These free credits expire 90 days after you create your account. ## Personal Accounts Are Now Organizations When you create an account, it is now automatically an organization. If you upgrade to a subscription plan, you keep the same organization. This means you get to keep your API keys, files, and settings. To learn more, go to [Organizations & Members](./ade-members). **Already on a subscription?** Your Personal account and subscription organization remain separate. This ensures that files and data in your Personal organization are not shared with members of your subscription organization. ## Custom Prompts for Figure Descriptions The and ADE Parse Jobs APIs include a new optional `custom_prompts` parameter that lets you tell how to describe figures during parsing. This is useful when the default figure descriptions do not fit your use case, such as for domain-specific charts or images unique to your organization. Pass your prompt in the `custom_prompts` parameter when calling the [ADE Parse](https://docs.landing.ai/api-reference/tools/ade-parse) or [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) APIs, or using the [Python](https://github.com/landing-ai/ade-python) and [TypeScript](https://github.com/landing-ai/ade-typescript) libraries. The `custom_prompts` parameter is only supported when using . For more information, see [Custom Prompts for Figure Descriptions](./ade-parse-custom-prompts). ## SSO Support for SAML 2.0 and OIDC Enterprise plans now support single sign-on (SSO) via SAML 2.0 and OpenID Connect (OIDC). SSO allows your organization to manage access through your existing identity provider (IdP). To get started, see [Single Sign-On (SSO)](./ade-sso). ## DPT-2 mini Updates includes the following updates: * **Improved table accuracy**: Table parsing accuracy has been improved for simple tables. * **Visual element captions**: now generates concise captions for image-based chunk types, including `figure`, `logo`, `card`, `attestation`, and `scan_code`. These changes apply to all snapshots. To learn more, go to [DPT-2 mini](./ade-parse-models#dpt-2-mini). ## New Parsing Model Snapshots New snapshots of and are now available: * : `dpt-2-20260302` * : `dpt-2-mini-20260302` `dpt-2-20260302` builds on previous versions with several improvements, including: * **Table boundary detection**: Tables that were previously split into multiple chunks are now correctly identified as a single table. * **Improved large table accuracy**: Large tables are now parsed more accurately. * **Special characters returned as Unicode**: Characters such as asterisks are now returned as their Unicode characters (for example, `*`) rather than as spelled-out strings like `asterisk`. The table boundary detection and large table accuracy improvements are also included in `dpt-2-mini-20260302`. To learn more about parsing models and snapshots, go to [Document Pre-Trained Transformers (Parsing Models)](./ade-parse-models). ### How This Affects Your Results If your code uses `dpt-2-latest`, `dpt-2`, `dpt-2-mini-latest`, or `dpt-2-mini`, it now points to these new snapshots and your parsing results may change. To receive the latest improvements automatically, continue using a `-latest` alias. To maintain consistent results over time, pin your code to a specific snapshot (for example, `dpt-2-20251103`). For more information, go to [Why Model Versioning Matters](./ade-parse-models#why-model-versioning-matters). ## Parse Password-Protected Files Accounts with [Zero Data Retention (ZDR)](./zdr) enabled can now parse password-protected files. Pass the document's password in the `password` parameter when calling the [ADE Parse](https://docs.landing.ai/api-reference/tools/ade-parse) or [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) APIs, or using the [Python](https://github.com/landing-ai/ade-python) and [TypeScript](https://github.com/landing-ai/ade-typescript) libraries. For more information, go to [Parse Password-Protected Files](./ade-parse-password). ## LandingLens and LandingEdge Documentation Has Moved Documentation for LandingLens and LandingEdge has moved to [landinglens.docs.landing.ai](https://landinglens.docs.landing.ai/). Existing links to docs.landing.ai/landinglens and docs.landing.ai/landingedge will automatically redirect to the new site. ## Extraction Schema Validation Fix The API now correctly validates `anyOf` sub-schemas before processing begins. If a sub-schema within `anyOf` is missing both the `type` and `anyOf` keywords, the API returns a 400 error identifying the invalid path in the schema. Previously, this caused an unexpected extraction failure. For more information about extraction response statuses and errors, go to [Troubleshoot Extraction](./ade-extract-troubleshoot). ## Confidence Scores The parsed results now include **confidence scores** for text, marginalia, card, and table chunks, as well as table cells. Confidence scores measure the confidence level of parsed text in Markdown with respect to the actual text or visual data present in the document. Each score ranges from 0.0 (low confidence) to 1.0 (high confidence). Lower values indicate regions where the model was less certain about the output. The confidence score feature is in Preview. The numeric range and distribution may change as we continue to develop and improve the model. ### Use Confidence Scores in the Playground After parsing a document in the Playground, turn on the **Confidence** toggle (the toggle is only available when using ). In the **Markdown** tab, the confidence score displays next to the chunk type. Turn on confidence scores ### Review Low-Confidence Sections Low-confidence sections (with scores of 0.95 or lower) are highlighted in yellow in the Playground. Highlighting low-confidence sections helps you quickly identify content that may need review. * **Text, card, marginalia chunks**: Specific text spans within a chunk are highlighted. A single chunk can contain multiple highlighted spans. * **Tables**: Each cell has its own confidence score. If a cell has a low score, the entire cell is highlighted. Low confidence score #### Why is 0.95 the threshold for low confidence scores? The Playground highlights content with confidence scores of 0.95 or lower because this is the threshold used internally to evaluate parsing quality. This threshold works well for general review workflows, but your use case may require a different threshold. When [using the API](./ade-json-response#confidence-score), you can implement custom logic to identify and route content based on confidence scores that match your specific requirements. #### Why don't I see the confidence score for files I uploaded in the past? The confidence score displays for documents parsed on February 12, 2026 or later. If you parsed a document before that date, the Playground does not re-process documents. If you want to see the newest results for a file that you've already parsed, re-upload it to the Playground. ### Use Confidence Scores in the API Response The `grounding` object in the API response includes all the confidence information displayed in the Playground: * Confidence scores for each chunk and table cell * For text chunks with low-confidence spans: the text, confidence score, and character positions (`start`, `end`) for each span This allows you to programmatically identify and route low-confidence content for review in your applications. Learn more in [Confidence Scores](./ade-json-response#confidence-score). will experience a technology change on February 17, 2026 that may affect parsing results, and will be fully deprecated on March 31, 2026. If you are using , update your code to use DPT-2. To learn more about DPT models, go to [Document Pre-Trained Transformers (Parsing Models)](./ade-parse-models). ## DPT-1 Underlying Technology Change: February 17, 2026 One of our underlying solution providers for is upgrading their technology on February 17, 2026. While will continue to function, table parsing behavior and results may change for some documents. ## DPT-1 Deprecation: March 31, 2026 Longer term, we plan to deprecate . After March 31, 2026, will no longer be supported and will be scheduled for shutdown. ## Impacted Users and Next Steps Check the scenarios below to determine if you are impacted and what action to take. Complete any necessary migration before February 17, 2026. * [Scenario 1: You Specify DPT-1 as the Model Parameter](#scenario-1-you-specify-dpt-1-as-the-model-parameter) * [Scenario 2: You Use the Legacy API Endpoint](#scenario-2-you-use-the-legacy-api-endpoint) * [Scenario 3: You Use the Legacy Python Library](#scenario-3-you-use-the-legacy-python-library) We recommend that you test any changes in staging first, especially if you process table-heavy documents. After testing in staging, deploy changes to production. ### Scenario 1: You Specify DPT-1 as the Model Parameter You are in this scenario if: * You call the `/v1/ade/parse` or `/v1/ade/parse-jobs` endpoint AND * You set the `model` parameter to `dpt-1`, `dpt-1-latest`, or `dpt-1-20250615` **What to do:** Update your code to use `model=dpt-2-latest` or a specific snapshot. For example, run the command below to use the latest snapshot of . ```shell theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/parse' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'document=@document.pdf' \ -F 'model=dpt-2-latest' ``` ### Scenario 2: You Use the Legacy API Endpoint You are in this scenario if you call the legacy endpoint: `v1/tools/agentic-document-analysis`. **What to do:** 1. Migrate to the `/v1/ade/parse` endpoint. To learn how to use this endpoint, go to [ADE Parse](./ade-separate-apis). 2. When calling the endpoint, use `model=dpt-2-latest` or a specific snapshot. For example, run the command below to use the latest snapshot of . ```shell theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/parse' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'document=@document.pdf' \ -F 'model=dpt-2-latest' ``` ### Scenario 3: You Use the Legacy Python Library You are in this scenario if you use the legacy `agentic-doc` Python library. **What to do:** 1. Migrate to the library. To learn how to use this library, go to [Python Library](./ade-python). 2. Update your code to use `model="dpt-2-latest"` or a specific snapshot when calling the `parse()` function. For example, use this code to use the latest snapshot of . ```python theme={null} response = client.parse( document=Path("/path/to/file/document"), model="dpt-2-latest" ) ``` When using the [Playground](https://va.landing.ai/), you can now load up to 10 files at once. After opening the Upload dialog box, navigate to and select multiple files or simply drag and drop the files. Upload multiple files ## Parsing Status After loading multiple files into the Playground, a floating dialog box appears in the bottom right corner, showing the parsing progress for each file. If a file fails to process, an error message displays in this area. Parsing status dialog showing upload progress ## Multiple File Upload Not Available with ZDR When [Zero Data Retention](./zdr) (ZDR) is enabled, only one file can be loaded at a time. ## Python and TypeScript Libraries Support Saving Responses to Local Directories The and libraries now support saving API responses directly to local directories as JSON files. The save parameter: * Creates the specified directory if it doesn't exist * Saves the response with the filename format: `{input_file}_{method}_output.json` * Works across all three core functions: parse, split, and extract ### Availability * Python library: v1.4.0 (parameter: `save_to`) * TypeScript library: v2.0.0 (parameter: `saveTo`) ### Example These examples show how to save the API response when parsing. ```python Python theme={null} from pathlib import Path from landingai_ade import LandingAIADE client = LandingAIADE() response = client.parse( document=Path("/path/to/document.pdf"), model="dpt-2-latest", save_to="output_folder" # optional: saves as {input_file}_parse_output.json ) ``` ```typescript TypeScript theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; const client = new LandingAIADE(); const response = await client.parse({ document: fs.createReadStream("/path/to/document.pdf"), model: "dpt-2-latest", saveTo: "output_folder" // optional: saves as {input_file}_parse_output.json }); ``` For more information, go to [Python Library](./ade-python) or [TypeScript Library](./ade-typescript). ## Table Cell Position Information When you parse documents with tables, the API response now includes row and column position information for each table cell in the `grounding` object. This allows you to map parsed data back to specific cell locations in the original table. Each table cell in the `grounding` object now includes a `position` field with: * `row`: Row position (zero-indexed) * `col`: Column position (zero-indexed) * `rowspan`: Number of rows the cell spans * `colspan`: Number of columns the cell spans * `chunk_id`: Associated chunk identifier **Why this matters:** * **Precise cell mapping**: You can now identify the exact row and column for each piece of data in your table * **Merged cell detection**: The rowspan and colspan values indicate when cells are merged * **Easier data validation**: You can verify that extracted data came from the expected cell location For more information about the grounding object structure, go to [Table Cell Position Information](./ade-json-response#table-cell-position-information). This release includes a new metadata field for tracking extraction model fallbacks, improved HTTP status codes for parse jobs with partial results, improved spreadsheet parsing, and documentation updates to Parse Jobs endpoint names. ## Fallback Model Version Field The API response now includes a `metadata.fallback_model_version` field. This field shows which extraction model was actually used if the API falls back from your requested model. For more information about the response structure, go to [JSON Response for Extraction](./ade-extract-response). ## Parse Jobs: Partial Content Now Returns a 206 Status When you retrieve parse job results with the ADE Get Parse Jobs API, the API now returns a 206 (Partial Content) HTTP status code if some pages failed during processing. The `failed_pages` array and `failure_reason` field that were already in the API response continue to provide details about which pages failed and why. For more information, go to [Troubleshoot Parsing](./ade-parse-troubleshoot#status-206-partial-content). ## Improved Title Detection in Spreadsheets The API now better identifies titles in spreadsheets. When a spreadsheet has a title or text at the top, the API now returns: * The title as a separate text chunk * The table as its own chunk ## You Can Now Resubscribe and Cancel in Interface If you want to resubscribe to or cancel a subscription plan, you can now do so directly in the interface. For full details, go to [Pricing & Billing](./ade-pricing#can-i-resubscribe). ## Updated API Endpoint Names We've updated the names of two Parse Jobs API endpoints: * "List Async Jobs" is now "**ADE List Parse Jobs**" * "Get Async Job Status" is now "**ADE Get Parse Jobs**" This is a documentation-only change. The endpoint URLs have not changed. Your existing API calls will continue to work without any modifications. ## Split Documents with the New Split API (Preview) We're releasing a Preview of the API, which classifies and separates a parsed document into multiple sub-documents based on Split Rules you define. This is useful when you receive batched documents containing multiple document types or multiple instances of the same document type. For example, a financial institution processing KYC documentation might receive a single PDF containing bank statements, utility bills, and identification documents for a customer. The API can automatically classify and separate each document type, enabling downstream processing systems to route each document appropriately. Get the full details in [Split](./ade-split). is in Preview. This feature is still in development and may not return accurate results. Do not use this feature in production environments. ### How It Works 1. Parse your document using the [ API](https://docs.landing.ai/api-reference/tools/ade-parse) to generate Markdown output 2. Define Split Rules that describe the document types or sections you want to identify 3. Call the API with the parsed Markdown and your Split Rules 4. The API returns each classified sub-document with its full Markdown content For the complete workflow, go to [Process Overview](./ade-split#process-overview). ### When to Use Split Use the API when you need to: * Separate batched documents containing multiple document types (invoices, receipts, contracts) * Split documents with repeated sections by unique identifiers (multiple pay stubs by date) * Organize multi-section documents into logical parts (academic articles with body, references, supplemental materials) * Route different document types to appropriate downstream systems For more use cases, go to [Example Use Cases](./ade-split#example-use-cases). ### How to Use Split The API is available through multiple interfaces: * [Playground](./ade-split#split-in-the-playground): Interactively create and test Split Rules * [API](./ade-split#split-with-the-api): Integrate directly into your applications * [Python Library](./ade-python#split-getting-started): Integrate into Python-based application with our Python library * [TypeScript Library](./ade-typescript#split-getting-started): Integrate into TypeScript-based application with our TypeScript library ## Revamped Playground We've launched a complete redesign to our [Playground](https://va.landing.ai/my/home)! The updated Playground now guides you through each step of the document processing process: **Parsing**, **Splitting**, and **Extraction**. Simply click a tile to get started! You can now see all the files you've processed on your Playground homepage, including which tools you've run on each file (parse, extract, split). We've also made it easier to get help with by adding **Product Update** and **Resources** panels. Revamped Playground ## The Parse Jobs API Supports up to 6,000-Page Documents The [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) API now supports documents up to 6,000 pages long. Previously, the limit was 1,000 pages. For more information, go to [Rate Limits for ADE Parse Jobs](./ade-parse-async#rate-limits-for-ade-parse-jobs). ## Improved Support for Partial Content with the Parse Jobs API We've improved how the [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) API handles partially parsed documents. Previously, if any pages failed to process, the job would fail with status `failed`. Now, the API processes all pages in the document. If some pages fail, the job completes with status `completed`, and the successfully processed pages are returned in the results. The `failed_pages` array in the metadata lists which pages failed, and the `failure_reason` field provides details about the failures. For more information, go to [Troubleshoot Parsing](./ade-parse-troubleshoot#check-for-partial-content). ## The Parse Jobs API Supports Additional Storage Providers for ZDR When calling the [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) API with [zero data retention](./zdr) (ZDR) enabled, you must include the `output_save_url` parameter. This parameter specifies the URL where parsed results are saved, ensuring that does not store the document content. We have now tested and confirmed support for Amazon S3, Azure Blob Storage, and Google Cloud Storage. Other storage providers that support PUT or CREATE operations via public or presigned URLs may also work. For detailed information, go to [Requirements for ZDR](./ade-parse-async#requirements-for-zdr). ## Credit Rounding Updated Credit usage for the API and is now rounded up to the nearest tenth decimal place instead of the nearest whole credit. For example, if a calculation results in 1.67 credits, the cost is now rounded up to 1.7 credits (previously would have been rounded up to 2 credits). For more information, go to [Pricing & Billing](./ade-pricing). ## DPT-2 mini Preview We've released a preview of , a lightweight parsing model optimized for simple, digitally native documents. consumes fewer credits than other parsing models, making it a cost-effective option for straightforward document processing. is in Preview. This model is still in development and may not return accurate results. Do not use this model in production environments. ### Credit Consumption consumes fewer credits than other parsing models. If ZDR is enabled, credit consumption increases. For pricing details, go to [Pricing & Billing](./ade-pricing#dpt-2-mini). ## DPT-2 Is Now Generally Available , the latest series of [parsing models](./ade-parse-models) for , is now generally available (GA). As part of going GA, we're releasing this new snapshot: `dpt-2-20251103`. This updated version offers improvements to table parsing, figure captioning, and chunk detection. For more information about parsing models, go to [Document Pre-Trained Transformers (Parsing Models)](./ade-parse-models). ### How This Affects Your API Calls The new snapshot `dpt-2-20251103` is now the default model. If you call the [ADE Parse](https://docs.landing.ai/api-reference/tools/ade-parse) or [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) API without specifying a `model` parameter (or if you use `dpt-2-latest`), your API calls will automatically use this latest snapshot. Your parsing results may change with this update due to improvements in table parsing, figure captioning, and chunk detection. ### Choose Your Approach You can choose between two approaches when setting the `model` parameter: **Get automatic improvements**: * Omit the `model` parameter, or set it to `dpt-2-latest` * Your API calls will automatically use the latest snapshot * You'll receive parsing improvements as new snapshots are released * Parsing results may change when new versions are released ```shell theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/parse' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'document=@document.pdf' \ -F 'model=dpt-2-latest' ``` **Maintain consistent results**: * Set the `model` parameter to a specific snapshot (like `dpt-2-20251103`) * Your parsing results will remain consistent over time * You won't automatically receive improvements from new snapshots ```shell theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/parse' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'document=@document.pdf' \ -F 'model=dpt-2-20251103' ``` For more information about model versioning and when to use each approach, go to [Model Versions and Snapshots](./ade-parse-models#model-versions-and-snapshots). ## Spreadsheet Support can now parse the following file types: * CSV (comma-separated values) * XLSX (Microsoft Excel) For more information, go to [Supported File Types](./ade-file-types). ## Credits Are Now Rounded Up Credit usage for the API is now rounded up to the nearest whole credit. For more information, go to [Pricing & Billing](./ade-pricing#credit-costs-for-the-extract-api). ## Support for Text Documents & Presentations can now parse the following file types: * DOC (Word) * DOCX (Word) * PPT (PowerPoint) * PPTX (PowerPoint) * ODT (OpenDocument Text) For more information, go to [Supported File Types](./ade-file-types). ## Document Length Limit Increased to 100 Pages now supports documents with up to 100 pages in both the [Playground](https://va.landing.ai/) and via the API. Need to parse longer documents? Use the [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) API to parse documents that are up to 1,000 pages or 1 GB. For more information, go to [Rate Limits](./ade-rate-limits). ## Subscriptions Now Available for EU Users The EU-hosted version of now offers monthly subscription plans. To see the plans and upgrade, go to the [EU Plans](https://va.eu-west-1.landing.ai/plan) page. The credit-based monthly subscription plans are designed to deliver more value and features to your team. All EU users start on our pay-as-you-go plan that comes with free credits to help you get started! Once you’re ready for production, upgrade to a monthly subscription plan to get access to these features: * More credits per dollar * One-click Zero Data Retention (ZDR) * Organization management * Role-based access control (RBAC) * API key management ## New APIs for Parsing Large Documents We have released new APIs that allow you to create parsing jobs. These APIs allow you to process large documents without blocking other operations, improving performance and user experience. To learn more about this workflow, go to [Parse Large Files (Parse Jobs)](./ade-parse-async). ### API Reference To learn more, go to the reference pages for new APIs: * [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) * [ADE Get Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-get-parse-jobs) * [ADE List Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-list-parse-jobs) ## Document Pre-Trained Transformers: You Can Now Pick a Parsing Model In this release, we're previewing a concept called (DPT). A DPT is the model that powers the parsing capabilities of the ADE Parsing APIs. The DPT identifies document layouts and chunks, then generates descriptive explanations (captions) for those chunks. The API initially launched with a single DPT model called . Because there was only one DPT, it was not surfaced to users. We are now introducing , which offers: * Improved performance for complex tables * Support for new chunk types (including barcodes and ID cards) * More precise captioning for figures With multiple DPT models now available, you can now select a DPT in both the Playground and when calling the API directly. For more information about models and how to use them, go to [Document Pre-Trained Transformers (Parsing Models)](./ade-parse-models). ## ADE Parse and ADE Extract Are Now Generally Available The and APIs are now Generally Available (GA). We recommend using these endpoints moving forward. ## New Python Library We've launched a new Python library to support extending the APIs: the library. Key benefits: * Support for the and APIs. * Support for setting the . * The library is automatically generated from our API specification, ensuring you have access to the latest endpoints and parameters. * The library is lightweight, which makes it suitable for resource-constrained environments like AWS Lambda functions. ## agentic-doc Library Transitioned to Legacy Status The [agentic-doc](https://github.com/landing-ai/agentic-doc) Python library has been transitioned to legacy status. Migrate to the new library, which is now the recommended Python library for . ## The tools/agentic-document-analysis Endpoint Is Now Legacy This endpoint has been transitioned to legacy status: `https://api.va.landing.ai/v1/tools/agentic-document-analysis`. Migrate to the new and APIs. ## Separate APIs for Parsing & Extraction In our original launch of , the field extraction function was part of the parsing function; every time you wanted to run extraction, you had to run parsing, even if you had already parsed the document. We are now introducing a Preview of two new endpoints that separate these functions: and . These APIs allow you to decouple parsing and extraction workflows for greater flexibility. You can now parse the document once with the API, and then use the API to run field extraction on that output multiple times. This is helpful if you want to experiment with different extraction schemas or you have multiple extraction tasks. To get detailed information about how to use these new APIs, go to [Separate APIs: Parse & Extract](./ade-separate-apis). ## Monthly Subscriptions We’re excited to announce a major update to [Agentic Document Extraction](https://va.landing.ai/demo/doc-extraction)! We’ve just launched credit-based monthly subscription plans designed to deliver more value and features to your team. Learn more about available plans in [Pricing](./ade-pricing). All users start on our pay-as-you-go plan that comes with free credits to help you get started! Once you’re ready for production, upgrade to a monthly subscription plan to get access to these new features: * More credits per dollar * One-click Zero Data Retention (ZDR) * Organization management * Role-based access control (RBAC) * API key management ## One-Click Zero Data Retention (ZDR) Users on subscription and Enterprise plans can turn on zero data retention (ZDR) directly in the user interface! This ensures that your documents are processed in-memory and are never stored at rest on LandingAI systems or by our sub-processors. To learn more, go to [Zero Data Retention (ZDR) Option Overview](./zdr). ## Organization Management Upgrading to a subscription or Enterprise plan automatically creates an organization. An organization contains all of the credits, members, API keys, and settings for the plan. To learn more, go to [Organizations & Members](./ade-members). ## Member Management: Role-Based Access Control (RBAC) Users on subscription and Enterprise plans can invite multiple users to their organization. These plans offer granular member controls, including the ability to: * invite members * assign roles to members that determine what functions they can perform * change member roles * revoke invitations * remove members To learn more, go to [Organizations & Members](./ade-members). ## API Key Management Users on subscription and Enterprise plans can create multiple API keys for their organization. These plans offer granular API key controls, including the ability to: * create API keys * revoke API keys To learn more, go to [API Key](./agentic-api-key). ## Agentic Document Extraction Now Available in Europe Agentic Document Extraction is now available in Europe. To learn more, go to [European Union (EU)](./ade-eu). Agentic Document Extraction in the EU provides: * **Data residency**: All data is stored and processed within the EU * **GDPR compliance**: Coming soon; learn more at our [Security and Data](https://landing.ai/security-at-landingai) page * **Regional performance**: Reduced latency for European users ## Improved Accuracy now delivers higher accuracy when extracting data from complex tables and multi-column layouts. ## Increased Processing Speed is now significantly faster than before, so you can process thousands of pages per minute. ## Process Longer Pages We've increased our page limits, so that you can process longer documents. For more information, go to [Rate Limits](./ade-rate-limits). ## Zero Data Retention Users on the Custom plan can enable a zero data retention policy, ensuring all data is deleted immediately after processing—supporting strict privacy and compliance requirements. For more information, [contact us](http://landing.ai/contact-va). ## Consolidated Chunk Types We consolidated these chunk types into `page_header`: * `page_header` * `page_footer` * `page_number` We consolidated these chunk types into `form`: * `form` * `key_value` For more information, go to [Chunk Types](./ade-chunk-types). # Chunk Types Source: https://docs.landing.ai/ade/ade-chunk-types ## Chunk Definition A **chunk** is a discrete element extracted from a document, such as a block of text, a table, or a figure. ## Chunk Overview When you send a document to the API, it analyzes the content on each page, breaks it down into meaningful elements, and returns each one as a chunk. Each chunk includes structured data that describes the content of the chunk and the location of the chunk in the document. This structure makes it easier to understand the extracted data and use it for downstream tasks. Extracted chunks are included in the API response. ## Semantic Chunking The API uses semantic chunking, which means it intelligently groups content based on meaning rather than just layout or formatting. Instead of splitting documents at arbitrary points like fixed lengths or paragraph breaks, the API identifies coherent units of information (like complete ideas, logical sections, or related data) and extracts them as individual chunks. Semantic chunking improves the relevance and usability of the extracted content, especially in downstream tasks like search, retrieval, and analysis. ## Why Do We Create Chunks? Chunking makes downstream tasks faster, more accurate, and easier to scale. It serves several key purposes: * **Enables downstream apps to process large documents efficiently**: Chunking allows applications like RAG systems and LLMs to index and retrieve smaller, meaningful segments instead of full documents. This helps avoid input size constraints, such as token limits. * **Improves retrieval granularity**: Smaller, semantically meaningful units allow for more accurate and relevant results in downstream tasks like question answering and summarization. * **Supports downstream semantic search and embeddings**: Well-structured chunks provide better inputs for embedding and make it easier to index and retrieve information during search. * **Maintains human readability**: Chunking reflects how a human would naturally read the document, maintaining the visual and logical relationships between elements on the page. ## Chunk Types Each chunk is labeled with a chunk type (`chunk_type` or `type`, depending on the API used), which identifies what kind of content it represents. The chunk types returned by are: * [`text`](#text) * [`table`](#table) * [`marginalia`](#marginalia) * [`figure`](#figure) * [`logo`](#logo): This is only available when using * [`card`](#card): This is only available when using * [`attestation`](#attestation): This is only available when using * [`scan_code`](#scan_code): This is only available when using ## Text A `text` chunk type is an element that consists entirely of characters (letters and numbers), such as: * paragraphs * titles and headings * lists * form fields * checkboxes * radio buttons * equations * code blocks * handwritten text ### Output for Key-Value Pairs If the `text` content has key-value pairs, like form fields, the extracted data will be returned as key-value pairs separated by line breaks (`\n`). ### Example: Paragraph Here is an example of the API marking a paragraph as a `text` chunk: Chunk Type: Text Here is the rendered Markdown for that chunk: Chunk Type: Text ### Example: Lists Here is an example of the API marking a list as a `text` chunk: Chunk Type: Text Here is the rendered Markdown for that chunk: Chunk Type: Text ## Table A `table` chunk type is a grid of rows and columns containing data. doesn't require gridlines to be present, and typically interprets well-aligned sets of data to be part of a table. For example, part of a receipt can be extracted as a table if the purchased items align with the costs. When you parse spreadsheets, sets of data are also interpreted as `table` chunks. ### Example: Receipt Here is an example of the API marking receipt line items as a `table` chunk: Chunk Type: Table Here is the rendered Markdown for that chunk: Chunk Type: Table ### Example: Earnings Statement Here is an example of the API marking part of an earnings statement as a `table` chunk: Chunk Type: Table Here is the rendered Markdown for that chunk: Chunk Type: Table ### Example: Spreadsheet Here is an example of the API marking data in a spreadsheet as a `table` chunk: Chunk Type: Table Here is the rendered Markdown for that chunk: Chunk Type: Table ## Marginalia A `marginalia` chunk type is a set of text in the top, bottom, or side margins of a document, including: * page headers * page footers * page numbers * handwritten notes in margins * line numbers on one side of a page #### Example: Header and Page Number Here is an example of the API marking a header and page number as a `page_header` chunk: Chunk Type: Page Header Here is the rendered Markdown for that chunk: Chunk Type: Page Header ## Figure A `figure` chunk type is an element that contains visual or graphical non-text content, including: * pictures * graphs (bar graphs, line graphs, etc.) * flowcharts * diagrams ### Example: Medical Imaging Here is an example of the API marking a pathology image as a `figure` chunk: Chunk Type: Figure Here is the rendered Markdown for that chunk: Chunk Type: Figure ### Example: Bar Chart Here is an example of the API marking a bar chart as a `figure` chunk: Chunk Type: Figure Here is the rendered Markdown for that chunk: Chunk Type: Figure ## Logo A `logo` chunk type identifies logos. The `logo` chunk type is only available when using . ### Example: Logo in Header Here is an example of the API marking a logo in a document header as a `logo` chunk: Chunk Type: Logo Here is the rendered Markdown for that chunk: Chunk Type: Logo ## Card A `card` chunk type identifies: * ID cards * driver licenses The `card` chunk type is only available when using . ### Example: Driver's License Here is an example of the API marking a driver's license as a `card` chunk: Chunk Type: Card Here is the rendered Markdown for that chunk: Chunk Type: Card ## Attestation An `attestation` chunk type includes: * signatures * stamps * seals The `attestation` chunk type is only available when using . ### Example: Signature Here is an example of the API marking a signature as an `attestation` chunk: Chunk Type: Attestation Here is the rendered Markdown for that chunk: Chunk Type: Attestation ## Scan\_code A `scan_code` chunk type identifies: * QR codes * bar codes The `scan_code` chunk type is only available when using . ### Example: Bar Codes Here is an example of the API marking two barcodes as `scan_code` chunks: Chunk Type: Scan Code Here is the rendered Markdown for these chunks: Chunk Type: Scan Code # Classify Source: https://docs.landing.ai/ade/ade-classify Use the [ API](https://docs.landing.ai/api-reference/tools/ade-classify) to classify each page in a document by type. You provide a document and a list of classes, and the API assigns a class to each page. Use those classes to decide how to handle each page downstream, for example which pages to parse, how to split a batch, or which extraction schema to apply. is in Preview. This feature is still in development and may not return accurate results. Do not use this feature in production environments. ## Example Use Cases * **Financial Services**: Financial institutions receiving batches of mixed documents can classify pages to identify bank statements, utility bills, and identification documents before routing them to the appropriate processing pipeline. * **Healthcare**: Healthcare systems ingesting patient records can classify pages to identify intake forms, pathology reports, and medication lists before parsing or extraction. * **Legal**: Legal teams processing incoming documents can classify pages by section type (for example, cover page, terms, signature page) before routing each page to the appropriate review workflow. * **Insurance**: Insurance companies receiving claim submissions can classify pages to identify claim forms, invoices, and medical records before extraction. For information about pricing and credits, go to [Pricing & Billing](./ade-pricing). ## Process Overview Follow these steps to classify pages in a document: 1. **Define your class list** by creating the classes that describe the page types in your document. Learn more about [Classes](#classes). 2. **Classify your document** using the [ API](https://docs.landing.ai/api-reference/tools/ade-classify). Pass your document and class list. All pages are classified concurrently. 3. **Use the classification results** in your downstream workflows. Route pages to [Parse](./ade-separate-apis), [Split](./ade-split), [Extract](./ade-extract), or other systems based on the class assigned to each page. Learn more about the [response structure](./ade-classify-response). ## Classes A class is an object with a required `class` name and an optional `description`. Include a description when the class name alone may be ambiguous (for example, `"spec"` vs `"manual"`). You can omit the description when the name is self-explanatory. You can mix classes with and without descriptions in the same request. ## Classify in the Playground Use the [Playground](https://va.landing.ai) to build and test classification classes before incorporating them into your code. Upload a document, define your classes, and validate the results. When the pages are classified as expected, use the downloaded classes with the [API](#classify-with-the-api), [Python library](./ade-python#classify-getting-started), or [TypeScript library](./ade-typescript#classify-getting-started) to classify documents at scale. 1. Go to the [Playground](https://va.landing.ai). 2. Select the file you want to classify. 3. Click **Classify**. 4. Click **Define Classes**. 5. List or describe the classes you want to apply to the pages and click **Suggest Classes**. (To create the class schema yourself, click **Start from Scratch**.) 6. The app generates classes and class descriptions based on your input. Review the generated classes and descriptions before continuing. 7. Click **Classify** to validate the classes on the selected file. (Click the drop-down menu to select whether to classify the **Current File**, **All Files**, or **Custom**.) 8. Review the results. If needed, continue editing the classes and their descriptions. 9. When the pages are classified as expected, [get a ready-to-use script for classification](#get-a-ready-to-use-classify-script) or [download the class schema](#export-the-class-schema-to-json) to use in your own code. ### Get a Ready-to-Use Classify Script After you create a class schema, generates a script to classify files based on the class schema you created. The Playground provides two versions: one for calling the API directly, and one for the [ library](./ade-python). **To get the script:** 1. Go to the [Playground](https://va.landing.ai/). 2. Open a project. 3. Click the **Classify** tab. 4. Click **Get Code**. Get the Code 5. The **View Code** pop-up opens. Click the **Library** or **API** tab to see the code for each method. 6. Click the **Download** or **Copy** buttons to get the code. View the Code ### Export the Class Schema to JSON After you create a class schema, you can export it as a JSON file that you can pass to the `classes` parameter in the [API](#classify-with-the-api), [Python library](./ade-python#classify-getting-started), or [TypeScript library](./ade-typescript#classify-getting-started). **To export the schema:** 1. Go to the [Playground](https://va.landing.ai/). 2. Open a project. 3. Click the **Classify** tab. 4. In the **Schema** panel, click **...** and select **Download Classes**. The classes and descriptions are downloaded as a JSON file. ## Classify with the API Classify the pages in a document by calling the endpoint. This example classifies a document that may contain invoices, bank statements, and earnings statements. The `invoice` and `bank_statement` classes include descriptions; `earnings statement` does not. ```shell theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/classify' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'classes=[{"class":"invoice","description":"A commercial bill with line items, totals, and payment terms"},{"class":"bank_statement","description":"A monthly summary of account transactions"},{"class":"earnings statement"}]' \ -F 'document=@document.pdf' ``` ### Parameters Get the full parameters from the [API reference](https://docs.landing.ai/api-reference/tools/ade-classify). | Parameter | Required | Description | | -------------- | --------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | | `document` | Required (choose one) | The document file to classify. | | `document_url` | Required (choose one) | A URL pointing to the document to classify. | | `classes` | Required | A JSON array of class objects. Each object requires a `class` name and accepts an optional `description`. Pass as a JSON string in form data. | | `model` | Optional | The classification model version to use (for example, `classify-20260420`). Defaults to the latest version. | ## Use Classify with Our Libraries Click one of the tiles below to learn how to use the API with our libraries. Use with our Python library. Use with our TypeScript library. ## Supported File Types The API supports all file types that Parse supports, except spreadsheets (CSV, XLSX), up to 200 MB. For the full list, see [Supported File Types](./ade-file-types). ## Share Your Feedback is in public preview and we are actively looking for feedback to improve it. To share your experience, [schedule a feedback session with us](https://landing-ai.zoom.us/zbook/landingai-fatimasalehbhai/landingai-ade-feedback). Come prepared to discuss: * What is working well * Any challenges you've encountered and how the feature could improve * The email address for your account, found in your [profile page](https://va.landing.ai/settings/personal/profile) * The documents you used * The code you used # JSON Response for Classification Source: https://docs.landing.ai/ade/ade-classify-response When you classify a document with the [ API](https://docs.landing.ai/api-reference/tools/ade-classify), the results are returned in a structured JSON format. ## Example Response The following example shows a classification response for a nine-page document containing bank statements and pay stubs: ```json [expandable] theme={null} { "classification": [ { "class": "bank statement", "reason": "Checking account statement for Sarah J. Mitchell from Builder's Bank for the period of September 1 to September 30, 2025.", "suggested_class": null, "page": 0 }, { "class": "bank statement", "reason": "Continuation of the transaction history and account summary for the September 2025 bank statement.", "suggested_class": null, "page": 1 }, { "class": "bank statement", "reason": "Checking account statement for Sarah J. Mitchell from Builder's Bank for the period of October 1 to October 31, 2025.", "suggested_class": null, "page": 2 }, { "class": "bank statement", "reason": "Continuation of the transaction history and account summary for the October 2025 bank statement.", "suggested_class": null, "page": 3 }, { "class": "bank statement", "reason": "Informational page containing bank policies, fraud protection details, and contact information.", "suggested_class": null, "page": 4 }, { "class": "pay stub", "reason": "Earnings statement from Acme Corporation for Sarah J. Mitchell for the pay period ending August 31, 2025.", "suggested_class": null, "page": 5 }, { "class": "pay stub", "reason": "Summary of gross pay, deductions, and net pay for the Acme Corporation earnings statement.", "suggested_class": null, "page": 6 }, { "class": "pay stub", "reason": "Earnings statement from Acme Corporation for Sarah J. Mitchell for the pay period ending September 15, 2025.", "suggested_class": null, "page": 7 }, { "class": "pay stub", "reason": "Summary of gross pay, deductions, and net pay for the second Acme Corporation earnings statement.", "suggested_class": null, "page": 8 } ], "metadata": { "filename": "SarahMitchell-loan-docs.pdf", "org_id": "f6t71a157bx9", "page_count": 9, "duration_ms": 10865, "credit_usage": 4.5, "job_id": "c19ab8c311db41b391c50a56a4cdea5c", "version": "classify-20260420" } } ``` ## Response Structure The response contains the following top-level fields: | Field | Type | Description | | ---------------------------------------------------- | ------ | ------------------------------------------------------------------------ | | [`classification`](#per-page-results-classification) | array | Per-page classification results. | | [`metadata`](#processing-metadata-metadata) | object | Processing information including credit usage, duration, and page count. | ## Per-Page Results (`classification`) The `classification` field contains an array of results, one per page in the document. Each object includes: | Field | Type | Description | | ----------------- | -------------- | ---------------------------------------------------------------------------------------------------------------------------------------- | | `page` | number | The page number (0-indexed). | | `class` | string | The class label assigned to this page (for example, `"Invoice"`). When a page cannot be confidently classified, this value is `unknown`. | | `reason` | string | An explanation of why this class was assigned. Use this to understand the classification decision and refine your class list if needed. | | `suggested_class` | string \| null | The closest matching class when `class` is `unknown`; otherwise `null`. | ### Unknown Pages When a page cannot be confidently classified into any of the classes you defined, the API returns `unknown` in the `class` field and a value in `suggested_class`. Use this to review borderline pages or adjust your class list. ## Processing Metadata (`metadata`) The `metadata` field provides information about the classification process: | Field | Type | Description | | -------------- | ------ | --------------------------------------------- | | `filename` | string | The name of the input document. | | `org_id` | string | Organization identifier. | | `page_count` | number | Total number of pages in the document. | | `duration_ms` | number | Processing time in milliseconds. | | `credit_usage` | number | Number of credits consumed. | | `job_id` | string | Unique identifier for the classification job. | | `version` | string | The model version used for classification. | # Troubleshoot Classification Source: https://docs.landing.ai/ade/ade-classify-troubleshoot Use this section to troubleshoot issues encountered when calling the API. ## Common Status Codes | Status Code | Name | Description | What to Do | | ----------- | ---------------------- | ----------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | | 200 | Success | Classification completed successfully. | Continue with normal operations. | | 400 | Bad Request | Invalid request parameters. | Review the error message. See [Status 400](#status-400-bad-request). | | 401 | Unauthorized | Missing or invalid API key. | Check that your API key header is present and contains a valid [API key](./agentic-api-key). | | 402 | Payment Required | Your account does not have enough credits to complete processing. | If you have multiple accounts, make sure you're using the correct [API key](./agentic-api-key). Add more credits to your account. | | 413 | Content Too Large | The document exceeds the 200 MB size limit. | Reduce the file size and retry. See [Status 413](#status-413-content-too-large). | | 415 | Unsupported Media Type | The document could not be read or converted. | Convert the document to PDF and retry. See [Status 415](#status-415-unsupported-media-type). | | 422 | Unprocessable Entity | Input validation failed. | Review your request parameters. See [Status 422](#status-422-unprocessable-entity). | | 429 | Too Many Requests | Rate limit exceeded. | Wait before retrying. Reduce request frequency and implement exponential backoff. | | 500 | Internal Server Error | Server error during classification. | Retry. If the issue persists, contact [support@landing.ai](mailto:support@landing.ai). See [Status 500](#status-500-internal-server-error). | | 502 | Bad Gateway | The classification service returned an error. | Retry. If the issue persists, contact [support@landing.ai](mailto:support@landing.ai). See [Status 502](#status-502-bad-gateway). | | 504 | Gateway Timeout | Request processing exceeded the timeout limit. | Reduce document size or number of pages. See [Status 504](#status-504-gateway-timeout). | ## Status 400: Bad Request This status code indicates invalid request parameters or client-side errors. Review the specific error message to identify the issue. ### Error: Invalid document URL This error occurs when the URL provided in `document_url` cannot be fetched. The URL may be invalid, unreachable, or return a non-200 response. **What to do:** * Verify the URL is correct and publicly accessible. * Confirm the server hosting the document does not require authentication or block external requests. * If the document is behind a firewall or requires credentials, download it locally and upload it using the `document` parameter instead. ## Status 413: Content Too Large This error occurs when the document you submit exceeds the 200 MB file size limit. **What to do:** * Reduce the file size before uploading. * If the document is a PDF, consider compressing it or splitting it into smaller files before classifying. ## Status 415: Unsupported Media Type This error occurs when a document in a supported office format (such as DOCX or PPTX) could not be converted to PDF for processing. This is a server-side conversion failure, not a format restriction. **What to do:** * Convert the document to PDF before uploading and retry. * If the error persists, contact [support@landing.ai](mailto:support@landing.ai). ## Status 422: Unprocessable Entity This status code indicates input validation failures. Review the error message and adjust your request parameters. ### Error: Unsupported file format This error occurs when you submit a file type that the API does not support. **What to do:** * Convert your document to a supported format before calling the API. * For supported file formats, see [Supported File Types](./ade-classify#supported-file-types). ### Error: Both document fields provided This error occurs when you include both `document` and `document_url` in the same request. **What to do:** * Include only one document input per request: either upload a file using `document` or provide a URL using `document_url`. ## Status 500: Internal Server Error This error indicates an unexpected server error occurred during classification. **What to do:** * Retry the request. * If the error persists, contact [support@landing.ai](mailto:support@landing.ai). ## Status 502: Bad Gateway This error occurs when the internal classification service fails to process the document, typically due to a transient error in the LLM service. **What to do:** * Retry the request. * If the error persists, contact [support@landing.ai](mailto:support@landing.ai). ## Status 504: Gateway Timeout This error occurs when the classification process exceeds the timeout limit. **What to do:** * Reduce the size of your document or the number of pages being classified. * If the error persists, contact [support@landing.ai](mailto:support@landing.ai). ## When Are Credits Consumed? Credits are consumed only when the API returns a 200 status code. All other responses, including errors, do not consume credits. # Code Examples and Resources Source: https://docs.landing.ai/ade/ade-code-examples-resources Use the resources below to access practical, ready-to-use code samples for common document processing use cases. Whether you need end-to-end examples or specific code snippets, these resources help you implement quickly. Explore end-to-end examples for common document processing workflows. Browse code samples with examples for the Python and TypeScript libraries. Some of these examples require the [Python](./ade-python) or [TypeScript](./ade-typescript) client libraries. Before running a script, set your API key and install the library and any required dependencies. # AI Coding Tools Source: https://docs.landing.ai/ade/ade-coding-resources This page provides resources for building (ADE) applications with AI coding tools like Claude Code, Cursor, and VS Code: * [ADE skills for AI coding agents](#skills) * [A documentation MCP server](#mcp-server) * [Shortcuts for connecting AI tools to docs pages](#shortcuts-for-connecting-ai-tools) * [Documentation pages in llms.txt format](#access-documentation-pages-as-llms-txt) * [Documentation pages as Markdown](#access-documentation-as-markdown) ## Skills Skills are instruction files that teach AI coding agents how to use effectively. offers ADE skills in the [ADE Document Processing Skills](https://github.com/landing-ai/ade-document-processing-skills) repository. ### Install Skills via the Claude Code Plugin Claude Code users can install the ADE skills as a plugin, which is an extension that bundles skills, agents, and other capabilities. Run the following commands in Claude Code to install the ADE skills plugin: ```bash theme={null} /plugin marketplace add landing-ai/ade-document-processing-skills /plugin install ade-document-processing@ade-document-processing-skills ``` After installation, run the following command to activate the plugin: ```bash theme={null} /reload-plugins ``` ### Install Skills with Other Methods Different AI coding tools have different methods for installing skills. Check the documentation for your AI coding tool to learn how to install skill files. Depending on the installation method, you may need to download or clone the [ADE Document Processing Skills](https://github.com/landing-ai/ade-document-processing-skills) repository and copy the skill files to the relevant directories. ## MCP Server The ADE MCP server connects AI tools to documentation and [ADE skills](#skills). When connected, your agent can search the documentation and access all ADE skills without additional setup. **MCP server URL:** `https://docs.landing.ai/mcp` Run the following command: ```bash theme={null} claude mcp add --transport http ade-docs https://docs.landing.ai/mcp ``` Go to **Settings** > **Connectors** and add the MCP server URL as a custom connector. Use the **Connect to Cursor** option in the [contextual menu](#shortcuts-for-connecting-ai-tools) on any docs page. Use the **Connect to VS Code** option in the [contextual menu](#shortcuts-for-connecting-ai-tools) on any docs page. ## Shortcuts for Connecting AI Tools Use the contextual menu on any docs page to connect to AI tools, copy content as Markdown, or install the ADE docs MCP server. Click the drop-down icon at the top of any page to open it. Contextual Menu | Option | What it does | | ------------------------ | ------------------------------------------------------- | | Copy page | Copies the page as Markdown | | View as Markdown | Opens the page in Markdown format | | Open in ChatGPT | Creates a ChatGPT conversation with the page as context | | Open in Claude | Creates a Claude conversation with the page as context | | Connect to Cursor | Installs the ADE docs MCP server in Cursor | | Connect to VS Code | Installs the ADE docs MCP server in VS Code | | Copy MCP server URL | Copies `https://docs.landing.ai/mcp` to your clipboard | | Copy MCP install command | Copies the MCP server install command to your clipboard | ## Access Documentation Pages as llms.txt Access an index of all documentation pages at [llms.txt](https://docs.landing.ai/llms.txt). For a combined file of all documentation content, use [llms-full.txt](https://docs.landing.ai/llms-full.txt). These files follow the [llms.txt standard](https://llmstxt.org) for AI tool consumption. ## Access Documentation as Markdown View any page as plain Markdown by adding `.md` to the page URL. # European Union (EU) Source: https://docs.landing.ai/ade/ade-eu is available in the European Union (EU) here: [https://va.eu-west-1.landing.ai/](https://va.eu-west-1.landing.ai/). in the EU provides: * **Data residency**: All data is stored and processed within the EU * **GDPR compliance**: The EU-hosted version of is compliant with the General Data Protection Regulation (GDPR) * **Regional performance**: Reduced latency for European users ## GDPR and Compliance Refer to the resources below to learn more about GDPR and compliance. Enterprise plans support SSO via SAML 2.0 and OpenID Connect (OIDC), allowing your organization to manage access through your existing identity provider (IdP). The Trust Center is your central resource for accessing our security documentation, compliance reports, and real-time system status. This page outlines our security posture, compliance with industry standards, and the measures we take to safeguard your data across our products and infrastructure. ## EU Pricing For EU subscription plans and pricing, go to [Pricing & Billing](./ade-pricing#eu-pricing). ## Differences When Using the EU Using in the EU works the same as the default US deployment, with only a few key differences outlined in this article: * [Create an account](#create-an-account-in-the-eu): Use the EU URL * [Get your API key](#get-your-api-key-for-the-eu): Use the EU URL * [Call the API directly](#call-the-api-directly-in-the-eu): Use the EU endpoint * [Use the library](#use-the-library-with-the-eu): Set the EU endpoint as an environment variable ### Create an Account in the EU Create an account and access the Playground in the EU here: [https://va.eu-west-1.landing.ai/home](https://va.eu-west-1.landing.ai/home). ### Get Your API Key for the EU To get your API key for the EU, go to [https://va.eu-west-1.landing.ai/settings/api-key](https://va.eu-west-1.landing.ai/settings/api-key). Use this API key when using the library or calling the API directly in the EU. API keys are deployment-specific. An API key created in the US will not work for the EU, and vice versa. ### Call the API Directly in the EU To ensure your API calls are processed in the EU, replace the default base URL with the EU base URL: **US:** `https://api.va.landing.ai`\ **EU:** `https://api.va.eu-west-1.landing.ai` For example: * US: `https://api.va.landing.ai/v1/ade/parse` * EU: `https://api.va.eu-west-1.landing.ai/v1/ade/parse` ### Use the Library with the EU To connect the library to the EU deployment, set the `environment` parameter to `eu` when initializing the client. ```python Python theme={null} from landingai_ade import LandingAIADE client = LandingAIADE( environment="eu", ) ``` ```typescript TypeScript theme={null} import LandingAIADE from "landingai-ade"; const client = new LandingAIADE({ environment: "eu" }); ``` # Extract Data Source: https://docs.landing.ai/ade/ade-extract ## Overview Use the [ API](https://docs.landing.ai/api-reference/tools/ade-extract) to pull specific data fields from parsed documents. You define a schema that specifies which fields to extract, and returns their values in a structured, predictable format. is designed for high-volume, repeatable workflows: use it when you need to retrieve the same set of fields from many documents, such as pulling invoice totals, contract dates, or form field values. Results are consistent across documents with varying layouts because extraction is schema-driven. ## Extraction Capabilities The following capabilities are available with model [`extract-20260314`](./ade-extract-models#extract-20260314) or later. * **Unlimited schema size**: No limits on the number of fields, nested levels, or characters in a schema. * **Semantic field matching**: Use the [`x-alternativeNames`](./ade-extract-schema-json#alternative-names) keyword to define alternative labels for a field. The model maps fields by meaning, so fields with different names across documents resolve to the same schema field. * **Consistent formatting**: Use the [`format`](./ade-extract-schema-json#format) keyword to specify how extracted values should be formatted. * **Improved handling of large content**: Better extraction from large tables, large arrays, and long documents. * **Cross-page table reconstruction**: Tables that span page breaks are returned as a single array, with no post-processing needed. For schema-building capabilities including master schemas and schema drift detection, see [Build Extraction Schemas with the API](./ade-extract-schema-api). ## Run Parse First runs after [Parse](./ade-separate-apis), which is required as the first step in all ADE workflows. It can also follow [Split](./ade-split) if you're working with multi-document files. ## Get Started: Extraction Workflow You can use the schema extraction wizard directly in our [Playground](https://va.landing.ai/) to build and validate an extraction schema. The Playground generates scripts that you can then copy and use in your own code: 1. Use the schema extraction wizard in our [Playground](./ade-extract-playground) to build a schema tailored to your documents. Build a Schema with the Wizard 2. Copy the script for the method you plan on using: the [ library](./ade-python#extract-getting-started) or the [API](#use-ade-extract-to-extract-fields-from-markdown). Export the Relevant Format 3. Paste the script into your code. ## Use the ADE Extract API to Extract Fields from Markdown Use the API to extract data from the Markdown output created by the [ API](./ade-separate-apis). See the full API reference [here](https://docs.landing.ai/api-reference/tools/ade-extract). ### Specify Documents to Run Extraction On The API offers two parameters for specifying the document you want to extract from: * `markdown`: Specify the actual Markdown file you want to run extraction on. * `markdown_url`: Include the URL to the Markdown file you want to run extraction on. ### Set the Extraction Schema Set the extraction schema in the `schema` parameter. The schema must meet specific format and property requirements. For detailed guidance, see [JSON Schema for Extraction](./ade-extract-schema-json). ### Set the `strict` Parameter Use the optional `strict` parameter to control how the API handles schemas that include [keywords that cause errors](./ade-extract-schema-json#keywords-that-cause-errors). * If `strict` is `false`: the API continues processing and returns a `206` (Partial Content). * If `strict` is `true`: the API stops processing and returns a `422` (Unprocessable Entity). In both cases, the API returns `422` if the schema fails validation, and `206` if the extracted output does not conform to the schema after extraction completes. ### Extracted Output For details about the extraction response structure and fields, see [JSON Response for Extraction](./ade-extract-response). ## Run Extract with Our Libraries Click one of the tiles below to learn how to run the [ API](https://docs.landing.ai/api-reference/tools/ade-extract) with our libraries. Run Extract with our Python library. Run Extract with our TypeScript library. ## Use Parse Markdown for Best Results The API is optimized for Markdown generated by the API. The parsed output includes element IDs, anchor tags, chunk tags, and other metadata that uses during the extraction process. can also process generic Markdown files or edited Parse Markdown, but results may be less accurate. For best results: * Use only Markdown output from the API, not generic Markdown files. * Do not edit the Markdown from before passing it to . # Link Extracted Data to Document Locations Source: https://docs.landing.ai/ade/ade-extract-grounding-sample ## Overview When you run the API, the response includes an `extraction_metadata` field with reference IDs that connect each extracted value back to its location in the original document. This tutorial shows you how to use those references. This tutorial uses the [ library](./ade-python) and [ library](./ade-typescript). In this tutorial, we will: * Parse this PDF: Pay Stub * Extract these fields: **Employee Name** and **Gross Pay** * Save a PNG of each extracted field's location * Save bounding box coordinates for each extracted field to a JSON file These examples require the [Python](./ade-python) or [TypeScript](./ade-typescript) client library. Before running a script, set your API key and install the library and any required dependencies. The Python script has been tested with PDF and PNG files and may work with other file types supported by . The TypeScript script is written specifically for PDF files. If you need to export crops from other file types, use it as a reference and adapt the image export logic. ## 1. Download the Document to Process Download the Pay Stub and save it to a local directory. ## 2. Create the Script Copy the script for your language and save it as `grounding.py` or `grounding.ts` in the same directory as the PDF. ```python Python [expandable] theme={null} import json import pymupdf from pathlib import Path from landingai_ade import LandingAIADE # Initialize client (uses VISION_AGENT_API_KEY environment variable) client = LandingAIADE() # Define the extraction schema schema = json.dumps({ "type": "object", "properties": { "employee_name": { "description": "The employee's full name", "type": "string" }, "gross_pay": { "description": "The gross pay amount", "type": "number" } } }) # Parse the document # save_to is optional, but saves the full parse response, which is useful for # keeping a record and for other downstream processing tasks parse_response = client.parse( document=Path("pay-stub.pdf"), model="dpt-2-latest", save_to="output" ) # Extract data extract_response = client.extract( schema=schema, markdown=parse_response.markdown, model="extract-latest" ) # Save the extraction results with open("output/pay-stub_extract_output.json", "w") as f: json.dump(extract_response.to_dict(), f, indent=2) # Open the PDF for PNG export pdf = pymupdf.open("pay-stub.pdf") # Link each extracted field to its location in the document grounding_results = {} for field_name, field_data in extract_response.extraction_metadata.items(): for chunk_id in field_data["references"]: # Skip table cell IDs not present in grounding if chunk_id not in parse_response.grounding: continue grounding = parse_response.grounding[chunk_id] # Collect extracted value and bounding box coordinates grounding_results[field_name] = { "value": extract_response.extraction[field_name], "page": grounding.page, "location": { "left": round(grounding.box.left, 3), "top": round(grounding.box.top, 3), "right": round(grounding.box.right, 3), "bottom": round(grounding.box.bottom, 3) } } # Crop the chunk and save as a PNG page_image = pdf[grounding.page].get_pixmap(dpi=150) left = int(grounding.box.left * page_image.width) right = int(grounding.box.right * page_image.width) top = int(grounding.box.top * page_image.height) bottom = int(grounding.box.bottom * page_image.height) crop = page_image.pil_image().crop((left, top, right, bottom)) crop.save(f"output/{field_name}.png") pdf.close() # Save grounding results to a JSON file with open("output/pay-stub_grounding_output.json", "w") as f: json.dump(grounding_results, f, indent=2) ``` ```typescript TypeScript [expandable] theme={null} import * as mupdf from "mupdf"; import LandingAIADE, { toFile } from "landingai-ade"; import fs from "fs"; // Initialize client (uses VISION_AGENT_API_KEY environment variable) const client = new LandingAIADE(); // Define the extraction schema const schema = JSON.stringify({ type: "object", properties: { employee_name: { description: "The employee's full name", type: "string" }, gross_pay: { description: "The gross pay amount", type: "number" } } }); // Parse the document // saveTo is optional, but saves the full parse response, which is useful for // keeping a record and for other downstream processing tasks const parseResponse = await client.parse({ document: fs.createReadStream("pay-stub.pdf"), model: "dpt-2-latest", saveTo: "output" }); // Extract data const extractResponse = await client.extract({ schema: schema, markdown: await toFile(Buffer.from(parseResponse.markdown), "document.md"), model: "extract-latest" }); // Save the extraction results fs.mkdirSync("output", { recursive: true }); fs.writeFileSync( "output/pay-stub_extract_output.json", JSON.stringify(extractResponse, null, 2) ); // Open the PDF for PNG export const pdfBuffer = fs.readFileSync("pay-stub.pdf"); const mupdfDoc = mupdf.Document.openDocument(pdfBuffer, "application/pdf").asPDF()!; // Link each extracted field to its location in the document const groundingResults: Record = {}; for (const [fieldName, fieldData] of Object.entries(extractResponse.extraction_metadata)) { for (const chunkId of fieldData.references) { if (!parseResponse.grounding?.[chunkId]) continue; const grounding = parseResponse.grounding[chunkId]; groundingResults[fieldName] = { value: (extractResponse.extraction as Record)[fieldName], page: grounding.page, location: { left: parseFloat(grounding.box.left.toFixed(3)), top: parseFloat(grounding.box.top.toFixed(3)), right: parseFloat(grounding.box.right.toFixed(3)), bottom: parseFloat(grounding.box.bottom.toFixed(3)) } }; // Crop the chunk and save as a PNG const scaleFactor = 150 / 72; const page = mupdfDoc.loadPage(grounding.page); const bounds = page.getBounds(); const fullWidth = Math.round((bounds[2] - bounds[0]) * scaleFactor); const fullHeight = Math.round((bounds[3] - bounds[1]) * scaleFactor); const left = Math.round(grounding.box.left * fullWidth); const top = Math.round(grounding.box.top * fullHeight); const right = Math.round(grounding.box.right * fullWidth); const bottom = Math.round(grounding.box.bottom * fullHeight); const cropPixmap = new mupdf.Pixmap(mupdf.ColorSpace.DeviceRGB, [left, top, right, bottom], false); cropPixmap.clear(255); const device = new mupdf.DrawDevice(mupdf.Matrix.scale(scaleFactor, scaleFactor), cropPixmap); page.run(device, mupdf.Matrix.identity); device.close(); fs.writeFileSync(`output/${fieldName}.png`, cropPixmap.asPNG()); } } // Save grounding results to a JSON file fs.writeFileSync( "output/pay-stub_grounding_output.json", JSON.stringify(groundingResults, null, 2) ); ``` ## 3. Run the Script Run the script from the same directory: ```bash Run Python theme={null} python grounding.py ``` ```bash Run TypeScript theme={null} npx tsx grounding.ts ``` ## 4. View Output The script saves the following files to the `output` folder: | File | Description | | -------------------------------- | ----------------------------------------------------------------- | | `pay-stub_parse_output.json` | Full parse response, including all chunks and grounding data. | | `pay-stub_extract_output.json` | Extraction results, including extracted values and reference IDs. | | `pay-stub_grounding_output.json` | Extracted values and bounding box coordinates for each field. | | `employee_name.png` | Cropped image of the chunk where the employee name was found. | | `gross_pay.png` | Cropped image of the chunk where the gross pay was found. | ### Chunk Coordinates Each entry in `pay-stub_grounding_output.json` includes the page number and bounding box coordinates. Coordinates are normalized values between 0 and 1, relative to the page dimensions: ```json theme={null} { "employee_name": { "value": "JANE HARPER", "page": 0, "location": { "left": 0.08, "top": 0.785, "right": 0.933, "bottom": 0.837 } }, "gross_pay": { "value": 452.43, "page": 0, "location": { "left": 0.306, "top": 0.331, "right": 0.438, "bottom": 0.345 } } } ``` ## Next Steps Now that you have a working script, you can: * Replace `pay-stub.pdf` with any document you want to parse and extract from. * Modify the `schema` dictionary to extract different fields. For guidance, see [Extraction Schema (JSON)](./ade-extract-schema-json). * Use the Playground to build and test a schema before adding it to your code. See [Schema Wizard](./ade-extract-playground). # Extraction Model Versions Source: https://docs.landing.ai/ade/ade-extract-models An extraction model powers the field extraction capabilities of the API. It analyzes your Markdown content and extracts structured data according to your JSON schema. You can specify a model when calling the API directly or when using the [client libraries](#set-the-model-with-the-client-libraries). If you don't specify a model, the API uses the latest extraction model (currently `extract-20260314`). ```shell {5} theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/extract' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'schema=...' \ -F 'markdown=@markdown.md' \ -F 'model=extract-20260314' ``` ## Model Versions The following table lists the available `model` values for the API: | Model Values | Description | | ------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | | `extract-20260314` | Use the extraction model snapshot released on March 14, 2026. For more information, go to [extract-20260314](#extract-20260314). This is the default model. | | `extract-latest` | Use the latest extraction model snapshot. | These models have been deprecated and will result in errors: `extract-20250930` and `extract-20251024`. ### Why Model Versioning Matters When integrating the API, you have two options for specifying the model: 1. **Use `extract-latest`** to always get the newest version. This automatically gives you improvements and updates, but extraction results may change when new model versions are released. 2. **Use a specific version** (like `extract-20260314`) to pin to an exact model version. This ensures consistent extraction results over time, but you won't receive improvements. ## extract-20260314 This model version introduces the following capabilities: * **Unlimited schema size**: No limits on the number of fields, nesting levels, or characters in a schema. * **Semantic field matching**: Use the [`x-alternativeNames`](./ade-extract-schema-json#alternative-names) keyword to define alternative labels for a field. The model maps fields by meaning, so fields with different names across documents resolve to the same schema field. * **Cross-page table reconstruction**: Tables that span page breaks are returned as a single array, with no post-processing needed. * **Master schemas**: Generate a single schema from multiple documents to handle field and layout variation across document types. Available in the [Playground](./ade-extract-playground) and via the [ API](./ade-extract-schema-api). * **Schema drift detection**: Update an existing schema when new or changed fields appear in your documents. Available in the [Playground](./ade-extract-playground) and via the [ API](./ade-extract-schema-api). This version also introduces the `metadata.warnings` field in the API response. For more information, go to [Warnings](./ade-extract-troubleshoot#warnings). Extraction model `extract-20260314` has different JSON schema requirements than the previous model. Learn about all schema requirements in [Extraction Schema (JSON)](./ade-extract-schema-json). ## Set the Model in the API When calling the endpoint, you can set the model using the `model` parameter. If you omit the `model` parameter, the API uses the latest model. This example shows how to specify a model: ```shell {5} theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/extract' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'schema={"type": "object", "properties": {"field1": {"type": "string"}, "field2": {"type": "string"}}}' \ -F 'markdown=@markdown.md' \ -F 'model=extract-latest' ``` ## Set the Model with the Client Libraries When using the Python or TypeScript library, you can set the model using the `model` parameter in the `extract()` method. If you omit the `model` parameter, the library will use the latest extraction model. ```python {20} Python theme={null} import json from pathlib import Path from landingai_ade import LandingAIADE # Define your extraction schema schema_dict = { "type": "object", "properties": { "field1": {"type": "string"}, "field2": {"type": "string"} } } client = LandingAIADE() schema_json = json.dumps(schema_dict) response = client.extract( schema=schema_json, markdown=Path("/path/to/output.md"), model="extract-latest" ) ``` ```typescript {26} TypeScript theme={null} import LandingAIADE, { toFile } from "landingai-ade"; import fs from "fs"; // Define your extraction schema const schemaDict = { type: "object", properties: { field1: { type: "string" }, field2: { type: "string" } } }; const client = new LandingAIADE(); const schemaJson = JSON.stringify(schemaDict); // Parse the document first const parseResponse = await client.parse({ document: fs.createReadStream("/path/to/document.pdf"), model: "dpt-2-latest" }); // Extract with the specified model const extractResponse = await client.extract({ schema: schemaJson, markdown: await toFile(Buffer.from(parseResponse.markdown), "document.md"), model: "extract-latest" }); ``` ## The Playground Uses the Most Recent Model The [Playground](https://va.landing.ai/) always uses the most recent extraction model. If you pin a specific model version in your code, results may differ slightly from what you see in the Playground. ## The Model Impacts the Schema Requirements Different model versions have different [JSON schema requirements](./ade-extract-schema-json). For details on supported keywords, field types, and structure, see [Extraction Schema (JSON)](./ade-extract-schema-json). # Schema Wizard: Build Extraction Schemas in the Playground Source: https://docs.landing.ai/ade/ade-extract-playground Use the schema wizard in the [Playground](https://va.landing.ai/demo/doc-extraction) to build and test an extraction schema. The schema defines which fields to extract from your documents and is used with the [ API](./ade-extract). You can also create a schema [manually](./ade-extract-schema-json) or with the [ API](./ade-extract-schema-api). The wizard is a good option if: * You want the Playground to enforce schema rules and formatting requirements automatically, without needing to know them yourself * You want immediate visual feedback on how the schema performs on your files * You prefer a visual, guided experience over writing JSON directly ## Workflow Use our AI-powered tools to generate a schema. Update and edit the schema, and see how it works with your document. Export your schema to use with our library or API. All files and their corresponding schemas are saved to a project. You can access these in **Projects**. ## Create a Schema * [Create a Schema with AI](#create-a-schema-with-ai) * [Upload a Schema](#upload-a-schema) ### Create a Schema with AI **Suggest Schema** calls the [ API](./ade-extract-schema-api) and consumes credits. See [Build Extract Schema API pricing](./ade-pricing#credit-costs-for-the-build-extract-schema-api). The Extract tool in the Playground generates a schema based on the files in your project. It factors in the data in the files and the file structures. Fields and layouts from each file are incorporated, so the schema covers variation across all the documents in the project. To create an extraction schema: 1. Go to the [Playground](https://va.landing.ai/). 2. Click the **Extract** tile. Start Extract 3. Upload files and click **Create Project**. (Continue to the next step while the files are parsed in the background.) 4. (Optional) Enter natural-language instructions for the extraction in the **Guidelines** text box. This can include: * Data to include * Data to exclude * Context about the files 5. Click **Suggest Schema**. Smart Suggestions 6. The app reviews the files in the project and creates a master schema. Review the generated fields before continuing. Smart Suggestions 7. Click [**Run This File**](#run-the-schema-on-your-files) to validate the schema on the selected file. Smart Suggestions 8. When the schema is ready, [use the schema](#use-the-schema). ### Upload a Schema If you have an existing extraction schema you want to edit, you can upload it to the Playground to validate it. You can then edit it just like any schema created in the Playground. **Start a new project with your own schema** To start with your own extraction schema: 1. Go to the [Playground](https://va.landing.ai/). 2. Click the **Extract** tile. Start Extract 3. Upload files and click **Create Project**. (Continue to the next step while the files are parsed in the background.) 4. Click **Upload JSON Schema**. Upload JSON Schema 5. Select the JSON file you want to load. 6. The app loads the JSON file into the **Schema** panel. **Upload a schema to an existing project** You can also upload a schema after you've started building one. The uploaded schema is saved as a new version. To upload a schema to an existing project: 1. Go to the [Playground](https://va.landing.ai/). 2. Open a project. 3. Click the **Extract** tab. 4. In the **Schema** panel, click **...** and select **Import Schema**. Import Schema 5. Select the JSON file you want to load. 6. The app loads the JSON file into the **Schema** panel. ## Edit and Validate the Schema After creating a schema in the [Playground](https://va.landing.ai/), you can edit and validate it. You can add fields, update descriptions, remove fields, and validate the full schema. * [Edit, Add, or Remove Fields Manually](#edit-add-or-remove-fields-manually) * [Refine Schema with AI](#refine-schema-with-ai) * [Run the Schema on Your Files](#run-the-schema-on-your-files) * [Start Over](#start-over) ### Edit, Add, or Remove Fields Manually Click a field to see and edit additional information, like the format, description, and alternative names. For detailed information about the field options, go to [Extraction Schema (JSON)](./ade-extract-schema-json). To add, edit, or remove fields: 1. Go to the [Playground](https://va.landing.ai/). 2. Open a project. 3. Click the **Extract** tab. 4. If you have multiple versions of the schema, make sure that the schema you want to use is selected. 5. Make your changes: * **Add a field**: Scroll to the bottom of the **Schema** panel and click **New Field**. * **Edit a field**: Click the field and update it. * **Remove a field**: Hover over the field and click the **Delete** icon. 6. After making any changes to the schema, click **Update Schema**. ### Refine Schema with AI **Refine Schema** calls the [ API](./ade-extract-schema-api) and consumes credits. See [Build Extract Schema API pricing](./ade-pricing#credit-costs-for-the-build-extract-schema-api). After creating a schema, you can edit it using the AI-powered "Refine Schema" tool. To refine a schema using AI: 1. Go to the [Playground](https://va.landing.ai/). 2. Open a project. 3. Click the **Extract** tab. 4. If you have multiple versions of the schema, make sure that the schema you want to use is selected. 5. You can do one or both of the following: * Describe the edits you want to make in the **Refine Schema** text box. * If you want to add more files that should be factored into the schema, click the **+** icon and select those files. 6. Click **Update**. The schema updates based on the input you provided. Refine Schema ### Run the Schema on Your Files **Run This File** calls the [ API](./ade-extract) and consumes credits. See [Extract API pricing](./ade-pricing#credit-costs-for-the-extract-api). Run the schema on one or more files to validate that it extracts the expected data. To run the schema: 1. Go to the [Playground](https://va.landing.ai/). 2. Open a project. 3. Click the **Extract** tab. 4. In the **Files** panel, select the file you want to run the schema on. 5. If you have multiple versions of the schema, make sure that the schema you want to use is selected. 6. Click **Run This File**. Run the extraction schema on the file The **Extracted Results** panel refreshes and displays two sets of content: * **Extraction**: The list of extracted key-value pairs. * **Extract Metadata**: The key-value pairs and their unique IDs (`references`). For more information about extraction output, go to [JSON Response for Extraction](./ade-extract-response). Extracted Results ### Start Over After you create a schema, you can start over. This creates a new blank schema, saved as the next version, while retaining all existing schema versions. You can then manually add fields, refine the schema with AI, or import a schema. To start over: 1. Go to the [Playground](https://va.landing.ai/). 2. Open a project. 3. Click the **Extract** tab. 4. In the **Schema** panel, click **...** and select **Start Over**. Start Over 5. When prompted, click **Start Fresh**. ## Use the Schema After you create and validate a schema, you are ready to use it. * [Get a Ready-to-Use Parse and Extract Script](#get-a-ready-to-use-parse-and-extract-script): Get a ready-to-use script for Parse and Extract that uses the schema you created. * [Export the Schema](#export-the-schema): Download the schema as a JSON file to use in your own code. ### Get a Ready-to-Use Parse and Extract Script After you create a schema, generates a script that covers both the and steps using the schema you built. The Playground provides two versions: one for calling the APIs directly, and one for the [ library](./ade-python). **To get the script:** 1. Go to the [Playground](https://va.landing.ai/). 2. Open a project. 3. Click the **Extract** tab. 4. If you have multiple versions of the schema, make sure that the schema you want to use is selected. 5. Click **Code**. View the Schema Code 6. The **View Code** pop-up opens. Click the **Library** or **API** tab to see the code for each extraction method. 7. Click the **Download** or **Copy** buttons to get the code. Get the Schema Code ### Export the Schema After you create a schema, you can export it as a downloadable JSON file to use in your own code. **To export the schema:** 1. Go to the [Playground](https://va.landing.ai/). 2. Open a project. 3. Click the **Extract** tab. 4. If you have multiple versions of the schema, make sure that the schema you want to use is selected. 5. In the **Schema** panel, click **...** and select **Export Schema**. Export Schema 6. The extraction schema is downloaded as a JSON file. ## Schema Versions The Playground saves a new version of the schema each time you run **Suggest Schema**, **Refine Schema**, or import a schema. Previous versions are retained even after you use **Start Over**. Use schema versions to compare results or roll back to an earlier version if a schema update doesn't perform as expected. To view a previous version, click the drop-down menu in the **Schema** panel and select the version you want to see. Schema Versions ## Download or Copy Extracted Data You can download or copy extracted data directly from the Playground. This is useful for spot-checking results during schema development. For production use, retrieve extracted data through the [ API](./ade-extract) or [ library](./ade-python). To download or copy the extracted data: 1. Go to the [Playground](https://va.landing.ai/). 2. Open a project. 3. Click the **Extract** tab. 4. If the **Extracted Results** panel doesn't display, create or edit a schema and click **Run This File**. 5. Click the **Download** or **Copy** buttons to get the extracted data. Copy the Extracted Values # JSON Response for Extraction Source: https://docs.landing.ai/ade/ade-extract-response When you extract structured data with the [ API](https://docs.landing.ai/api-reference/tools/ade-extract), the extracted data and metadata are returned in a structured JSON format. ## Response Structure The response contains the following top-level fields: * [`extraction`](#extracted-data-extraction): The extracted key-value pairs as defined by your schema. * [`extraction_metadata`](#extraction-metadata-extraction_metadata): Metadata showing which chunks were referenced for each extracted field. * [`metadata`](#processing-metadata-metadata): Processing information including credit usage, duration, filename, job ID, version, schema validation errors, and warnings. ## Extracted Data (`extraction`) The `extraction` field contains the structured data extracted from your document, formatted according to your JSON schema. The structure matches your input schema exactly. For a simple schema: ```json theme={null} { "type": "object", "properties": { "employee_name": { "type": "string", "description": "The employee's full name" }, "employee_ssn": { "type": "string", "description": "The employee's Social Security Number" }, "gross_pay": { "type": "number", "description": "The gross pay amount" } } } ``` The `extraction` field returns: ```json theme={null} { "employee_name": "MICHAEL D BRYAN", "employee_ssn": "555-50-1234", "gross_pay": 6000.00 } ``` ## Extraction Metadata (`extraction_metadata`) The `extraction_metadata` field has the same structure as your extraction schema, but each field contains a dictionary with `references` that lists the HTML element IDs where the data was found. The `references` field can contain: * **Chunk IDs**: UUID-format IDs (e.g., `72ba3cca-01e5-407b-9fc4-81f54f9f0c51`) that reference entire chunks like text blocks or figures * **Table cell IDs**: Format `{page_number}-{base62_sequential_number}` (e.g., `0-u`) when extracted data comes from table cells * **Other HTML element IDs**: Any ID attribute from HTML elements within the [`markdown` fields](./ade-markdown-response) from the parsed output This metadata is useful for: * Tracing which parts of the document contributed to each extracted field * Debugging extraction issues * Building confidence scores or validation logic * Creating audit trails for extracted data ### Simple Schema Metadata For a simple extraction schema, the metadata includes the value and references for each field. When data is extracted from text chunks, references contain chunk IDs (UUIDs): ```json theme={null} { "employee_name": { "value": "MICHAEL D BRYAN", "references": [ "72ba3cca-01e5-407b-9fc4-81f54f9f0c51" ] }, "employee_ssn": { "value": "555-50-1234", "references": [ "a3f5d8c9-2b4e-4a1c-8f7e-9d6c5b4a3e2f" ] } } ``` When data is extracted from table cells, references contain table cell IDs: ```json theme={null} { "employee_name": { "value": "JANE HARPER", "references": [ "75a62de4-5120-44bf-a6dd-b2aa63db18c6" ] }, "gross_pay": { "value": 452.43, "references": [ "0-u" ] } } ``` In this example, `"0-u"` is a table cell ID where `0` indicates page 0 and `u` is the base62-encoded sequential number for that cell. ### Nested Schema Metadata For nested extraction schemas, the metadata preserves the same nested structure: ```json theme={null} { "patient_details": { "patient_name": { "value": "John Smith", "references": [ "72ba3cca-01e5-407b-9fc4-81f54f9f0c51" ] }, "date": { "value": "2024-01-15", "references": [ "72ba3cca-01e5-407b-9fc4-81f54f9f0c51" ] } }, "emergency_contact_information": { "emergency_contact_name": { "value": "Jane Smith", "references": [ "5b8865b9-1a81-46df-bcf7-0bdbed9130dc" ] }, "relationship_to_patient": { "value": "Spouse", "references": [ "5b8865b9-1a81-46df-bcf7-0bdbed9130dc" ] } } } ``` ## Processing Metadata (`metadata`) The `metadata` field provides information about the extraction process: | Field | Type | Description | | ------------------------ | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `filename` | string | The name of the input file. | | `org_id` | string | Organization identifier. | | `duration_ms` | number | Processing time in milliseconds. | | `credit_usage` | number | Number of credits consumed. | | `job_id` | string | Unique job identifier. | | `version` | string | Model version used for extraction. For more information, go to [Extraction Model Versions](./ade-extract-models). | | `fallback_model_version` | string \| null | The extract model that was actually used when the initial extraction attempt failed with the requested version. `null` if no fallback occurred. | | `schema_violation_error` | string \| null | Error message if extracted data doesn't conform to the input schema. `null` if the schema is valid. For more information, go to [Troubleshoot Extraction](./ade-extract-troubleshoot). | | `warnings` | array | Structured warning objects generated during extraction. Each warning has a `code` that identifies the warning type and a `msg` (human-readable description). For more information, go to [Warnings](./ade-extract-troubleshoot#warnings). | ## Example Response Here is a complete example showing the extraction response structure: ```json theme={null} { "extraction": { "account_holder_name": "SUSAN SAMPLE", "account_number": "02782-5094431", "closing_balance": 3664.79 }, "extraction_metadata": { "account_holder_name": { "references": [ "55d1b275-b539-4af7-ac85-3396f62c216e" ], "value": "SUSAN SAMPLE" }, "account_number": { "references": [ "dd784487-910c-4cb7-8bb2-d4a7ef3d952b", "0-3" ], "value": "02782-5094431" }, "closing_balance": { "references": [ "0-e" ], "value": 3664.79 } }, "metadata": { "credit_usage": 0.8583999999999999, "duration_ms": 9893, "filename": "upload.md", "job_id": "1c08692d839c4c33bd85aebfafeab5e9", "org_id": "lgy5xucm9xnq", "version": "extract-20260314", "fallback_model_version": null, "schema_violation_error": null, "warnings": [] } } ``` # Build Extraction Schemas with the API Source: https://docs.landing.ai/ade/ade-extract-schema-api Use the [ API](https://docs.landing.ai/api-reference/tools/ade-build-schema) to programmatically generate a JSON extraction schema from the Markdown output of the [ API](./ade-separate-apis). The API analyzes the Markdown content and returns a schema you can pass directly to the [ API](./ade-extract). Each call to the API consumes credits. See [Build Extract Schema API pricing](./ade-pricing#credit-costs-for-the-build-extract-schema-api). ## When to Use the Schema Builder API The [ API](https://docs.landing.ai/api-reference/tools/ade-build-schema) is useful when you want to automate schema creation or refinement as part of a larger pipeline, without using the [Playground schema wizard](./ade-extract-playground). Use the API to: * [Build a master schema](#generate-a-master-schema-from-markdown-files) from multiple documents to handle field and layout variation across document types. * [Detect schema drift](#detect-schema-drift-and-refine-an-existing-schema) by passing updated documents alongside an existing schema to surface new or changed fields before they reach your pipeline. For schema format requirements and supported field types, see [Extraction Schema (JSON)](./ade-extract-schema-json). ## API Reference See the full API reference [here](https://docs.landing.ai/api-reference/tools/ade-build-schema). Endpoint: `https://api.va.landing.ai/v1/ade/extract/build-schema` ## Request Parameters At least one of `markdowns`, `markdown_urls`, or `prompt` must be provided. | Parameter | Type | Required | Description | | --------------- | ---------------- | -------- | ----------------------------------------------------------------------------------------------------------------------------- | | `model` | string | No | The extraction model to use. Use `extract-latest` for the latest version. | | `markdowns` | file or string | No | One or more Markdown files or inline Markdown strings to analyze. Provide multiple Markdown files for better schema coverage. | | `markdown_urls` | array of strings | No | URLs to Markdown files to analyze. | | `prompt` | string | No | Instructions for how to generate or modify the schema. | | `schema` | string | No | An existing JSON schema to refine or iterate on. | ## Response The response contains: * `extraction_schema` (string): The generated JSON schema, returned as a string. * `metadata`: Includes `job_id`, `duration_ms`, `credit_usage`, and `version`. ## Workflows ### Generate a Master Schema from Markdown Files Pass one or more Markdown files to generate a schema based on the content. The API identifies the fields present in the Markdown and returns an extraction schema. ```bash theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/extract/build-schema' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'markdowns=@markdown.md' \ -F 'model=extract-latest' ``` To build a master schema that covers multiple document types, pass multiple Markdown files that represent the range of layouts you expect to process. The API generates a single schema that handles field and layout variation across all of them: ```bash theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/extract/build-schema' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'markdowns=@markdown_1.md' \ -F 'markdowns=@markdown_2.md' \ -F 'model=extract-latest' ``` ### Generate a Schema from a Prompt Use the `prompt` parameter to specify which fields to extract. This is useful when you only need a subset of the fields in the Markdown file, or when you want to shape the field names and structure. ```bash theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/extract/build-schema' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'markdowns=@markdown.md' \ -F 'model=extract-latest' \ -F 'prompt=Extract the vendor name, invoice date, and total amount due' ``` You can also use `prompt` without any Markdown input to generate a schema based on instructions alone: ```bash theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/extract/build-schema' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'model=extract-latest' \ -F 'prompt=Create a schema for extracting patient name, date of birth, and insurance provider from medical intake forms' ``` ### Detect Schema Drift and Refine an Existing Schema Pass an existing schema in the `schema` parameter to refine it. This is useful for schema drift detection: if a new document type enters your pipeline (for example, invoices from a new vendor that uses a different layout and field names), you can pass the new Markdown alongside your current schema. The API surfaces new or changed fields so you can update the schema before it affects your pipeline. To refine a schema based on a Markdown file: ```bash theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/extract/build-schema' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'markdowns=@markdown.md' \ -F 'model=extract-latest' \ -F 'schema={"type":"object","properties":{"vendor":{"type":"string"},"total":{"type":"number"}}}' ``` To update a schema based on a prompt: ```bash theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/extract/build-schema' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'model=extract-latest' \ -F 'schema={"type":"object","properties":{"vendor":{"type":"string"},"total":{"type":"number"}}}' \ -F 'prompt=Add a field for the invoice number and make the total field return a string with the currency symbol' ``` # Extraction Schema (JSON) Source: https://docs.landing.ai/ade/ade-extract-schema-json ## Overview An extraction schema is a JSON object that defines which fields to extract from a document and how to structure the output. You pass the schema to the API along with Markdown content generated by the API. You can build extraction schemas using the [Playground](https://va.landing.ai/) or the [ API](./ade-extract-schema-api), both of which generate valid schemas automatically. Use this article as a reference for understanding how schemas are structured and how the API handles unsupported keywords. The schema requirements in this article apply to [`extract-20260314`](./ade-extract-models#extract-20260314) and later. For information about earlier versions, see [Earlier Versions of Extract](#earlier-versions-of-extract). When using the library, you can also pass a Pydantic model class instead of a JSON schema. For more information, go to [Python Library](./ade-python). ## Basic Structure The extraction schema must follow this structure: ```json theme={null} { "type": "object", "properties": { "your_field_name": { "type": "string", "description": "Describe what to extract." } } } ``` **Example** This schema extracts two fields from an invoice. It also includes the optional `x-alternativeNames` keyword to account for field name variations across documents: ```json theme={null} { "type": "object", "properties": { "total_invoice_amount": { "description": "The total monetary value of the invoice, including all charges and taxes.", "x-alternativeNames": [ "Total Invoice Amount", "Grand Total", "Amount Due" ], "type": "number" }, "bank_name": { "description": "The official name of the bank where the payment will be deposited.", "x-alternativeNames": [ "Bank Name", "Beneficiary Bank", "Financial Institution" ], "type": "string" } } } ``` Schemas generated by the [Playground](./ade-extract-playground) or the [ API](./ade-extract-schema-api) automatically include root-level `description` and `required` keywords. These are currently ignored by the API. ### Top-Level Type Requirement The top-level `type` keyword must be `"object"`. Schemas with a different top-level type will return an error. In the following example, the highlighted line shows the required top-level `type` keyword: ```json highlight={2} theme={null} { "type": "object", "properties": {} } ``` ## Define Each Field Each field you want to extract is defined as a property inside the `properties` keyword. Each property can include the following: | | Required | Description | | ------------------------------------------ | --------------------------- | ---------------------------------------------------------------------------- | | [Field name](#field-names) | Required | The key used to identify the extracted value in the output. | | [`type`](#supported-field-types) | Required | The data type of the extracted value. | | [`description`](#field-descriptions) | Optional | Natural-language description of what to extract. | | [`enum`](#restrict-values-with-enum) | Optional | Restricts the extracted value to a set of allowed values. | | [`format`](#format) | Optional | Instructions for how to format the extracted value. | | [`x-alternativeNames`](#alternative-names) | Optional | Alternative labels for the field that may appear across different documents. | | [`properties`](#properties-for-objects) | Required for `object` types | Defines the fields within a nested object. | | [`items`](#format-arrays-with-items) | Required for `array` types | Defines the structure of items in an array. | ### Field Names **Required.** The field name is the key that identifies each property in the `properties` object and determines how the extracted value is labeled in the output. Field names can contain letters, numbers, underscores, and hyphens. Use descriptive, specific names that clearly indicate what data to extract: * `invoice_number` instead of `number` * `patient_name` instead of `name` In the following example, the highlighted lines are field names: ```json highlight={4, 13} theme={null} { "type": "object", "properties": { "account_holder": { "description": "The name of the individual or entity who owns the bank account.", "x-alternativeNames": [ "Account Holder", "Account Owner", "Primary Account Holder" ], "type": "string" }, "ending_balance": { "description": "The total amount of money in the account at the end of a specific period.", "x-alternativeNames": [ "Ending Balance", "Final Balance", "Closing Balance" ], "type": "number" } } } ``` ### Supported Field Types **Required.** Use the `type` keyword to define the data type of the extracted value. Supported types are: | Type | Description | | --------- | -------------------------------------------------------------------------------------------------------------------------------------- | | `array` | A list of items. To see an example, go to [Arrays](#format-arrays-with-items). | | `boolean` | True or false values. | | `integer` | Whole numbers. | | `number` | Numeric values, including decimals. Use this type for monetary values or when you need to perform calculations on the extracted value. | | `object` | A nested structure. To see an example, go to [Nested Objects](#nested-objects). | | `string` | Text values. | In the following example, the highlighted line shows the `type` keyword: ```json highlight={12} theme={null} { "type": "object", "properties": { "statement_date": { "description": "The date on which the financial statement was issued.", "format": "Month DD, YYYY", "x-alternativeNames": [ "Statement Date", "Date of Statement", "Billing Date" ], "type": "string" } } } ``` To restrict a field to a specific set of allowed values, use the [`enum`](#restrict-values-with-enum) keyword. ### Restrict Values with Enum **Optional.** Use the `enum` keyword to restrict the extracted value to a specific set of allowed values. Only string values are supported. Include `"type": "string"` in the field definition. Any non-string values are converted to strings. For example, the following schema restricts the extracted account type to one of three allowed values. The highlighted lines show the `type` and `enum` keywords: ```json highlight={5, 12-16} theme={null} { "type": "object", "properties": { "account_type": { "type": "string", "description": "The classification of the bank account, such as checking or savings, with specific features.", "x-alternativeNames": [ "Account Type", "Type of Account", "Account Category" ], "enum": [ "Premium Checking", "Basic Checking", "Savings" ] } } } ``` In the Playground, "enum" appears as a field type option in the **Type** drop-down menu. When you export the schema, the Playground outputs this correctly using the `enum` keyword. ### Field Descriptions **Optional.** Use the `description` keyword to help the API identify and extract the correct data. The more specific your descriptions, the more accurate the extraction. Include the following in your descriptions: * Exactly what data to extract * What to include or exclude (for example, "excluding tax" or "including area code") In the following example, the highlighted line shows the `description` keyword: ```json highlight={6} theme={null} { "type": "object", "properties": { "total_amount": { "type": "number", "description": "Total amount in USD, excluding tax" } } } ``` ### Format **Optional.** Use the `format` keyword to specify how the extracted value should be formatted. This is most commonly applied to `string` fields. The `format` keyword accepts natural-language instructions and standard [JSON Schema format values](https://www.learnjsonschema.com/2020-12/format-annotation/format/). Natural-language instructions offer more flexibility, since you can describe formatting requirements that don't have a standard equivalent. We recommend experimenting with different values to find what works best for your use case. The following examples illustrate the range of options: | `format` value | Output example | | ------------------------------------------------------- | ------------------ | | `YYYY-MM-DD` | `2026-01-17` | | `Month DD, YYYY` | `January 17, 2026` | | `Currency amount with the $ symbol, for example $12.50` | `$170.23` | | `Two-letter US state code` | `CA` | In the following example, the highlighted line shows the `format` keyword: ```json highlight={6} theme={null} { "type": "object", "properties": { "statement_date": { "type": "string", "format": "YYYY-MM-DD", "description": "The date on which the financial statement was issued." } } } ``` ### Alternative Names **Optional.** Use the `x-alternativeNames` keyword to list alternative labels for a field. This helps the API locate the correct data when documents use different labels for the same field, such as "Invoice Number" versus "Reference Number." In the following example, the highlighted lines show the `x-alternativeNames` keyword: ```json highlight={6-10} theme={null} { "type": "object", "properties": { "total_amount": { "type": "number", "x-alternativeNames": [ "Total Amount", "Grand Total", "Amount Due" ], "description": "The total monetary value of the invoice." } } } ``` ### Properties (For Objects) **Required for `object` types.** Use the `properties` keyword to define the fields within a nested object. For a full example, see [Nested Objects](#nested-objects). ### Format Arrays with `items` **Required for `array` types.** Use the `items` keyword to define the structure of items in an array. For a full example, see [Arrays](#arrays). ## Nested Objects Use nested objects when the data you want to extract has a natural hierarchical structure. For example, an invoice might include a billing address with multiple sub-fields (street, city, state, and ZIP code), or a patient form might have separate sections for personal details and insurance information. In the following example, the highlighted lines show a nested `properties` keyword inside the `invoice` object: ```json highlight={6-19} theme={null} { "type": "object", "properties": { "invoice": { "type": "object", "properties": { "number": { "type": "string", "description": "Invoice number" }, "date": { "type": "string", "description": "Invoice date" }, "total": { "type": "number", "description": "Total amount" } } } } } ``` ## Arrays Use arrays to extract repeating structures from a document, such as all rows in a table. Each item in the array follows the same schema, making arrays well-suited for data like transaction lists, invoice line items, or lists of charges. To define an array field: 1. Set `"type": "array"` on the field. 2. Include the `items` keyword to define the structure of each item. 3. Inside `items`, define the fields each item should contain. The following example extracts financial transactions from a bank statement, where each transaction is an object with three fields: ```json highlight={11, 12-44} theme={null} { "type": "object", "properties": { "transactions": { "description": "A list of all financial transactions recorded for the account.", "x-alternativeNames": [ "Transaction History", "Activity Log", "Movements" ], "type": "array", "items": { "type": "object", "properties": { "date": { "description": "The date when the transaction occurred.", "format": "YYYY-MM-DD", "x-alternativeNames": [ "Transaction Date", "Date of Transaction", "Activity Date" ], "type": "string" }, "description": { "description": "A brief description of the transaction.", "x-alternativeNames": [ "Transaction Description", "Details", "Item" ], "type": "string" }, "type": { "description": "The type of transaction, for example debit, credit, deposit, or withdrawal.", "x-alternativeNames": [ "Transaction Type", "Category", "Kind" ], "type": "string" } } } } } } ``` ## Keyword Support The API only supports a specific set of JSON Schema keywords. Unsupported keywords either cause errors or are silently ignored, depending on the keyword. ### Supported Keywords The API supports the following keywords. For details on each, see [Define Each Field](#define-each-field). * [`description`](#field-descriptions) * [`enum`](#restrict-values-with-enum) (string values only) * [`format`](#format) * [`items`](#format-arrays-with-items) * [`properties`](#properties-for-objects) * [`type`](#supported-field-types) * [`x-alternativeNames`](#alternative-names) ### Ignored Keywords The following keywords are not supported but will not cause errors. The API removes or resolves them before running extraction. | Keyword | How the API handles it | | ---------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | Reference keywords: `$anchor`, `$defs`, `$id`, `$ref`, `$schema`, `definitions` | Used to define and reference reusable schema components. All references are resolved during schema conversion. The keywords are then removed. | | Recursive and dynamic keywords: `$dynamicAnchor`, `$dynamicRef`, `$recursiveAnchor`, `$recursiveRef` | Used for self-referential schemas. All references are resolved during schema conversion. The keywords are then removed. | | `anyOf` | To ensure consistent output, each field is limited to a specific type.

If one of the `anyOf` types is `null`, the API removes `null` and sets the `type` to the other specified type.

If none of the types are `null`, the API falls back to `string`, since that is least likely to cause issues. | | `default` | If `default` is `null`, the API removes the keyword. If `default` is any other value, the API returns a 206. | | `nullable` | The API removes the keyword. All fields are nullable by default. For more information, see [Missing Fields and Nullable Fields](#missing-fields-and-nullable-fields). | | `required` | The API removes the keyword. The API considers all fields to be required. | | `title` | The API removes the keyword. Typically, the title does not give additional information that is not already present in the field name or description. | ### Keywords That Cause Errors Whether the API returns a `206` (Partial Content) or `422` (Unprocessable Entity) depends on the `strict` parameter. If `strict` is `false`, the API returns a `206`. If `strict` is `true`, the API returns a `422`. For more information, see [Set the strict Parameter](./ade-extract#set-the-strict-parameter). **Any keyword not listed in [Supported Keywords](#supported-keywords) or [Ignored Keywords](#ignored-keywords) will cause the API to return an error.** The following list provides common examples and is not exhaustive: * `allOf` * `const` * `maxItems` * `maxLength` * `maximum` * `minItems` * `minLength` * `minimum` * `oneOf` * `pattern` * `propertyOrdering` * `uniqueItems` ## How the API Handles Required Fields The API treats all fields as required and always attempts to extract every property defined in your schema. The `required` keyword is not supported and is ignored if included. For more information, see [Ignored Keywords](#ignored-keywords). If the API cannot find a field in the document, it returns `null` rather than an error. For more information, see [How the API Handles Missing Fields](#how-the-api-handles-missing-fields). ## How the API Handles Missing Fields All fields are nullable, meaning the API returns `null` when it cannot find a field in the document rather than returning an error. Because of this, the API ignores `null` and `nullable` if included in your schema. For more information, see [Ignored Keywords](#ignored-keywords). For example, if your schema includes a `first_name` field but the document does not contain a first name, the API returns `null` for that field. The exact behavior depends on the field type: | Field type | Behavior when not found | | ----------------------------------------------------------- | ----------------------------------------------------------------------- | | Primitive fields (`boolean`, `integer`, `number`, `string`) | Returns `null`. | | `array` | Returns an empty array: `[]`. | | `object` | Never returns `null`, but all primitive fields within it return `null`. | ## Schema Validation The API processes your schema and output in three stages: * [Before Extraction: Validate Schema Structure](#validate-schema-structure-before-extraction) * [During Processing: Convert Schema](#convert-schema-during-processing) * [After Extraction: Validate Extracted Output Against Schema](#validate-extracted-output-against-schema-after-extraction) ### Validate Schema Structure (Before Extraction) The API checks that your schema is valid JSON and follows the required structure. If validation fails, the API returns a `422` error before processing begins. ```json theme={null} { "error": "Invalid JSON schema provided for fields_schema." } ``` ### Convert Schema (During Processing) The API converts your schema before running extraction. If the schema includes [keywords that cause errors](#keywords-that-cause-errors), the behavior depends on the `strict` parameter: * If `strict` is `false`: the API continues and returns a `206` (Partial Content). * If `strict` is `true`: the API stops and returns a `422` (Unprocessable Entity). For more information, see [Set the strict Parameter](./ade-extract#set-the-strict-parameter). ### Validate Extracted Output Against Schema (After Extraction) After extraction completes, the API validates that the extracted output matches your schema. If it does not, the API returns a `206` (Partial Content) with any successfully extracted data. Because the API returns at least partial results, the API call consumes credits. ## FAQs for Extraction Schemas ### Does the order of properties in the schema need to match the order of the fields in the document? No. The order of properties in the schema has no impact on extraction. For example, a property defined last in the schema can still extract data from a field that appears at the top of the document. ### Is there a maximum number of fields that can be extracted? No. There is no maximum number of properties in an extraction schema. ### Can I put formatting and alternative names in the description field? You can include formatting instructions and alternative field names in `description`, but using the dedicated `format` and `x-alternativeNames` keywords is more effective. When you use dedicated keywords, the API knows exactly what each piece of information is: `format` contains only formatting instructions, and `x-alternativeNames` contains only alternative field labels. This allows the API to apply each more precisely than if the same information were embedded in a general description. For more information, see [Format](#format) and [Alternative Names](#alternative-names). ### What is the best practice for writing field descriptions? Be as specific as needed so there is no ambiguity about what to extract. Think of it as natural-language instructions (like prompting an LLM). The clearer and more specific it is, the better the results. You may want to iterate on your descriptions to find what works best for your use case. For more information, see [Field Descriptions](#field-descriptions). ## Earlier Versions of Extract The supported schema structure for the API has changed over time. Extraction schemas generated by before April 2, 2026 may not be compatible with the current API. Update your schema to meet the requirements described in this article. # Troubleshoot Extraction Source: https://docs.landing.ai/ade/ade-extract-troubleshoot Use this section to troubleshoot issues encountered when calling the extraction APIs: * : /v1/ade/extract * : /v1/ade/extract/build-schema ## Common Status Codes These status codes apply to all extraction endpoints. | Status Code | Name | Description | What to Do | | ----------- | ----------------- | ----------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | | 401 | Unauthorized | Missing or invalid API key. | Check that your `apikey` header is present and contains a valid [API key](./agentic-api-key). | | 402 | Payment Required | Your account does not have enough credits to complete processing. | If you have multiple accounts, make sure you're using the correct [API key](./agentic-api-key). Add more credits to your account. | | 429 | Too Many Requests | Rate limit exceeded. | Wait before retrying. Reduce request frequency and implement exponential backoff. | ## ADE Extract This section covers errors for the API. ### Status Codes | Status Code | Name | Description | What to Do | | ----------- | --------------------- | ---------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | 200 | Success | Extraction completed successfully and extracted data conforms to the schema. | Continue with normal operations. | | 206 | Partial Content | Extraction completed but extracted data does not fully conform to the schema. | Review the `schema_violation_error` field in the response and adjust your schema or document. See [Status 206: Partial Content](#status-206-partial-content). | | 400 | Bad Request | Invalid request due to malformed input, unsupported version, or client-side extraction errors. | Review error message for specific issue. See [Status 400: Bad Request](#status-400-bad-request). | | 422 | Unprocessable Entity | Input validation failed. | Review your request parameters. See [Status 422: Unprocessable Entity](#status-422-unprocessable-entity). | | 500 | Internal Server Error | Server error during processing. | Retry. If the issue persists, contact [support@landing.ai](mailto:support@landing.ai). See [Status 500: Internal Server Error](#status-500-internal-server-error). | | 504 | Gateway Timeout | Request processing exceeded the timeout limit (475 seconds). | Reduce document size or simplify extraction schema. See [Status 504: Gateway Timeout](#status-504-gateway-timeout). | ### Status 206: Partial Content This response occurs when extraction completes successfully but the extracted data does not fully conform to the provided JSON schema. The API returns a 206 status code with the extracted data and a `schema_violation_error` field that contains details about the validation failure. For `extract-20260314` and above, the `metadata.warnings` field also contains structured warning objects. See [Warnings](#warnings). The errors in this section may appear in the `schema_violation_error` field when you receive a 206 status code. Because the API returns at least partial results, the API call consumes credits. **What to do:** * Review the specific validation error in the `schema_violation_error` field. * Verify the JSON schema follows the guidelines described in [Extraction Schema (JSON)](./ade-extract-schema-json). Update your JSON schema if needed. * Check if the document contains the expected data in the expected format. #### Extracted data does not completely conform to the requested schema This occurs when the extracted data violates the JSON schema validation rules. The message includes details from the validation process. **Message:** ``` Extracted data does not completely conform to the requested schema. See below error for details. {validation_error_details} Please read our documentation for more information: https://docs.landing.ai/ade/ade-extract-troubleshoot ``` **What to do:** * Review the validation details to identify the specific schema violation. * Verify the JSON schema follows the guidelines described in [Extraction Schema (JSON)](./ade-extract-schema-json). Update your JSON schema if needed. * Check if the document contains the expected data in the expected format. #### Null value returned The API returns null when it cannot locate a field's value in the document. This can occur because the data is not in the document, or because the API could not locate it. For more information, go to [How the API Handles Missing Fields](./ade-extract-schema-json#how-the-api-handles-missing-fields). **Error message:** ``` The value at 'field_name' was not found in the document, so null was returned. Since the schema requires a non-null value, verify that the document indeed doesn't contain this data, and if so, update the schema to allow null by adding 'null' to the field's type (e.g., type: ['string', 'null']) or by setting nullable: true. ``` **What to do:** * Check whether the document actually contains the expected data. If it does not, the null result is correct. * If the document does contain the data but the API could not find it, revise your schema to help the API locate the field: * Add or refine the `description` to be more specific about what to extract. * Add entries to `x-alternativeNames` that match how the field label appears in the document. For more information, go to [Field Descriptions](./ade-extract-schema-json#field-descriptions) and [Alternative Names](./ade-extract-schema-json#alternative-names). ### Status 400: Bad Request This status code indicates invalid request parameters or client-side errors. Review the specific error message to identify the issue. #### Error: Invalid JSON schema This error occurs when the `extraction_schema` parameter contains invalid JSON. **Error message:** ``` Invalid JSON schema: Expecting value: line 1 column 1 (char 0) ``` **What to do:** * Verify your extraction schema is valid JSON. * Check for syntax errors (missing commas, quotes, brackets). * Verify the JSON schema follows the guidelines described in [Extraction Schema (JSON)](./ade-extract-schema-json). Update your JSON schema if needed. #### Error: Failed to download document from URL This error occurs when the API cannot download the Markdown file from the provided `markdown_url`. **Error message:** ``` Failed to download document from URL: {error_details} ``` **What to do:** * Verify the URL is accessible and returns valid content. * Check network connectivity and URL permissions. * Ensure the URL points to a Markdown file (.md extension). #### Error: Field extraction invalid This error occurs when the extraction process fails due to issues with the extraction schema or the extracted data. **Error message:** ``` Field extraction invalid: {error_details} ``` **What to do:** * Review the error details in the response. * Verify the JSON schema follows the guidelines described in [Extraction Schema (JSON)](./ade-extract-schema-json). Update your JSON schema if needed. * Ensure all required fields are properly defined in the schema. * Check if the document contains data that matches the schema structure. #### Error: Invalid extract version This error occurs when an unsupported model version is specified. **Error message:** ``` Invalid extract version '{version}' provided. Valid versions are: {list_of_versions} or use 'extract-latest' to use the latest version. ``` **What to do:** * Use one of the supported versions listed in the error message. * Use `extract-latest` to automatically use the latest version. * If you don't specify a version, the API uses the latest version by default. * For more information, go to [Extraction Model Versions](./ade-extract-models#model-versions). ### Status 422: Unprocessable Entity This status code indicates input validation failures. Review the error message and adjust your request parameters. #### Error: The provided schema must have "type": "object" for the root This error occurs when the `extraction_schema` is a valid JSON object, but its `type` keyword is not set to `"object"`. The extraction engine requires the root schema to explicitly declare `"type": "object"`. **Error message:** ``` Field extraction validation error: The provided schema must have "type": "object" for the root. ``` **What to do:** Set `"type": "object"` at the root of your schema. **Correct:** ```json {2} theme={null} { "type": "object", "properties": { "invoice_number": { "type": "string", "description": "Invoice number" } } } ``` **Incorrect:** ```json {2} theme={null} { "type": "array", "items": { "type": "object", "properties": { "invoice_number": { "type": "string", "description": "Invoice number" } } } } ``` #### Error: The provided JSON must parse to an object at the root This error occurs when the `extraction_schema` parameter is valid JSON, but the root value is not a JSON object. For example, the root is a JSON array (`[...]`), a string, or a number rather than `{...}`. **Error message:** ``` Field extraction validation error: The provided JSON must parse to an object at the root. ``` **What to do:** Wrap your schema in a JSON object (`{...}`). **Correct:** ```json {1,8} theme={null} { "type": "object", "properties": { "invoice_number": {"type": "string"}, "vendor_name": {"type": "string"}, "total": {"type": "number"} } } ``` **Incorrect:** ```json theme={null} ["invoice_number", "vendor_name", "total"] ``` #### Error: The provided JSON object was not a valid JSON schema This error occurs when the `extraction_schema` parameter contains a JSON object that does not conform to the JSON Schema specification. **Error message:** ``` Field extraction validation error: The provided JSON object was not a valid JSON schema. Error: {error_details} ``` **What to do:** * Review the error details to identify the specific schema issue. * Verify your schema follows the guidelines described in [Extraction Schema (JSON)](./ade-extract-schema-json). #### Error: The provided schema contains recursive local \$ref cycles This error occurs when the extraction schema contains circular `$ref` references (for example, a schema that references itself). **Error message:** ``` Field extraction validation error: The provided schema contains recursive local `$ref` cycles, which are not supported. ``` **What to do:** Remove circular `$ref` references from your schema. Restructure nested types to avoid self-referencing definitions. #### Error: The following schema fields were not supported This error occurs when the extraction schema contains [keywords that cause errors](./ade-extract-schema-json#keywords-that-cause-errors) and the `strict` parameter is set to `true`. When `strict` is `false`, the API removes the unsupported keywords and continues processing. The API returns `200` if extraction succeeds with valid output, or `206` if the extracted output does not conform to the original schema. For more information, go to [Set the strict Parameter](./ade-extract#set-the-strict-parameter). **Error message:** ``` Field extraction validation error: The following schema fields were not supported: {keywords} ``` **What to do:** * Remove the unsupported keywords listed in the error message from your schema. * For a list of keywords that cause errors, go to [Keywords That Cause Errors](./ade-extract-schema-json#keywords-that-cause-errors). #### Error: Cannot provide both 'markdown' and 'markdown\_url' This error occurs when both a Markdown file and a URL to a Markdown file are provided in the same request. **Error message:** ``` Cannot provide both 'markdown' and 'markdown_url'. Please provide only one. ``` **What to do:** Choose one input method: * Provide a Markdown file using the `markdown` parameter, OR * Provide a URL to a Markdown file using the `markdown_url` parameter. #### Error: Must provide either 'markdown' or 'markdown\_url' This error occurs when your request does not include either the `markdown` or `markdown_url` parameter. **Error message:** ``` Must provide either 'markdown' or 'markdown_url'. ``` **What to do:** Add one of these parameters to your request: * Use the `markdown` parameter to upload a Markdown file, OR * Use the `markdown_url` parameter to provide a URL to a Markdown file. #### Error: No markdown file or URL provided This error occurs when you include a `markdown` or `markdown_url` parameter in your request, but the value is empty or blank. **Error message:** ``` No markdown file or URL provided. ``` **What to do:** * If using `markdown`: Ensure you are uploading a valid Markdown file (not an empty file or blank value). * If using `markdown_url`: Ensure the parameter contains a valid URL (not an empty string or blank value). * Verify that your request properly includes the file or URL value. #### Error: Invalid URL format This error occurs when the `markdown_url` parameter contains a value that is not a valid URL. **Error message:** ``` Invalid URL format: {url} ``` **What to do:** * Verify the URL is properly formatted with a valid protocol (http\:// or https\://). * Check for typos or missing characters in the URL. * Ensure the URL is properly encoded if it contains special characters. #### Error: Multiple Markdown files detected This error occurs when multiple Markdown files are included in the request. **Error message:** ``` Multiple markdown files detected (X). Please provide only one document file. ``` **What to do:** Send only one Markdown file per request. #### Error: Unsupported format This error occurs when you provide a file other than Markdown (.md) to the extract endpoint, such as PDF, DOCX, XLSX, or image files. **Error message:** ``` Unsupported format: {mime_type} ({filename}). Supported formats: MD ``` **What to do:** * The extract endpoint only accepts Markdown files with a .md extension. * If you have a PDF, DOCX, or other document format, use the [Parse API](https://docs.landing.ai/api-reference/tools/ade-parse) endpoint to convert your document to Markdown first. * Ensure your file has a .md extension and contains valid UTF-8 encoded Markdown content. ### Status 500: Internal Server Error This error indicates an unexpected server error occurred during processing. **What to do:** * Retry the request. * If the error persists, contact [support@landing.ai](mailto:support@landing.ai). ### Status 504: Gateway Timeout This error occurs when the extraction process exceeds the timeout limit (475 seconds). **Error message:** ``` Request timed out after 475 seconds ``` **What to do:** * Reduce the size of your Markdown document. * Simplify your extraction schema and verify it follows the guidelines described in [Extraction Schema (JSON)](./ade-extract-schema-json). Update your JSON schema if needed. * If the error persists, contact [support@landing.ai](mailto:support@landing.ai). ### Model Fallback Behavior The `metadata.fallback_model_version` field is included in the API response to support legacy behavior. When using `extract-20251024`, the API may automatically fall back to `extract-20250930` if your JSON schema is too complex. If a fallback occurred, `metadata.fallback_model_version` contains the model version that was used instead. For more information about the response structure, go to [JSON Response for Extraction](./ade-extract-response). When using `extract-20260314`, the API does not fall back to an earlier model. **What to do:** To avoid unexpected fallback behavior, use the most recent extraction model. For more information, go to [Extraction Model Versions](./ade-extract-models). ### Warnings For `extract-20260314`, the `metadata.warnings` field contains an array of structured warning objects. Each warning has a `code` that identifies the warning type and a `msg` with a human-readable description. If you use an earlier extraction model version, this field is absent from the response. Any warning in the `warnings` array causes the API to return a 206 status code. The following warning codes can appear: | Warning Code | Description | | ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | | `nonconformant_schema` | The schema contains issues that affected extraction. | | `nonconformant_output` | The extracted output does not fully conform to the schema. This warning also populates the `schema_violation_error` field for backward compatibility. | For more information about the response structure, go to [JSON Response for Extraction](./ade-extract-response#processing-metadata-metadata). ## ADE Build Extract Schema This section covers errors for the API. ### Status Codes | Status Code | Name | Description | What to Do | | ----------- | --------------------- | ---------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------- | | 200 | Success | Schema generated successfully. | Continue with normal operations. | | 400 | Bad Request | Invalid request due to a download failure, unsupported version, or invalid schema. | Review the error message for the specific issue. See errors below. | | 422 | Unprocessable Entity | Input validation failed. | Review your request parameters. See errors below. | | 500 | Internal Server Error | Server error during schema generation. | Retry. If the issue persists, contact [support@landing.ai](mailto:support@landing.ai). | | 504 | Gateway Timeout | Request processing exceeded the timeout limit (475 seconds). | Reduce document size. See errors below. | ### Status 400: Bad Request This status code indicates invalid request parameters or client-side errors. Review the specific error message to identify the issue. #### Error: Invalid extract version This error occurs when an unsupported model version is specified. **Error message:** ``` Invalid extract version '{version}' provided. Valid versions are: {list_of_versions} or use 'extract-latest' to use the latest version. ``` **What to do:** * Use one of the supported versions listed in the error message. * Use `extract-latest` to automatically use the latest version. * If you don't specify a version, the API uses the latest version by default. * For more information, go to [Extraction Model Versions](./ade-extract-models#model-versions). #### Error: Failed to download document from URL This error occurs when the API cannot download a Markdown file from one of the provided `markdown_urls`. **Error message:** ``` Failed to download document from URL {url}: {error_details} ``` **What to do:** * Verify each URL is accessible and returns valid content. * Check network connectivity and URL permissions. * Ensure each URL points to a Markdown file (.md extension). #### Error: Schema generation invalid This error occurs when the schema generation request is rejected by the processing service. **Error message:** ``` Schema generation invalid: {error_details} ``` **What to do:** * Review the error details in the response. * Check that any provided `schema` parameter is valid JSON. * Verify that your Markdown content is readable and well-formed. ### Status 422: Unprocessable Entity This status code indicates input validation failures. Review the error message and adjust your request parameters. #### Error: Must provide at least one of 'markdowns', 'markdown\_urls', 'schema', or 'prompt' This error occurs when your request does not include any input. **Error message:** ``` Must provide at least one of 'markdowns', 'markdown_urls', 'schema', or 'prompt'. ``` **What to do:** Add at least one of the following to your request: * `markdowns`: One or more Markdown files or inline Markdown strings. * `markdown_urls`: One or more URLs to Markdown files. * `schema`: An existing JSON schema to refine. * `prompt`: A description of the schema you want to generate. #### Error: Invalid URL format This error occurs when a URL in the `markdown_urls` parameter is not a valid URL. **Error message:** ``` Invalid URL format: {url} ``` **What to do:** * Verify the URL is properly formatted with a valid protocol (http\:// or https\://). * Check for typos or missing characters in the URL. * Ensure the URL is properly encoded if it contains special characters. #### Error: Invalid existing schema JSON This error occurs when the `schema` parameter contains invalid JSON. **Error message:** ``` Invalid existing schema JSON: {error_details} ``` **What to do:** * Verify your schema is valid JSON. * Check for syntax errors (missing commas, quotes, or brackets). #### Error: Unsupported format This error occurs when a file provided via `markdowns` or `markdown_urls` is not a Markdown (.md) file. **Error message:** ``` Unsupported format: {mime_type} ({filename}). Supported formats: MD ``` **What to do:** * The build-schema endpoint only accepts Markdown files with a .md extension. * If you have a PDF, DOCX, or other document format, use the [Parse API](https://docs.landing.ai/api-reference/tools/ade-parse) to convert it to Markdown first. * Ensure your file has a .md extension and contains valid UTF-8 encoded Markdown content. #### Error: Schema generation validation error This error occurs when the schema generation request fails validation in the processing service. **Error message:** ``` Schema generation validation error: {error_details} ``` **What to do:** * Review the error details in the response. * Check that your inputs are valid: well-formed Markdown content, valid JSON for the `schema` parameter, and clear instructions in the `prompt` parameter. ### Status 500: Internal Server Error This error indicates an unexpected server error occurred during schema generation. **What to do:** * Retry the request. * If the error persists, contact [support@landing.ai](mailto:support@landing.ai). ### Status 504: Gateway Timeout This error occurs when schema generation exceeds the timeout limit (475 seconds). **Error message:** ``` Request timed out after {seconds} seconds ``` **What to do:** * Reduce the size and number of Markdown files in the request. * If the error persists, contact [support@landing.ai](mailto:support@landing.ai). ## When Are Credits Consumed? * : Credits are consumed only when the API returns a 200 or 206 status code. All other responses, including errors, do not consume credits. * : Credits are consumed only when the API returns a 200 status code. All other responses, including errors, do not consume credits. # Supported File Types Source: https://docs.landing.ai/ade/ade-file-types can parse the file types listed in the table below. Support depends on the method you use: the [Playground](https://va.landing.ai/my/playground/ade), [API](https://docs.landing.ai/api-reference/tools/ade-parse), or our [Python](https://github.com/landing-ai/ade-python) and [TypeScript](https://github.com/landing-ai/ade-typescript) libraries.
File Type Playground API/Library
PDF
PDF
up to 100 pages

see [Rate Limits](./ade-rate-limits)
Images
JPEG
JPG
PNG
APNG ×
BMP ×
DCX ×
DDS ×
DIB ×
GD ×
GIF ×
ICNS ×
JP2 (JP2000) ×
PCX ×
PPM ×
PSD ×
TGA ×
TIF ×
TIFF ×
WEBP ×
Text Documents
(see notes [here](#file-conversion-for-text-documents-and-presentations))
DOC (Word)
DOCX (Word)
ODT (OpenDocument Text)
Presentations
(see notes [here](#file-conversion-for-text-documents-and-presentations))
PPT (PowerPoint)
PPTX (PowerPoint)
Spreadsheets
(see notes [here](#spreadsheet-considerations))

CSV (comma-separated values)
(up to 10 MB)

(up to 50 MB)
XLSX (Microsoft Excel)
(up to 10 MB)

(up to 50 MB)
To parse password-protected files, go to [Parse Password-Protected Files](./ade-parse-password). ## File Conversion for Text Documents and Presentations converts text documents and presentations to PDFs before parsing them. This conversion may change the document layout and increase or decrease the number of pages. For example, unsupported fonts may be replaced with larger alternatives, causing text to wrap differently or overflow onto additional pages. While the conversion process may impact the layout, still parses content correctly. ## Spreadsheet Considerations supports XLSX files with up to 65,536 rows and 65,536 columns per file. If an XLSX file exceeds these limits, parses the file and returns data within the supported limits. Content in rows and columns beyond the limits is not parsed. When you load a spreadsheet in the Playground, a render limit applies and only a truncated version of the spreadsheet is displayed. This does not affect the parsing results. # JSON Response for Parsing Source: https://docs.landing.ai/ade/ade-json-response When you parse a document with the [ API](https://docs.landing.ai/api-reference/tools/ade-parse) or complete an [ADE Parse Job](https://docs.landing.ai/api-reference/tools/ade-parse-jobs), the parsed data is returned in a structured JSON format. The fields and structure of API responses are detailed in the [API Reference](https://docs.landing.ai/api-reference/tools/ade-parse). This article is designed to give additional context about the returned fields and how to use them. ## Response Structure The response contains the following top-level fields: | Field | Description | | ----------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | [`markdown`](./ade-markdown-response) | Complete Markdown representation of the document. | | [`chunks`](#parsed-chunks-chunks) | Array of `chunk` objects, one for each parsed region. | | [`splits`](#splits) | Array of `split` objects organizing chunks by page or section. | | [`grounding`](#grounding-information-grounding) | Object mapping chunk IDs to detailed grounding information, which includes the page number and bounding box coordinates. | | `metadata` | Processing information (credit usage, duration, filename, job ID, page count, version). For partial content responses, includes a `failed_pages` array listing page numbers that failed to process. | ## Chunk, Table, and Cell Identifiers Each chunk, table, and table cell has a unique identifier (ID) that appears in multiple parts of the API response. Use these IDs to link different sets of information across the response. The ID format depends on the element type: | ID Type | Format | Example | | -------------- | ------------------------------------------ | -------------------------------------- | | Chunk IDs | UUID format | `7d58c5cf-e4f5-4a7e-ba34-0cd7bc6a6506` | | Table IDs | `{page_number}-{base62_sequential_number}` | `0-1` | | Table cell IDs | `{page_number}-{base62_sequential_number}` | `0-2`, `0-3` | ### Sequential Numbering for Tables and Cells Table and cell IDs use sequential numbering within each page. The first table element on the first page has ID `0-1`, and its first cell has ID `0-2`. All subsequent table elements and cells on that page continue this sequential numbering (`0-3`, `0-4`, etc.). The numbering restarts for each new page. For example, the first table on the second page will have ID `1-1`, and its first cell will have ID `1-2`. ## Parsed Chunks (`chunks`) A [chunk](./ade-chunk-types) is a distinct region or element in the parsed document. The `chunks` array contains all parsed chunks from the document in reading order. Each `chunk` object in the `chunks` array contains: | Field | Description | | ----------- | --------------------------------------------------------------------------------------------------------------------------------------- | | `id` | Chunk [unique identifier](#chunk-table-and-cell-identifiers) (UUID format). | | `markdown` | [Markdown](./ade-markdown-response) content for the chunk. | | `type` | [Chunk type](./ade-chunk-types). | | `grounding` | The location of the chunk in the document. Contains page number and [bounding box coordinates](#working-with-bounding-box-coordinates). | ## Splits The `split` parameter is different from the [ API](./ade-split). If your goal is to separate a document into sub-documents after parsing, use the API, not the `split` parameter. The `splits` array groups chunks into logical sections based on the [`split`](./ade-separate-apis#set-up-splits-for-parsing) parameter. Each split object includes the chunks, markdown content, and page numbers for that section. * **If the `split` parameter was omitted**: The API returns the entire document as a single split. * **If page-level splits were used (`split=page`)**: The API organizes chunks by page. For multi-page documents, this creates one split per page. ### Split Object Structure Each `split` object contains: | Field | Description | | ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `class` | The split type. | | `identifier` | The unique identifier for each entry. | | `pages` | Array of page numbers (zero-indexed) included in the split. | | `markdown` | Complete markdown content for all chunks in the split. | | `chunks` | Array of [chunk IDs](#chunk-table-and-cell-identifiers) included in the split. | ### Full Document Split vs. Page-Level Split When you omit the `split` parameter, the API returns a single split containing the entire document. When you set `split=page`, the API creates one split per page for multi-page documents. ```json Full Document (Split Omitted) theme={null} "splits": [ { "class": "full", "identifier": "full", "pages": [0], "markdown": "...", "chunks": ["chunk-id-1", "chunk-id-2"] } ] ``` ```json Page-Level Split theme={null} "splits": [ { "class": "page", "identifier": "page_0", "pages": [0], "markdown": "...", "chunks": ["chunk-id-1"] }, { "class": "page", "identifier": "page_1", "pages": [1], "markdown": "...", "chunks": ["chunk-id-2", "chunk-id-3"] } ] ``` ## Grounding Information (`grounding`) **Grounding** is location information that maps each chunk or table cell back to its precise position in the original document. Each grounding entry includes the page number and bounding box coordinates for an element. The `grounding` object contains location data for all chunks and table cells in the parsed document. ### How the Grounding Object Works The `grounding` object is a dictionary where [chunk, table, and cell IDs](#chunk-table-and-cell-identifiers) are keys. To look up grounding information for any element, use its ID. **What's included in the grounding object:** * All chunk IDs from the `chunks` array * Table IDs (for entire tables) * Table cell IDs (individual cells within tables) **Example:** Consider parsing this document containing a table of bank account interest rates: Example document with bank account interest rates The parsed response is shown below, with key elements highlighted in the table: | Line | Description | ID | | ------- | --------------------------------------------------------- | ------------------------------------ | | Line 47 | `grounding` object begins | N/A | | Line 48 | Grounding for the text "Bank Account Rates" | 2831e56d-94f5-4ec4-b001-6e16e188119b | | Line 58 | Grounding for the table chunk | 54905c88-b4b8-45a7-84b5-1dcf5f1d1e60 | | Line 68 | Grounding for the table ID | 0-1 | | Line 78 | Grounding for the first cell in the table, "Account Type" | 0-2 | ```json [expandable] lines highlight={47, 48, 58, 68, 78} theme={null} { "markdown": "\n\n## Bank Account Rates\n\n\n\n\n\n\n\n
Account TypeAPY
Checking0.25%
Savings3.30%
", "chunks": [ { "markdown": "\n\n## Bank Account Rates", "type": "text", "id": "2831e56d-94f5-4ec4-b001-6e16e188119b", "grounding": { "box": { "left": 0.01733572781085968, "top": 0.03853631019592285, "right": 0.46379542350769043, "bottom": 0.212029367685318 }, "page": 0 } }, { "markdown": "\n\n\n\n\n\n
Account TypeAPY
Checking0.25%
Savings3.30%
", "type": "table", "id": "54905c88-b4b8-45a7-84b5-1dcf5f1d1e60", "grounding": { "box": { "left": 0.018494874238967896, "top": 0.332082062959671, "right": 0.9904391765594482, "bottom": 0.9499841928482056 }, "page": 0 } } ], "splits": [ { "class": "full", "identifier": "full", "pages": [ 0 ], "markdown": "\n\n## Bank Account Rates\n\n\n\n\n\n\n\n
Account TypeAPY
Checking0.25%
Savings3.30%
", "chunks": [ "2831e56d-94f5-4ec4-b001-6e16e188119b", "54905c88-b4b8-45a7-84b5-1dcf5f1d1e60" ] } ], "grounding": { "2831e56d-94f5-4ec4-b001-6e16e188119b": { "box": { "left": 0.01733572781085968, "top": 0.03853631019592285, "right": 0.46379542350769043, "bottom": 0.212029367685318 }, "page": 0, "type": "chunkText" }, "54905c88-b4b8-45a7-84b5-1dcf5f1d1e60": { "box": { "left": 0.018494874238967896, "top": 0.332082062959671, "right": 0.9904391765594482, "bottom": 0.9499841928482056 }, "page": 0, "type": "chunkTable" }, "0-1": { "box": { "left": 0.02559941303263648, "top": 0.34709447306434493, "right": 0.9833663842655778, "bottom": 0.9408472995763262 }, "page": 0, "type": "table" }, "0-2": { "box": { "left": 0.025725066969341424, "top": 0.34709447306434493, "right": 0.5043980858495114, "bottom": 0.5451939275066816 }, "page": 0, "type": "tableCell", "position": { "row": 0, "col": 0, "rowspan": 1, "colspan": 1, "chunk_id": "54905c88-b4b8-45a7-84b5-1dcf5f1d1e60" } }, "0-3": { "box": { "left": 0.5043442644343732, "top": 0.3470990955792963, "right": 0.9832354042503647, "bottom": 0.5451945630018467 }, "page": 0, "type": "tableCell", "position": { "row": 0, "col": 1, "rowspan": 1, "colspan": 1, "chunk_id": "54905c88-b4b8-45a7-84b5-1dcf5f1d1e60" } }, "0-4": { "box": { "left": 0.025662229745978628, "top": 0.5451932923723879, "right": 0.5043442644343732, "bottom": 0.7430525865339257 }, "page": 0, "type": "tableCell", "position": { "row": 1, "col": 0, "rowspan": 1, "colspan": 1, "chunk_id": "54905c88-b4b8-45a7-84b5-1dcf5f1d1e60" } }, "0-5": { "box": { "left": 0.5042905263610684, "top": 0.5451939275066816, "right": 0.9833008650173567, "bottom": 0.7429820118984354 }, "page": 0, "type": "tableCell", "position": { "row": 1, "col": 1, "rowspan": 1, "colspan": 1, "chunk_id": "54905c88-b4b8-45a7-84b5-1dcf5f1d1e60" } }, "0-6": { "box": { "left": 0.02559941303263648, "top": 0.7429820118984354, "right": 0.5042905263610684, "bottom": 0.9408472995763262 }, "page": 0, "type": "tableCell", "position": { "row": 2, "col": 0, "rowspan": 1, "colspan": 1, "chunk_id": "54905c88-b4b8-45a7-84b5-1dcf5f1d1e60" } }, "0-7": { "box": { "left": 0.5042367730777779, "top": 0.7429113809301333, "right": 0.9833663842655779, "bottom": 0.9408260780878634 }, "page": 0, "type": "tableCell", "position": { "row": 2, "col": 1, "rowspan": 1, "colspan": 1, "chunk_id": "54905c88-b4b8-45a7-84b5-1dcf5f1d1e60" } } }, "metadata": { "filename": "bank-account-rates.png", "org_id": null, "page_count": 1, "duration_ms": 2836, "credit_usage": 3.0, "job_id": "def6af5f0485497c81ea46bc4c848f83", "version": "dpt-2-20251103", "failed_pages": [] } } ``` ### Fields in the Grounding Object Each key-value pair in the `grounding` object uses an ID as the key (chunk ID, table ID, or cell ID) and location data as the value. Each value is an object with the following fields: | Field | Description | | ---------------------- | --------------------------------------------------------------------------------------------------------------------------------- | | `box` | [Bounding box coordinates](#working-with-bounding-box-coordinates) | | `page` | Zero-indexed page number | | `type` | Detailed classification. See [grounding types](#grounding-types). | | `confidence` | Confidence score for text-based chunks. See [Confidence Score](#confidence-score). | | `low_confidence_spans` | Sections of text with low confidence scores, if any. | | `position` | For table cells only. Cell location in table grid (row, col, spans). See [Table Cell Position](#table-cell-position-information). | **Example:** Example of one key-value pair from the `grounding` object (logo chunk): ```json theme={null} { "grounding": { "49c7b7d0-2d8e-4485-8306-3dd820eb13ed": { "box": { "left": 0.06869561225175858, "top": 0.02685479447245598, "right": 0.1445186734199524, "bottom": 0.08676017820835114 }, "page": 0, "type": "chunkLogo" } } } ``` ### Grounding Types Each entry in the `grounding` object has a `type` field that identifies the element type. #### Why are grounding types different from chunk types? Most grounding types correspond directly to [chunk types](./ade-chunk-types) with a "chunk" prefix. However, the `grounding` object provides more granular location data than the chunks array. For tables, the grounding object includes separate entries for the HTML `` element and individual cells, even though these are not separate chunks. This allows you to look up the precise location of any cell within a table. The following table lists the grounding types and the corresponding chunk types, when applicable. | Grounding Type | Chunk Type | | ------------------ | ------------------------------------------------------------------------------------ | | `chunkAttestation` | `attestation` | | `chunkCard` | `card` | | `chunkFigure` | `figure` | | `chunkLogo` | `logo` | | `chunkMarginalia` | `marginalia` | | `chunkScanCode` | `scan_code` | | `chunkTable` | `table` | | `chunkText` | `text` | | `table` | Special grounding type representing the HTML `
` element within a table chunk. | | `tableCell` | Special grounding type representing an individual cell within a table. | ### Table Grounding Structure For each table in a document, the grounding object includes three types of entries: **table chunk**, **table**, **cell**. These three types form a hierarchy: the **chunk** entry provides the overall table location, while the **table** and **cell** entries provide precise locations for the HTML elements within that chunk. | Entry | Description | Type | | ----- | --------------------------------------------------------------------------- | ------------ | | Chunk | The table as a whole. Use when you need the overall table location. | `chunkTable` | | Table | The HTML `
` element. Use when you need the precise table boundaries. | `table` | | Cells | Each individual cell. Use when you need to locate a specific cell. | `tableCell` | **Example:** Consider parsing this simple 2-cell table: Example table with 2 cells The `grounding` field from the response is shown below, with key elements highlighted in the table: | Line | Description | ID | | ------- | ---------------------------------------------------- | ------------------------------------ | | Line 1 | `grounding` object | N/A | | Line 2 | Grounding for the table chunk | ef24b1ea-8dac-4a24-bf60-704b7c6a6bca | | Line 12 | Grounding for the table ID | 0-1 | | Line 22 | Grounding for the first cell in the table, "Cell 1" | 0-2 | | Line 39 | Grounding for the second cell in the table, "Cell 2" | 0-3 | ```json [expandable] lines highlight={1, 2, 12, 22, 39 } theme={null} "grounding": { "ef24b1ea-8dac-4a24-bf60-704b7c6a6bca": { "box": { "left": 0.15924638509750366, "top": 0.19857195019721985, "right": 0.6330658197402954, "bottom": 0.313611775636673 }, "page": 0, "type": "chunkTable" }, "0-1": { "box": { "left": 0.16758577831817933, "top": 0.20629539815303133, "right": 0.6280869812016796, "bottom": 0.3090690124206503 }, "page": 0, "type": "table" }, "0-2": { "box": { "left": 0.16758577831817933, "top": 0.2062988834957917, "right": 0.39743034595752635, "bottom": 0.3090588206865543 }, "page": 0, "type": "tableCell", "position": { "row": 0, "col": 0, "rowspan": 1, "colspan": 1, "chunk_id": "ef24b1ea-8dac-4a24-bf60-704b7c6a6bca" } }, "0-3": { "box": { "left": 0.3972880070815534, "top": 0.20629539815303133, "right": 0.6280869812016795, "bottom": 0.3090690124206503 }, "page": 0, "type": "tableCell", "position": { "row": 0, "col": 1, "rowspan": 1, "colspan": 1, "chunk_id": "ef24b1ea-8dac-4a24-bf60-704b7c6a6bca" } } }, ``` ### Table Cell Position Information Each table cell (`tableCell`) includes a `position` field that provides the cell's location in the table grid. The `row` and `col` values are zero-indexed, where `row: 0, col: 0` represents the first row and first column of the table. The `rowspan` and `colspan` values indicate merged cells, with a value of 1 meaning the cell is not merged in that direction. | Field | Description | | ---------- | ------------------------------------------- | | `row` | Row position (integer, zero-indexed). | | `col` | Column position (integer, zero-indexed). | | `rowspan` | Number of rows the cell spans (integer). | | `colspan` | Number of columns the cell spans (integer). | | `chunk_id` | Associated chunk identifier (string). | #### How to Use the Table Cell Positions Use the table cell position information in the `grounding` to: * Map parsed data back to specific cells in the original table (for example, verify that a dollar amount came from row 2, column 3) * Identify merged cells through the `rowspan` and `colspan` values * Programmatically reference cells by their row and column indices ### Confidence Score returns confidence scores for text, marginalia, card, and table chunks, as well as table cells, when parsing with . Confidence scores measure the confidence level of parsed text in Markdown with respect to the actual text or visual data present in the document. Each score ranges from 0.0 (low confidence) to 1.0 (high confidence). Lower values indicate regions where the model was less certain about the output. For example, here are the confidence score fields for a chunk: ```json Confidence score fields highlight={11-12} theme={null} "grounding": { "42ca60d3-a606-4c9a-a61e-493966b63fd9": { "box": { "left": 0.8607578277587891, "top": 0.9346558451652527, "right": 0.9316827058792114, "bottom": 0.951895534992218 }, "page": 0, "type": "chunkMarginalia", "confidence": 0.991, "low_confidence_spans": [] } } ``` #### Confidence Score Fields The grounding object includes the following confidence-related fields: | Field | Description | | ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `confidence` | Overall confidence score for the chunk or element (0.0 to 1.0). When `low_confidence_spans` contains entries, this is the lowest of the span-level or cell-level confidence scores. | | `low_confidence_spans` | Array of objects identifying specific text substrings with confidence scores of 0.95 or lower. Empty array if no low-confidence substrings exist. Learn more in [Low Confidence Spans](#low-confidence-spans). | #### Low Confidence Spans The `low_confidence_spans` array contains substrings that have confidence scores of 0.95 or lower. Each object in the `low_confidence_spans` array contains: | Field | Description | | ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- | | `text` | The substring that has low confidence. | | `span` | Two-element array `[start, end]` indicating the character position range of the substring within the chunk's text. Positions are zero-indexed. | | `confidence` | Confidence score for this specific substring (0.0 to 1.0). | For example, in the parse response below, the API has low confidence for two spans of text within the chunk. ```json Low confidence spans highlight={11-27} theme={null} "grounding": { "6be7d3cf-6664-42c7-906c-4957882e40ff": { "box": { "left": 0.24479518830776215, "top": 0.9269143342971802, "right": 0.6804859042167664, "bottom": 0.9461647272109985 }, "page": 0, "type": "chunkText", "confidence": 0.25, "low_confidence_spans": [ { "text": " 1220", "span": [ 8, 13 ], "confidence": 0.25 }, { "text": "15?\"", "span": [ 26, 30 ], "confidence": 0.62 } ] } } ``` #### Chunks and Elements with Confidence Scores Confidence scores are available for text, marginalia, card, and table chunks, as well as table cells. The `tableCell` elements have confidence scores but do not include `low_confidence_spans`
Grounding Type Chunk Type Has Confidence Score Has Low Confidence Spans
chunkText text
chunkMarginalia marginalia
chunkCard card
chunkTable table
table
tableCell
chunkFigure figure
chunkLogo logo
chunkAttestation attestation
chunkScanCode scan\_code
#### Confidence Score Availability Confidence scores are only available when using the or ADE Parse Jobs API with . #### Null Confidence Scores The confidence score will be `null` in the following situations: * The file was parsed with or . * The chunk or element does not support confidence scores. See a list of supported elements in [Chunks and Elements with Confidence Scores](#chunks-and-elements-with-confidence-scores). For example, here are the confidence score fields for a logo: ```json Response when confidence score not applicable highlight={9-11} theme={null} "7215c238-8363-4de3-9f34-dd12c42cf5ec": { "box": { "left": 0.6948671936988831, "top": 0.9627023339271545, "right": 0.9369598031044006, "bottom": 1 }, "page": 0, "type": "chunkLogo", "confidence": null, "low_confidence_spans": [] } ``` #### How to Use Confidence Scores Confidence scores are not a calibrated probability of correctness. Use them as a ranking or triage signal to flag and prioritize content for human review, rather than as an exact measure of error likelihood. #### Factors That Can Lower Confidence Scores Certain document characteristics may result in lower confidence scores: * Low-resolution or degraded scans * Handwritten text that is difficult to read * Superscripts and subscripts * Characters or punctuation that look alike, such as commas and periods * Words split across lines with a hyphen * Line endings where it's unclear whether the text continues in the same paragraph or starts a new one #### False Positives The system is designed to over-flag rather than risk missing potential issues. As a result, some regions may be marked as low confidence even when the parsed output is correct. These false positives mean you may need to review some content that is actually correct, but they help reduce the likelihood of missing content that contains errors. ## When to Use Chunk-Level vs. Global Grounding Grounding information is available at the chunk level (`chunks[].grounding`) and at the global level (`grounding`). Each has a different level of detail:
Field Description Global Grounding Object Chunk-Level Grounding
box Bounding box coordinates
page Zero-indexed page number
confidence Confidence score
low\_confidence\_spans Sections of text with low confidence scores, if any
type Detailed element type
position Table cell position
Choose the appropriate source based on your workflow: **Use the global `grounding` object when:** * You need table cell positions or other HTML element grounding * You need confidence scores * You need to understand relationships between elements * Building comprehensive document analysis workflows * You need the complete, authoritative grounding data for all elements in the document **Use chunk-level grounding (`chunks[].grounding`) when:** * Iterating through chunks sequentially and need each chunk's location immediately * Working with a single chunk in isolation * Building simple workflows where each chunk is processed independently ## Working with Bounding Box Coordinates The `grounding` objects include the bounding box coordinates for each chunk in the `box` object: | Field | Description | | -------- | -------------------- | | `left` | Left edge position | | `top` | Top edge position | | `right` | Right edge position | | `bottom` | Bottom edge position | ### Bounding Box Values All bounding box coordinates use normalized values between 0 and 1, where: * `(0, 0)` represents the top-left corner of the page * `(1, 1)` represents the bottom-right corner of the page ### Convert Bounding Box Values To convert bounding box values to pixel coordinates, multiply the normalized values by the image dimensions: ``` x1 = left * image_width y1 = top * image_height x2 = right * image_width y2 = bottom * image_height ``` For practical examples of working with bounding box coordinates, go to [Link Extracted Data to Document Locations](./ade-extract-grounding-sample). ## Example Response The following is the API response when parsing this single-page Product Specifications Sheet. The document contains text, a logo, a table, a barcode, and more. The `split` parameter was omitted in the API call. ```json [expandable] theme={null} { "markdown": "\n\n# Product Specification Sheet\n\nPremium Wireless Headphones - Model WH-3000\n\n\n\n<::logo: Acme\nAcme\nBlue rectangular background with rounded corners, featuring three overlapping white circles and the text \"Acme\" below them::>\n\n\n\n**Product Description:** Premium over-ear wireless headphones featuring advanced noise cancellation technology, superior sound quality, and all-day comfort. Designed for professionals and audiophiles who demand exceptional audio performance.\n\n\n\nTechnical Specifications\n\n\n\n\n\n\n\n
FeatureSpecification
Battery LifeUp to 30 hours with ANC on
ConnectivityBluetooth 5.3, USB-C
Driver Size40mm dynamic drivers
Weight250 grams
Charging Time2.5 hours (full charge)
\n\n\n\nProduct Identification\n\n\n\n<::scan_code: Barcode (UPC-A)\n012345678905\nUPC: 012345678905 | SKU: WH-3000-BLK\nThis is a standard black and white barcode with clear, readable numbers and text below it. The quality is excellent with no discernible issues.::>\n\n\n\nAcme Technologies • Revision 1.2 • February 2026 • For internal use only", "chunks": [ { "markdown": "\n\n# Product Specification Sheet\n\nPremium Wireless Headphones - Model WH-3000", "type": "text", "id": "d52a402e-ecbe-45bb-b645-6f879533062d", "grounding": { "box": { "left": 0.16788743436336517, "top": 0.14442312717437744, "right": 0.6138156652450562, "bottom": 0.20980927348136902 }, "page": 0 } }, { "markdown": "\n\n<::logo: Acme\nAcme\nBlue rectangular background with rounded corners, featuring three overlapping white circles and the text \"Acme\" below them::>", "type": "logo", "id": "e82c9292-b032-4115-88b1-8e63a453fa62", "grounding": { "box": { "left": 0.6695178747177124, "top": 0.13189411163330078, "right": 0.8306078910827637, "bottom": 0.21691173315048218 }, "page": 0 } }, { "markdown": "\n\n**Product Description:** Premium over-ear wireless headphones featuring advanced noise cancellation technology, superior sound quality, and all-day comfort. Designed for professionals and audiophiles who demand exceptional audio performance.", "type": "text", "id": "e2d9311c-6034-4a19-845c-8b3bbb607220", "grounding": { "box": { "left": 0.16763517260551453, "top": 0.24475212395191193, "right": 0.8179030418395996, "bottom": 0.3363903760910034 }, "page": 0 } }, { "markdown": "\n\nTechnical Specifications\n\n\n\n\n\n\n\n
FeatureSpecification
Battery LifeUp to 30 hours with ANC on
ConnectivityBluetooth 5.3, USB-C
Driver Size40mm dynamic drivers
Weight250 grams
Charging Time2.5 hours (full charge)
", "type": "table", "id": "6ad71883-8f01-4ca8-854e-67fdf98705c0", "grounding": { "box": { "left": 0.16853490471839905, "top": 0.3461304306983948, "right": 0.8309106826782227, "bottom": 0.6140876412391663 }, "page": 0 } }, { "markdown": "\n\nProduct Identification", "type": "text", "id": "a111a086-c0ce-4177-b9d6-32813583c09c", "grounding": { "box": { "left": 0.17146995663642883, "top": 0.6203439235687256, "right": 0.40161988139152527, "bottom": 0.6445382833480835 }, "page": 0 } }, { "markdown": "\n\n<::scan_code: Barcode (UPC-A)\n012345678905\nUPC: 012345678905 | SKU: WH-3000-BLK\nThis is a standard black and white barcode with clear, readable numbers and text below it. The quality is excellent with no discernible issues.::>", "type": "scan_code", "id": "36cfe5fe-1d1f-4025-9fed-8dd30c61e512", "grounding": { "box": { "left": 0.35067588090896606, "top": 0.6593142151832581, "right": 0.6467784643173218, "bottom": 0.7614949345588684 }, "page": 0 } }, { "markdown": "\n\nAcme Technologies • Revision 1.2 • February 2026 • For internal use only", "type": "text", "id": "769360a9-738e-4d46-9815-1c6fadc44fef", "grounding": { "box": { "left": 0.17040444910526276, "top": 0.7973303198814392, "right": 0.6371863484382629, "bottom": 0.8179382681846619 }, "page": 0 } } ], "splits": [ { "class": "full", "identifier": "full", "pages": [ 0 ], "markdown": "\n\n# Product Specification Sheet\n\nPremium Wireless Headphones - Model WH-3000\n\n\n\n<::logo: Acme\nAcme\nBlue rectangular background with rounded corners, featuring three overlapping white circles and the text \"Acme\" below them::>\n\n\n\n**Product Description:** Premium over-ear wireless headphones featuring advanced noise cancellation technology, superior sound quality, and all-day comfort. Designed for professionals and audiophiles who demand exceptional audio performance.\n\n\n\nTechnical Specifications\n\n\n\n\n\n\n\n
FeatureSpecification
Battery LifeUp to 30 hours with ANC on
ConnectivityBluetooth 5.3, USB-C
Driver Size40mm dynamic drivers
Weight250 grams
Charging Time2.5 hours (full charge)
\n\n\n\nProduct Identification\n\n\n\n<::scan_code: Barcode (UPC-A)\n012345678905\nUPC: 012345678905 | SKU: WH-3000-BLK\nThis is a standard black and white barcode with clear, readable numbers and text below it. The quality is excellent with no discernible issues.::>\n\n\n\nAcme Technologies • Revision 1.2 • February 2026 • For internal use only", "chunks": [ "d52a402e-ecbe-45bb-b645-6f879533062d", "e82c9292-b032-4115-88b1-8e63a453fa62", "e2d9311c-6034-4a19-845c-8b3bbb607220", "6ad71883-8f01-4ca8-854e-67fdf98705c0", "a111a086-c0ce-4177-b9d6-32813583c09c", "36cfe5fe-1d1f-4025-9fed-8dd30c61e512", "769360a9-738e-4d46-9815-1c6fadc44fef" ] } ], "grounding": { "d52a402e-ecbe-45bb-b645-6f879533062d": { "box": { "left": 0.16788743436336517, "top": 0.14442312717437744, "right": 0.6138156652450562, "bottom": 0.20980927348136902 }, "page": 0, "type": "chunkText" }, "e82c9292-b032-4115-88b1-8e63a453fa62": { "box": { "left": 0.6695178747177124, "top": 0.13189411163330078, "right": 0.8306078910827637, "bottom": 0.21691173315048218 }, "page": 0, "type": "chunkLogo" }, "e2d9311c-6034-4a19-845c-8b3bbb607220": { "box": { "left": 0.16763517260551453, "top": 0.24475212395191193, "right": 0.8179030418395996, "bottom": 0.3363903760910034 }, "page": 0, "type": "chunkText" }, "6ad71883-8f01-4ca8-854e-67fdf98705c0": { "box": { "left": 0.16853490471839905, "top": 0.3461304306983948, "right": 0.8309106826782227, "bottom": 0.6140876412391663 }, "page": 0, "type": "chunkTable" }, "0-1": { "box": { "left": 0.17681816766455002, "top": 0.3838889918624204, "right": 0.8229430485888809, "bottom": 0.6081938563658866 }, "page": 0, "type": "table" }, "0-2": { "box": { "left": 0.17681816766455002, "top": 0.38390272737304887, "right": 0.4088692768884933, "bottom": 0.4210832942672924 }, "page": 0, "type": "tableCell", "position": { "row": 0, "col": 0, "rowspan": 1, "colspan": 1, "chunk_id": "6ad71883-8f01-4ca8-854e-67fdf98705c0" } }, "0-3": { "box": { "left": 0.40885843457119975, "top": 0.3838889918624204, "right": 0.8229109083847056, "bottom": 0.42110282924189346 }, "page": 0, "type": "tableCell", "position": { "row": 0, "col": 1, "rowspan": 1, "colspan": 1, "chunk_id": "6ad71883-8f01-4ca8-854e-67fdf98705c0" } }, "0-4": { "box": { "left": 0.17683986982706765, "top": 0.4210723476432389, "right": 0.40885843457119975, "bottom": 0.45900337698896576 }, "page": 0, "type": "tableCell", "position": { "row": 1, "col": 0, "rowspan": 1, "colspan": 1, "chunk_id": "6ad71883-8f01-4ca8-854e-67fdf98705c0" } }, "0-5": { "box": { "left": 0.4088473778820912, "top": 0.4210832942672924, "right": 0.8229174171863827, "bottom": 0.45899898681702755 }, "page": 0, "type": "tableCell", "position": { "row": 1, "col": 1, "rowspan": 1, "colspan": 1, "chunk_id": "6ad71883-8f01-4ca8-854e-67fdf98705c0" } }, "0-6": { "box": { "left": 0.17686202113997704, "top": 0.45899898681702755, "right": 0.4088473778820912, "bottom": 0.4960768409028701 }, "page": 0, "type": "tableCell", "position": { "row": 2, "col": 0, "rowspan": 1, "colspan": 1, "chunk_id": "6ad71883-8f01-4ca8-854e-67fdf98705c0" } }, "0-7": { "box": { "left": 0.40883656585659384, "top": 0.45899115081094266, "right": 0.822923787547722, "bottom": 0.4960756768254202 }, "page": 0, "type": "tableCell", "position": { "row": 2, "col": 1, "rowspan": 1, "colspan": 1, "chunk_id": "6ad71883-8f01-4ca8-854e-67fdf98705c0" } }, "0-8": { "box": { "left": 0.17688367164390373, "top": 0.4960756768254202, "right": 0.40883656585659384, "bottom": 0.5339128041911175 }, "page": 0, "type": "tableCell", "position": { "row": 3, "col": 0, "rowspan": 1, "colspan": 1, "chunk_id": "6ad71883-8f01-4ca8-854e-67fdf98705c0" } }, "0-9": { "box": { "left": 0.4088255320780733, "top": 0.4960735986896906, "right": 0.8229302909456246, "bottom": 0.533930464401036 }, "page": 0, "type": "tableCell", "position": { "row": 3, "col": 1, "rowspan": 1, "colspan": 1, "chunk_id": "6ad71883-8f01-4ca8-854e-67fdf98705c0" } }, "0-a": { "box": { "left": 0.17690576166324445, "top": 0.533902913573485, "right": 0.4088255320780733, "bottom": 0.5709324331603126 }, "page": 0, "type": "tableCell", "position": { "row": 4, "col": 0, "rowspan": 1, "colspan": 1, "chunk_id": "6ad71883-8f01-4ca8-854e-67fdf98705c0" } }, "0-b": { "box": { "left": 0.4088147366922871, "top": 0.5339128041911175, "right": 0.8229366506791294, "bottom": 0.5709510466955219 }, "page": 0, "type": "tableCell", "position": { "row": 4, "col": 1, "rowspan": 1, "colspan": 1, "chunk_id": "6ad71883-8f01-4ca8-854e-67fdf98705c0" } }, "0-c": { "box": { "left": 0.17692738041744768, "top": 0.5709220105206869, "right": 0.4088147366922871, "bottom": 0.6081720470002152 }, "page": 0, "type": "tableCell", "position": { "row": 5, "col": 0, "rowspan": 1, "colspan": 1, "chunk_id": "6ad71883-8f01-4ca8-854e-67fdf98705c0" } }, "0-d": { "box": { "left": 0.40880387715616723, "top": 0.5709324331603126, "right": 0.8229430485888809, "bottom": 0.6081938563658866 }, "page": 0, "type": "tableCell", "position": { "row": 5, "col": 1, "rowspan": 1, "colspan": 1, "chunk_id": "6ad71883-8f01-4ca8-854e-67fdf98705c0" } }, "a111a086-c0ce-4177-b9d6-32813583c09c": { "box": { "left": 0.17146995663642883, "top": 0.6203439235687256, "right": 0.40161988139152527, "bottom": 0.6445382833480835 }, "page": 0, "type": "chunkText" }, "36cfe5fe-1d1f-4025-9fed-8dd30c61e512": { "box": { "left": 0.35067588090896606, "top": 0.6593142151832581, "right": 0.6467784643173218, "bottom": 0.7614949345588684 }, "page": 0, "type": "chunkScanCode" }, "769360a9-738e-4d46-9815-1c6fadc44fef": { "box": { "left": 0.17040444910526276, "top": 0.7973303198814392, "right": 0.6371863484382629, "bottom": 0.8179382681846619 }, "page": 0, "type": "chunkText" } }, "metadata": { "filename": "product-specs.pdf", "org_id": null, "page_count": 1, "duration_ms": 14358, "credit_usage": 3.0, "job_id": "dbad0c1be3874904bf9cec3e5e536471", "version": "dpt-2-20251103", "failed_pages": [] } } ``` # Supported Languages Source: https://docs.landing.ai/ade/ade-languages supports a wide range of languages, with extraction accuracy influenced by script type, image clarity, and formatting. ## Strong Support The following languages typically yield consistent and accurate results: * English * Chinese (Simplified and Traditional) * Dutch * French * German * Italian * Japanese (All scripts) * Korean * Portuguese * Russian * Spanish ## Moderate Support Support for the following languages varies based on font and clarity: * Arabic * Czech * Danish * Finnish * Greek * Hebrew * Hindi * Indonesian * Norwegian * Polish * Swedish * Thai * Turkish * Vietnamese ## Basic Support may struggle with the complex scripts of the following languages: * Amharic * Bengali * Gujarati * Malayalam * Tamil * Telugu * Urdu * Other non-Latin or stylized scripts # Markdown Response Source: https://docs.landing.ai/ade/ade-markdown-response When you call the [ API](https://docs.landing.ai/api-reference/tools/ade-parse) endpoint ([https://api.va.landing.ai/v1/ade/parse](https://api.va.landing.ai/v1/ade/parse)), the API response includes `markdown` fields at three levels: * **Top-level `markdown` field**: Contains the complete parsed document content as Markdown * **Chunk-level `markdown` fields**: Each object in the `chunks` array includes its own `markdown` field containing only the content for that specific chunk * **Split-level `markdown` fields**: Each object in the `splits` array includes a `markdown` field containing the content for that specific section of the document All `markdown` fields use the same formatting and include embedded [HTML anchor tags](#anchor-tags) that link the content to specific chunks in the `chunks` array. These anchors enable you to trace content back to its location in the original document. ### Markdown Fields in Context To better understand how these `markdown` fields work together, let's look at the parsing response for this pallet label: Pallet Label Below is the full parsing response for this document with the `markdown` fields highlighted. Notice that each chunk's HTML anchor tag (the `` element) appears consistently across all `markdown` fields. This consistency means you can reference any chunk by its ID, whether you're working with the complete document Markdown, a specific split, or an individual chunk. ```json Markdown Fields {2,5,19,33,47,68} [expandable] theme={null} { "markdown": "\n\nSKU\nWH-2847-BLK\n\n\n\nQUANTITY\n\n48 Units\n\n\n\n<::scan_code: Barcode\n\nThis is a clear, well-defined linear barcode with distinct black bars on a white background, showing no visible quality issues.::>\n\n\n\n2847 0000 4812\n---\n", "chunks": [ { "markdown": "\n\nSKU\nWH-2847-BLK", "type": "text", "id": "69a645fe-8617-4be8-b66f-68d7788755c0", "grounding": { "box": { "left": 0.06628379225730896, "top": 0.09509064257144928, "right": 0.4664609134197235, "bottom": 0.2678269147872925 }, "page": 0 } }, { "markdown": "\n\nQUANTITY\n\n48 Units", "type": "text", "id": "bdaf2198-0fdc-4c94-be54-a49ca796ea0f", "grounding": { "box": { "left": 0.06750398874282837, "top": 0.3068091869354248, "right": 0.3204823434352875, "bottom": 0.47341251373291016 }, "page": 0 } }, { "markdown": "\n\n<::scan_code: Barcode\n\nThis is a clear, well-defined linear barcode with distinct black bars on a white background, showing no visible quality issues.::>", "type": "scan_code", "id": "65fe085e-b8c8-46e0-848c-f0c32ccf16e8", "grounding": { "box": { "left": 0.11207544803619385, "top": 0.5374823212623596, "right": 0.8830909729003906, "bottom": 0.8660760521888733 }, "page": 0 } }, { "markdown": "\n\n2847 0000 4812\n---\n", "type": "text", "id": "9e551c40-3eaa-4d77-a219-50b7f4a57e05", "grounding": { "box": { "left": 0.28996485471725464, "top": 0.9088556170463562, "right": 0.7057502269744873, "bottom": 0.9888754487037659 }, "page": 0 } } ], "splits": [ { "class": "page", "identifier": "page_0", "pages": [ 0 ], "markdown": "\n\nSKU\nWH-2847-BLK\n\n\n\nQUANTITY\n\n48 Units\n\n\n\n<::scan_code: Barcode\n\nThis is a clear, well-defined linear barcode with distinct black bars on a white background, showing no visible quality issues.::>\n\n\n\n2847 0000 4812\n---\n", "chunks": [ "69a645fe-8617-4be8-b66f-68d7788755c0", "bdaf2198-0fdc-4c94-be54-a49ca796ea0f", "65fe085e-b8c8-46e0-848c-f0c32ccf16e8", "9e551c40-3eaa-4d77-a219-50b7f4a57e05" ] } ], "grounding": { "69a645fe-8617-4be8-b66f-68d7788755c0": { "box": { "left": 0.06628379225730896, "top": 0.09509064257144928, "right": 0.4664609134197235, "bottom": 0.2678269147872925 }, "page": 0, "type": "chunkText" }, "bdaf2198-0fdc-4c94-be54-a49ca796ea0f": { "box": { "left": 0.06750398874282837, "top": 0.3068091869354248, "right": 0.3204823434352875, "bottom": 0.47341251373291016 }, "page": 0, "type": "chunkText" }, "65fe085e-b8c8-46e0-848c-f0c32ccf16e8": { "box": { "left": 0.11207544803619385, "top": 0.5374823212623596, "right": 0.8830909729003906, "bottom": 0.8660760521888733 }, "page": 0, "type": "chunkScanCode" }, "9e551c40-3eaa-4d77-a219-50b7f4a57e05": { "box": { "left": 0.28996485471725464, "top": 0.9088556170463562, "right": 0.7057502269744873, "bottom": 0.9888754487037659 }, "page": 0, "type": "chunkText" } }, "metadata": { "filename": "pallet-label.png", "org_id": null, "page_count": 1, "duration_ms": 4226, "credit_usage": 3, "job_id": "y2xvbwlqc1p9ynwfkx4tx7q7q", "version": "dpt-2-20250919" } } ``` ## Markdown Structure The `markdown` field includes a parsed chunk or a sequence of chunks. Each chunk begins with an [HTML anchor tag](#anchor-tags) containing a unique identifier, followed by the chunk content. For example, the following `markdown` field contains two chunks: * A figure chunk (ID: `4c29090b-b75e-4d5f-95b6-24a7d5668486`) with a description of the image * A text chunk (ID: `ae2e4e41-9443-4fb5-bced-199915f97dec`) containing formatted address information ```json theme={null} "markdown": "\n\n<::An illustration of a sun with eight rays extending outwards.: figure::>\n\n\n\n**Eliza Smith**\n123 Main St.\nMountain View, CA 94041" ``` ### Anchor Tags Each chunk begins with an HTML anchor tag containing the chunk's unique identifier. The `id` attribute contains the UUID that matches the corresponding entry in the `chunks` array, enabling you to trace content back to its location in the original document. ```html theme={null} ``` ### Content Format by Chunk Type The Markdown content format varies based on the [chunk type](./ade-chunk-types): * [Text-Based Chunks](#text-based-chunks) * [Image-Based Chunks](#image-based-chunks) * [Table Chunks](#table-chunks) * [Spreadsheets](#spreadsheets) #### Text-Based Chunks For text-based chunks (`text`, `marginalia`), content appears as standard Markdown text: ```markdown theme={null} ## Heading Text Paragraph content with **bold** and *italic* formatting. ``` #### Image-Based Chunks Image-based chunks (`figure`, `logo`, `card`, `attestation`, `scan_code`) use a special delimiter format that wraps the caption or description: ```markdown theme={null} <::Caption or description of the visual element: figure::> ``` #### Table Chunks Table chunks (`table`) appear as HTML table markup. Most table elements include unique `id` attributes. These IDs use the format `{page_number}-{base62_sequential_number}`, where the page number starts at 0 and the sequential number increments for each element within the page. If a page contains multiple tables, the ID numbering continues sequentially across all tables on that page. Table cells that span multiple rows or columns include `rowspan` or `colspan` attributes in the HTML markup. This ID system allows you to trace individual cells, rows, and tables back to their locations in the original document. The [JSON response](./ade-json-response#table-cell-position-information) also includes position information (row, column, rowspan, colspan) for each table cell in the grounding object. ```markdown theme={null}
Product Summary
ProductRevenue
Hardware15,230
Software8,540
``` #### Spreadsheets When you parse spreadsheets, data is identified as `table` chunks, and embedded images or charts are identified as `figure` chunks. Table chunks appear as HTML table markup. Most table elements include unique `id` attributes. These IDs use the format `{tab_name}-{cell_reference}`, where the tab name is the name of the spreadsheet tab and the cell reference uses standard spreadsheet notation (column letter followed by row number, such as A1, B2, or C3). The table itself uses a range-based ID format: `{tab_name}-{start_cell}:{end_cell}` (for example, `Sheet 1-A1:B4`). This ID system allows you to trace individual cells back to their locations in the original spreadsheet. For example, here is a screenshot of a spreadsheet, followed by the Markdown output. Spreadsheet ```markdown theme={null} { "markdown": "\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
ProgramInterest Rate
15 Year Fixed-Rate Mortgage0.05125
30 Year Fixed-Rate Mortgage0.05875
10/1 ARM0.05625
", "type": "table", "id": "Sheet 1-A1:B4", "grounding": null } ``` For a list of supported spreadsheet types, go to [Supported File Types](./ade-file-types). ### Chunk Separators Chunks are separated by double newlines (`\n\n`), except for the final chunk in the document. ## How do I find the Markdown response for the ADE Parse Jobs API? If you call the [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) API, the API responds with the `job_id`. The parsing results, including the `markdown` field, are returned when you check the parsing job status with the [ADE Get Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-get-parse-jobs) API. # Organizations & Members Source: https://docs.landing.ai/ade/ade-members Enterprise plans support single sign-on (SSO) via SAML 2.0 and OpenID Connect (OIDC). For more information, see [Single Sign-On (SSO)](./ade-sso). ## Organizations An **organization** is the container for your account. It holds your credits, members, API keys, files, and settings. When you create an account, an organization is automatically created on the Explore plan. If you later upgrade to a Team plan, you keep the same organization. You can be invited to join multiple organizations. **If you upgraded to a subscription plan before April 2, 2026:** Before this date, a separate organization was created when you upgraded. You still have a Personal account in addition to your subscription organization. This ensures that files and data in your Personal organization are not shared with members of your subscription organization. ### Switch to a Different Organization To switch to a different organization: 1. Log in to [https://va.landing.ai/](https://va.landing.ai/). 2. Click your profile icon at the bottom left corner of the page. 3. Click the name of the organization you're currently in. 4. Select the organization you want to open. ### Change the Organization Name Members with the Admin role can change the organization name. To change the organization name: 1. Log in to [https://va.landing.ai/](https://va.landing.ai/). 2. Go to the [Organization Settings](https://va.landing.ai/settings/organization/general) page (to navigate there manually, click your profile icon at the bottom left corner of the page and click **Organization Settings**). 3. Enter the new name in the **Organization name** field. 4. Click **Save**. ### Organization ID Each organization has an Organization ID, which is a unique identifier. You do not need to include the Organization ID when sending API calls. When contacting support, you might be asked to share your Organization ID. To locate the Organization ID: 1. Log in to [https://va.landing.ai/](https://va.landing.ai/). 2. Go to the [Organization Settings](https://va.landing.ai/settings/organization/general) page (to navigate there manually, click your profile icon at the bottom left corner of the page and click **Organization Settings**). 3. The **Organization ID** displays. 4. Click the **Copy** icon to copy the ID. ## Members and Roles The Explore plan supports one user. To add members to your organization, upgrade to a Team or Enterprise plan. Users in an organization are called **members**. Team and Enterprise plans support unlimited members. For more information about plan features, go to [Pricing & Billing](./ade-pricing). Members are managed on the [Members](https://va.landing.ai/settings/organization/members) page. ### Member Roles An organization includes the following member roles: **Developer** and **Admin**. The following table defines what actions each role can perform:
Action Developer Admin
Process documents
Create API keys
Revoke your own API keys
Revoke API keys other members created
Invite members
Remove members
Change member roles
Manage billing
Update organization name
### Invite Members Members with the Admin role can invite other members to the organization. Invitations to members expire in 2 days. If an invitation has expired, you can invite the user again. If your organization has SSO enabled, inviting users may not be required. For more information, go to [Single Sign-On (SSO)](./ade-sso#user-access-and-jit-provisioning). To invite a member: 1. Log in to [https://va.landing.ai/](https://va.landing.ai/). 2. Go to the [Members](https://va.landing.ai/settings/organization/members) page (to navigate there manually, click your profile icon at the bottom left corner of the page and click **Members**). 3. If you're not currently in the correct organization, select your organization from the drop-down menu. 4. Click **Invite**. 5. Enter the email addresses of the users you want to invite. Separate email addresses with a comma. 6. Select the [role](#member-roles) you want to assign to the users. 7. Click **Invite**. An invitation to join the organization is emailed to the users. Invited users can then join your organization by following the [Complete Your Registration](#complete-your-registration) process. ### Complete Your Registration After you’ve been invited to an organization, you will receive an automated email prompting you to complete your registration. Your account will be linked to the email address the email was sent to. If you want your account to be linked to a different email address, ask the user who invited you to send a new invitation to that address. To complete your registration: 1. Open the automated email. 2. Click the button that prompts you to accept the invitation. A new window or tab opens. 3. Follow the on-screen prompts to either create a new account or log in with an existing account. 4. After logging in, your registration is complete and you can start using . ### Remove Members Members with the Admin role can remove other members from the organization. If you remove a member, any API keys that user created will remain active and can be managed by Admins. To remove members: 1. Log in to [https://va.landing.ai/](https://va.landing.ai/). 2. Go to the [Members](https://va.landing.ai/settings/organization/members) page (to navigate there manually, click your profile icon at the bottom left corner of the page and click **Members**). 3. Locate the row for the member. (You may need to search for the member to narrow down the list of members.) 4. Click the **Settings** button (ellipses) and select **Remove from \[organization]**. Follow the on-screen prompts to complete the process. ### Revoke Invitations Members with the Admin role can revoke invitations that are Pending. To revoke an invitation: 1. Log in to [https://va.landing.ai/](https://va.landing.ai/). 2. Go to the [Members](https://va.landing.ai/settings/organization/members) page (to navigate there manually, click your profile icon at the bottom left corner of the page and click **Members**). 3. Locate the row for the member. (You may need to search for the member to narrow down the list of members.) 4. Click the **Settings** button (ellipses) and select **Revoke invitation**. The user will no longer be able to create an account for your organization. If you revoked the invitation by mistake, you can invite the user again. ### Change Member Roles Members with the Admin role can change the role of other members. To change a member's role: 1. Log in to [https://va.landing.ai/](https://va.landing.ai/). 2. Go to the [Members](https://va.landing.ai/settings/organization/members) page (to navigate there manually, click your profile icon at the bottom left corner of the page and click **Members**). 3. Locate the row for the member. (You may need to search for the member to narrow down the list of members.) 4. Select the role you want to assign to the member. Follow the on-screen prompts to complete the process. # Overview Source: https://docs.landing.ai/ade/ade-overview (ADE) is a document intelligence platform that converts documents into reliable, structured data. Use the structured data from ADE to build retrieval-augmented generation (RAG) applications, power intelligent search systems, extract key information, and automate document processing at scale. ## How ADE Works provides multiple APIs for document processing: [Parse](./ade-separate-apis), [Extract](./ade-extract), [Classify](./ade-classify), [Section](./ade-section), and [Split](./ade-split). Most workflows start with **Parse**, which converts your documents into structured data. After parsing, you can optionally use **Extract** to pull specific data fields, **Section** to generate a hierarchical table of contents, or **Split** to separate multi-document files. Use **Classify** independently to label pages by document type before or without parsing. | API | Description | When to Use | | ---------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | [Parse](./ade-separate-apis) | Converts documents into structured Markdown with hierarchical JSON.

Identifies elements like text, tables, and form fields with exact page and coordinate references. | Use to convert documents into structured data for downstream applications, such as RAG, search, and training LLMs.

This is the required first step for all ADE workflows. | | [Extract](./ade-extract) | Pulls specific data fields from parsed documents using schema-based extraction. | Use when you need the values for specific fields.

Can be used after Parse or Split. | | [Classify](./ade-classify) | Assigns a category to each page of a document based on classes you define. | Use when you need to sort, route, or label pages by document type. Does not require Parse. | | [Section](./ade-section) | Analyzes a parsed document and generates a hierarchical table of contents with section titles, levels, and chunk references. | Use when you need to navigate, scope, or retrieve content by section (for example, for section-aware RAG chunking or document review tools). | | [Split](./ade-split) | Classifies and separates parsed documents into multiple sub-documents based on document types you define. | Use when one file contains multiple documents that need to be separated, such as batched Know Your Customer (KYC) documents. | ## Get Started Create an account and process files in our Playground. Comfortable with using the Playground? Now you can make your first API call in minutes. ## Key Features Delivers high accuracy, even on complex documents. Achieved **99.16% accuracy** on the DocVQA dataset. Identifies specific elements (called "chunks") including text, tables, images, form fields, and bar codes. Includes page numbers and coordinates for each chunk to support traceability, validation, and compliance workflows. Handles any document layout without templates or training—works out of the box. Understands relationships between elements to generate accurate descriptions and maintain proper reading order. Returns results in Markdown for human readability and JSON for programmatic access. Parses PDFs, images, text documents, presentations, and spreadsheets. Parses documents in multiple languages. # Legacy ADE Features Source: https://docs.landing.ai/ade/ade-overview-legacy The `v1/tools/agentic-document-analysis` endpoint, the `agentic-doc` Python library, and the parsing model are legacy and will be deprecated on March 31, 2026. ## Legacy ADE Endpoint The legacy endpoint (`v1/tools/agentic-document-analysis`) was the original endpoint for document processing. It combined document parsing and field extraction into a single API call. This endpoint has been replaced with separate, function-specific APIs: * **[ API](./ade-separate-apis)**: Converts documents into structured Markdown with hierarchical JSON * **[ API](./ade-split)**: Classifies and separates documents into sub-documents * **[ API](./ade-extract)**: Extracts specific data fields from parsed documents ## Legacy agentic-doc Library The [agentic-doc](https://github.com/landing-ai/agentic-doc) Python library has been legacy since September 30, 2025, and will be deprecated on March 31, 2026. It is no longer actively maintained. Use the library for all new projects. ## Legacy DPT-1 Parsing Model was the original (DPT) model for . It powered basic document parsing and was the only model available when launched. has been replaced by , which builds on and adds support for complex tables, additional chunk types (such as logos, barcodes, and signatures), and improved layout detection. Use for all new projects. To migrate, update the `model` parameter in your API calls or library code from `dpt-1` to `dpt-2`. For more information, go to [Document Pre-Trained Transformers (Parsing Models)](./ade-parse-models). # Parse Large Files (Parse Jobs) Source: https://docs.landing.ai/ade/ade-parse-async The [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) API enables you to parse large documents that exceed the size limits of the standard ADE Parse API. ## Monitor Parse Jobs You can monitor parse jobs with these APIs: * [ADE Get Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-get-parse-jobs): Get the status for a specific parse job. * [ADE List Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-list-parse-jobs): List all parse jobs associated with your API key. ## Rate Limits for ADE Parse Jobs The ADE Parse Jobs API allows you to parse large documents. The following table shows the limits for the ADE Parse Jobs API. | Maximum File Size | Maximum Pages | | ----------------- | ------------- | | 1 GB | 6,000 pages | To see the rate limits for all APIs, go to [Rate Limits](./ade-rate-limits). ## API Reference To learn more, go to the reference pages for the Parse Jobs APIs: * [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) * [ADE Get Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-get-parse-jobs) * [ADE List Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-list-parse-jobs) ## Save Parsed Output to a URL When calling the [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) endpoint, you can use the `output_save_url` parameter to save the parsed Markdown to a specified URL instead of returning it in the API response. This is useful for managing large documents, integrating with your existing storage workflow, or complying with data retention policies. ### When Parsed Output Is Saved to a URL The parsed Markdown is saved to a URL in these scenarios: * **You specify the `output_save_url` parameter**: The Markdown is saved to your specified URL. * **The parsed Markdown exceeds 1 MB**: The Markdown is automatically saved to a presigned S3 URL generated by . The URL expires after 1 hour, but time you call the [ADE Get Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-get-parse-jobs) API, a new presigned URL is generated. Behavior when the output is saved to a URL: * The `output_url` field in the API response contains the URL where the Markdown is stored. * The `data` field in the API response is `None`. ### URL Requirements If you specify the `output_save_url` parameter, your URL must meet these requirements: * The URL must be a public or presigned URL that explicitly allows PUT or CREATE operations (depending on the provider). * Tested storage providers: Amazon S3, Azure Blob Storage, and Google Cloud Storage. Other storage providers may also work. * The API cannot access private URLs, such as folders in Google Drive. ### Example: Use Amazon S3 Presigned URLs If you use Amazon S3, you can generate a presigned URL and provide it as the `output_save_url` value. Presigned URLs grant temporary access to your S3 bucket without requiring authentication in the API request. For more information about presigned URLs with Amazon S3, go to the [Amazon documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/PresignedUrlUploadObject.html). Here is a sample script that creates a presigned URL and uses it for a parsing job: ```python theme={null} import requests import json headers = { 'Authorization': 'Bearer YOUR_API_KEY' } url = 'https://api.va.landing.ai/v1/ade/parse/jobs' # Prepare the request payload output_save_url = generate_presigned_url(...) files = {'document_url': 'https://...', 'output_save_url': output_save_url} response = requests.post(url, files=files, headers=headers) print(response.json()) ``` ## ZDR Requirements When [zero data retention](./zdr) (ZDR) is enabled, you must configure the following parameters to ensure that does not store the document content: * **Pass your document in the `document_url` parameter**. You cannot use the `document` parameter with ZDR enabled. * **Include the `output_save_url` parameter**. This ensures that the parsed content is saved to your specified URL instead of being returned in the API response. To learn how to configure this parameter, go to [Save Parsed Output to a URL](#save-parsed-output-to-a-url). ## Workflow Overview 1. Parse a document with the [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) API. 2. Copy the `job_id` in the API response. 3. To get results from the parsing job, call the [ADE Get Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-get-parse-jobs) API with the `job_id`. 4. The parsed content is returned as Markdown in `data.markdown`, or as a URL in `output_url` (in which case `data` is `None`). For more information, go to [Save Parsed Output to a URL](#save-parsed-output-to-a-url). 5. If you need to extract fields: 1. Create an extraction schema. 2. Send the Markdown to the API. ## End-to-End Workflow: Parse and Extract the Output This tutorial walks you through how to parse a document with the ADE Parse Jobs API and then extract a subset of fields from it using the API. For the sake of simplicity, we use a 2-page PDF in this example, with the intention that you will use larger documents in your own use case. We provide a separate script for each endpoint, so you can choose to skip the extraction steps if you don't need them. In this tutorial, we will: * Parse this PDF: MRI Report * Extract these fields: **Exam Date** and **Procedure** ### 1. Download the Document to Process Download the MRI Report and save it to a local directory. ### 2. Create Parse Job & Get Job ID #### Create the Script Copy the script below and save it as `create-parse-job.py` in the same directory as the PDF. ```python [expandable] theme={null} import requests import json headers = { 'Authorization': 'Bearer YOUR_API_KEY' } url = 'https://api.va.landing.ai/v1/ade/parse/jobs' # Upload a document document = open('mri-report.pdf', 'rb') files = {'document': document} data = {'model': 'dpt-2-latest'} response = requests.post(url, files=files, data=data, headers=headers) print(response.json()) ``` #### Run the Script Run the script from the same directory: ```bash theme={null} python create-parse-job.py ``` This returns the `job_id`: ```json theme={null} {'job_id': 'cmfx34ewm0000hyoqkh9dzd8n'} ``` ### 3. Use job\_id to Get Parsing Results #### Create the Script Copy the script below and save it as `get-parse-results.py` in the same directory as the PDF. Replace `{jobId}` with the `job_id` from the previous step. ```python [expandable] theme={null} import requests headers = { 'Authorization': 'Bearer YOUR_API_KEY' } url = f'https://api.va.landing.ai/v1/ade/parse/jobs/{jobId}' response = requests.get(url, headers=headers) response_data = response.json() # Print the full response print(response_data) # Check if job is completed if response_data.get('status') == 'completed': # Check if markdown content is available in data if 'data' in response_data and response_data['data'].get('markdown'): markdown_content = response_data['data']['markdown'] # Save markdown content to file with open('markdown-mri-report.md', 'w', encoding='utf-8') as f: f.write(markdown_content) print("\nMarkdown content saved to a Markdown file.") # Check if output_url is available instead elif response_data.get('output_url'): print("Use the Markdown file specified in `output_url`.") else: print("No Markdown content or `output_url` found in the completed job response.") else: print(f"\nJob status: {response_data.get('status', 'unknown')}.") ``` #### Run the Script Run the script from the same directory: ```bash theme={null} python get-parse-results.py ``` When parsing is complete, the script saves the output to `markdown-mri-report.md`. You will pass this file to the Extract API in the next step. ### 4. Extract Fields from Markdown Now that we have the parsed output in a Markdown file, we're ready to extract these fields: **Exam Date** and **Procedure**. #### Create the Script Copy the script below and save it as `extract-mri-report.py` in the same directory as the Markdown file. ```python [expandable] theme={null} import requests import json headers = { 'Authorization': 'Bearer YOUR_API_KEY' } url = 'https://api.va.landing.ai/v1/ade/extract' # Define the extraction schema schema = json.dumps({ "type": "object", "properties": { "exam_date": { "description": "The date on which the medical examination or procedure was performed.", "format": "YYYY-MM-DD", "x-alternativeNames": [ "Exam Date", "Date of Exam", "Examination Date" ], "type": "string" }, "procedure": { "description": "The specific medical procedure or examination that was conducted, such as an MRI or X-ray.", "x-alternativeNames": [ "Procedure", "Medical Procedure", "Performed Procedure" ], "type": "string" } } }) # Prepare files and data files = {'markdown': open('markdown-mri-report.md', 'rb')} data = {'schema': schema, 'model': 'extract-latest'} # Run extraction response = requests.post(url, files=files, data=data, headers=headers) # Save the results to a JSON file with open('mri-report_extract_output.json', 'w') as f: json.dump(response.json(), f, indent=2) ``` #### Run the Script Run the script from the same directory: ```bash theme={null} python extract-mri-report.py ``` #### View the Output The results are saved to `mri-report_extract_output.json`. The file includes the extracted fields and metadata: ```json [expandable] theme={null} { "extraction": { "exam_date": "2010-05-20", "procedure": "MRI OF THE LUMBAR SPINE WITH AND WITHOUT CONTRAST" }, "extraction_metadata": { "exam_date": { "references": [ "b5183837-035c-4a54-b324-a4c9e8a68027", "1-3" ], "value": "2010-05-20" }, "procedure": { "references": [ "56340fbe-8a51-46a0-a309-d91abd9b8b00" ], "value": "MRI OF THE LUMBAR SPINE WITH AND WITHOUT CONTRAST" } }, "metadata": { "filename": "markdown-mri-report.md", "org_id": null, "duration_ms": 15080, "credit_usage": 1.3256, "job_id": "dfef14d3b66045b8abaf39788b9d17e8", "version": "extract-20260314", "schema_violation_error": null, "fallback_model_version": null, "warnings": [] } } ``` ## Run Parse Jobs with Our Libraries Click one of the tiles below to learn how to run Parse Jobs with our libraries. Run Parse Jobs with our Python library. Run Parse Jobs with our TypeScript library. # Async Parse: Processing Multiple Documents Concurrently Source: https://docs.landing.ai/ade/ade-parse-async-sample ## Overview Process multiple documents concurrently to significantly reduce total processing time compared to sequential requests. These examples require the [Python](./ade-python) or [TypeScript](./ade-typescript) client library. Before running a script, set your API key and install the library and any required dependencies. Use AsyncLandingAIADE for async document processing Use concurrent parsing with Promise.all() or p-limit ## Python Use `AsyncLandingAIADE` when you need to process many lightweight documents (such as invoices, receipts, or forms) efficiently. This async client allows you to send multiple parse requests concurrently using Python's `asyncio`, which significantly reduces total processing time compared to sequential requests. The async approach lets you send multiple requests in parallel. While one document is being processed, another request can be sent. The API server handles the actual document processing in the background. To avoid exceeding the pages per hour limits and receiving `429` errors, use a client-side rate limiter like `aiolimiter` to control concurrency. ```python [expandable] theme={null} import asyncio from pathlib import Path from landingai_ade import AsyncLandingAIADE # Initialize client (uses the API key from the VISION_AGENT_API_KEY environment variable) client = AsyncLandingAIADE() async def main() -> None: response = await client.parse( document=Path("path/to/file"), model="dpt-2-latest" ) print(response.chunks) # Save markdown output (useful if you plan to run extract on the markdown) with open("output.md", "w", encoding="utf-8") as f: f.write(response.markdown) asyncio.run(main()) ``` ## TypeScript Use concurrent parsing when you need to process many lightweight documents (such as invoices, receipts, or forms) efficiently. The TypeScript library's methods are already asynchronous, allowing you to send multiple parse requests concurrently using JavaScript's `Promise.all()` or `Promise.allSettled()`. This significantly reduces total processing time compared to sequential requests. The concurrent approach lets you send multiple requests in parallel. While one document is being processed, another request can be sent. The API server handles the actual document processing in the background. To avoid exceeding the pages per hour limits and receiving `429` errors, use a concurrency control library like `p-limit` to limit the number of simultaneous requests. ### Basic Concurrent Parsing ```typescript [expandable] theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; import path from "path"; // Initialize client (uses the API key from the VISION_AGENT_API_KEY environment variable) const client = new LandingAIADE(); async function parseMultipleDocuments() { // Replace with your folder path containing documents to parse const dataFolder = "data/"; const files = fs.readdirSync(dataFolder); // Filter for file types (adjust as needed for your use case) const documentFiles = files.filter(filename => { const fileExt = path.extname(filename).toLowerCase(); return ['.pdf', '.png', '.jpg', '.jpeg'].includes(fileExt); }); // Parse all documents concurrently const parsePromises = documentFiles.map(filename => { const filepath = path.join(dataFolder, filename); return client.parse({ document: fs.createReadStream(filepath), model: "dpt-2-latest" }).then(response => ({ filename, response })).catch(error => ({ filename, error: error.message })); }); // Wait for all parsing to complete const results = await Promise.all(parsePromises); // Process results for (const result of results) { if ('error' in result) { console.error(`Failed to parse ${result.filename}: ${result.error}`); } else { console.log(`Successfully parsed ${result.filename}`); // Save Markdown output const outputMd = path.basename(result.filename, path.extname(result.filename)) + ".md"; fs.writeFileSync(outputMd, result.response.markdown, "utf-8"); } } } parseMultipleDocuments(); ``` ### Concurrent Parsing with Rate Limiting To control concurrency and avoid rate limits, use the `p-limit` library: ```bash theme={null} npm install p-limit ``` ```typescript [expandable] theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; import path from "path"; import pLimit from "p-limit"; // Initialize client (uses the API key from the VISION_AGENT_API_KEY environment variable) const client = new LandingAIADE(); // Adjust concurrency limit as needed to avoid rate limits const limit = pLimit(5); async function parseMultipleDocumentsWithRateLimit() { // Replace with your folder path containing documents to parse const dataFolder = "data/"; const files = fs.readdirSync(dataFolder); // Filter for file types (adjust as needed for your use case) const documentFiles = files.filter(filename => { const fileExt = path.extname(filename).toLowerCase(); return ['.pdf', '.png', '.jpg', '.jpeg'].includes(fileExt); }); // Parse all documents with concurrency control const parsePromises = documentFiles.map(filename => { // Wrap each parse call with the rate limiter return limit(async () => { const filepath = path.join(dataFolder, filename); console.log(`Parsing ${filename}...`); try { const response = await client.parse({ document: fs.createReadStream(filepath), model: "dpt-2-latest" }); // Save Markdown output const outputMd = path.basename(filename, path.extname(filename)) + ".md"; fs.writeFileSync(outputMd, response.markdown, "utf-8"); console.log(`Successfully parsed ${filename}`); return { filename, success: true }; } catch (error) { console.error(`Failed to parse ${filename}: ${error.message}`); return { filename, success: false, error: error.message }; } }); }); // Wait for all parsing to complete const results = await Promise.all(parsePromises); // Summary const successful = results.filter(r => r.success).length; const failed = results.filter(r => !r.success).length; console.log(`\nCompleted: ${successful} successful, ${failed} failed`); } parseMultipleDocumentsWithRateLimit(); ``` # Custom Prompts for Figure Descriptions Source: https://docs.landing.ai/ade/ade-parse-custom-prompts When you parse a file, the parsing results include descriptions of images called **captions**. Captions appear in the `markdown` field of `figure`-type chunks. For more information, see [Image-Based Chunks](./ade-markdown-response#image-based-chunks). Use the optional `custom_prompts` parameter to tell how to describe figures during parsing. ```bash highlight={4} theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/parse' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'document=@document.pdf' \ -F 'custom_prompts={"figure":"YOUR_CUSTOM_PROMPT"}' ``` ## When to Add Custom Prompts Adding custom prompts can be helpful if: * The default image descriptions do not fit your use case. * The downstream processing tasks require the image descriptions to use a specific format or language. * The images or charts are unique to your organization or use case and you need to provide additional context about what they represent. ## Sample Prompts **Make the description more concise** This is helpful if you want to minimize the number of input tokens when running the Split or Extract APIs, or if the images are straightforward and do not need much description. ```bash theme={null} custom_prompts={"figure":"Limit the description to 50 characters. Be concise but communicate the core message of the image. Do not truncate sentences."} ``` **Omit the description** This is helpful if you don't need the image descriptions in your post-parsing tasks. ```bash theme={null} custom_prompts={"figure":"Do not describe the image. Return an empty string."} ``` **Set the language** This is useful if your downstream tasks require descriptions in a specific language. ```bash theme={null} custom_prompts={"figure":"Return the description in Spanish."} ``` **Return a specific format** This is useful if you want to standardize or change how the image is described. ```bash theme={null} custom_prompts={"figure":"If the image is a chart, format the data as an HTML table"} ``` **Use a standard description** This is useful if you want to return the same string for every figure. ```bash theme={null} custom_prompts={"figure":"Only return this string as the description: This is an image"} ``` ## Supported Models The `custom_prompts` parameter is only supported with . does not generate figure captions, so the parameter does not apply when using that model. Passing it with will return a 422 error. For more information about parsing models, see [Document Pre-Trained Transformers (Parsing Models)](./ade-parse-models). ## Requirements The `custom_prompts` parameter accepts a JSON string. Only the `figure` key is supported. Any other key will be rejected. The custom prompt can include up to 512 characters. ## Use Custom Prompts with the API ```bash Parse theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/parse' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'document=@document.pdf' \ -F 'custom_prompts={"figure":"Describe axis labels in detail."}' ``` ```bash Parse Jobs theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/parse/jobs' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'document=@document.pdf' \ -F 'custom_prompts={"figure":"Describe axis labels in detail."}' ``` ## Use Custom Prompts with Our Libraries When using the library, pass `custom_prompts` as a JSON string using `json.dumps`. The Parse endpoints use multipart form data, so the parameter does not accept a dictionary directly. You can also pass a string literal (for example, `'{"figure": "YOUR_CUSTOM_PROMPT"}'`), but `json.dumps` is preferred because it handles escaping and formatting automatically, reducing the risk of malformed JSON that causes a 422 error. ### Parse ```python Python theme={null} import json from pathlib import Path from landingai_ade import LandingAIADE client = LandingAIADE() # Replace with your file path response = client.parse( document=Path("/path/to/file/document"), custom_prompts=json.dumps({"figure": "YOUR_CUSTOM_PROMPT"}), ) print(response.chunks) # Save Markdown output (useful if you plan to run extract on the Markdown) with open("output.md", "w", encoding="utf-8") as f: f.write(response.markdown) ``` ```typescript TypeScript theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; const client = new LandingAIADE(); // Replace with your file path const response = await client.parse({ document: fs.createReadStream("/path/to/file/document"), custom_prompts: { figure: "YOUR_CUSTOM_PROMPT" }, }); console.log(response.chunks); // Save Markdown to a file if (response.markdown) { fs.writeFileSync("output.md", response.markdown, "utf-8"); } else { console.log("No 'markdown' field found in the response"); } ``` ### Parse Jobs ```python Python [expandable] theme={null} import json import time from pathlib import Path from landingai_ade import LandingAIADE client = LandingAIADE() # Step 1: Create a parse job job = client.parse_jobs.create( document=Path("/path/to/file/document"), custom_prompts=json.dumps({"figure": "YOUR_CUSTOM_PROMPT"}), ) job_id = job.job_id print(f"Job {job_id} created.") # Step 2: Get the parsing results while True: response = client.parse_jobs.get(job_id) if response.status == "completed": print(f"Job {job_id} completed.") break print(f"Job {job_id}: {response.status} ({response.progress * 100:.0f}% complete)") time.sleep(5) ``` ```typescript TypeScript [expandable] theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; const client = new LandingAIADE(); // Step 1: Create a parse job const job = await client.parseJobs.create({ document: fs.createReadStream("/path/to/file/document"), custom_prompts: { figure: "YOUR_CUSTOM_PROMPT" }, }); const jobId = job.job_id; console.log(`Job ${jobId} created.`); // Step 2: Get the parsing results while (true) { const response = await client.parseJobs.get(jobId); if (response.status === "completed") { console.log(`Job ${jobId} completed.`); break; } console.log(`Job ${jobId}: ${response.status} (${(response.progress * 100).toFixed(0)}% complete)`); await new Promise(resolve => setTimeout(resolve, 5000)); } ``` # Parse a Directory of Documents Source: https://docs.landing.ai/ade/ade-parse-directory-sample ## Overview Parse all documents in a folder by iterating through files and calling the parse API for each supported file type. These examples require the [Python](./ade-python) or [TypeScript](./ade-typescript) client library. Before running a script, set your API key and install the library and any required dependencies. ## Scripts ```python Python [expandable] theme={null} from pathlib import Path from landingai_ade import LandingAIADE # Initialize client (uses the API key from the VISION_AGENT_API_KEY environment variable) client = LandingAIADE() # Replace "data/" with your folder path data_folder = Path("data/") for filepath in data_folder.glob("*"): # Adjust file types as needed for your use case if filepath.suffix.lower() in ['.pdf', '.png', '.jpg', '.jpeg']: print(f"Processing: {filepath.name}") response = client.parse( document=filepath, model="dpt-2-latest" ) print(response.chunks) # Save markdown output (useful if you plan to run extract on the markdown) output_md = filepath.stem + ".md" with open(output_md, "w", encoding="utf-8") as f: f.write(response.markdown) ``` ```typescript TypeScript [expandable] theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; import path from "path"; // Initialize client (uses the API key from the VISION_AGENT_API_KEY environment variable) const client = new LandingAIADE(); // Replace with your folder path containing documents to parse const dataFolder = "data/"; const files = fs.readdirSync(dataFolder); for (const filename of files) { const filepath = path.join(dataFolder, filename); const fileExt = path.extname(filepath).toLowerCase(); // Adjust file types as needed for your use case if (['.pdf', '.png', '.jpg', '.jpeg'].includes(fileExt)) { console.log(`Processing: ${filename}`); const response = await client.parse({ document: fs.createReadStream(filepath), model: "dpt-2-latest" }); console.log(response.chunks); // Save Markdown output (useful if you plan to run extract on the Markdown) const outputMd = path.basename(filepath, fileExt) + ".md"; fs.writeFileSync(outputMd, response.markdown, "utf-8"); } } ``` # Parse & Extract Source: https://docs.landing.ai/ade/ade-parse-extract-sample ## Overview This tutorial walks you through how to parse a document with the API and extract specific fields from it with the API. This tutorial uses the [ library](./ade-python) and [ library](./ade-typescript). In this tutorial, we will: * Parse this PDF: Wire Transfer Form * Extract these fields: **Bank Name** and **Total Invoice Amount** These examples require the [Python](./ade-python) or [TypeScript](./ade-typescript) client library. Before running a script, set your API key and install the library and any required dependencies. The scripts have been tested with PDF and PNG files and may work with other file types supported by . ## 1. Download the Document to Process Download the Wire Transfer Form and save it to a local directory. ## 2. Create the Script Copy the script for your language and save it as `parse-extract.py` or `parse-extract.ts` in the same directory as the PDF. ```python Python [expandable] theme={null} import json from pathlib import Path from landingai_ade import LandingAIADE # Initialize client (uses VISION_AGENT_API_KEY environment variable) client = LandingAIADE() # Define the extraction schema schema = json.dumps({ "type": "object", "properties": { "bank_name": { "description": "The official name of the bank where the account is held.", "x-alternativeNames": ["Name of Bank", "Financial Institution", "Bank"], "type": "string" }, "total_invoice_amount": { "description": "The total monetary amount of the invoice, including all charges and taxes.", "x-alternativeNames": ["Grand Total", "Amount Due", "Invoice Total"], "type": "number" } } }) # Parse the document # save_to is optional, but saves the full parse response, which is useful for # keeping a record and for other downstream processing tasks parse_response = client.parse( document=Path('wire-transfer.pdf'), model='dpt-2-latest', save_to='output' ) # Extract fields from the parsed output extract_response = client.extract( schema=schema, markdown=parse_response.markdown, model='extract-latest' ) # Save the extract results to a JSON file with open('output/wire-transfer_extract_output.json', 'w') as f: json.dump(extract_response.to_dict(), f, indent=2) ``` ```typescript TypeScript [expandable] theme={null} import LandingAIADE, { toFile } from "landingai-ade"; import fs from "fs"; // Initialize client (uses VISION_AGENT_API_KEY environment variable) const client = new LandingAIADE(); // Define the extraction schema const schema = JSON.stringify({ type: "object", properties: { bank_name: { description: "The official name of the bank where the account is held.", "x-alternativeNames": ["Name of Bank", "Financial Institution", "Bank"], type: "string" }, total_invoice_amount: { description: "The total monetary amount of the invoice, including all charges and taxes.", "x-alternativeNames": ["Grand Total", "Amount Due", "Invoice Total"], type: "number" } } }); // Parse the document // saveTo is optional, but saves the full parse response, which is useful for // keeping a record and for other downstream processing tasks const parseResponse = await client.parse({ document: fs.createReadStream("wire-transfer.pdf"), model: "dpt-2-latest", saveTo: "output" }); // Extract fields from the parsed output const extractResponse = await client.extract({ schema: schema, markdown: await toFile(Buffer.from(parseResponse.markdown), "document.md"), model: "extract-latest" }); // Save the extract results to a JSON file fs.mkdirSync("output", { recursive: true }); fs.writeFileSync( "output/wire-transfer_extract_output.json", JSON.stringify(extractResponse, null, 2) ); ``` ## 3. Run the Script Run the script from the same directory: ```bash Run Python theme={null} python parse-extract.py ``` ```bash Run TypeScript theme={null} npx tsx parse-extract.ts ``` ## 4. View Extraction Output The results are saved to an `output` folder in the same directory. View the extracted fields and metadata in `wire-transfer_extract_output.json`. ```json [expandable] theme={null} { "extraction": { "bank_name": "JPMorgan Chase Bank, N.A.", "total_invoice_amount": 15750.0 }, "extraction_metadata": { "bank_name": { "references": [ "4f64f8d9-ff3a-4c47-aeb5-2ab6eaa9ce7a" ], "value": "JPMorgan Chase Bank, N.A." }, "total_invoice_amount": { "references": [ "deeb001e-6b3e-4c4e-96b1-6f321521ad4f", "0-h" ], "value": 15750.0 } }, "metadata": { "credit_usage": 0.5396, "duration_ms": 11536, "filename": "upload.md", "job_id": "bec005b58d144096b0525af3aa6ed12d", "org_id": null, "version": "extract-20260314", "fallback_model_version": null, "schema_violation_error": null, "warnings": [] } } ``` ## Next Steps Now that you have a working script, you can: * Replace `wire-transfer.pdf` with any document you want to parse and extract from. * Modify the `schema` dictionary to extract different fields. For guidance, see [Extraction Schema (JSON)](./ade-extract-schema-json). * Use the Playground to build and test a schema before adding it to your code. See [Schema Wizard](./ade-extract-playground). * Link extracted fields back to their locations in the original document. See [Link Extracted Data to Document Locations](./ade-extract-grounding-sample). # Document Pre-Trained Transformers (Parsing Models) Source: https://docs.landing.ai/ade/ade-parse-models ## Parsing Models Overview A (DPT) is the model that powers the parsing capabilities of the ADE Parsing APIs. The DPT identifies document layouts and chunks, then generates descriptive explanations (captions) for those chunks. ## Availability The ability to select a is available: * in the [Playground](#set-the-model-in-the-playground) * when calling the or [Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) APIs * when using the [Python](https://github.com/landing-ai/ade-python) and [TypeScript](https://github.com/landing-ai/ade-typescript) libraries ## Model Versions and Snapshots The following table lists the available `model` values for the and ADE Async Parse API: | Model Values | Description | | ------------------- | --------------------------------------------------------------------------------- | | dpt-2 | The latest snapshot of . | | dpt-2-latest | The latest snapshot of . | | dpt-2-20250919 | The snapshot of released on September 19, 2025. | | dpt-2-20251103 | The snapshot of released on November 3, 2025. | | dpt-2-20260302 | The snapshot of released on March 2, 2026. [See improvements.](#dpt-2-20260302) | | dpt-2-20260410 | The snapshot of released on April 10, 2026. [See improvements.](#dpt-2-20260410) | | dpt-2-mini | The latest snapshot of . | | dpt-2-mini-20251003 | The snapshot of released on October 3, 2025. | | dpt-2-mini-20260302 | The snapshot of released on March 2, 2026. [See improvements.](#dpt-2-20260302) | | dpt-2-mini-latest | The latest snapshot of . | ### Why Model Versioning Matters When integrating the API, you have two options for specifying the model: 1. **Use a general model name** (like `dpt-2` or `dpt-2-latest`) to always get the newest version. This automatically give you improvements and updates, but parsing results may change when new model versions are released 2. **Use a specific snapshot** (like `dpt-2-20250919`) to pin to an exact model version. This ensures consistent parsing results over time, but you won't receive improvements. If you use only a general model name like `dpt-2` in production, your application may produce different results when we release model updates. Consider whether you need consistent results or prefer to receive the latest improvements. ### Understanding Snapshots and -latest **Snapshots** are frozen versions of a model released on specific dates. Each snapshot maintains the same parsing behavior indefinitely, making your results predictable. The **latest** suffix always points to the most recent snapshot of that model. ## DPT-2 was introduced in September 2025. It builds upon an earlier model, and offers these advanced features: * **Agentic Table Captioning**: can parse large, complex, no-gridline, and merged-cell tables with unprecedented fidelity. Every cell is preserved, aligned, and made accessible—enabling cell-level grounding so you know exactly where values came from. * **Refined Figure Captioning**: Logos, seals, and small figures are now identified precisely and concisely, eliminating the noise of verbose descriptions. * **Smarter Layout Detection**: Fewer chunks are missed, even in messy scans. can even detect stamps inside tables and process them separately—critical for compliance workflows. * **Expanded Chunk Ontology**: Beyond text, tables, and figures, now recognizes attestation (signatures, stamps, seals), ID cards, logos, barcodes, and QR codes—ensuring all document elements are classified consistently. To learn more, go to [Chunk Types](./ade-chunk-types). ### dpt-2-20260410 The `dpt-2-20260410` snapshot builds on previous snapshots with these improvements: * **Improved cell parsing in forms and tables**: Text positioned at different locations within a cell is now captured more completely. * **Improved column alignment in complex tables**: Cell data now more accurately aligns with its corresponding column headers. ### dpt-2-20260302 The `dpt-2-20260302` snapshot builds on previous snapshots with several improvements, including: * **Table boundary detection**: Tables that were previously split into multiple chunks are now correctly identified as a single table. * **Improved large table accuracy**: Large tables are now parsed more accurately. * **Special characters returned as Unicode**: Characters such as asterisks are now returned as their Unicode characters (for example, `*`) rather than as spelled-out strings like `asterisk`. The table boundary detection and table parsing improvements in this snapshot are also included in `dpt-2-mini-20260302`. ### DPT-2 Availability The can be used in these API endpoints: * [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) ## DPT-2 mini is a lightweight model optimized for simple, digitally native documents. It provides cost-effective parsing for straightforward document structures. is in Preview. This model is still in development and may not return accurate results. Do not use this model in production environments. ### Supported Features supports: * Digitally native documents, such as PDFs created from digital files. * English text. * Layout detection and document structure identification. * Simple tables. * All [chunk types](./ade-chunk-types), including paragraphs, figures, and more. The model transcribes any text present in image-based chunk types and generates concise descriptions for visual elements. ### Ideal Document Types is ideal for digitally native English documents with straightforward layouts, such as: * Business correspondence (letters, memos, emails) * Simple reports and documentation * Basic forms with key-value pairs * Invoices with simple tables * Digital contracts ### Limitations and When to Use DPT-2 Instead does not support: * Scanned documents or handwritten content. * Non-English languages. * Complex tables with multi-level headers, merged cells, or nested structures. * Very small fonts. * Full visual element analysis. Image-based [chunk types](./ade-chunk-types) (`figure`, `logo`, `card`, `attestation`, and `scan_code`) are identified and receive concise descriptions, but not the in-depth analysis that provides. If your use case requires any of the features that does not support, use instead. ### DPT-2 mini Availability The can be used in these API endpoints: * [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) ## Set the Model in the API When calling the or [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) endpoint, you can set the model using the `model` parameter. If you omit the `model` parameter, the API will use the latest snapshot of the `dpt-2` model. For example, run the command below to use the latest snapshot of . ```shell theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/parse' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'document=@document.pdf' \ -F 'model=dpt-2-latest' ``` ## Set the Model with the Library When using the library, you can set the model using the `model` parameter in the `parse()` function. If you omit the `model` parameter, the library will use the latest snapshot of the `dpt-2` model. For example, use this code to parse a document with the latest snapshot of : ```python theme={null} from pathlib import Path from landingai_ade import LandingAIADE client = LandingAIADE() response = client.parse( document=Path("/path/to/document.pdf"), model="dpt-2-latest" ) ``` ## Set the Model in the Playground To toggle between different models in the Playground: 1. Load a document into the [Playground](https://va.landing.ai/my/playground/ade). 2. Ensure the **Parse** tab is open. 3. Select the model you want to use from the top right corner. Select a model # Parse Password-Protected Files Source: https://docs.landing.ai/ade/ade-parse-password Organizations that have [Zero Data Retention (ZDR)](./zdr) enabled can parse password-protected files. To parse a password-protected file, pass the document's password in the `password` parameter when calling the API. ## Supported File Types for Password-Protected Parsing The following file types support password-protected parsing: | Category | Extensions | | -------------- | -------------- | | PDF | PDF | | Text Documents | DOC, DOCX, ODT | | Presentations | PPT, PPTX | | Spreadsheets | XLSX | ## Parse Password-Protected Files with the API Add the `password` parameter to your request when parsing a password-protected file. The parameter is optional. If the file is not password-protected, the value is ignored. ```bash Parse theme={null} curl -X POST "https://api.va.landing.ai/v1/ade/parse" \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "document=@path/to/file.pdf" \ -F "password=YOUR_DOCUMENT_PASSWORD" ``` ```bash Parse Jobs theme={null} curl -X POST "https://api.va.landing.ai/v1/ade/parse/jobs" \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "document_url=https://example.com/path/to/file.pdf" \ -F "password=YOUR_DOCUMENT_PASSWORD" \ -F "output_save_url=https://example.com/path/to/output" ``` If you submit a password-protected file without the `password` parameter, the request returns a 422 error. For more information, go to [Troubleshoot Parsing](./ade-parse-troubleshoot). ## Parse Password-Protected Files with Our Libraries ### Parse ```python Python theme={null} from pathlib import Path from landingai_ade import LandingAIADE client = LandingAIADE() # Replace with your file path response = client.parse( document=Path("/path/to/file/document"), password="YOUR_DOCUMENT_PASSWORD", ) print(response.chunks) # Save Markdown output (useful if you plan to run extract on the Markdown) with open("output.md", "w", encoding="utf-8") as f: f.write(response.markdown) ``` ```typescript TypeScript theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; const client = new LandingAIADE(); // Replace with your file path const response = await client.parse({ document: fs.createReadStream("/path/to/file/document"), password: "YOUR_DOCUMENT_PASSWORD", }); console.log(response.chunks); // Save Markdown to a file if (response.markdown) { fs.writeFileSync("output.md", response.markdown, "utf-8"); } else { console.log("No 'markdown' field found in the response"); } ``` ### Parse Jobs ```python Python [expandable] theme={null} import time from landingai_ade import LandingAIADE client = LandingAIADE() # Step 1: Create a parse job job = client.parse_jobs.create( document_url="https://example.com/path/to/file.pdf", password="YOUR_DOCUMENT_PASSWORD", output_save_url="https://example.com/path/to/output", ) job_id = job.job_id print(f"Job {job_id} created.") # Step 2: Get the parsing results while True: response = client.parse_jobs.get(job_id) if response.status == "completed": print(f"Job {job_id} completed.") break print(f"Job {job_id}: {response.status} ({response.progress * 100:.0f}% complete)") time.sleep(5) ``` ```typescript TypeScript [expandable] theme={null} import LandingAIADE from "landingai-ade"; const client = new LandingAIADE(); // Step 1: Create a parse job const job = await client.parseJobs.create({ document_url: "https://example.com/path/to/file.pdf", password: "YOUR_DOCUMENT_PASSWORD", output_save_url: "https://example.com/path/to/output", }); const jobId = job.job_id; console.log(`Job ${jobId} created.`); // Step 2: Get the parsing results while (true) { const response = await client.parseJobs.get(jobId); if (response.status === "completed") { console.log(`Job ${jobId} completed.`); break; } console.log(`Job ${jobId}: ${response.status} (${(response.progress * 100).toFixed(0)}% complete)`); await new Promise(resolve => setTimeout(resolve, 5000)); } ``` # Save Parsed Chunks as Images Source: https://docs.landing.ai/ade/ade-parse-save-chunks-images-sample ## Overview Use this script to extract and save each parsed chunk as a separate PNG. This is useful for building datasets, analyzing chunk quality, or processing individual document regions. These examples require the [Python](./ade-python) or [TypeScript](./ade-typescript) client library. Before running a script, set your API key and install the library and any required dependencies. ## Scripts ```python Python [expandable] theme={null} from pathlib import Path from datetime import datetime from landingai_ade import LandingAIADE from PIL import Image import pymupdf def save_chunks_as_images(parse_response, document_path, output_base_dir="groundings"): """Save each parsed chunk as a separate image file.""" # Create timestamped output directory timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") document_name = Path(document_path).stem output_dir = Path(output_base_dir) / f"{document_name}_{timestamp}" def save_page_chunks(image, chunks, page_num): """Save all chunks for a specific page.""" img_width, img_height = image.size # Create page-specific directory page_dir = output_dir / f"page_{page_num}" page_dir.mkdir(parents=True, exist_ok=True) for chunk in chunks: # Check if chunk belongs to this page if chunk.grounding.page != page_num: continue box = chunk.grounding.box # Convert normalized coordinates to pixel coordinates x1 = int(box.left * img_width) y1 = int(box.top * img_height) x2 = int(box.right * img_width) y2 = int(box.bottom * img_height) # Crop the chunk region chunk_img = image.crop((x1, y1, x2, y2)) # Save with descriptive filename filename = f"{chunk.type}.{chunk.id}.png" output_path = page_dir / filename chunk_img.save(output_path) print(f"Saved chunk: {output_path}") if document_path.suffix.lower() == '.pdf': pdf = pymupdf.open(document_path) total_pages = len(pdf) for page_num in range(total_pages): page = pdf[page_num] pix = page.get_pixmap(matrix=pymupdf.Matrix(2, 2)) # 2x scaling for clarity img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples) # Save chunks for this page save_page_chunks(img, parse_response.chunks, page_num) pdf.close() else: # Load image file directly img = Image.open(document_path) if img.mode != "RGB": img = img.convert("RGB") # Save chunks for single page save_page_chunks(img, parse_response.chunks, 0) print(f"\nAll chunks saved to: {output_dir}") return output_dir # Initialize client (uses the API key from the VISION_AGENT_API_KEY environment variable) client = LandingAIADE() # Replace with your file path document_path = Path("/path/to/file/document") # Parse the document print("Parsing document...") parse_response = client.parse( document=document_path, model="dpt-2-latest" ) print("Parsing complete!") # Save chunks as images save_chunks_as_images(parse_response, document_path) ``` ```typescript TypeScript [expandable] theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; import path from "path"; import { createCanvas, loadImage } from "canvas"; import { pdf } from "pdf-to-img"; async function saveChunksAsImages( parseResponse: any, documentPath: string, outputBaseDir: string = "groundings" ) { // Create timestamped output directory const timestamp = new Date().toISOString().replace(/[:.]/g, "-").slice(0, -5); const documentName = path.basename(documentPath, path.extname(documentPath)); const outputDir = path.join(outputBaseDir, `${documentName}_${timestamp}`); async function savePageChunks( imageBuffer: Buffer, chunks: any[], pageNum: number ) { // Load the page image const image = await loadImage(imageBuffer); const imgWidth = image.width; const imgHeight = image.height; // Create page-specific directory const pageDir = path.join(outputDir, `page_${pageNum}`); if (!fs.existsSync(pageDir)) { fs.mkdirSync(pageDir, { recursive: true }); } // Process each chunk for (const chunk of chunks) { // Check if chunk belongs to this page if (chunk.grounding.page !== pageNum) { continue; } const box = chunk.grounding.box; // Convert normalized coordinates to pixel coordinates const x1 = Math.floor(box.left * imgWidth); const y1 = Math.floor(box.top * imgHeight); const x2 = Math.floor(box.right * imgWidth); const y2 = Math.floor(box.bottom * imgHeight); // Calculate crop dimensions const width = x2 - x1; const height = y2 - y1; // Create canvas for cropped chunk const canvas = createCanvas(width, height); const ctx = canvas.getContext("2d"); // Draw the cropped region ctx.drawImage(image, x1, y1, width, height, 0, 0, width, height); // Save with descriptive filename const filename = `${chunk.type}.${chunk.id}.png`; const outputPath = path.join(pageDir, filename); const buffer = canvas.toBuffer("image/png"); fs.writeFileSync(outputPath, buffer); console.log(`Saved chunk: ${outputPath}`); } } const fileExtension = path.extname(documentPath).toLowerCase(); if (fileExtension === ".pdf") { // Convert PDF to images const document = await pdf(documentPath, { scale: 2.0 }); let pageNum = 0; for await (const page of document) { console.log(`Processing page ${pageNum}...`); await savePageChunks(page, parseResponse.chunks, pageNum); pageNum++; } } else { // Load image file directly const imageBuffer = fs.readFileSync(documentPath); await savePageChunks(imageBuffer, parseResponse.chunks, 0); } console.log(`\nAll chunks saved to: ${outputDir}`); return outputDir; } // Initialize client (uses the API key from the VISION_AGENT_API_KEY environment variable) const client = new LandingAIADE(); async function extractChunks() { // Replace with your file path const documentPath = "/path/to/file/document"; // Parse the document console.log("Parsing document..."); const parseResponse = await client.parse({ document: fs.createReadStream(documentPath), model: "dpt-2-latest" }); console.log("Parsing complete!"); // Save each chunk as a separate image await saveChunksAsImages(parseResponse, documentPath); } extractChunks(); ``` ## Directory Structure for Saved Images Images are saved with this structure: ``` groundings/ └── document_TIMESTAMP/ └── page_0/ └── ChunkType.CHUNK_ID.png ``` Where: * `TIMESTAMP` is the time and date the document was parsed (format: `YYYYMMDD_HHMMSS` for Python, ISO format for TypeScript) * `page_0` is the zero-indexed page number * `ChunkType` is the [chunk type](./ade-chunk-types) * `CHUNK_ID` is the unique chunk identifier (UUID format) Example output: ``` groundings/ └── document_20250117_143022/ ├── page_0/ │ ├── text.c5f81e1b-37d2-46bf-89e1-4983c1a36444.png │ ├── table.a2b91c3d-48e5-4f67-9123-5678abcdef12.png │ └── figure.e9f12345-6789-4abc-def0-123456789abc.png └── page_1/ ├── text.f1a23456-7890-4bcd-ef12-3456789abcde.png └── marginalia.b3c45678-9012-4def-5678-90abcdef1234.png ``` # Save Parsed Output Source: https://docs.landing.ai/ade/ade-parse-save-sample ## Overview Use this script to save the parsed output to JSON and Markdown files for downstream processing. These examples require the [Python](./ade-python) or [TypeScript](./ade-typescript) client library. Before running a script, set your API key and install the library and any required dependencies. ## Scripts ```python Python [expandable] theme={null} import json from pathlib import Path from landingai_ade import LandingAIADE # Initialize client (uses the API key from the VISION_AGENT_API_KEY environment variable) client = LandingAIADE() # Parse the document response = client.parse( document=Path("/path/to/file/document"), model="dpt-2-latest" ) # Create output directory if it doesn't exist output_dir = Path("ade_results") output_dir.mkdir(parents=True, exist_ok=True) # Save the response to JSON output_file = output_dir / "parse_results.json" with open(output_file, "w", encoding="utf-8") as f: json.dump(response.model_dump(), f, indent=2, default=str) # Save markdown output (useful if you plan to run extract on the markdown) markdown_file = output_dir / "output.md" with open(markdown_file, "w", encoding="utf-8") as f: f.write(response.markdown) print(f"Results saved to: {output_file}") print(f"Markdown saved to: {markdown_file}") ``` ```typescript TypeScript [expandable] theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; import path from "path"; // Initialize client (uses the API key from the VISION_AGENT_API_KEY environment variable) const client = new LandingAIADE(); async function saveParsedOutput() { // Replace with your file path const response = await client.parse({ document: fs.createReadStream("/path/to/file/document"), model: "dpt-2-latest" }); // Create output directory if it doesn't exist const outputDir = "ade_results"; if (!fs.existsSync(outputDir)) { fs.mkdirSync(outputDir, { recursive: true }); } // Save the response to JSON const outputFile = path.join(outputDir, "parse_results.json"); fs.writeFileSync(outputFile, JSON.stringify(response, null, 2), "utf-8"); // Save Markdown output (useful if you plan to run extract on the Markdown) const markdownFile = path.join(outputDir, "output.md"); fs.writeFileSync(markdownFile, response.markdown, "utf-8"); console.log(`Results saved to: ${outputFile}`); console.log(`Markdown saved to: ${markdownFile}`); } saveParsedOutput(); ``` # Troubleshoot Parsing Source: https://docs.landing.ai/ade/ade-parse-troubleshoot Use this section to troubleshoot issues encountered when calling the parse APIs: * : /v1/ade/parse * **[ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs)**: /v1/ade/parse/jobs * **[ADE Get Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-get-parse-job)**: /v1/ade/parse/jobs/ * **[ADE List Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-list-parse-jobs)**: /v1/ade/parse/jobs ## Common Status Codes These status codes apply to all parse endpoints. | Status Code | Name | Description | What to Do | | ----------- | ----------------- | ----------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | | 401 | Unauthorized | Missing or invalid API key. | Check that your `apikey` header is present and contains a valid [API key](./agentic-api-key). | | 402 | Payment Required | Your account does not have enough credits to complete processing. | If you have multiple accounts, make sure you're using the correct [API key](./agentic-api-key). Add more credits to your account. | | 429 | Too Many Requests | Rate limit exceeded. | Wait before retrying. Reduce request frequency and implement exponential backoff. | ## ADE Parse This section covers errors for the ADE Parse API. ### Status Codes | Status Code | Name | Description | What to Do | | ----------- | --------------------- | ------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------- | | 200 | Success | Document parsed successfully and returned immediately. | Continue with normal operations. | | 206 | Partial Content | Document parsed but some pages failed during processing. | Review the `failed_pages` field in the metadata. See [Status 206](#status-206-partial-content). | | 400 | Bad Request | Invalid request due to document download failure or unsupported model version. | Review error message for specific issue. See [Status 400](#status-400-bad-request). | | 422 | Unprocessable Entity | Input validation failed. | Review your request parameters. See [Status 422](#status-422-unprocessable-entity). | | 500 | Internal Server Error | All pages failed to process. | Retry. If the issue persists, contact [support@landing.ai](mailto:support@landing.ai). See [Status 500](#status-500-internal-server-error). | | 504 | Gateway Timeout | Request processing exceeded the timeout limit (475 seconds). | Reduce document size or number of pages. See [Status 504](#status-504-gateway-timeout). | ### Status 206: Partial Content **Applies to:** ADE Parse and ADE Get Parse Jobs This response occurs when the document was parsed successfully but some pages failed during processing. The response includes: * A 206 status code * Parsed content for successful pages * A `failed_pages` array in the metadata listing which pages failed (zero-indexed) * **For ADE Get Parse Jobs only:** A `failure_reason` field with details about the failures Because the API returns at least partial results, the API call consumes credits. **What to do:** * Review the `failed_pages` field in the metadata to identify which pages failed. * For ADE Get Parse Jobs, also review the `failure_reason` field for details about the failures. * Check if the failed pages are corrupted or have unusual formatting. * If the issue persists with specific pages, contact [support@landing.ai](mailto:support@landing.ai). ### Status 400: Bad Request This status code indicates invalid request parameters or client-side errors. Review the specific error message to identify the issue. #### Error: Failed to download document from URL This error occurs when the API cannot download the document from the provided `document_url`. **Error message:** ``` Failed to download document from URL: {error_details} ``` **What to do:** * Verify the URL is accessible and returns valid content. * Check network connectivity and URL permissions. * Ensure the URL points to a supported document type. #### Error: Invalid Document Format converts text documents and presentations to PDFs before parsing them. This error occurs when a text document or presentation cannot be converted to PDF. The file may be password-protected or corrupted. **Error message:** ``` Invalid document format. The file may be corrupted or password-protected. ``` **What to do:** * Verify the document is not corrupted by opening it in the appropriate application (such as Microsoft Word or Microsoft PowerPoint). * Open the document in the appropriate application (such as Microsoft Word or Microsoft PowerPoint) and resave it. #### Error: Unsupported model This error occurs when an invalid or unsupported model version is specified. **Error message:** ``` Unsupported model: {version} ``` **What to do:** * Check the API documentation for supported model versions. * If you don't specify a version, the API uses the latest version by default. * Verify the model version string is formatted correctly. ### Status 422: Unprocessable Entity This status code indicates input validation failures. Review the error message and adjust your request parameters. #### Error: Cannot provide both 'document' and 'document\_url' This error occurs when both a document file and a URL to a document are provided in the same request. **Error message:** ``` Cannot provide both 'document' and 'document_url'. Please provide only one. ``` **What to do:** Choose one input method and remove the other from your request: * Provide a document file using the `document` parameter. * Provide a URL to a document using the `document_url` parameter. #### Error: Must provide either 'document' or 'document\_url' This error occurs when your request does not include either the `document` or `document_url` parameter. **Error message:** ``` Must provide either 'document' or 'document_url'. ``` **What to do:** Add one of these parameters to your request: * Use the `document` parameter to upload a document file. * Use the `document_url` parameter to provide a URL to a document. #### Error: Invalid URL format This error occurs when the `document_url` parameter contains an invalid URL. **Error message:** ``` Invalid URL format: {url} ``` **What to do:** * Verify the URL is properly formatted with a valid protocol (http\:// or https\://). * Check for typos or missing characters in the URL. * Ensure the URL is properly encoded if it contains special characters. #### Error: PDF must not exceed X pages This error occurs when the PDF page count exceeds your account's page limit. **Error message:** ``` PDF must not exceed {limit} pages. ``` **What to do:** * Reduce the PDF page count. To see the maximum number of pages allowed, go to [Rate Limits](./ade-rate-limits). * Consider using the [ADE Parse Jobs](./ade-parse-async) API, which allows you to process longer documents. #### Error: PDF contains zero pages This error occurs when the PDF file has no pages. **Error message:** ``` PDF contains zero pages. Please provide a PDF with at least one page. ``` **What to do:** Use a valid PDF file that contains at least one page of content. #### Error: Failed to open or read PDF This error occurs when the PDF file is corrupted or cannot be opened. **Error message:** ``` Failed to read PDF document: {error_details} ``` or ``` Failed to open PDF. Ensure it is a valid PDF file. ``` **What to do:** * Use a valid, non-corrupted PDF file. * Open the PDF in a PDF reader to verify it's not corrupted. * Re-save or re-export the PDF. #### Error: Document Is Password-Protected This error occurs when you submit a password-protected file without providing the `password` parameter. **Error message:** ``` Document is password-protected. Please provide the password parameter. ``` **What to do:** * **If you have ZDR enabled**: Add the `password` parameter to your request. For more information, go to [Parse Password-Protected Files](./ade-parse-password). * **If you don't have ZDR enabled**: Parsing password-protected files is not supported for your account. Remove the password and try again. #### Error: Failed to Decrypt Document This error occurs if you have [Zero Data Retention (ZDR)](./zdr) enabled, included the `password` parameter (see [Parse Password-Protected Files](./ade-parse-password)), and the password is incorrect or the file is corrupted. **Error message:** ``` Failed to decrypt {document_type} document. The password is incorrect or the file is corrupted. ``` **What to do:** * Verify the password is correct. * Open the file in the appropriate application to confirm it is not corrupted. #### Error: Password-Protected Documents Not Supported for Your Account This error occurs when the `password` parameter is included in the request but [Zero Data Retention (ZDR)](./zdr) is not enabled on your account. **Error message:** ``` Password-protected documents are not currently supported for your account. Please remove the password from your document before uploading. ``` **What to do:** * Remove the password from the document before uploading and try again. * To parse password-protected files, enable ZDR on your account. For more information, go to [Zero Data Retention](./zdr). #### Error: Multiple document files detected This error occurs when multiple document files are included in the request. **Error message:** ``` Multiple document files detected (X). Please provide only one document file. ``` **What to do:** Send only one document file per request. #### Error: File is empty This error occurs when the uploaded file contains no data. **Error message:** ``` File is empty. ``` **What to do:** Ensure you are uploading a valid file with content (not an empty file). #### Error: Failed to Convert Document to Supported Format converts text documents and presentations to PDFs before parsing them. This error occurs when a document is converted to PDF successfully but the resulting PDF is empty or contains no extractable content. **Error message:** ``` Failed to convert document to supported format ``` **What to do:** * Verify the document contains actual content, not just blank pages or empty slides. * Check that the document doesn't consist only of unsupported elements (such as embedded objects that cannot be converted). * Open the document in the appropriate application (such as Microsoft Word or Microsoft PowerPoint) and resave it. #### Error: Unsupported Format This error occurs when the uploaded file format is not supported. **Error message:** ``` Unsupported format: {mime_type} ({filename}). Supported formats: {supported_formats} ``` **What to do:** * Check the list of [supported file types](./ade-file-types). * Convert your document to a supported format before uploading. * Verify the file extension matches the actual file content. #### Error: Unsupported Spreadsheet Format This error occurs when the uploaded spreadsheet file format is not supported. **Error message:** ``` Unsupported format: {filename}. Supported formats: .xlsx, .csv ``` **What to do:** * Convert your spreadsheet to .xlsx or .csv format. * Verify the file extension matches the actual file content. #### Error: Spreadsheet File Too Large This error occurs when the uploaded spreadsheet exceeds the 50 MB size limit. **Error message:** ``` Spreadsheet file too large (max 50 MB). Please split into smaller files. ``` **What to do:** * Split your spreadsheet into multiple smaller files. * Remove unnecessary data or sheets to reduce file size. #### Error: Invalid `custom_prompts` Value **Applies to:** ADE Parse and ADE Parse Jobs This error occurs when the `custom_prompts` parameter fails validation. Common causes include: * The figure prompt exceeds 512 characters * An unsupported key is used (only `figure` is supported) * The value for the `figure` key is not a string * The value is not a JSON object * The value is not valid JSON **Error messages:** ``` custom_prompts['figure'] must be 512 characters or fewer. ``` ``` Input should be 'figure' ``` **What to do:** * Ensure `custom_prompts` is a valid JSON string in object format: `{"figure": "your prompt"}`. * Use only the `figure` key. Any other key will be rejected. * Keep the figure prompt to 512 characters or fewer. * For more information, see [Custom Prompts for Figure Descriptions](./ade-parse-custom-prompts). #### Error: `custom_prompts` Not Supported for Model **Applies to:** ADE Parse and ADE Parse Jobs This error occurs when the `custom_prompts` parameter is used with a model that does not support it, such as . **Error message:** ``` custom_prompts is not supported for the '{model_name}' model. Please use a different model or remove the custom_prompts parameter. ``` **What to do:** * Use when using `custom_prompts`. For more information, see [Custom Prompts for Figure Descriptions](./ade-parse-custom-prompts). ### Status 500: Internal Server Error This error indicates all pages in the document failed to process. **Error message:** ``` Failed to process the document ``` **What to do:** * Retry the request. * Check if the document has unusual formatting or corrupted content. * If the document is very large, process individual pages. * If the error persists, contact [support@landing.ai](mailto:support@landing.ai). ### Status 504: Gateway Timeout This error occurs when the parsing process exceeds the timeout limit (475 seconds). **Error message:** ``` Request timed out after {seconds} seconds ``` **What to do:** * Reduce the document size or number of pages. * Split large documents into smaller files. * If the error persists, contact [support@landing.ai](mailto:support@landing.ai). ## ADE Parse Jobs This section covers errors for the ADE Parse Jobs API, which creates asynchronous parse jobs. ### Status Codes | Status Code | Name | Description | What to Do | | ----------- | -------------------- | --------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | | 202 | Accepted | Job created successfully and queued for processing. | Use the returned `job_id` to check job status with the [ADE Get Parse Jobs](#ade-get-parse-jobs) API. | | 400 | Bad Request | Invalid request parameters. | Review the error message for details. See [Status 400](#status-400-bad-request-2). | | 422 | Unprocessable Entity | Input validation failed. | Review your request parameters. Common errors include missing required fields, ZDR configuration issues, or invalid input. See errors below. | | 429 | Too Many Requests | Rate limit exceeded. | Wait before retrying. Reduce request frequency and implement exponential backoff. | ### Status 400: Bad Request #### Error: output\_save\_url must be present if zeroDataRetention is enabled This error occurs when Zero Data Retention (ZDR) is enabled but no output save URL is provided. **Error message:** ``` output_save_url must be present if zeroDataRetention is enabled ``` **What to do:** When using Zero Data Retention (ZDR), you must provide an `output_save_url` where the parsed results will be saved. The results will not be returned in the API response when ZDR is enabled. #### Error: Only document\_url is accepted if zeroDataRetention is enabled This error occurs when Zero Data Retention (ZDR) is enabled but a document file is uploaded instead of providing a URL. **Error message:** ``` Only document_url is accepted if zeroDataRetention is enabled ``` **What to do:** When using Zero Data Retention (ZDR), you must use the `document_url` parameter to provide a URL to your document. Direct file uploads via the `document` parameter are not supported with ZDR. ### Status 500: Internal Server Error This error indicates a server-side failure during job creation or enqueuing. **Error message:** ``` Failed to enqueue job for processing ``` or ``` Internal server error during document parsing ``` or ``` Failed to create async job ``` or ``` Failed to upload document to S3 ``` **What to do:** * Retry the request. * If the error persists, contact [support@landing.ai](mailto:support@landing.ai). ## ADE Get Parse Jobs Use this section to troubleshoot issues encountered when calling the **[ADE Get Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-get-parse-job)** API. ### Status Codes | Status Code | Name | Description | What to Do | | ----------- | --------------------- | ------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------- | | 200 | Success | Job status retrieved successfully. | Check the `status` field. If `completed`, results are in the `data` field (for small results) or `output_url` field (for large results or ZDR). | | 206 | Partial Content | Job completed but some pages failed during processing. | Review the `failed_pages` field in the metadata and the `failure_reason` field. See [Status 206](#status-206-partial-content). | | 404 | Not Found | Job with the specified ID not found. | Verify the job ID is correct. The job may belong to a different API key. | | 422 | Unprocessable Entity | Input validation failed. | Verify the job ID format is correct. | | 500 | Internal Server Error | Server error during job status retrieval. | Retry. If the issue persists, contact [support@landing.ai](mailto:support@landing.ai). | ### Job Status Values The `status` field in the response indicates the current state of the job: | Status | Description | | ------------ | --------------------------------------------------------------------------- | | `pending` | Job is queued and waiting to be processed. | | `processing` | Job is currently being processed. | | `completed` | Job completed successfully. Results are available in the response. | | `failed` | Job failed during processing. Check the `failure_reason` field for details. | | `cancelled` | Job was cancelled. | ### Understanding the Response **For completed jobs:** * If results are less than 1 MB, they appear in the `data` field as a `ParseResponse` object. * If results are 1 MB or larger, the `output_url` field contains a presigned S3 URL (expires after 1 hour). * If Zero Data Retention is enabled, results are always saved to your `output_save_url` and not included in the response. **For failed jobs:** * Check the `failure_reason` field for error details. * Common failure reasons include document processing errors or timeout. **Progress tracking:** * The `progress` field shows completion as a decimal from 0 to 1 (e.g., 0.5 = 50% complete). * Progress is based on the number of pages processed. ### Partial Content in Completed Jobs A job can have status `completed` even if some pages failed to process. When this occurs, the API returns a 206 status code. For details on handling partial content, see [Status 206: Partial Content](#status-206-partial-content). ## ADE List Parse Jobs Use this section to troubleshoot issues encountered when calling the **[ADE List Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-list-parse-jobs)** API. ### Status Codes | Status Code | Name | Description | What to Do | | ----------- | --------------------- | -------------------------------- | -------------------------------------------------------------------------------------- | | 200 | Success | Job list retrieved successfully. | Review the list of jobs and their statuses. | | 500 | Internal Server Error | Server error during job listing. | Retry. If the issue persists, contact [support@landing.ai](mailto:support@landing.ai). | ### Query Parameters You can filter and paginate the job list using these parameters: * `page`: Page number for pagination (default: 1) * `page_size`: Number of jobs per page (default: 10) * `status`: Filter by job status (`pending`, `processing`, `completed`, `failed`, `cancelled`) ### Understanding the Response The response contains: * `jobs`: Array of job summaries with `job_id`, `status`, `received_at` timestamp, and `progress`. * `has_more`: Boolean indicating if more pages are available. * For failed jobs, the `failure_reason` field provides error details. ## When Are Credits Consumed? Credits are consumed only when the or ADE Get Parse Jobs API returns a 200 or 206 status code. All other responses, including errors, do not consume credits. # Visualize Parsed Chunks: Draw Bounding Boxes Source: https://docs.landing.ai/ade/ade-parse-visualize-sample ## Overview Use this script to visualize parsed chunks by drawing color-coded bounding boxes on your document. Each chunk type uses a distinct color, making it easy to see how the document was parsed. The script identifies [chunk types](./ade-chunk-types) and table cells. For PDFs, the script creates a separate annotated PNG for each page (`page_1_annotated.png`, `page_2_annotated.png`). For image files, the script creates a single `page_annotated.png`. The image below shows an example output with bounding boxes drawn on the first page of a PDF (Python library): Annotated PDF Page These examples require the [Python](./ade-python) or [TypeScript](./ade-typescript) client library. Before running a script, set your API key and install the library and any required dependencies. ## Scripts ```python Python [expandable] theme={null} from pathlib import Path from landingai_ade import LandingAIADE from PIL import Image, ImageDraw import pymupdf # Define colors for each chunk type CHUNK_TYPE_COLORS = { "chunkText": (40, 167, 69), # Green "chunkTable": (0, 123, 255), # Blue "chunkMarginalia": (111, 66, 193), # Purple "chunkFigure": (255, 0, 255), # Magenta "chunkLogo": (144, 238, 144), # Light green "chunkCard": (255, 165, 0), # Orange "chunkAttestation": (0, 255, 255), # Cyan "chunkScanCode": (255, 193, 7), # Yellow "chunkForm": (220, 20, 60), # Red "tableCell": (173, 216, 230), # Light blue "table": (70, 130, 180), # Steel blue } def draw_bounding_boxes(parse_response, document_path): """Draw bounding boxes around each chunk.""" def create_annotated_image(image, groundings, page_num=0): """Create an annotated image with grounding boxes and labels.""" annotated_img = image.copy() draw = ImageDraw.Draw(annotated_img) img_width, img_height = image.size for gid, grounding in groundings.items(): # Check if grounding belongs to this page (for PDFs) if grounding.page != page_num: continue box = grounding.box # Extract coordinates from box left, top, right, bottom = box.left, box.top, box.right, box.bottom # Convert to pixel coordinates x1 = int(left * img_width) y1 = int(top * img_height) x2 = int(right * img_width) y2 = int(bottom * img_height) # Draw bounding box color = CHUNK_TYPE_COLORS.get(grounding.type, (128, 128, 128)) # Default to gray draw.rectangle([x1, y1, x2, y2], outline=color, width=3) # Draw label background and text label = f"{grounding.type}:{gid}" label_y = max(0, y1 - 20) draw.rectangle([x1, label_y, x1 + len(label) * 8, y1], fill=color) draw.text((x1 + 2, label_y + 2), label, fill=(255, 255, 255)) return annotated_img if document_path.suffix.lower() == '.pdf': pdf = pymupdf.open(document_path) total_pages = len(pdf) base_name = document_path.stem for page_num in range(total_pages): page = pdf[page_num] pix = page.get_pixmap(matrix=pymupdf.Matrix(2, 2)) # 2x scaling img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples) # Create and save annotated image annotated_img = create_annotated_image(img, parse_response.grounding, page_num) annotated_path = f"page_{page_num + 1}_annotated.png" annotated_img.save(annotated_path) print(f"Annotated image saved to: {annotated_path}") pdf.close() else: # Load image file directly img = Image.open(document_path) if img.mode != "RGB": img = img.convert("RGB") # Create and save annotated image annotated_img = create_annotated_image(img, parse_response.grounding) annotated_path = "page_annotated.png" annotated_img.save(annotated_path) print(f"Annotated image saved to: {annotated_path}") return None # Initialize client (uses the API key from the VISION_AGENT_API_KEY environment variable) client = LandingAIADE() # Replace with your file path document_path = Path("/path/to/file/document") # Parse the document print("Parsing document...") parse_response = client.parse( document=document_path, model="dpt-2-latest" ) print("Parsing complete!") # Draw bounding boxes and create annotated images draw_bounding_boxes(parse_response, document_path) ``` ```typescript TypeScript [expandable] theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; import path from "path"; import { createCanvas, loadImage } from "canvas"; import { pdf } from "pdf-to-img"; // Define colors for each chunk type const CHUNK_TYPE_COLORS: Record = { chunkText: [40, 167, 69], // Green chunkTable: [0, 123, 255], // Blue chunkMarginalia: [111, 66, 193], // Purple chunkFigure: [255, 0, 255], // Magenta chunkLogo: [144, 238, 144], // Light green chunkCard: [255, 165, 0], // Orange chunkAttestation: [0, 255, 255], // Cyan chunkScanCode: [255, 193, 7], // Yellow chunkForm: [220, 20, 60], // Red tableCell: [173, 216, 230], // Light blue table: [70, 130, 180], // Steel blue }; function rgbToString(rgb: [number, number, number]): string { return `rgb(${rgb[0]}, ${rgb[1]}, ${rgb[2]})`; } async function drawAnnotations( imageBuffer: Buffer, groundings: any, pageNum: number ): Promise { // Load the rendered PDF page image const image = await loadImage(imageBuffer); const canvas = createCanvas(image.width, image.height); const ctx = canvas.getContext("2d"); // Draw the original PDF page ctx.drawImage(image, 0, 0); const imgWidth = canvas.width; const imgHeight = canvas.height; // Draw bounding boxes for each grounding for (const [gid, grounding] of Object.entries(groundings)) { const g = grounding as any; // Check if grounding belongs to this page if (g.page !== pageNum) { continue; } const box = g.box; // Convert normalized coordinates to pixel coordinates const x1 = Math.floor(box.left * imgWidth); const y1 = Math.floor(box.top * imgHeight); const x2 = Math.floor(box.right * imgWidth); const y2 = Math.floor(box.bottom * imgHeight); // Get color for this chunk type (default to gray) const color = CHUNK_TYPE_COLORS[g.type] || [128, 128, 128]; const colorString = rgbToString(color); // Draw bounding box ctx.strokeStyle = colorString; ctx.lineWidth = 3; ctx.strokeRect(x1, y1, x2 - x1, y2 - y1); // Draw label background and text const label = `${g.type}:${gid.substring(0, 8)}`; const labelY = Math.max(0, y1 - 20); ctx.fillStyle = colorString; ctx.fillRect(x1, labelY, label.length * 8, 20); ctx.fillStyle = "white"; ctx.font = "12px sans-serif"; ctx.fillText(label, x1 + 2, labelY + 14); } return canvas.toBuffer("image/png"); } async function drawBoundingBoxes(parseResponse: any, documentPath: string) { const fileExtension = path.extname(documentPath).toLowerCase(); if (fileExtension === ".pdf") { // Convert PDF to images using pdf-to-img const document = await pdf(documentPath, { scale: 2.0 }); let pageNum = 0; for await (const page of document) { console.log(`Processing page ${pageNum + 1}...`); // Draw annotations on the rendered PDF page const annotatedBuffer = await drawAnnotations( page, parseResponse.grounding, pageNum ); // Save annotated image const outputPath = `page_${pageNum + 1}_annotated.png`; fs.writeFileSync(outputPath, annotatedBuffer); console.log(`Annotated image saved to: ${outputPath}`); pageNum++; } } else { // Load image file directly const image = await loadImage(documentPath); const canvas = createCanvas(image.width, image.height); const ctx = canvas.getContext("2d"); // Draw the image ctx.drawImage(image, 0, 0); const imgWidth = canvas.width; const imgHeight = canvas.height; // Draw bounding boxes for page 0 for (const [gid, grounding] of Object.entries(parseResponse.grounding)) { const g = grounding as any; if (g.page !== 0) continue; const box = g.box; const x1 = Math.floor(box.left * imgWidth); const y1 = Math.floor(box.top * imgHeight); const x2 = Math.floor(box.right * imgWidth); const y2 = Math.floor(box.bottom * imgHeight); const color = CHUNK_TYPE_COLORS[g.type] || [128, 128, 128]; const colorString = rgbToString(color); ctx.strokeStyle = colorString; ctx.lineWidth = 3; ctx.strokeRect(x1, y1, x2 - x1, y2 - y1); const label = `${g.type}:${gid.substring(0, 8)}`; const labelY = Math.max(0, y1 - 20); ctx.fillStyle = colorString; ctx.fillRect(x1, labelY, label.length * 8, 20); ctx.fillStyle = "white"; ctx.font = "12px sans-serif"; ctx.fillText(label, x1 + 2, labelY + 14); } // Save annotated image const buffer = canvas.toBuffer("image/png"); fs.writeFileSync("page_annotated.png", buffer); console.log("Annotated image saved to: page_annotated.png"); } } // Initialize client (uses the API key from the VISION_AGENT_API_KEY environment variable) const client = new LandingAIADE(); async function visualizeChunks() { // Replace with your file path const documentPath = "/path/to/file/document"; // Parse the document console.log("Parsing document..."); const parseResponse = await client.parse({ document: fs.createReadStream(documentPath), model: "dpt-2-latest" }); console.log("Parsing complete!"); // Draw bounding boxes and create annotated images await drawBoundingBoxes(parseResponse, documentPath); } visualizeChunks(); ``` # Playground Source: https://docs.landing.ai/ade/ade-playground ## Overview The [Playground](https://va.landing.ai/) is a web-based application that lets you quickly try out our APIs without writing any code. You can run the [Parse](./ade-separate-apis), [Split](./ade-split), and [Extract](./ade-extract) tools on your documents and see the output directly in the interface. ADE Playground ## From Playground to Production If you're new to , start with the [Playground](https://va.landing.ai/) rather than jumping straight into the APIs. The Playground gives you an immediate, visual way to explore what [Parse](./ade-separate-apis), [Split](./ade-split), and [Extract](./ade-extract) can do with your specific documents. Use the Playground to: * **Test your use case before writing any code.** Upload your actual documents and see how handles them. Can it parse faxed medical records? Extract supplier IDs consistently across invoices from different vendors? Handle long financial tables in earnings reports? You'll get immediate answers. * **Build extraction and split schemas visually.** The schema wizard guides you through creating, editing, and validating schemas using a point-and-click interface. * **Export production-ready code.** Once you're satisfied with the results, the Playground generates the API or library code you need to replicate the same processing programmatically. When you're ready to scale, use the [API](https://docs.landing.ai/api-reference/tools/ade-parse) or the [Python](./ade-python) or [TypeScript](./ade-typescript) libraries. The Playground is not intended for production use and has limitations such as lower rate limits. ## Files Are Organized into Projects Files in the Playground are organized into **projects**. Projects let you group similar documents together. You can then [create and apply extraction schemas](./ade-extract-playground) across multiple documents at once. Recent Projects To view all files you've uploaded to the Playground, click **Projects** in the left navigation bar. View All Projects ## Chat with Document After the Playground parses a file, you can use the **Chat with Document** tool to interact with the document. The Chat with Document tool is an LLM layered on top of the output of the API. The chat tool showcases how the API accurately parses and understands document data, including element locations. Use the chat tool to get inspired for how you can build custom solutions on top of the API. The Chat with Document tool suggests a few prompts based on your document. You can also enter your own prompts. Like the Playground itself, the Chat with Document tool is an example of what you can build, and is not part of the API. ## Manage Your Account in the Playground The Playground is also used to create and manage your account. For more information, go to [Organizations & Members](./ade-members). # Pricing & Billing Source: https://docs.landing.ai/ade/ade-pricing offers various plans for . Pricing varies by region: * [US](#us-pricing) * [EU](#eu-pricing) [Credit costs](#credit-costs) are identical for both the US and EU regions. ## US Pricing See the full details for the pricing plans and upgrade in the [app](https://va.landing.ai/plan). All users start on our pay-as-you-go Explore plan with free credits. After that, you can stay on the Explore plan and buy credits as needed, or upgrade to a Team or Enterprise account.
Explore Team Enterprise
Cost Pay-as-you-go
Start with 1000 free credits
Credit packs start at \$250/mo Contact us
Credit value \$1 buys 100 credits \$1 buys 110 credits Contact us
## EU Pricing is available in the EU at [https://va.eu-west-1.landing.ai/](https://va.eu-west-1.landing.ai/). See the full details for the pricing plans and upgrade in the [app](https://va.eu-west-1.landing.ai/plan). We price in US dollars across all regions. Your payment will be processed in USD using current exchange rates. All users start on our pay-as-you-go Explore plan with free credits. After that, you can stay on the Explore plan and buy credits as needed, or upgrade to a Team or Enterprise account.
Explore Team Enterprise
Cost Pay-as-you-go
Start with 1000 free credits
Credit packs start at \$325/mo Contact us
Credit value \$1 buys 76 credits \$1 buys 84 credits Contact us
## Plan Features
Feature Explore Team Enterprise
Core Capabilities
Parsing
[view supported file formats](./ade-file-types)
Intelligent Chunking
Field Extraction
List Extraction
Table Extraction
Figure Summaries
Visual Grounding
Split Classification
Multilingual documents
Confidence Scoring
Custom Processing Pipeline
Organization Management & Support
Users 1 Unlimited users Unlimited users
Support Level Community Enhanced Designated
Security & Deployment
Rate Limits
[see more details](./ade-rate-limits)
Standard Higher Priority
Zero Data Retention (ZDR)
HIPAA (BAA) (inc. ZDR)
VPC and On-Prem Deployments
SLA & Uptime Guarantees
Single Sign-On (SSO)
[see more details](./ade-sso)
## Credit Costs The credit consumption is determined by the specific API endpoint called. * [Parse APIs](#credit-costs-for-the-parse-apis) * [Extract API](#credit-costs-for-the-extract-api) * [Build Extract Schema API](#credit-costs-for-the-build-extract-schema-api) * [Classify API](#credit-costs-for-the-classify-api) * [Section API](#credit-costs-for-the-section-api) * [Split API](#credit-costs-for-the-split-api) ### Credit Costs for the Parse APIs This section explains credit use for these APIs: * [ API](https://docs.landing.ai/api-reference/tools/ade-parse) * [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) #### Documents This pricing applies when using the parsing model. For pricing, see [DPT-2 mini](#dpt-2-mini). The number of credits used to parse a document is based on the number of pages and the features used in parsing, as shown in the following table. | Feature | Credit Cost | Notes | | :-------------- | :------------- | :--------------------------------------------------------------------------- | | Parsing | 3 credits/page | Each page processed includes parsing. | | ZDR (HIPAA/BAA) | +1 credit/page | Additional charge when enabled.
Available on Team and Enterprise plans. | #### Spreadsheets This pricing applies when using the parsing model. For pricing, see [DPT-2 mini](#dpt-2-mini). The number of credits used to parse a spreadsheet is based on the number of sheets and embedded images. For supported spreadsheet formats, see [Supported File Types](./ade-file-types). | Feature | Credit Cost | Notes | | :-------------- | :----------------------------------- | :--------------------------------------------------------------------------- | | Parsing | 1 credit/sheet | Each sheet processed includes parsing. | | Embedded images | 3 credits/image | Cost per embedded image, chart, or logo. | | ZDR (HIPAA/BAA) | +1 credit/sheet
+1 credit/image | Additional charge when enabled.
Available on Team and Enterprise plans. | **Example:** A spreadsheet with 1 sheet and 2 embedded images: * Base cost: 1 credit (sheet) + 6 credits (2 images) = **7 credits** * With ZDR enabled: 7 credits (base) + 1 credit (sheet) + 2 credits (2 images) = **10 credits** #### DPT-2 mini is a lightweight model that consumes fewer credits than other parsing models. The number of credits used to parse a document is based on the number of pages and the features used in parsing, and is rounded up to the nearest tenth decimal place. | Feature | Credit Cost | Notes | | :-------------- | :--------------- | :--------------------------------------------------------------------------- | | Parsing | 1.5 credits/page | Each page processed includes parsing. | | ZDR (HIPAA/BAA) | +1 credit/page | Additional charge when enabled.
Available on Team and Enterprise plans. | ### Credit Costs for the Extract API This section explains credit use for this API: * [ API](https://docs.landing.ai/api-reference/tools/ade-extract) The number of credits used to extract data is based on both the number of input characters and output characters, and is rounded up to the nearest tenth decimal place. | Factor | Credit Cost | Notes | | :---------------------- | :---------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | 5,000 input characters | 1 credit | Input characters are the number of characters in the Markdown file that is passed to the API. | | 1,000 output characters | 1 credit | Output characters are the number of characters in the `extraction` object returned by the API, excluding whitespace used for indentation. (Other spaces, like the spaces after colons in key-value pairs, are not removed.) | **Formula** Here is the formula used to calculate the credit cost: ```text theme={null} credits = (input characters ÷ 5,000) + (output characters ÷ 1,000) Result is rounded up to the nearest tenth decimal place. ``` #### Sample Cost for the Extract API **Input Characters** Let's say you run the API on a Markdown file with the following content. The file has 2,270 characters. Therefore, **the number of input characters is 2,270.** ``` \n\nAROMA CAFE\n1211 Green Street\nNew York, NY 10005\n\n\n\n12/27/2019\n\n\n\n08:26 PM\n\n\n\nTAB 8 HOST MAGGIE\n\n\n\nAMEX ##########19883\n\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
QTYDESCAMT
1Americano$3.19
1Almond Scone$1.99
116oz Bottle Water$2.99
\n\n\n\nAMT: $8.70\n\nSUBTOTAL: $8.17\nTAX: $0.53\n\nBALANCE: $8.70\n\n\n\nSummary : This image displays a barcode consisting of a series of vertical black and white lines, used for encoding information in a machine-readable format.\n\nbarcode:\n Barcode Details :\n • The barcode is composed of alternating black and white vertical bars of varying thickness.\n • No human-readable numeric or alphanumeric string is visible below or above the barcode.\n • Barcode type is not explicitly indicated; the pattern resembles a standard linear barcode (e.g., Code 128 or EAN-13), but cannot be definitively identified without further context.\n • The barcode contains approximately 60–70 modules (individual bars and spaces).\n • Quiet zones (blank margins) are present on both the left and right sides of the barcode.\n • The barcode is horizontally oriented and centered on a white background.\n • No additional text, logo, or annotation is present.\n\nAnalysis :\n • The barcode is designed for machine scanning and does not convey any human-readable information in this image.\n • The absence of a numeric string or label means the encoded data cannot be determined visually.\n • The presence of quiet zones and consistent bar height suggests it is formatted for standard retail or inventory use.\n\n\n\nSAMPLE RECEIPT ``` **Output Characters** Let's say you extract multiple fields from the Markdown file, and the following is the content in the `extraction` object returned by the API. The response has 475 characters. Therefore, **the number of output characters is 475.** ``` {'merchant': {'name': 'AROMA CAFE', 'address': '1211 Green Street', 'city_state_zip': 'New York, NY 10005'}, 'transaction': {'date': '2019-12-27', 'time': '20:26', 'payment_method': 'AMEX ##########19883', 'tab_host': 'TAB 8 HOST MAGGIE'}, 'items': [{'qty': '1', 'desc': 'Americano', 'amt': '3.19'}, {'qty': '1', 'desc': 'Almond Scone', 'amt': '1.99'}, {'qty': '1', 'desc': '16oz Bottle Water', 'amt': '2.99'}], 'totals': {'subtotal': '8.17', 'tax': '0.53', 'total': '8.70'}} ``` **Calculate the Total** Now that you have your input and output character counts, you can calculate the credit cost: | Step | Calculation | Result | | ----------------------- | --------------------------------------- | ---------- | | Input credits | 2,270 ÷ 5,000 | 0.454 | | Output credits | 475 ÷ 1,000 | 0.475 | | Total (before rounding) | 0.454 + 0.475 | 0.929 | | Final cost | Round up to nearest tenth decimal place | 1.0 credit | ### Credit Costs for the Build Extract Schema API This section explains credit use for this API: * [ API](https://docs.landing.ai/api-reference/tools/ade-build-schema) The number of credits used to build an extraction schema is based on both the number of input characters and output characters, and is rounded up to the nearest tenth decimal place. | Factor | Credit Cost | Notes | | :---------------------- | :---------- | :----------------------------------------------------------------------------------------------------------------------------------- | | 5,000 input characters | 1 credit | Input characters are the combined total number of characters in the `markdowns`, `markdown_urls`, `prompt`, and `schema` parameters. | | 1,000 output characters | 1 credit | Output characters are the number of characters in the schema returned by the API. | **Formula** Here is the formula used to calculate the credit cost: ```text theme={null} credits = (input characters ÷ 5,000) + (output characters ÷ 1,000) Result is rounded up to the nearest tenth decimal place. ``` #### Sample Cost for the Build Extract Schema API Let's say you have an existing schema built from two bank statements. You want to add and remove some fields in the schema, so you call the API to refine the schema. **Input Characters** The API includes the following parameters. | Parameter | Content | Characters | | :---------- | :----------------------------------------------------- | ---------: | | `markdowns` | The parsed output of the first bank statement | 9,116 | | `markdowns` | The parsed output of the second bank statement | 7,314 | | `schema` | The existing extraction schema that you want to refine | 9,089 | | `prompt` | Instructions to add and remove specific fields | 214 | | **Total** | | **25,733** | **Output Characters** Running the API call generates the following updated schema. | Content | Characters | | :------------- | ---------: | | Updated schema | 8,585 | **Calculate the Total** Now that you have your input and output character counts, you can calculate the credit cost. | Step | Calculation | Result | | :---------------------- | :-------------------------------------- | :----------- | | Input credits | 25,733 ÷ 5,000 | 5.1466 | | Output credits | 8,585 ÷ 1,000 | 8.585 | | Total (before rounding) | 5.1466 + 8.585 | 13.7316 | | Final cost | Round up to nearest tenth decimal place | 13.8 credits | ### Credit Costs for the Classify API This section explains credit use for this API: * [ADE Classify API](https://docs.landing.ai/api-reference/tools/ade-classify) The number of credits used to classify a document is based on the number of pages, and is rounded up to the nearest tenth decimal place. | Factor | Credit Cost | Notes | | :------- | :--------------- | :------------------------------------------------------------ | | Per page | 0.5 credits/page | Each page processed by the Classify API consumes 0.5 credits. | **Formula** Here is the formula used to calculate the credit cost: ```text theme={null} credits = number of pages × 0.5 Result is rounded up to the nearest tenth decimal place. ``` **Example:** A 10-page document costs 10 × 0.5 = **5.0 credits**. ### Credit Costs for the Section API This section explains credit use for this API: * [ API](https://docs.landing.ai/api-reference/tools/ade-section) The number of credits used to section a document is based on both the number of input characters and output characters, and is rounded up to the nearest tenth decimal place. | Factor | Credit Cost | Notes | | :---------------------- | :---------- | :--------------------------------------------------------------------------------------------------- | | 5,000 input characters | 1 credit | Input characters are the number of characters in the Markdown file passed to the API. | | 1,000 output characters | 1 credit | Output characters are the number of characters in the `table_of_contents` field returned by the API. | **Formula** Here is the formula used to calculate the credit cost: ```text theme={null} credits = (input characters ÷ 5,000) + (output characters ÷ 1,000) Result is rounded up to the nearest tenth decimal place. ``` ### Credit Costs for the Split API This section explains credit use for this API: * [ API](https://docs.landing.ai/api-reference/tools/ade-split) The number of credits used to split documents is based on the number of input characters, and is rounded up to the nearest tenth decimal place. | Factor | Credit Cost | Notes | | :--------------------- | :---------- | :-------------------------------------------------------------------------------------------- | | 5,000 input characters | 1 credit | Input characters are the number of characters in the Markdown file that is passed to the API. | **Formula** Here is the formula used to calculate the credit cost: ```text theme={null} credits = (input characters ÷ 5,000) Result is rounded up to the nearest tenth decimal place. ``` ## Overages **Explore Plan** If you're on the **Explore** plan, you can only use the credits you have purchased. If you run out of credits, you can purchase additional credits to continue using . **Team Plan (Subscription)** If you're on a **Team** plan (a subscription plan), you have a pool of extra credits to draw from if you exceed your allocation for a billing cycle. This pool can include: * **Transferred credits**: Credits from a previous plan that carried over when you upgraded (for example, unused Explore credits). * **Prepaid credits**: "Pay-As-You-Go" credits purchased in advance. Prepaid credits cost \$0.01 per credit (US) or \$0.013 per credit (EU) and expire 1 year from the purchase date. Credits that expire soonest are consumed first. For example, if you have 500 transferred Explore credits expiring in 2 months and 1,000 prepaid credits expiring in 10 months, the transferred credits are consumed first. ## Auto Recharge Use Auto Recharge to automatically purchase credits when your balance falls below a set threshold. Auto Recharge is available on the **Explore** and **Team** plans. Auto Recharge is enabled by default on **Team** plans. It is disabled by default on the **Explore** plan. ### Auto Recharge Best Practices Set your threshold high enough to cover your largest expected API call. If your threshold is too low, Auto Recharge may not trigger before your balance is exhausted, and the API call will return a `402` status (Payment Required). For example, if a typical job consumes 1,500 credits, set your threshold above 1,500 credits. ### How to Manage Auto Recharge 1. Log in to [https://va.landing.ai](https://va.landing.ai). 2. Go to the **Plan & Billing** page (to navigate there manually, click your profile icon at the bottom left corner of the page and click **Plan & Billing**). 3. If you're not currently in the correct organization, select your organization from the drop-down menu. 4. Click the **Plan** tab. 5. Click **Add Credits**. (If you're on the **Explore** plan, click **Buy Credits**.) 6. If Auto Recharge is not already enabled, turn on **Enable Auto Recharge**. 7. In the **When credit balance goes below** field, enter the threshold that will trigger Auto Recharge. For example, entering 5,000 means credits are purchased automatically when your balance reaches 5,000 or fewer. By default, this is 10% of the credits allocated for your billing cycle. 8. In the **Add to** field, enter the number of credits you want to purchase. By default, this is 100% of the credits allocated for your billing cycle. 9. Click **Next** and follow the on-screen prompts. ## Upgrade Team Plans You can upgrade your Team plan from one tier to another. For example, you can upgrade from the tier that includes 55K credits per month to the tier that includes 110K credits per month. If you upgrade your Team plan, the changes take effect immediately. ### What to Expect When You Upgrade **Your current billing cycle ends:** * Your current billing cycle ends on the upgrade date. **Your new billing cycle starts:** * A new billing cycle starts on the upgrade date. * You are charged for the new plan. * Your unused credits from the previous plan are transferred to the allocation for the new billing cycle and expire at the end of that cycle. These transferred credits can be transferred again if you upgrade before they expire. ### How to Upgrade Your Plan 1. Log in to [https://va.landing.ai/home](https://va.landing.ai/home). 2. Go to the **Plan & Billing** page (to navigate there manually, click your profile icon at the bottom left corner of the page and click **Plan & Billing**). 3. If you're not currently in the correct organization, select your organization from the drop-down menu. 4. Click the **Plan** tab. 5. Click **Manage** > **Change Plan**. 6. Select the plan you want and follow the on-screen prompts. ## Downgrade Team Plans You can downgrade your Team plan from one tier to another. If you downgrade your Team, the changes take effect at the end of the current billing cycle. If you have unused credits when you downgrade, those credits do not transfer to your new plan. You have until the end of your current billing cycle to use the remaining credits. ### How to Downgrade Your Plan 1. Log in to [https://va.landing.ai/home](https://va.landing.ai/home). 2. Go to the **Plan & Billing** page (to navigate there manually, click your profile icon at the bottom left corner of the page and click **Plan & Billing**). 3. If you're not currently in the correct organization, select your organization from the drop-down menu. 4. Click the **Plan** tab. 5. Click **Manage** > **Change Plan**. 6. Select the plan you want and follow the on-screen prompts. ## Credit Expiration
Credit Type Expiration
Free credits included in the Explore plan Expire 90 days after you create your account
"Pay-As-You-Go" Credits Purchased (includes credits purchased via Auto Recharge) Expire 1 year from the purchase date
Credits allocated as part of a subscription or Enterprise plan Credits are allocated at the start of each billing cycle (for subscription plans) or subscription term (for Enterprise plans). Unused credits expire at the end of that period.
## FAQs ### Plans, Credits, and Billing Credits are usage units that get consumed each time you process a document. Every document processing task—like parsing or field extraction—uses credits. The cost of a credit depends on your plan. Each plan progressively includes more credits per dollar. | Plan | Monthly | | :------------- | :----------------- | | **Explore** | 1 credit per 1¢ | | **Team** | 1.1 credits per 1¢ | | **Enterprise** | Contact us | Yes. When you sign up, you’re automatically on the **Explore** plan, which includes 1000 free credits. These credits are perfect for exploring the ADE document processing capabilities, building prototypes, and testing how ADE integrates with your workflow. Each ADE plan is designed to meet the volume and feature needs of a different use case. If you’re a solo developer, the **Explore** plan might be best fit for you. But if you’re processing a high number of documents each month and need zero data retention, then the **Team** plan might be better suited for you. If you start on the **Explore** plan and upgrade to another plan, unused **Explore** credits are transferred to your account and are consumed if you go over the allocated credits for your billing cycle. No, you do not need to put down a credit card to get started with ADE! When you sign up, you’re automatically on the **Explore** plan, which includes free credits and doesn’t require upfront payment. You are billed based on the number of credits consumed by your API requests, regardless of how you interact with ADE. Whether you call the API directly, use one of our libraries, or test in the Playground, all requests count toward your credit usage. Yes. Whenever an API operation results in partial credits, the cost is rounded up to the nearest tenth decimal place. For example, if a calculation results in 1.67 credits, the cost is rounded up to 1.7 credits. If you were on the original “pay-as-you-go plan” and added funds before the credit-based system was implemented, your funds have been converted to credits. The conversion is \$0.01 to 1 credit, which ensures that you retain your original pricing of parsing costing \$0.03 per page. If you were on an enterprise plan before the credit-based system was implemented, there are no changes to your plan. Credits are consumed based on the number of pages processed. When processing PDFs, a page (regardless of dimensions) is simply a page in the PDF. When processing images, each image counts as a page. All billing is securely processed through Stripe. For **Team** plans, you'll be charged at the beginning of each billing cycle and immediately receive your credits for that period. Yes, you can resubscribe to a Team plan when your organization is in "read-only" mode. To resubscribe, click the **Re-subscribe** button that displays at the top of the page. Or, resubscribe on the **Plan & Billing** page: 1. Log in to [https://va.landing.ai/](https://va.landing.ai/). 2. Go to the **Plan & Billing** page (to navigate there manually, click your profile icon at the bottom left corner of the page and click **Plan & Billing**). 3. If you're not currently in the correct organization, select your organization from the drop-down menu. 4. Click the **Re-subscribe** button and follow the on-screen prompts. Yes, you can cancel your subscription to a Team plan. If you cancel your subscription, the cancellation takes effect at the end of the current billing cycle. Once the cancellation is in effect, the organization is in "read-only" mode. To cancel a subscription: 1. Log in to [https://va.landing.ai/](https://va.landing.ai/). 2. Go to the **Plan & Billing** page (to navigate there manually, click your profile icon at the bottom left corner of the page and click **Plan & Billing**). 3. If you're not currently in the correct organization, select your organization from the drop-down menu. 4. Go to **Manage** > **Cancel Subscription** and follow the on-screen prompts. If you are on a Team plan, you can update your billing information, including your credit card and billing address, directly in the user interface. To do this: 1. Log in to [https://va.landing.ai/](https://va.landing.ai/). 2. Go to the **Plan & Billing** page (to navigate there manually, click your profile icon at the bottom left corner of the page and click **Plan & Billing**). 3. If you're not currently in the correct organization, select your organization from the drop-down menu. 4. Click the **Billing** tab. 5. To update your credit card information, click **Update Card**. Update your information in the pop-up window and click **Save card**. Update Card 6. To update your billing address, click **Manage**. This opens your billing information in Stripe. Manage Billing 7. Click **Update Information**. Update your billing information and click **Save**. Update Billing Information If you are on an Enterprise plan and need to update your payment information or billing address, contact [support@landing.ai](mailto:support@landing.ai). No. does not collect or remit sales tax, VAT, or GST on invoices for . Our invoices are issued by a U.S. entity. If your local laws require tax on digital services, you are responsible for assessing, reporting, and paying any such taxes to your local authority (including reverse-charge where applicable). If you have questions about your obligations, please consult your tax advisor or local tax authority. Not at this time. We do not display customer tax IDs on invoices. is not currently registered to collect VAT/GST. Prices are quoted exclusive of taxes, and customers remain responsible for any taxes due locally (e.g., under reverse-charge rules). No. For compliance and audit integrity, issued invoices cannot be modified or reissued. Please make sure your billing profile is correct before your next invoice is created. Pricing plans change over time. This page documents only currently available plans. If you are on a plan that is no longer offered, you remain on your current plan. For questions, contact [support@landing.ai](mailto:support@landing.ai). ### General Questions about Agentic Document Extraction Yes, ADE can parse text from multiple languages. To see the full list of supported languages, go to [Supported Languages](./ade-languages). Yes, ADE can parse handwritten text, including handwritten text in different languages. ADE supports HIPAA compliance on its Team and Enterprise plans. To ensure HIPAA compliance, you must: * Enable the Zero Data Retention (ZDR) option. * Have a signed Business Associate Agreement (BAA) in place with LandingAI. To initiate the BAA process, submit your request through the form on the [Organization Settings](https://va.landing.ai/settings/organization/general) page (available after ZDR is enabled). To learn more about our HIPAA compliance, check out our [Trust Center](https://trust.landing.ai/) and [Security & Compliance](https://landing.ai/security-at-landingai) page. ADE supports zero data retention (ZDR) on its **Team** and **Enterprise** plans. To learn more about the ZDR option, check out [Zero Data Retention](./zdr). LandingAI is SOC 2 Type II compliant. To learn more, check out our [Trust Center](https://trust.landing.ai/) and [Security & Compliance](https://landing.ai/security-at-landingai) page. # Python Library Source: https://docs.landing.ai/ade/ade-python The library is a lightweight Python library you can use for parsing documents, classifying pages, extracting data, generating tables of contents, and splitting documents into sub-documents. The library is automatically generated from our API specification, ensuring you have access to the latest endpoints and parameters. ## Install the Library ```bash theme={null} pip install landingai-ade ``` ## Set the API Key as an Environment Variable To use the library, first [generate an API key](https://va.landing.ai/my/settings/api-key). Save the key to a `.zshrc` file or another secure location on your computer. Then export the key as an environment variable. ```bash theme={null} export VISION_AGENT_API_KEY= ``` For more information about API keys and alternate methods for setting the API key, go to [API Key](./agentic-api-key). ## Use with EU Endpoints By default, the library uses the US endpoints. If your API key is from the EU endpoint, set the `environment` parameter to `eu` when initializing the client. ```python theme={null} from pathlib import Path from landingai_ade import LandingAIADE client = LandingAIADE( environment="eu", ) # ... rest of your code ``` For more information about using in the EU, go to [European Union (EU)](./ade-eu). ## Parse: Getting Started The `parse` function converts documents into structured markdown with chunk and grounding metadata. Use these examples as guides to get started with parsing with the library. ### Parse Local Files Use the `document` parameter to parse files from your filesystem. Pass the file path as a `Path` object. ```python theme={null} from pathlib import Path from landingai_ade import LandingAIADE client = LandingAIADE() # Replace with your file path response = client.parse( document=Path("/path/to/file/document"), model="dpt-2-latest", save_to="output_folder" # optional: saves as {input_file}_parse_output.json ) print(response.chunks) # Save Markdown output (useful if you plan to run extract on the Markdown) with open("output.md", "w", encoding="utf-8") as f: f.write(response.markdown) ``` ### Parse Remote URLs Use the `document_url` parameter to parse files from remote URLs (http, https, ftp, ftps). ```python theme={null} from landingai_ade import LandingAIADE client = LandingAIADE() # Parse a remote file response = client.parse( document_url="https://example.com/document.pdf", model="dpt-2-latest" ) print(response.chunks) # Save Markdown output (useful if you plan to run extract on the Markdown) with open("output.md", "w", encoding="utf-8") as f: f.write(response.markdown) ``` ### Set Parameters The `parse` function accepts optional parameters to customize parsing behavior. To see all available parameters, go to [ADE Parse API](https://docs.landing.ai/api-reference/tools/ade-parse). Pass these parameters directly to the `parse()` function. ```python theme={null} from pathlib import Path from landingai_ade import LandingAIADE client = LandingAIADE() response = client.parse( document=Path("/path/to/document.pdf"), model="dpt-2-latest", split="page" ) ``` ### Parse Jobs The `parse_jobs` function enables you to asynchronously parse documents that are up to 1,000 pages or 1 GB. For more information about parse jobs, go to [Parse Large Files (Parse Jobs)](./ade-parse-async). Here is the basic workflow for working with parse jobs: 1. Start a parse job. 2. Copy the `job_id` in the response. 3. Get the results from the parsing job with the `job_id`. This script contains the full workflow: ```python [expandable] theme={null} import time from pathlib import Path from landingai_ade import LandingAIADE client = LandingAIADE() ## Step 1: Create a parse job job = client.parse_jobs.create( document=Path("/path/to/file/document"), model="dpt-2-latest" ) job_id = job.job_id print(f"Job {job_id} created.") # Step 2: Get the parsing results while True: response = client.parse_jobs.get(job_id) if response.status == "completed": print(f"Job {job_id} completed.") break print(f"Job {job_id}: {response.status} ({response.progress * 100:.0f}% complete)") time.sleep(5) # Step 3: Access the parsed data print("Global markdown:", response.data.markdown[:200] + "...") print(f"Number of chunks: {len(response.data.chunks)}") # Save Markdown output (useful if you plan to run extract on the Markdown) with open("output.md", "w", encoding="utf-8") as f: f.write(response.data.markdown) ``` #### List Parse Jobs To list all async parse jobs associated with your API key, run this code: ```python theme={null} from pathlib import Path from landingai_ade import LandingAIADE client = LandingAIADE() # List all jobs response = client.parse_jobs.list() for job in response.jobs: print(f"Job {job.job_id}: {job.status}") ``` ### Parse Output The `parse` function returns a `ParseResponse` object with the following fields: * **`chunks`**: List of `Chunk` objects, one for each parsed region * **`markdown`**: Complete Markdown representation of the document * **`metadata`**: Processing information (credit usage, duration, filename, job ID, page count, version) * **`splits`**: List of `Split` objects organizing chunks by page or section * **`grounding`**: Dictionary mapping chunk IDs to detailed grounding information For detailed information about the response structure, chunks, grounding, and bounding box coordinates, go to [JSON Response](./ade-json-response). #### Common Use Cases for ParseResponse Fields **Access all text chunks:** ```python theme={null} for chunk in response.chunks: if chunk.type == 'text': print(f"Chunk {chunk.id}: {chunk.markdown}") ``` **Filter chunks by page:** ```python theme={null} page_0_chunks = [chunk for chunk in response.chunks if chunk.grounding.page == 0] ``` **Get chunk locations:** ```python theme={null} for chunk in response.chunks: box = chunk.grounding.box print(f"Chunk at page {chunk.grounding.page}: ({box.left}, {box.top}, {box.right}, {box.bottom})") ``` **Access detailed chunk types from grounding dictionary:** ```python theme={null} for chunk_id, grounding in response.grounding.items(): print(f"Chunk {chunk_id} has type: {grounding.type}") ``` ## Extract: Getting Started The `extract` function extracts structured data from Markdown content using extraction schemas. Use these examples as guides to get started with extracting with the library. **Pass Markdown Content** The library supports a few methods for passing the Markdown content for extraction: * Extract data directly from the [parse response](#extract-from-parse-response) * Extract data from a local [Markdown file](#extract-from-markdown-files) * Extract data from a Markdown file at a remote URL: `markdown_url="https://example.com/file.md"` **Pass the Extraction Schema** The library supports a few methods for passing the extraction schema: * [Pydantic models](#extraction-with-pydantic) * [JSON schema (inline)](#extraction-with-json-schema-inline) * [JSON schema file](#extraction-with-json-schema-file) ### Extract from Parse Response After parsing a document, you can pass the markdown string directly from the `ParseResponse` to the extract function without saving it to a file. ```python theme={null} import json from pathlib import Path from landingai_ade import LandingAIADE # Define your extraction schema schema_dict = { "type": "object", "properties": { "employee_name": { "type": "string", "description": "The employee's full name" } } } client = LandingAIADE() schema_json = json.dumps(schema_dict) # Parse the document parse_response = client.parse( document=Path("/path/to/document.pdf"), model="dpt-2-latest" ) # Extract data using the markdown string from parse response extract_response = client.extract( schema=schema_json, markdown=parse_response.markdown, # Pass markdown string directly model="extract-latest", save_to="output_folder" # optional: saves as {input_file}_extract_output.json ) # Access the extracted data print(extract_response.extraction) ``` ### Extract from Markdown Files If you already have a Markdown file (from a previous parsing operation), you can extract data directly from it. Use the `markdown` parameter for local markdown files or `markdown_url` for remote markdown files. ```python [expandable] theme={null} import json from pathlib import Path from landingai_ade import LandingAIADE # Define your extraction schema schema_dict = { "type": "object", "properties": { "employee_name": { "type": "string", "description": "The employee's full name" }, "employee_ssn": { "type": "string", "description": "The employee's Social Security Number" }, "gross_pay": { "type": "number", "description": "The gross pay amount" } } } client = LandingAIADE() schema_json = json.dumps(schema_dict) # Extract from a local markdown file extract_response = client.extract( schema=schema_json, markdown=Path("/path/to/output.md"), model="extract-latest" ) # Or extract from a remote markdown file extract_response = client.extract( schema=schema_json, markdown_url="https://example.com/document.md", model="extract-latest" ) # Access the extracted data print(extract_response.extraction) ``` ### Extraction with Pydantic Use Pydantic models to define your extraction schema in a type-safe way. The library provides a helper function to convert Pydantic models to JSON schemas. ```python [expandable] theme={null} from pathlib import Path from landingai_ade import LandingAIADE from landingai_ade.lib import pydantic_to_json_schema from pydantic import BaseModel, Field # Define your extraction schema as a Pydantic model class PayStubData(BaseModel): employee_name: str = Field(description="The employee's full name") employee_ssn: str = Field(description="The employee's Social Security Number") gross_pay: float = Field(description="The gross pay amount") # Initialize the client client = LandingAIADE() # First, parse the document to get markdown parse_response = client.parse( document=Path("/path/to/pay-stub.pdf"), model="dpt-2-latest" ) # Convert Pydantic model to JSON schema schema = pydantic_to_json_schema(PayStubData) # Extract structured data using the schema extract_response = client.extract( schema=schema, markdown=parse_response.markdown, model="extract-latest" ) # Access the extracted data print(extract_response.extraction) # Access extraction metadata to see which chunks were referenced print(extract_response.extraction_metadata) ``` ### Extraction with JSON Schema (Inline) Define your extraction schema directly as a JSON string in your script. ```python [expandable] theme={null} import json from pathlib import Path from landingai_ade import LandingAIADE # Define your extraction schema as a dictionary schema_dict = { "type": "object", "properties": { "employee_name": { "type": "string", "description": "The employee's full name" }, "employee_ssn": { "type": "string", "description": "The employee's Social Security Number" }, "gross_pay": { "type": "number", "description": "The gross pay amount" } } } # Initialize the client client = LandingAIADE() # First, parse the document to get markdown parse_response = client.parse( document=Path("/path/to/pay-stub.pdf"), model="dpt-2-latest" ) # Convert schema dictionary to JSON string schema_json = json.dumps(schema_dict) # Extract structured data using the schema extract_response = client.extract( schema=schema_json, markdown=parse_response.markdown, model="extract-latest" ) # Access the extracted data print(extract_response.extraction) # Access extraction metadata to see which chunks were referenced print(extract_response.extraction_metadata) ``` ### Extraction with JSON Schema File Load your extraction schema from a separate JSON file for better organization and reusability. For example, here is the `pay_stub_schema.json` file: ```json theme={null} { "type": "object", "properties": { "employee_name": { "type": "string", "description": "The employee's full name" }, "employee_ssn": { "type": "string", "description": "The employee's Social Security Number" }, "gross_pay": { "type": "number", "description": "The gross pay amount" } } } ``` You can pass the JSON file defined above in the following script: ```python [expandable] theme={null} import json from pathlib import Path from landingai_ade import LandingAIADE # Initialize the client client = LandingAIADE() # First, parse the document to get markdown parse_response = client.parse( document=Path("/path/to/pay-stub.pdf"), model="dpt-2-latest" ) # Load schema from JSON file with open("pay_stub_schema.json", "r") as f: schema_json = f.read() # Extract structured data using the schema extract_response = client.extract( schema=schema_json, markdown=parse_response.markdown, model="extract-latest" ) # Access the extracted data print(extract_response.extraction) # Access extraction metadata to see which chunks were referenced print(extract_response.extraction_metadata) ``` ### Extract Nested Subfields Define nested Pydantic models to extract hierarchical data from documents. This approach organizes related information under meaningful section names. Define nested models before the main extraction schema. Otherwise, the nested model classes will not be defined when referenced. For example, to extract data from the **Patient Details** and **Emergency Contact Information** sections in this Medical Form, define separate models for each section, then combine them in a main model. ```python [expandable] theme={null} from pathlib import Path from pydantic import BaseModel, Field from landingai_ade import LandingAIADE from landingai_ade.lib import pydantic_to_json_schema # Define a nested model for patient-specific information class PatientDetails(BaseModel): patient_name: str = Field( ..., description='Full name of the patient.', title='Patient Name' ) date: str = Field( ..., description='Date the patient information form was filled out.', title='Date', ) # Define a nested model for emergency contact details class EmergencyContactInformation(BaseModel): emergency_contact_name: str = Field( ..., description='Full name of the emergency contact person.', title='Emergency Contact Name', ) relationship_to_patient: str = Field( ..., description='Relationship of the emergency contact to the patient.', title='Relationship to Patient', ) primary_phone_number: str = Field( ..., description='Primary phone number of the emergency contact.', title='Primary Phone Number', ) secondary_phone_number: str = Field( ..., description='Secondary phone number of the emergency contact.', title='Secondary Phone Number', ) address: str = Field( ..., description='Full address of the emergency contact.', title='Address' ) # Define the main extraction schema that combines all the nested models class PatientAndEmergencyContactInformationExtractionSchema(BaseModel): # Nested field containing patient details patient_details: PatientDetails = Field( ..., description='Information about the patient as provided in the form.', title='Patient Details', ) # Nested field containing emergency contact information emergency_contact_information: EmergencyContactInformation = Field( ..., description='Details of the emergency contact person for the patient.', title='Emergency Contact Information', ) # Initialize the client client = LandingAIADE() # Parse the document to get markdown parse_response = client.parse( document=Path("/path/to/medical-form.pdf"), model="dpt-2-latest" ) # Convert Pydantic model to JSON schema schema = pydantic_to_json_schema(PatientAndEmergencyContactInformationExtractionSchema) # Extract structured data using the schema extract_response = client.extract( schema=schema, markdown=parse_response.markdown, model="extract-latest" ) # Display the extracted structured data print(extract_response.extraction) ``` ### Extract Variable-Length Data with List Objects Use python `List` type inside of a Pydantic BaseModel to extract repeatable data structures when you don't know how many items will appear. Common examples include line items in invoices, transaction records, or contact information for multiple people. For example, to extract variable-length wire instructions and line items from this Wire Transfer Form, use `List[DescriptionItem]` for line items and `List[WireInstruction]` for wire transfer details. ```python [expandable] theme={null} from typing import List from pathlib import Path from pydantic import BaseModel, Field from landingai_ade import LandingAIADE from landingai_ade.lib import pydantic_to_json_schema # Nested models for list fields class DescriptionItem(BaseModel): description: str = Field(description="Invoice or Bill Description") amount: float = Field(description="Invoice or Bill Amount") class WireInstruction(BaseModel): bank_name: str = Field(description="Bank name") bank_address: str = Field(description="Bank address") bank_account_no: str = Field(description="Bank account number") swift_code: str = Field(description="SWIFT code") aba_routing: str = Field(description="ABA routing number") ach_routing: str = Field(description="ACH routing number") # Invoice model containing list object fields class Invoice(BaseModel): description_or_particular: List[DescriptionItem] = Field( description="List of invoice line items (description and amount)" ) wire_instructions: List[WireInstruction] = Field( description="Wire transfer instructions" ) # Main extraction model class ExtractedInvoiceFields(BaseModel): invoice: Invoice = Field(description="Invoice list-type fields") # Initialize the client client = LandingAIADE() # Parse the document to get markdown parse_response = client.parse( document=Path("/path/to/wire-transfer.pdf"), model="dpt-2-latest" ) # Convert Pydantic model to JSON schema schema = pydantic_to_json_schema(ExtractedInvoiceFields) # Extract structured data using the schema extract_response = client.extract( schema=schema, markdown=parse_response.markdown, model="extract-latest" ) # Display the extracted data print(extract_response.extraction) ``` ### Extraction Output The `extract` function returns an `ExtractResponse` object with the following fields: * **`extraction`**: The extracted key-value pairs as defined by your schema * **`extraction_metadata`**: Metadata showing which chunks were referenced for each extracted field * **`metadata`**: Processing information including credit usage, duration, filename, job ID, version, and schema validation errors For detailed information about the response structure, extraction metadata, and chunk references, go to [Extract JSON Response](./ade-extract-response). ## Classify: Getting Started The `classify` function classifies each page in a document by type. Provide your document and a list of classes, and the API assigns a class to each page. Use these examples as guides to get started with classifying with the library. ### Classify Local Files Use the `document` parameter to classify files from your filesystem. Pass the file path as a `Path` object. ```python theme={null} import json from pathlib import Path from landingai_ade import LandingAIADE client = LandingAIADE() classes = [ {"class": "invoice", "description": "A commercial bill with line items, totals, and payment terms"}, {"class": "bank_statement", "description": "A monthly summary of account transactions"}, {"class": "pay_stub"} ] response = client.classify( classes=json.dumps(classes), document=Path("/path/to/document.pdf"), model="classify-latest" ) for result in response.classification: print(f"Page {result.page}: {result.class_}") ``` ### Classify Remote URLs Use the `document_url` parameter to classify files from remote URLs (http, https, ftp, ftps). ```python theme={null} import json from landingai_ade import LandingAIADE client = LandingAIADE() classes = [ {"class": "invoice", "description": "A commercial bill with line items, totals, and payment terms"}, {"class": "bank_statement", "description": "A monthly summary of account transactions"} ] response = client.classify( classes=json.dumps(classes), document_url="https://example.com/document.pdf", model="classify-latest" ) for result in response.classification: print(f"Page {result.page}: {result.class_}") ``` ### Set Parameters The `classify` function accepts optional parameters to customize classification behavior. To see all available parameters, go to [ API](https://docs.landing.ai/api-reference/tools/ade-classify). ```python theme={null} import json from pathlib import Path from landingai_ade import LandingAIADE client = LandingAIADE() classes = [ {"class": "invoice"}, {"class": "bank_statement"} ] response = client.classify( classes=json.dumps(classes), document=Path("/path/to/document.pdf"), model="classify-latest" ) ``` ### Classify Output The `classify` function returns a `ClassifyResponse` object with the following fields: * **`classification`**: List of `Classification` objects, one per page, each containing: * **`class_`**: The predicted class label, or `'unknown'` if the page could not be classified. Note: `class_` is used instead of `class` because `class` is a reserved keyword in Python. * **`page`**: The zero-indexed page number * **`reason`**: A brief explanation of the classification (for debugging) * **`suggested_class`**: A proposed class when the prediction is `'unknown'` * **`metadata`**: Processing information (credit usage, duration, filename, job ID, page count, version) For detailed information about the response structure, see [JSON Response for Classification](./ade-classify-response). #### Common Use Cases for ClassifyResponse Fields **Get classification for each page:** ```python theme={null} for result in response.classification: print(f"Page {result.page}: {result.class_}") ``` **Filter pages by class:** ```python theme={null} invoices = [r for r in response.classification if r.class_ == "invoice"] print(f"Found {len(invoices)} invoice pages") ``` **Handle pages that could not be classified:** ```python theme={null} unknown = [r for r in response.classification if r.class_ == "unknown"] for r in unknown: print(f"Page {r.page}: suggested class is {r.suggested_class}") ``` ## Section: Getting Started The `section` function analyzes a parsed document and generates a hierarchical table of contents. Use these examples as guides to get started with sectioning with the library. **Pass Markdown Content** The library supports a few methods for passing the Markdown content for sectioning: * Section data directly from the [parse response](#section-from-parse-response) * Section data from a local [Markdown file](#section-from-markdown-files) * Section data from a Markdown file at a remote URL: `markdown_url="https://example.com/file.md"` ### Section from Parse Response After parsing a document, you can pass the Markdown string directly from the `ParseResponse` to the section function without saving it to a file. ```python theme={null} from pathlib import Path from landingai_ade import LandingAIADE client = LandingAIADE() # Parse the document parse_response = client.parse( document=Path("/path/to/document.pdf"), model="dpt-2-latest" ) # Section using the Markdown string from parse response section_response = client.section( markdown=parse_response.markdown, # Pass Markdown string directly model="section-latest" ) # Access the table of contents for entry in section_response.table_of_contents: indent = " " * (entry.level - 1) print(f"{indent}{entry.section_number}. {entry.title}") ``` ### Section from Markdown Files If you already have a Markdown file (from a previous parsing operation), you can section it directly. Use the `markdown` parameter for local Markdown files or `markdown_url` for remote Markdown files. ```python theme={null} from pathlib import Path from landingai_ade import LandingAIADE client = LandingAIADE() # Section from a local Markdown file section_response = client.section( markdown=Path("/path/to/parsed_output.md"), model="section-latest" ) # Or section from a remote Markdown file section_response = client.section( markdown_url="https://example.com/document.md", model="section-latest" ) # Access the table of contents for entry in section_response.table_of_contents: indent = " " * (entry.level - 1) print(f"{indent}{entry.section_number}. {entry.title}") ``` ### Set Parameters The `section` function accepts optional parameters to customize sectioning behavior. To see all available parameters, go to [ API](https://docs.landing.ai/api-reference/tools/ade-section). ```python theme={null} from pathlib import Path from landingai_ade import LandingAIADE client = LandingAIADE() section_response = client.section( markdown=Path("/path/to/parsed_output.md"), guidelines="Treat each numbered article as a top-level section", model="section-latest" ) ``` ### Section Output The `section` function returns a `SectionResponse` object with the following fields: * **`table_of_contents`**: List of `SectionTOCEntry` objects, each containing: * **`title`**: The generated section heading text * **`level`**: The hierarchy depth (1 = top-level, 2 = subsection, 3 = sub-subsection, and so on) * **`section_number`**: The hierarchical number (for example, `"1"`, `"1.2"`, `"1.2.3"`) * **`start_reference`**: The chunk ID where this section begins, corresponding to a `chunks[].id` value from the parse response * **`table_of_contents_md`**: Markdown-formatted TOC string with anchor links * **`metadata`**: Processing information (credit usage, duration, filename, job ID, version) For detailed information about the response structure, see [JSON Response for Sectioning](./ade-section-response). ## Split: Getting Started The `split` function classifies and separates a parsed document into multiple sub-documents based on Split Rules you define. Use these examples as guides to get started with splitting with the library. **Pass Markdown Content** The library supports a few methods for passing the Markdown content for splitting: * Split data directly from the [parse response](#split-from-parse-response) * Split data from a local [Markdown file](#split-from-markdown-files) * Split data from a Markdown file at a remote URL: `markdown_url="https://example.com/file.md"` **Define Split Rules** Split Rules define how the API classifies and separates your document. Each Split Rule consists of: * `name`: The Split Type name (required) * `description`: Additional context about what this Split Type represents (optional) * `identifier`: A field that makes each instance unique, used to create separate splits (optional) For more information about Split Rules, see [Split Rules](./ade-split#split-rules). ### Split from Parse Response After parsing a document, you can pass the Markdown string directly from the `ParseResponse` to the split function without saving it to a file. ```python theme={null} import json from pathlib import Path from landingai_ade import LandingAIADE client = LandingAIADE() # Parse the document parse_response = client.parse( document=Path("/path/to/document.pdf"), model="dpt-2-latest" ) # Define Split Rules split_class = [ { "name": "Bank Statement", "description": "Document from a bank that summarizes all account activity over a period of time." }, { "name": "Pay Stub", "description": "Document that details an employee's earnings, deductions, and net pay for a specific pay period.", "identifier": "Pay Stub Date" } ] # Split using the Markdown string from parse response split_response = client.split( split_class=json.dumps(split_class), markdown=parse_response.markdown, # Pass Markdown string directly model="split-latest", save_to="output_folder" # optional: saves as {input_file}_split_output.json ) # Access the splits for split in split_response.splits: print(f"Classification: {split.classification}") print(f"Identifier: {split.identifier}") print(f"Pages: {split.pages}") ``` ### Split from Markdown Files If you already have a Markdown file (from a previous parsing operation), you can split it directly. Use the `markdown` parameter for local Markdown files or `markdown_url` for remote Markdown files. ```python theme={null} import json from pathlib import Path from landingai_ade import LandingAIADE client = LandingAIADE() # Define Split Rules split_class = [ { "name": "Invoice", "description": "A document requesting payment for goods or services.", "identifier": "Invoice Number" }, { "name": "Receipt", "description": "A document acknowledging that payment has been received." } ] # Split from a local Markdown file split_response = client.split( split_class=json.dumps(split_class), markdown=Path("/path/to/parsed_output.md"), model="split-latest" ) # Or split from a remote Markdown file split_response = client.split( split_class=json.dumps(split_class), markdown_url="https://example.com/document.md", model="split-latest" ) # Access the splits for split in split_response.splits: print(f"Classification: {split.classification}") if split.identifier: print(f"Identifier: {split.identifier}") print(f"Number of pages: {len(split.pages)}") print(f"Markdown content: {split.markdowns[0][:100]}...") ``` ### Set Parameters The `split` function accepts optional parameters to customize split behavior. To see all available parameters, go to [ADE Split API](https://docs.landing.ai/api-reference/tools/ade-split). ```python theme={null} import json from pathlib import Path from landingai_ade import LandingAIADE client = LandingAIADE() split_response = client.split( split_class=json.dumps([ {"name": "Section A", "description": "Introduction section"}, {"name": "Section B", "description": "Main content section"} ]), markdown=Path("/path/to/parsed_output.md"), model="split-latest" ) ``` ### Split Output The `split` function returns a `SplitResponse` object with the following fields: * **`splits`**: List of `Split` objects, each containing: * **`classification`**: The Split Type name assigned to this sub-document * **`identifier`**: The unique identifier value (or `None` if no identifier was specified) * **`pages`**: List of zero-indexed page numbers that belong to this split * **`markdowns`**: List of Markdown content strings, one for each page * **`metadata`**: Processing information (credit usage, duration, filename, job ID, page count, version) For detailed information about the response structure, see [JSON Response for Splitting](./ade-split-response). #### Common Use Cases for SplitResponse Fields **Access all splits by classification:** ```python theme={null} for split in split_response.splits: print(f"Split Type: {split.classification}") print(f"Pages included: {split.pages}") ``` **Filter splits by classification:** ```python theme={null} invoices = [split for split in split_response.splits if split.classification == "Invoice"] print(f"Found {len(invoices)} invoices") ``` **Access Markdown content for each split:** ```python theme={null} for split in split_response.splits: print(f"Classification: {split.classification}") for i, markdown in enumerate(split.markdowns): print(f" Page {split.pages[i]} Markdown: {markdown[:100]}...") ``` **Group splits by identifier:** ```python theme={null} from collections import defaultdict splits_by_id = defaultdict(list) for split in split_response.splits: if split.identifier: splits_by_id[split.identifier].append(split) for identifier, splits in splits_by_id.items(): print(f"Identifier '{identifier}': {len(splits)} split(s)") ``` # Quickstart Source: https://docs.landing.ai/ade/ade-quickstart The code below guides you through parsing a sample document and extracting specific fields from it. Call the APIs directly with the cURL commands, or use our Python and TypeScript libraries. ## Prerequisites * An [account](https://va.landing.ai/) * An [API key](./agentic-api-key) ## Call the API Run the code below to parse a sample bank statement with the [ API](https://docs.landing.ai/api-reference/tools/ade-parse). Replace `YOUR_API_KEY` with your API key. ```bash theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/parse' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'document_url=https://docs.landing.ai/examples/bank-statement.pdf' \ -F 'model=dpt-2-latest' ``` After parsing the document, save the `markdown` response to a file named `markdown-bank-statement.md`. Now that the parsed output is in a Markdown file, run the code below to extract the **Account Holder Name** and **Number of Deposits** fields using the [ API](https://docs.landing.ai/api-reference/tools/ade-extract). Replace `YOUR_API_KEY` with your API key, and replace `markdown-bank-statement.md` with the path to your Markdown file. ```bash theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/extract' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'schema={"type": "object", "properties": {"name": {"type": "string", "description": "Account holder name"}, "number_deposits": {"type": "integer", "description": "The number of deposits"}}}' \ -F 'markdown=@markdown-bank-statement.md' \ -F 'model=extract-latest' ``` The API response includes the extracted fields. ```json expandable theme={null} { "extraction": { "name": "Sarah J. Mitchell", "number_deposits": 5 }, "extraction_metadata": { "name": { "value": "Sarah J. Mitchell", "references": [ "d10a90d8-4bac-4dec-b9b2-6ead7345b798" ] }, "number_deposits": { "value": 5, "references": [ "a0294b27-61f7-40d7-ba26-1778f3b16248" ] } }, "metadata": { "filename": "markdown-bank-statement.md", "org_id": null, "duration_ms": 3440, "credit_usage": 25.147199999999998, "job_id": "68c00faf73cc46a6a5fcefdef565aa56", "version": "extract-20260314", "schema_violation_error": null, "fallback_model_version": null } } ``` Get your [API key](https://va.landing.ai/settings/api-key) and set it as an environment variable. For more details, go to [API Key](./agentic-api-key). ```bash theme={null} export VISION_AGENT_API_KEY= ``` Install the library. ```bash theme={null} pip install landingai-ade ``` This example shows the complete workflow of parsing a sample bank statement and extracting the **Account Holder Name** and **Number of Deposits** fields. If you only need to parse documents, use just the parsing section. Save the code block below as `quickstart.py`. ```python expandable theme={null} import json from landingai_ade import LandingAIADE # Initialize the client client = LandingAIADE() ##### Parse the document ##### # Parse the document parse_response = client.parse( document_url="https://docs.landing.ai/examples/bank-statement.pdf", model="dpt-2-latest" ) # Print the parsing results print("Parse Markdown:") print(parse_response.markdown) print("Parse Chunks:") print(parse_response.chunks) print("Grounding Information:") print(parse_response.grounding) print("Parse API Metadata:") print(parse_response.metadata) # Save Markdown to a file (optional, useful for later reference) if parse_response.markdown: with open('markdown-bank-statement.md', 'w', encoding='utf-8') as f: f.write(parse_response.markdown) print("\nMarkdown content saved to a Markdown file.") else: print("No 'markdown' field found in the response") ##### Extract fields from the parsed document ##### # Define your extraction schema schema_dict = { "type": "object", "properties": { "name": { "type": "string", "description": "Account holder name" }, "number_deposits": { "type": "integer", "description": "The number of deposits" } } } # Convert schema dictionary to JSON string schema_json = json.dumps(schema_dict) # Extract fields using the parsed markdown extract_response = client.extract( schema=schema_json, markdown=parse_response.markdown, # Use the markdown from the parse response model="extract-latest" ) # Print the extracted fields print("\nExtracted fields:") print(extract_response.extraction) ``` Run the Python file you created. ```bash theme={null} python quickstart.py ``` The script outputs the parsing results (markdown, chunks, grounding information, and metadata), then displays the extracted fields: ```json theme={null} Extracted fields: {'name': 'Sarah J. Mitchell', 'number_deposits': 5} ``` Get your [API key](https://va.landing.ai/settings/api-key) and set it as an environment variable. For more details, go to [API Key](./agentic-api-key). ```bash theme={null} export VISION_AGENT_API_KEY= ``` Install the library. ```bash theme={null} npm install landingai-ade ``` This example shows the complete workflow of parsing a sample bank statement and extracting the **Account Holder Name** and **Number of Deposits** fields. If you only need to parse documents, use just the parsing section. Save the code block below as `quickstart.ts`. ```typescript expandable theme={null} import LandingAIADE, { toFile } from "landingai-ade"; import fs from "fs"; const client = new LandingAIADE(); ///// Parse the document ///// // Parse the document const parseResponse = await client.parse({ document: await fetch("https://docs.landing.ai/examples/bank-statement.pdf"), model: "dpt-2-latest" }); // Print the parsing results console.log("Parse Markdown:"); console.log(parseResponse.markdown); console.log("Parse Chunks:"); console.log(parseResponse.chunks); console.log("Grounding Information:"); console.log(parseResponse.grounding); console.log("Parse API Metadata:"); console.log(parseResponse.metadata); // Save Markdown to a file (optional, useful for later reference) if (parseResponse.markdown) { fs.writeFileSync("markdown-bank-statement.md", parseResponse.markdown, "utf-8"); console.log("\nMarkdown content saved to a Markdown file."); } else { console.log("No 'markdown' field found in the response"); } ///// Extract fields from the parsed document ///// // Define your extraction schema const schemaDict = { type: "object", properties: { name: { type: "string", description: "Account holder name" }, number_deposits: { type: "integer", description: "The number of deposits" } }, } }; const schemaJson = JSON.stringify(schemaDict); // Extract fields using the parsed markdown const extractResponse = await client.extract({ schema: schemaJson, markdown: await toFile(Buffer.from(parseResponse.markdown), "document.md"), // Use the markdown from the parse response model: "extract-latest" }); // Print the extracted fields console.log("\nExtracted fields:"); console.log(extractResponse.extraction); ``` Run the TypeScript file you created. ```bash theme={null} npx tsx quickstart.ts ``` The script outputs the parsing results (markdown, chunks, grounding information, and metadata), then displays the extracted fields: ```json theme={null} Extracted fields: { name: 'Sarah J. Mitchell', number_deposits: 5 } ``` ## Next Steps: Develop and Scale Choose how you want to integrate into your workflow. Call the API directly for maximum flexibility, or use our Python or TypeScript libraries for faster development. Call the API directly for language flexibility and advanced customization. Use our Python library to build custom scripts. Use our TypeScript library to build custom scripts. ## Sample Scripts & Projects Use these sample scripts and projects for real-world use cases as templates to develop your own custom document processing solutions. Explore end-to-end examples for common document processing workflows. Browse code samples with examples for the Python and TypeScript libraries. # Rate Limits Source: https://docs.landing.ai/ade/ade-rate-limits Rate limits help ensure that the API stays fast and available for everyone. Rate limits prevent any single user or accidental spike in traffic from slowing things down for others. ## Maximum Pages per Hour Rate limits apply per pricing plan and are the same for both ADE Parse and ADE Parse Jobs endpoints. Higher pricing plans have increased rate limits to support larger processing volumes. Hourly rate limits are distributed per minute to ensure consistent API performance for all users. This distribution means your requests are processed evenly throughout the hour rather than all at once. Rate limits apply at the organization level. All requests from any user or API key within your organization count toward the same limit. ## Maximum Pages per Document The maximum number of pages you can parse in a single PDF depends on the API endpoint and parsing method. The table below shows the page limits for each combination. | Endpoint | [Playground](https://va.landing.ai/demo/doc-extraction) | API Request / [ library](https://github.com/landing-ai/ade-python) | | ---------------------------------------------------------------------------- | ------------------------------------------------------- | ------------------------------------------------------------------ | | [ADE Parse](https://docs.landing.ai/api-reference/tools/ade-parse) | 100 pages | 100 pages | | [ADE Parse Jobs](https://docs.landing.ai/api-reference/tools/ade-parse-jobs) | N/A1 | 1 GB or
6,000 pages | 1. The Playground does not use ADE Parse Jobs. # Section Source: https://docs.landing.ai/ade/ade-section Use the [ API](https://docs.landing.ai/api-reference/tools/ade-section) to generate a hierarchical table of contents (TOC) from a parsed document. analyzes the semantic structure of your document and returns a flat, reading-order list of sections and subsections, each with a title, hierarchy level, section number, and a reference to the starting chunk in the parsed document. is in Preview. This feature is still in development and may not return accurate results. Do not use this feature in production environments. ## Example Use Cases Use in workflows that need to navigate, scope, or retrieve content by section: * **Extraction pipelines**: Scope extraction queries to specific sections of a document (for example, "extract all tables in the Claims section"). * **RAG and search**: Use section-aware chunking instead of sliding-window chunking for more relevant retrieval results. * **Document review**: Generate a navigable TOC for long documents such as insurance policies, contracts, reports, and technical manuals. * **Cross-document analysis**: Process large document batches and compare structure across a corpus. * **LLM applications**: Provide context windows organized by section rather than arbitrary token splits. ## How It Works accepts the Markdown output from (which contains [reference anchors](./ade-markdown-response#anchor-tags)) and returns a structured table of contents. automatically infers the document's hierarchy. You can also pass an optional `guidelines` parameter to control how sections are grouped. The response includes two formats for the same TOC: * **`table_of_contents`**: A structured JSON array for programmatic use * **`table_of_contents_md`**: A Markdown-formatted TOC you can prepend to the document for navigation The TOCs that generates are in English, regardless of the source document language. ## Run Parse First The API requires the Markdown output from the [ API](https://docs.landing.ai/api-reference/tools/ade-parse) as input. The Markdown must contain [reference anchors](./ade-markdown-response#anchor-tags) (``) that uses to map sections back to specific chunks in the original document. ## Section with the API Section a document by calling the endpoint. ```shell theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/section' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'markdown=@parsed_output.md' \ -F 'model=section-latest' ``` ### Parameters Get the full parameters from the [API reference](https://docs.landing.ai/api-reference/tools/ade-section). | Parameter | Required | Description | | :------------- | :--------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `markdown` | Required if `markdown_url` omitted | The Markdown output from the [ API](https://docs.landing.ai/api-reference/tools/ade-parse). Accepts a file upload or a raw string. | | `markdown_url` | Required if `markdown` omitted | A URL pointing to a Markdown file to fetch and section. | | `guidelines` | Optional | Natural-language instructions that control how the hierarchy is built. When omitted, the API infers structure automatically. Examples:

• `"Treat each numbered article as a top-level section"`
• `"Create a flat structure, no more than 2 levels deep"`
• `"Separate the introduction and appendices from the main body"` | | `model` | Optional | The section model version to use. If omitted, the API uses the latest model. For more information, see [Section Model Versions](./ade-section-models). | For detailed information about the API response, go to [JSON Response for Sectioning](./ade-section-response). For information about pricing and credits, go to [Pricing & Billing](./ade-pricing). ## Use Section with Our Libraries Click one of the tiles below to learn how to use the API with our libraries. Use with our Python library. Use with our TypeScript library. ## Share Your Feedback is in public preview and we are actively looking for feedback to improve it. To share your experience, [schedule a feedback session with us](https://landing-ai.zoom.us/zbook/landingai-fatimasalehbhai/landingai-ade-feedback). Come prepared to discuss: * What is working well * Any challenges you've encountered and how the feature could improve * The email address for your account, found in your [profile page](https://va.landing.ai/settings/personal/profile) * The documents you used * The code you used # Section Model Versions Source: https://docs.landing.ai/ade/ade-section-models A section model powers the document structure analysis capabilities of the API. The model analyzes parsed Markdown content and generates a hierarchical table of contents. You can specify a model when calling the API directly or when using the [client libraries](#set-the-model-with-the-client-libraries). If you don't specify a model, the API uses the latest section model (currently `section-20260406`). ## Model Versions The following table lists the available `model` values for the API: | Model Values | Description | | ------------------ | --------------------------------------------------------- | | `section-20260406` | Use the section model snapshot released on April 6, 2026. | | `section-latest` | Use the latest section model snapshot. | ### Why Model Versioning Matters When integrating the API, you have two options for specifying the model: 1. **Use `section-latest`** to always get the newest version. This automatically gives you improvements and updates, but results may change when new model versions are released. 2. **Use a specific version** (like `section-20260406`) to pin to an exact model version. This ensures consistent results over time, but you won't receive improvements. ## Set the Model in the API When calling the endpoint, you can set the model using the `model` parameter. If you omit the `model` parameter, the API uses the latest model. This example shows how to specify a model: ```shell theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/section' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'markdown=@parsed_output.md' \ -F 'model=section-latest' ``` ## Set the Model with the Client Libraries When using the Python or TypeScript library, you can set the model using the `model` parameter in the `section()` method. If you omit the `model` parameter, the library will use the latest section model. ```python {8} Python theme={null} from pathlib import Path from landingai_ade import LandingAIADE client = LandingAIADE() response = client.section( markdown=Path("/path/to/parsed_output.md"), model="section-latest" ) ``` ```typescript {8} TypeScript theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; const client = new LandingAIADE(); const response = await client.section({ markdown: fs.createReadStream("/path/to/parsed_output.md"), model: "section-latest" }); ``` # JSON Response for Sectioning Source: https://docs.landing.ai/ade/ade-section-response When you section a document with the [ API](https://docs.landing.ai/api-reference/tools/ade-section), the results are returned in a structured JSON format. ## Example Response This example shows the information that returns for a sample homeowner's insurance policy. You can download the source document to follow along: Insurance Policy PDF The screenshot below shows the first page of the document. First page of a homeowner's insurance policy analyzed the full document and returned a hierarchical table of contents. The **Declarations And Policy Overview** section on the first page appears as section `2` (level 1) in the response. **Insured Property** and **Policy Period And Renewal** are its level 2 subsections, each with their own level 3 sub-subsections. Each entry's `start_reference` points to the chunk where that section begins in the parsed document. ```json [expandable] theme={null} { "table_of_contents": [ { "title": "Policy Header", "level": 1, "section_number": "1", "start_reference": "1c4b7184-346d-490f-a7bf-bf2442bcbd7c" }, { "title": "Declarations And Policy Overview", "level": 1, "section_number": "2", "start_reference": "494d766e-5cf1-48c4-a580-73061b722b90" }, { "title": "Insured Property", "level": 2, "section_number": "2.1", "start_reference": "1e6c9100-f461-4bb8-8dcd-8ae0311a97c0" }, { "title": "Mortgage And Lienholder", "level": 3, "section_number": "2.1.1", "start_reference": "95e79ad7-72b5-4c9f-9717-8488996c6795" }, { "title": "Detached Structures", "level": 3, "section_number": "2.1.2", "start_reference": "4c1e087b-aee9-4494-9789-735388a7579c" }, { "title": "Policy Period And Renewal", "level": 2, "section_number": "2.2", "start_reference": "de55b7d5-df5c-480e-8b91-0f9f10f5b059" }, { "title": "Cancellation", "level": 3, "section_number": "2.2.1", "start_reference": "7cc48544-b2ec-4e34-b2a1-ae1b3b7d17a6" }, { "title": "Annual Premium", "level": 3, "section_number": "2.2.2", "start_reference": "b8c0ba71-d786-4f89-a07f-3fcfc0edd734" }, { "title": "Schedule Of Coverages And Limits", "level": 1, "section_number": "3", "start_reference": "3700c26c-dd67-4dda-a712-081becb9ab63" }, { "title": "Coverage A Dwelling", "level": 2, "section_number": "3.1", "start_reference": "de789af4-1585-4b3f-9193-4253a9e58f6b" }, { "title": "Ordinance Or Law", "level": 3, "section_number": "3.1.1", "start_reference": "c7eaad71-19ca-4ab4-90cb-e8e9581ededc" }, { "title": "Named Storm Deductible", "level": 3, "section_number": "3.1.2", "start_reference": "d1300436-0131-49d1-a98a-1f46ece31c23" }, { "title": "Coverage C Personal Property", "level": 2, "section_number": "3.2", "start_reference": "80ed234f-1824-4def-b7fb-6883c3fa2b52" }, { "title": "Sub Limits", "level": 3, "section_number": "3.2.1", "start_reference": "0c3e0c5a-80ee-49f6-9ca5-52941547d4bd" }, { "title": "Business Property", "level": 3, "section_number": "3.2.2", "start_reference": "60aba72b-ca56-4806-97b0-bdf580669943" }, { "title": "Exclusions", "level": 1, "section_number": "4", "start_reference": "29529f14-e489-4f2a-bea8-0eef68f431b0" }, { "title": "Earth Movement And Flood", "level": 2, "section_number": "4.1", "start_reference": "51c3a6b7-7b78-404e-a1f3-b39be8498294" }, { "title": "Ensuing Loss", "level": 3, "section_number": "4.1.1", "start_reference": "908cf899-f5af-40c8-9829-4e09da9083c9" }, { "title": "Concurrent Causation", "level": 3, "section_number": "4.1.2", "start_reference": "f48d2861-3d9a-4db6-b2d1-78d22013c3fb" }, { "title": "Other Exclusions", "level": 2, "section_number": "4.2", "start_reference": "3336ee6d-9723-4fe1-914f-75109187cc0a" }, { "title": "Wear And Tear", "level": 3, "section_number": "4.2.1", "start_reference": "f66ab1c5-d5fe-42ac-bf71-56a3c769a0a6" }, { "title": "Vacancy", "level": 3, "section_number": "4.2.2", "start_reference": "75eff73f-3166-45dd-9140-a9182f858e47" }, { "title": "Claims And Conditions", "level": 1, "section_number": "5", "start_reference": "35ccfa20-00fb-462e-b375-aaa1b84a1d8f" }, { "title": "Duties After A Loss", "level": 2, "section_number": "5.1", "start_reference": "a08d6602-32b0-41e0-ad85-5d951dfc93c0" }, { "title": "Notice Deadline", "level": 3, "section_number": "5.1.1", "start_reference": "a68996f1-acd7-403e-bb68-58a7fbe336f0" }, { "title": "Emergency Repairs", "level": 3, "section_number": "5.1.2", "start_reference": "edd78fe8-08d6-4f55-a19b-d15a2a05464b" }, { "title": "Dispute Resolution", "level": 2, "section_number": "5.2", "start_reference": "bdf63da4-4f34-48f0-82a2-cece1ef666d0" }, { "title": "Suit Limitation", "level": 3, "section_number": "5.2.1", "start_reference": "db860fc2-e02e-4efa-8205-553b392b63ca" }, { "title": "Loss Payment", "level": 3, "section_number": "5.2.2", "start_reference": "0f7081fd-75c2-4e0f-8355-b4d66a13c20f" }, { "title": "Policyholder Acknowledgment", "level": 1, "section_number": "6", "start_reference": "065d0342-0c5b-4fbc-9888-b78ac5e0fefc" }, { "title": "Signature Lines", "level": 1, "section_number": "7", "start_reference": "f47e4117-eda0-477d-9a76-482bff511444" } ], "table_of_contents_md": "- [Policy Header](#1c4b7184-346d-490f-a7bf-bf2442bcbd7c)\n- [Declarations And Policy Overview](#494d766e-5cf1-48c4-a580-73061b722b90)\n - [Insured Property](#1e6c9100-f461-4bb8-8dcd-8ae0311a97c0)\n - [Mortgage And Lienholder](#95e79ad7-72b5-4c9f-9717-8488996c6795)\n - [Detached Structures](#4c1e087b-aee9-4494-9789-735388a7579c)\n - [Policy Period And Renewal](#de55b7d5-df5c-480e-8b91-0f9f10f5b059)\n - [Cancellation](#7cc48544-b2ec-4e34-b2a1-ae1b3b7d17a6)\n - [Annual Premium](#b8c0ba71-d786-4f89-a07f-3fcfc0edd734)\n- [Schedule Of Coverages And Limits](#3700c26c-dd67-4dda-a712-081becb9ab63)\n - [Coverage A Dwelling](#de789af4-1585-4b3f-9193-4253a9e58f6b)\n - [Ordinance Or Law](#c7eaad71-19ca-4ab4-90cb-e8e9581ededc)\n - [Named Storm Deductible](#d1300436-0131-49d1-a98a-1f46ece31c23)\n - [Coverage C Personal Property](#80ed234f-1824-4def-b7fb-6883c3fa2b52)\n - [Sub Limits](#0c3e0c5a-80ee-49f6-9ca5-52941547d4bd)\n - [Business Property](#60aba72b-ca56-4806-97b0-bdf580669943)\n- [Exclusions](#29529f14-e489-4f2a-bea8-0eef68f431b0)\n - [Earth Movement And Flood](#51c3a6b7-7b78-404e-a1f3-b39be8498294)\n - [Ensuing Loss](#908cf899-f5af-40c8-9829-4e09da9083c9)\n - [Concurrent Causation](#f48d2861-3d9a-4db6-b2d1-78d22013c3fb)\n - [Other Exclusions](#3336ee6d-9723-4fe1-914f-75109187cc0a)\n - [Wear And Tear](#f66ab1c5-d5fe-42ac-bf71-56a3c769a0a6)\n - [Vacancy](#75eff73f-3166-45dd-9140-a9182f858e47)\n- [Claims And Conditions](#35ccfa20-00fb-462e-b375-aaa1b84a1d8f)\n - [Duties After A Loss](#a08d6602-32b0-41e0-ad85-5d951dfc93c0)\n - [Notice Deadline](#a68996f1-acd7-403e-bb68-58a7fbe336f0)\n - [Emergency Repairs](#edd78fe8-08d6-4f55-a19b-d15a2a05464b)\n - [Dispute Resolution](#bdf63da4-4f34-48f0-82a2-cece1ef666d0)\n - [Suit Limitation](#db860fc2-e02e-4efa-8205-553b392b63ca)\n - [Loss Payment](#0f7081fd-75c2-4e0f-8355-b4d66a13c20f)\n- [Policyholder Acknowledgment](#065d0342-0c5b-4fbc-9888-b78ac5e0fefc)\n- [Signature Lines](#f47e4117-eda0-477d-9a76-482bff511444)\n", "metadata": { "filename": "insurance_policy-md.md", "org_id": "org_123", "duration_ms": 15435, "credit_usage": 6.0588, "job_id": "17aa1381fe97495f8b9be76baba9932c", "version": "section-20260406" } } ``` ## Response Structure The response contains the following top-level fields: | Field | Description | | -------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- | | [`table_of_contents`](#table-of-contents-table_of_contents) | A flat, reading-order array of section entries with hierarchy levels and chunk references. | | [`table_of_contents_md`](#markdown-table-of-contents-table_of_contents_md) | A Markdown-formatted version of the same TOC with anchor links. | | [`metadata`](#processing-metadata-metadata) | Processing information including credit usage, duration, filename, job ID, and model version. | ## Table of Contents (`table_of_contents`) The `table_of_contents` field contains a flat array of all sections and subsections in the order they appear in the document. The flat list structure is optimized for retrieval use cases. You can iterate the list directly without recursion. Each entry includes: | Field | Description | | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `title` | The generated text for the section heading. | | `level` | The hierarchy depth. Top-level sections are `1`, subsections are `2`, sub-subsections are `3`, and so on. | | `section_number` | The hierarchical number for this section (for example, `"1"`, `"1.2"`, `"1.2.3"`). Use this field to reconstruct parent-child relationships: entry `"2.1"` is a child of entry `"2"`. | | `start_reference` | The ID of the chunk where this section begins. This corresponds to a `chunks[].id` value from the [ API](https://docs.landing.ai/api-reference/tools/ade-parse) response. | ## Markdown Table of Contents (`table_of_contents_md`) The `table_of_contents_md` field provides a ready-to-use Markdown-formatted TOC. Each entry is an anchor link pointing to the corresponding [reference anchor](./ade-markdown-response#anchor-tags) in the document Markdown. Indentation represents the hierarchy level. You can prepend this string to the document's full Markdown to enable in-document navigation: ```text theme={null} - [Introduction](#uuid_a1) - [Methods](#uuid_b2) - [Data Collection](#uuid_c3) - [Survey Design](#uuid_d4) ``` ## Processing Metadata (`metadata`) The `metadata` field provides information about the sectioning process: | Field | Description | | -------------- | -------------------------------------------------------------------------------------------------------------- | | `filename` | The name of the input Markdown file. | | `org_id` | Organization identifier. | | `duration_ms` | Processing time in milliseconds. | | `credit_usage` | Number of credits consumed. | | `job_id` | Unique job identifier. | | `version` | Model version used for sectioning. For more information, go to [Section Model Versions](./ade-section-models). | # Troubleshoot Sectioning Source: https://docs.landing.ai/ade/ade-section-troubleshoot Use this section to troubleshoot issues encountered when calling the API ([https://api.va.landing.ai/v1/ade/section](https://api.va.landing.ai/v1/ade/section)). ## Status Codes | Status Code | Name | Description | What to Do | | ----------- | --------------------- | ----------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | | 200 | Success | Sectioning completed successfully. | Continue with normal operations. | | 400 | Bad Request | Invalid request due to Markdown download failure from URL. | Review error message for specific issue. See [Status 400](#status-400-bad-request). | | 401 | Unauthorized | Missing or invalid API key. | Check that your `Authorization` header is present and contains a valid [API key](./agentic-api-key). | | 402 | Payment Required | Your account does not have enough credits to complete processing. | If you have multiple accounts, make sure you're using the correct [API key](./agentic-api-key). Add more credits to your account. | | 422 | Unprocessable Entity | Input validation failed. | Review your request parameters. See [Status 422](#status-422-unprocessable-entity). | | 429 | Too Many Requests | Rate limit exceeded. | Wait before retrying. Reduce request frequency and implement exponential backoff. | | 500 | Internal Server Error | Server error during sectioning. | Retry. If the issue persists, contact [support@landing.ai](mailto:support@landing.ai). See [Status 500](#status-500-internal-server-error). | | 504 | Gateway Timeout | Request processing exceeded the timeout limit. | Reduce Markdown content size. See [Status 504](#status-504-gateway-timeout). | ## Status 400: Bad Request This status code indicates a client-side error. Review the specific error message to identify the issue. ### Error: Failed to download document from URL This error occurs when the API cannot download the Markdown file from the provided `markdown_url`. **Error message:** ``` Failed to download document from URL: {error_details} ``` **What to do:** * Verify the URL is accessible and returns valid content. * Check network connectivity and URL permissions. * Ensure the URL points to a Markdown file (.md extension). ## Status 422: Unprocessable Entity This status code indicates input validation failures. Review the error message and adjust your request parameters. ### Error: No markdown provided This error occurs when your request does not include the `markdown` parameter or `markdown_url` parameter, or when the values are empty. **Error message:** ``` Must provide either 'markdown' or 'markdown_url'. ``` **What to do:** Add one of these parameters to your request: * Use the `markdown` parameter to upload a Markdown file or provide inline Markdown content, OR * Use the `markdown_url` parameter to provide a URL to a Markdown file. ### Error: Both 'markdown' and 'markdown\_url' provided This error occurs when your request includes both the `markdown` and `markdown_url` parameters. **Error message:** ``` Cannot provide both 'markdown' and 'markdown_url'. ``` **What to do:** Remove one of the parameters. Use either `markdown` (to upload a file or provide inline content) or `markdown_url` (to specify a URL), but not both in the same request. ### Error: Invalid 'markdown\_url' format This error occurs when the value provided for `markdown_url` is not a valid URL. **Error message:** ``` Invalid URL format: {url} ``` **What to do:** * Verify that the URL is correctly formatted (for example, `https://example.com/document.md`). * Ensure the URL includes a scheme (`https://` or `http://`). ### Error: Multiple markdown files detected This error occurs when multiple Markdown files are included in the request. **Error message:** ``` Multiple markdown files detected (X). Please provide only one document file. ``` **What to do:** Send only one Markdown file per request. ### Error: Unsupported format This error occurs when you provide a file other than Markdown (.md) to the section endpoint. **Error message:** ``` Unsupported format: {mime_type} ({filename}). Supported formats: MD ``` **What to do:** * The section endpoint only accepts Markdown files with a .md extension. * If you have a PDF, DOCX, or other document format, use the [ API](https://docs.landing.ai/api-reference/tools/ade-parse) to convert your document to Markdown first, then pass the parsed Markdown output to the section endpoint. * Ensure your file has a .md extension and contains valid UTF-8 encoded Markdown content. ### Error: Missing reference anchors This error occurs when the Markdown content does not contain reference anchors. The section endpoint requires Markdown output from the [ API](https://docs.landing.ai/api-reference/tools/ade-parse), which includes `` anchors that uses to map sections to chunks. **Error message:** ``` Invalid markdown: missing reference anchors. Markdown must contain reference delimiters. Use the parse tool to generate properly formatted markdown. ``` **What to do:** * Do not pass plain Markdown or documents you have formatted manually. Use the Markdown output from the [ API](https://docs.landing.ai/api-reference/tools/ade-parse) as input. * Confirm the Markdown content contains `` anchor tags before sending it to the section endpoint. ## Status 500: Internal Server Error This error indicates an unexpected server error occurred during sectioning. **What to do:** * Retry the request. * If the error persists, contact [support@landing.ai](mailto:support@landing.ai). ## Status 504: Gateway Timeout This error occurs when the sectioning process exceeds the timeout limit. **What to do:** * Reduce the size of your Markdown document. * If the error persists, contact [support@landing.ai](mailto:support@landing.ai). ## Best Practices for Avoiding Errors * **Use parsed Markdown**: Always pass Markdown content generated by the [ API](./ade-separate-apis). The section endpoint requires structured Markdown with [reference anchors](./ade-markdown-response#anchor-tags) (``). * **Use guidelines sparingly**: The `guidelines` parameter is optional. Start without guidelines to see the default behavior before customizing the hierarchy. ## When Are Credits Consumed? Credits are consumed only when the API returns a 200 status code. All other responses, including errors, do not consume credits. # Security and Privacy Source: https://docs.landing.ai/ade/ade-security is committed to protecting your data and maintaining the highest security standards for . Refer to the resources below to learn about our security practices, zero data retention (ZDR) option, compliance certifications (GDPR, SOC 2 Type II, HIPAA), and more. Enterprise plans support SSO via SAML 2.0 and OpenID Connect (OIDC), allowing your organization to manage access through your existing identity provider (IdP). The Trust Center is your central resource for accessing our security documentation, compliance reports, and real-time system status. This page outlines our security posture, compliance with industry standards, and the measures we take to safeguard your data across our products and infrastructure. # Parse Source: https://docs.landing.ai/ade/ade-separate-apis Use the [ API](https://docs.landing.ai/api-reference/tools/ade-parse) to convert documents into structured Markdown, chunks, and metadata. The API identifies elements like text, tables, and form fields with exact page and coordinate references. After parsing, you can: * Use the Markdown output directly in your applications * Run [field extraction](./ade-extract) on the parsed content * [Generate a hierarchical table of contents](./ade-section) for navigation and retrieval * [Classify and split](./ade-split) multi-document files into separate documents ## Use ADE Parse to Parse Documents Use the API to parse data from documents. See the full API reference [here](https://docs.landing.ai/api-reference/tools/ade-parse). To get detailed information about the API response, go to [JSON Response for Parsing](./ade-json-response). ### Specify Documents to Parse The API offers two parameters for specifying the document you want to parse: * `document`: Specify the actual file you want to parse. * `document_url`: Include the URL to the file that you want to parse. ### Set Up Splits for Parsing The `split` parameter is different from the [ API](./ade-split). If your goal is to separate a document into sub-documents after parsing, use the API, not the `split` parameter. The API includes an optional `split` parameter that controls how the parsed content is organized in the response. This is how to include the `split` parameter when calling the API. ```bash theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/parse' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'document=@document.pdf' \ -F 'model=dpt-2-latest'\ - F 'split=page' # Optional split parameter ``` #### Omit the Split Parameter When you omit the `split` parameter, the API returns the entire document as a single split. All chunks from all pages are grouped together in one split object with `class: "full"` and `identifier: "full"`. Omit the `split` parameter when you need to: * Process the entire document as a single unit * Maintain document context across pages * Extract data that spans multiple pages #### Use Page-Level Splits When you use page-level splits (When you set `split=page`), the API organizes chunks by page. For multi-page documents, this creates one split per page. Each split has `class: "page"` and an identifier like `page_0`, `page_1`, etc. Use page-level splits (set the `split` parameter to `page`) when you need to: * Process pages independently * Extract data from specific pages * Build page-by-page workflows * Reduce content size for downstream processing ## Add Custom Prompts for Figure Descriptions Use the optional `custom_prompts` parameter to customize how describes figures during parsing. For more information, see [Custom Prompts for Figure Descriptions](./ade-parse-custom-prompts). ## Example: Parse a Document with the API This example shows how to parse a document with the API and save the Markdown output to a file. Materials: * Sample PDF: Wire Transfer Form ```python [expandable] theme={null} import requests headers = { 'Authorization': 'Bearer YOUR_API_KEY' } url = 'https://api.va.landing.ai/v1/ade/parse' # Upload a document document = open('wire-transfer.pdf', 'rb') files = {'document': document} data = {'model': 'dpt-2-latest'} response = requests.post(url, files=files, data=data, headers=headers) response_data = response.json() # Print the full response print(response_data) # Extract and save the markdown content if 'markdown' in response_data: markdown_content = response_data['markdown'] # Save markdown content to file with open('markdown-wire-transfer.md', 'w', encoding='utf-8') as f: f.write(markdown_content) print("\nMarkdown content saved to a Markdown file.") else: print("No 'markdown' field found in the response") # Close the file document.close() ``` The full response will be similar to the JSON below. Notice that each chunk has an `id`. For example, the first chunk is the text "# WIRE TRANSFER FORM". The `id` for that chunk is `33335548-e7c3-40bd-898e-4f23d6c99d34`. ```json [expandable] theme={null} { 'markdown':"\n\n# WIRE TRANSFER FORM\n\n\n\nInvoice Information\n\nInvoice Description: Professional consulting services - Q3 2025\n\nTotal Invoice Amount: $15,750.00 USD\n\n\n\nBeneficiary Bank Information\n\nBank Name: JPMorgan Chase Bank, N.A.\n\nBank Address: 270 Park Avenue, New York, NY 10017, USA\n\nBank Account Number: 4578923456789012\n\nSWIFT Code: CHASUS33\n\nABA Routing Number: 021000021\n\nACH Routing Number: 021000021\n\n\n\nInvoice Line Items\n
DescriptionAmount
Strategic planning consultation (40 hours @ $150/hr)$6,000.00
Market analysis report preparation$3,500.00
Implementation roadmap development$2,250.00
Executive presentation materials$1,500.00
Follow-up consultation sessions (15 hours @ $150/hr)$2,250.00
Travel expenses (reimbursable)$250.00
TOTAL$15,750.00
\n\n\n\nWire Transfer Instructions\n\nPayment Method: International Wire Transfer\nCurrency: USD (United States Dollars)\nBeneficiary Name: ABC Consulting Services LLC\n\n\n\nBeneficiary Address: 1234 Business Park Drive, Suite 500, Los Angeles, CA 90210, USA\n\nPurpose of Payment: Payment for professional consulting services as per Invoice #INV-2025-0847\n\n\n\n- Special Instructions:\n - Please include invoice number INV-2025-0847 in the payment reference\n - All bank charges to be borne by the sender\n - Payment should be received within 3-5 business days\n - Please send wire confirmation receipt to accounting@abcconsulting.com\n - For any questions regarding this transfer, contact: +1 (555) 123-4567\n\n\n\n**Urgency:** Standard processing (3-5 business days acceptable)\n\n\n\nForm completed on: September 3, 2025\n\nReference Number: WT-2025-0847", 'chunks':[ { 'markdown':'# WIRE TRANSFER FORM', 'type':'text', 'id':'33335548-e7c3-40bd-898e-4f23d6c99d34', 'grounding':{ 'box':{ 'left':0.2622728943824768, 'top':0.07604080438613892, 'right':0.7369285821914673, 'bottom':0.10924206674098969 }, 'page':0 } }, { 'markdown':'Invoice Information\n\nInvoice Description: Professional consulting services - Q3 2025\n\nTotal Invoice Amount: $15,750.00 USD', 'type':'text', 'id':'0777dc07-855b-4b83-b422-5e8063405249', 'grounding':{ 'box':{ 'left':0.10331332683563232, 'top':0.13015401363372803, 'right':0.8966385126113892, 'bottom':0.2544138431549072 }, 'page':0 } }, { 'markdown':'Beneficiary Bank Information\n\nBank Name: JPMorgan Chase Bank, N.A.\n\nBank Address: 270 Park Avenue, New York, NY 10017, USA\n\nBank Account Number: 4578923456789012\n\nSWIFT Code: CHASUS33\n\nABA Routing Number: 021000021\n\nACH Routing Number: 021000021', 'type':'text', 'id':'7c56b114-cc66-4fe4-99cb-9425a5210747', 'grounding':{ 'box':{ 'left':0.10399597883224487, 'top':0.2693082094192505, 'right':0.895996630191803, 'bottom':0.5048781633377075 }, 'page':0 } }, { 'markdown':'Invoice Line Items\n
DescriptionAmount
Strategic planning consultation (40 hours @ $150/hr)$6,000.00
Market analysis report preparation$3,500.00
Implementation roadmap development$2,250.00
Executive presentation materials$1,500.00
Follow-up consultation sessions (15 hours @ $150/hr)$2,250.00
Travel expenses (reimbursable)$250.00
TOTAL$15,750.00
', 'type':'table', 'id':'b95955a2-3f1d-4b96-be12-d5af677efd60', 'grounding':{ 'box':{ 'left':0.10457819700241089, 'top':0.5198298096656799, 'right':0.8970209956169128, 'bottom':0.8072096705436707 }, 'page':0 } }, { 'markdown':'Wire Transfer Instructions\n\nPayment Method: International Wire Transfer\nCurrency: USD (United States Dollars)\nBeneficiary Name: ABC Consulting Services LLC', 'type':'text', 'id':'d9296cc1-f804-43e2-9f0f-99e7c62eec48', 'grounding':{ 'box':{ 'left':0.10443270206451416, 'top':0.8223555088043213, 'right':0.8968669176101685, 'bottom':0.974624514579773 }, 'page':0 } }, { 'markdown':'Beneficiary Address: 1234 Business Park Drive, Suite 500, Los Angeles, CA 90210, USA\n\nPurpose of Payment: Payment for professional consulting services as per Invoice #INV-2025-0847', 'type':'text', 'id':'f2b8a1d4-4436-4e05-9467-bdaf5ca4bd3b', 'grounding':{ 'box':{ 'left':0.11186572909355164, 'top':0.022329870611429214, 'right':0.8772550821304321, 'bottom':0.09824278950691223 }, 'page':1 } }, { 'markdown':'- Special Instructions:\n - Please include invoice number INV-2025-0847 in the payment reference\n - All bank charges to be borne by the sender\n - Payment should be received within 3-5 business days\n - Please send wire confirmation receipt to accounting@abcconsulting.com\n - For any questions regarding this transfer, contact: +1 (555) 123-4567', 'type':'text', 'id':'1d536fff-e204-48d4-a53a-8e524665aec5', 'grounding':{ 'box':{ 'left':0.11558690667152405, 'top':0.10176733136177063, 'right':0.8238765001296997, 'bottom':0.20318034291267395 }, 'page':1 } }, { 'markdown':'**Urgency:** Standard processing (3-5 business days acceptable)', 'type':'text', 'id':'fb34e8c2-0aa6-4866-895d-060c07b717ea', 'grounding':{ 'box':{ 'left':0.11588779091835022, 'top':0.204525887966156, 'right':0.6877880096435547, 'bottom':0.23076602816581726 }, 'page':1 } }, { 'markdown':'Form completed on: September 3, 2025\n\nReference Number: WT-2025-0847', 'type':'text', 'id':'7c686aab-8142-4da2-a7e7-dae4495aade5', 'grounding':{ 'box':{ 'left':0.35991770029067993, 'top':0.26450976729393005, 'right':0.641823947429657, 'bottom':0.3033582866191864 }, 'page':1 } } ], 'splits':[ { 'class':'full', 'identifier':'full', 'pages':[ 0, 1 ], 'markdown':"\n\n# WIRE TRANSFER FORM\n\n\n\nInvoice Information\n\nInvoice Description: Professional consulting services - Q3 2025\n\nTotal Invoice Amount: $15,750.00 USD\n\n\n\nBeneficiary Bank Information\n\nBank Name: JPMorgan Chase Bank, N.A.\n\nBank Address: 270 Park Avenue, New York, NY 10017, USA\n\nBank Account Number: 4578923456789012\n\nSWIFT Code: CHASUS33\n\nABA Routing Number: 021000021\n\nACH Routing Number: 021000021\n\n\n\nInvoice Line Items\n
DescriptionAmount
Strategic planning consultation (40 hours @ $150/hr)$6,000.00
Market analysis report preparation$3,500.00
Implementation roadmap development$2,250.00
Executive presentation materials$1,500.00
Follow-up consultation sessions (15 hours @ $150/hr)$2,250.00
Travel expenses (reimbursable)$250.00
TOTAL$15,750.00
\n\n\n\nWire Transfer Instructions\n\nPayment Method: International Wire Transfer\nCurrency: USD (United States Dollars)\nBeneficiary Name: ABC Consulting Services LLC\n\n\n\nBeneficiary Address: 1234 Business Park Drive, Suite 500, Los Angeles, CA 90210, USA\n\nPurpose of Payment: Payment for professional consulting services as per Invoice #INV-2025-0847\n\n\n\n- Special Instructions:\n - Please include invoice number INV-2025-0847 in the payment reference\n - All bank charges to be borne by the sender\n - Payment should be received within 3-5 business days\n - Please send wire confirmation receipt to accounting@abcconsulting.com\n - For any questions regarding this transfer, contact: +1 (555) 123-4567\n\n\n\n**Urgency:** Standard processing (3-5 business days acceptable)\n\n\n\nForm completed on: September 3, 2025\n\nReference Number: WT-2025-0847", 'chunks':[ '33335548-e7c3-40bd-898e-4f23d6c99d34', '0777dc07-855b-4b83-b422-5e8063405249', '7c56b114-cc66-4fe4-99cb-9425a5210747', 'b95955a2-3f1d-4b96-be12-d5af677efd60', 'd9296cc1-f804-43e2-9f0f-99e7c62eec48', 'f2b8a1d4-4436-4e05-9467-bdaf5ca4bd3b', '1d536fff-e204-48d4-a53a-8e524665aec5', 'fb34e8c2-0aa6-4866-895d-060c07b717ea', '7c686aab-8142-4da2-a7e7-dae4495aade5' ] } ], 'metadata':{ 'filename':'wire-transfer.pdf', 'org_id':None, 'page_count':2, 'duration_ms':7861, 'credit_usage':6.0, 'version':'latest' } } ``` To extract specific fields from the parsed Markdown, see [Extract Data](./ade-extract). ## Run Parse with Our Libraries Click one of the tiles below to learn how to run the [ API](https://docs.landing.ai/api-reference/tools/ade-parse) with our libraries. Run Parse with our Python library. Run Parse with our TypeScript library. # App Name Source: https://docs.landing.ai/ade/ade-sf-app-name Some SQL commands for managing require the name of your instance of the app. In the SQL commands throughout this documentation, `APP_NAME` is used as a placeholder for your app name. The default app name is `AGENTIC_DOCUMENT_EXTRACTION__APP`. You need your app name to: * [Parse documents](./ade-sf-parse-cloud) * [Extract fields from parsed output](./ade-sf-extract-cloud) * [Grant app access to stages](./ade-sf-grant-access-to-stages) ## App Name in Sample Scripts Sample scripts in the app interface automatically use your app name. When you copy these scripts, you do not need to change the app name. ## Locate the Name of Your App You can get the name of your instance of in Snowsight: 1. Open [Snowsight](https://app.snowflake.com). 2. Go to **Catalog** > **Apps** > **Agentic Document Extraction - App**. 3. The app opens to the **Home** page. 4. Identify the app name in the first line of the scripts. For example, in the screenshot below, the name of the app is `AGENTIC_DOCUMENT_EXTRACTION__APP`. App Name # Extract Fields from Parsed Output Source: https://docs.landing.ai/ade/ade-sf-extract-cloud To extract fields from documents, first parse a document with the [api.parse](./ade-sf-parse-cloud) procedure. Then use the `api.extract` procedure to extract key-value pairs from the Markdown returned by the `api.parse` procedure. ## Prerequisites Before you can extract fields, you must parse a document. For more information, go to [Parse Documents](./ade-sf-parse-cloud). ## Set Up the Session Before running a parse or extract procedure, run the command below to set your session to use the Agentic Document Extraction application and procedures. Replace this placeholder with the [name of your instance](./ade-sf-app-name) of Agentic Document Extraction: `APP_NAME`. ```sql theme={null} USE "APP_NAME"; ``` ## Extract To extract fields from parsed documents, use the `api.extract` procedure. The `api.extract` procedure sends the Markdown and a JSON schema to the -hosted service, and saves the extracted data to an output table (defaults to `db.extract_output`). The `api.extract` procedure runs the [ADE Extract API](https://docs.landing.ai/api-reference/tools/ade-extract). ## Required Inputs The `api.extract` procedure requires: * **Markdown content** from `api.parse` * **JSON schema** that defines which fields to extract and their expected format. For more information, go to [Create a JSON Schema for Field Extraction](https://docs.landing.ai/ade/ade-extract-schema-json). The procedure accepts these inputs using different methods. For details, go to [Methods for Passing the Markdown](#methods-for-passing-the-markdown) and [Methods for Passing the JSON Schema](#methods-for-passing-the-json-schema). ## Optional Parameters The `api.extract` procedure supports these optional parameters: * **`doc_id`**: Document ID from parse output; provide this to link the extraction results to the original parsed document * **`output_table`**: Specify a custom output table name instead of the default `extract_output` * **`model`**: Specify the model version to use for extraction. For full details on extraction models, go to [Extraction Model Versions](./ade-extract-models). ## Extract Return Object The `api.extract` procedure returns an OBJECT with the following fields: * **`message`**: Success or error message * **`output_table`**: Name of the table where results were saved (such as "db.extract\_output"). For the table schema, go to [Extract Output Table Schema](#extract-output-table-schema). * **`doc_id`**: Document ID from the parse output (for linking results) * **`extraction_id`**: Unique extraction job identifier * **`status_code`**: HTTP status code for the request This return object is useful when chaining parse and extract operations together, and for tracking extraction jobs and debugging. ## Extract Output Table Schema The extraction results are stored in the table specified by `output_table` in the [return object](#extract-return-object). By default, this is `db.extract_output`. The table has the following schema: * **DOC\_ID**: Document ID from parse output; you can use this to link extraction results to the original parsed document in `parse_output` * **EXTRACTION\_JOB\_ID**: Unique extraction job identifier * **SOURCE\_MARKDOWN**: First 10,000 characters of input Markdown (for reference) * **MODEL\_VERSION**: Model version used for extraction * **EXTRACTED\_AT**: Timestamp when extraction completed * **STATUS\_CODE**: HTTP status code (200 for success) * **EXTRACTION**: VARIANT containing the extracted data matching your schema * **EXTRACTION\_METADATA**: VARIANT with extraction metadata * **METADATA**: VARIANT with job metadata * **ERROR**: VARIANT containing error information (if extraction failed) Example query to access nested extraction data: ```sql theme={null} SELECT DOC_ID, EXTRACTION:field_name::STRING AS field_value, EXTRACTION:nested:field::NUMBER AS nested_value FROM db.extract_output WHERE STATUS_CODE = 200; ``` ## Methods for Passing the Markdown You can pass the Markdown content to `api.extract` using two methods: * [Pass the Parse Result Object Directly](#pass-the-parse-result-object-directly) * [Pass Markdown Explicitly](#pass-markdown-explicitly) ### Pass the Parse Result Object Directly You can pass the result object from `api.parse` directly to `api.extract`. This is the most streamlined approach for chaining parse and extract operations. You can combine this with any method for passing the JSON schema. Use this method when you: * Want to chain parse and extract in a single script block * Need to avoid querying the parse output table * Want to automatically link parse and extract results **Procedure Signature** ```sql theme={null} PROCEDURE api.extract( parse_result OBJECT, schema STRING, output_table STRING DEFAULT NULL, model STRING DEFAULT NULL ) ``` #### Example The procedure automatically: * Extracts the `doc_id` from the parse result * Retrieves the Markdown from the parse output table * Links the extraction result with the parse result via `doc_id` ```sql theme={null} DECLARE parse_result OBJECT; extract_result OBJECT; BEGIN -- Parse the document CALL api.parse( 'https://va.landing.ai/pdfs/invoice_1.pdf' ) INTO :parse_result; -- Extract using the parse result directly CALL api.extract( :parse_result, 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json' ) INTO :extract_result; RETURN extract_result; END; ``` ### Pass Markdown Explicitly You can query the Markdown directly from the `parse_output` table and pass it as a parameter. Use this method when you've already parsed documents and want to extract from them separately. You can combine this with any method for passing the JSON schema. **Procedure Signature** ```sql theme={null} PROCEDURE api.extract( markdown STRING, schema STRING, doc_id STRING DEFAULT NULL, output_table STRING DEFAULT NULL, model STRING DEFAULT NULL ) ``` #### Example ```sql theme={null} CALL api.extract( markdown => (SELECT MARKDOWN FROM db.parse_output WHERE doc_id = 'your_doc_id'), schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json', doc_id => 'your_doc_id' ); SELECT * FROM db.extract_output; ``` ## Methods for Passing the JSON Schema You can pass the JSON schema to `api.extract` using multiple methods: * [Include the JSON Schema Inline](#include-the-json-schema-inline) * [Use a Staged Schema File](#use-a-staged-schema-file) * [Pass a URL to an Externally Hosted JSON Schema (Demo Files Only)](#pass-a-url-to-an-externally-hosted-json-schema-demo-files-only) ### Include the JSON Schema Inline Provide the schema as an inline JSON string. You can combine this with any method for passing the Markdown. Use this method when you: * Have a simple schema specific to one query * Want to keep all logic contained in a single script * Prototype or test schema definitions #### Example ```sql theme={null} CALL api.extract( markdown => (SELECT MARKDOWN FROM db.parse_output WHERE doc_id = 'your_doc_id'), schema => '{ "type": "object", "properties": { "title": { "type": "string", "title": "Document Title", "description": "The main title of the document." }, "content": { "type": "string", "title": "Main Content", "description": "The primary content or text from the document." } }, "required": ["title", "content"] }', doc_id => 'your_doc_id' ); SELECT * FROM db.extract_output; ``` ### Use a Staged Schema File Use `build_scoped_file_url()` to reference a schema file in a Snowflake stage. You can combine this with any method for passing the Markdown. Use this method when you: * Store your schema files in Snowflake stages * Want to version control schemas alongside your data * Need to reference schemas from internal stage locations #### Example ```sql theme={null} CALL api.extract( markdown => (SELECT MARKDOWN FROM db.parse_output WHERE doc_id = 'your_doc_id'), schema => build_scoped_file_url('@your_db.your_schema.your_stage', '/simple_schema.json'), doc_id => 'your_doc_id' ); SELECT * FROM db.extract_output; ``` ### Pass a URL to an Externally Hosted JSON Schema (Demo Files Only) This method only works with schema files hosted at `https://va.landing.ai`, which were granted access during app installation. To use schemas from other URLs or locations, use another method for passing the JSON schema. Provide the schema as a URL parameter. You can combine this with any method for passing the Markdown. #### Example ```sql theme={null} CALL api.extract( markdown => (SELECT MARKDOWN FROM db.parse_output WHERE doc_id = 'your_doc_id'), schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json', doc_id => 'your_doc_id' ); SELECT * FROM db.extract_output; ``` ## Batch Processing with EXECUTE IMMEDIATE For processing multiple documents at once, you can use Snowflake's scripting capabilities to loop through parsed documents and extract data from each. This approach is useful when you have many documents already parsed and want to extract structured data from all of them in one operation. ```sql theme={null} EXECUTE IMMEDIATE $$ DECLARE res RESULTSET DEFAULT ( SELECT MARKDOWN, DOC_ID FROM db.parse_output WHERE STATUS_CODE = 200 ); cur CURSOR FOR res; BEGIN FOR record IN cur DO LET markdown STRING := record.MARKDOWN; LET doc_id STRING := record.DOC_ID; CALL api.extract( markdown => :markdown, schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json', doc_id => :doc_id, output_table => 'batch_extract_results' ); END FOR; RETURN 'Processing complete'; END; $$; -- View all extracted results SELECT * FROM db.batch_extract_results; ``` ## Sample Scenarios This section provides examples of how to run the `api.extract` procedure in different scenarios. * [Parse and Extract Data from Files at Publicly Accessible URLs](#parse-and-extract-data-from-files-at-publicly-accessible-urls) * [Parse and Extract Data from a Staged File](#parse-and-extract-data-from-a-staged-file) * [Parse and Extract Data from All Files in a Stage](#parse-and-extract-data-from-all-files-in-a-stage) ### Parse and Extract Data from Files at Publicly Accessible URLs Run the command below to parse multiple files at publicly accessible URLs, and then extract data from the parsed output. We've provided the sample files to help you get started. This example uses an externally hosted JSON schema at `https://va.landing.ai`, which is only available for demo purposes. For production use, use an [inline schema](#include-the-json-schema-inline) or a [staged schema file](#use-a-staged-schema-file). Replace this placeholder with your information: `APP_NAME`. ```sql theme={null} USE "APP_NAME"; -- Step 1: Parse invoice files CALL api.parse('https://va.landing.ai/pdfs/invoice_1.pdf'); CALL api.parse('https://va.landing.ai/pdfs/invoice_2.pdf'); CALL api.parse('https://va.landing.ai/pdfs/invoice_3.pdf'); CALL api.parse('https://va.landing.ai/pdfs/invoice_4.pdf'); -- Step 2: Extract structured data from each parsed invoice CALL api.extract( markdown => (SELECT MARKDOWN FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_1.pdf'), schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json', doc_id => (SELECT DOC_ID FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_1.pdf') ); CALL api.extract( markdown => (SELECT MARKDOWN FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_2.pdf'), schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json', doc_id => (SELECT DOC_ID FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_2.pdf') ); CALL api.extract( markdown => (SELECT MARKDOWN FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_3.pdf'), schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json', doc_id => (SELECT DOC_ID FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_3.pdf') ); CALL api.extract( markdown => (SELECT MARKDOWN FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_4.pdf'), schema => 'https://va.landing.ai/pdfs/InvoiceExtractionSchema.json', doc_id => (SELECT DOC_ID FROM db.parse_output WHERE SOURCE_URL = 'https://va.landing.ai/pdfs/invoice_4.pdf') ); -- Step 3: View extracted data SELECT SPLIT_PART(p.SOURCE_URL, '/', -1) AS filename, e.EXTRACTION:invoice_info:invoice_date::STRING AS invoice_date, e.EXTRACTION:company_info:supplier_name::STRING AS supplier_name, e.EXTRACTION:line_items[0]:description::STRING AS first_line_item_description, e.EXTRACTION:totals_summary:total_due::FLOAT AS total_due FROM db.extract_output e JOIN db.parse_output p ON e.DOC_ID = p.DOC_ID ORDER BY filename; ``` ### Parse and Extract Data from a Staged File Before parsing staged files, you must grant the application access to your stage. For more information, go to [Grant Access to Stages](./ade-sf-grant-access-to-stages). Run the command below to parse a single file in a Snowflake stage, and then extract data from the parsed output. Replace these placeholders with your information: `APP_NAME`, `your_db`, `your_schema`, `your_stage`, `path/to/file.pdf`, and the JSON schema fields. ```sql theme={null} USE "APP_NAME"; -- Step 1: Parse the staged file CALL api.parse( '@your_db.your_schema.your_stage/path/to/file.pdf' ); -- Step 2: Extract structured data using inline schema CALL api.extract( markdown => (SELECT MARKDOWN FROM db.parse_output WHERE FILENAME = 'file.pdf'), schema => '{ "type": "object", "properties": { "field1": {"type": "string"}, "field2": {"type": "string"} }, "required": ["field1", "field2"] }', doc_id => (SELECT DOC_ID FROM db.parse_output WHERE FILENAME = 'file.pdf') ); -- Step 3: View extracted fields SELECT EXTRACTION:field1::STRING AS field1, EXTRACTION:field2::STRING AS field2 FROM db.extract_output; ``` #### Sample Script: Parse and Extract a Staged File Let's say you have the following setup: * **APP\_NAME**: AGENTIC\_DOCUMENT\_EXTRACTION\_\_APP * **Database**: DEMO\_DB * **Schema**: DEMO\_SCHEMA * **Stage**: DEMO\_STAGE * **PDF**: statement-jane-harper.pdf You want to extract these fields from the file: * Employee Name * Employee Social Security Number First, grant the application access to the stage: ```sql theme={null} GRANT USAGE ON DATABASE DEMO_DB TO APPLICATION "AGENTIC_DOCUMENT_EXTRACTION__APP"; GRANT USAGE ON SCHEMA DEMO_DB.DEMO_SCHEMA TO APPLICATION "AGENTIC_DOCUMENT_EXTRACTION__APP"; GRANT READ, WRITE ON STAGE DEMO_DB.DEMO_SCHEMA.DEMO_STAGE TO APPLICATION "AGENTIC_DOCUMENT_EXTRACTION__APP"; ``` Then, parse the file and extract fields from `statement-jane-harper.pdf`: ```sql theme={null} USE "AGENTIC_DOCUMENT_EXTRACTION__APP"; -- Step 1: Parse the staged file CALL api.parse( '@DEMO_DB.DEMO_SCHEMA.DEMO_STAGE/statement-jane-harper.pdf' ); -- Step 2: Extract structured data using inline schema CALL api.extract( markdown => (SELECT MARKDOWN FROM db.parse_output WHERE FILENAME = 'statement-jane-harper.pdf'), schema => '{ "title": "Employee Payroll Field Extraction Schema", "description": "Schema for extracting key employee payroll fields.", "type": "object", "properties": { "employee_name": { "title": "Employee Name", "description": "The full name of the employee as it appears on the payroll document.", "type": "string" }, "employee_ssn": { "title": "Employee Social Security Number", "description": "The Social Security Number of the employee, formatted as XXX-XX-XXXX.", "type": "string" } } }', doc_id => (SELECT DOC_ID FROM db.parse_output WHERE FILENAME = 'statement-jane-harper.pdf') ); -- Step 3: View extracted fields SELECT EXTRACTION:employee_name::STRING AS employee_name, EXTRACTION:employee_ssn::STRING AS employee_ssn FROM db.extract_output; ``` ### Parse and Extract Data from All Files in a Stage Before parsing staged files, you must grant the application access to your stage. For more information, go to [Grant Access to Stages](./ade-sf-grant-access-to-stages). Run the script below to iterate through all files in a Snowflake stage, parse each file, and extract data from the parsed output. This example uses Snowflake's CURSOR functionality to filter and process multiple files from a stage. Replace these placeholders with your information: `@your_db.your_schema.your_stage`, filtering criteria, and the JSON schema. ```sql theme={null} -- Example: Parse and extract all PDFs on a stage using some filtering criteria DECLARE file_cursor CURSOR FOR SELECT RELATIVE_PATH FROM DIRECTORY(@your_db.your_schema.your_stage) WHERE SIZE < 1000000 -- Files smaller than 1MB (1,000,000 bytes) AND LOWER(RELATIVE_PATH) LIKE '%your_file_pattern%.pdf' -- Only process PDF files matching your pattern LIMIT 10; current_file_path STRING; full_stage_path STRING; parse_ret OBJECT; extract_ret OBJECT; BEGIN FOR file_record IN file_cursor DO current_file_path := file_record.RELATIVE_PATH; full_stage_path := '@"your_db"."your_schema"."your_stage"/' || :current_file_path; -- Parse document using direct stage path CALL api.parse(file_path => :full_stage_path) INTO :parse_ret; -- Extract using the parse return object CALL api.extract( parse_result => :parse_ret, schema => '{"type": "object", "properties": {...}}' -- Replace with your JSON schema ) INTO :extract_ret; END FOR; END; -- View parsed results SELECT * FROM db.parse_output; -- View extracted results SELECT * FROM db.extract_output; ``` # Grant Agentic Document Extraction Access to Stages Source: https://docs.landing.ai/ade/ade-sf-grant-access-to-stages To [parse](./ade-sf-parse-cloud) or [extract data](./ade-sf-extract-cloud) from files in Snowflake stages, grant access to those stages. To grant access to a stage: 1. Make a note of the **database**, **schema**, and **stage** that you want to grant access to. 2. Open [Snowsight](https://app.snowflake.com). 3. Create a new worksheet and run the following SQL commands. Replace these placeholders with your information: `YOUR_DB`, `YOUR_SCHEMA`, `YOUR_STAGE`, and `APP_NAME`. (To locate your app name, go to [App Name](./ade-sf-app-name)). ```sql theme={null} GRANT USAGE ON DATABASE YOUR_DB TO APPLICATION "APP_NAME"; GRANT USAGE ON SCHEMA YOUR_DB.YOUR_SCHEMA TO APPLICATION "APP_NAME"; GRANT READ, WRITE ON STAGE YOUR_DB.YOUR_SCHEMA.YOUR_STAGE TO APPLICATION "APP_NAME"; ``` # Install and Update Source: https://docs.landing.ai/ade/ade-sf-install ## Installation Requirements * Before installing any apps from the Snowflake Marketplace, a user with the ORGADMIN role must accept the Consumer Terms of Service for the Snowflake Marketplace. For more details, go to the [Snowflake Documentation](https://other-docs.snowflake.com/en/collaboration/consumer-becoming#accept-the-snowflake-provider-and-consumer-terms-of-service). * To install an app, you must use the ACCOUNTADMIN role or another role with the IMPORT SHARE and CREATE privileges. For more details, go to the [Snowflake Documentation](https://other-docs.snowflake.com/en/native-apps/consumer-installing). * is not available on Snowflake trial accounts. ## Request the App Access to the app is available by request. To request the app, follow the instructions below: 1. Open the [LandingAI](https://app.snowflake.com/marketplace/providers/GZTYZ12K65BX/LandingAI) provider page in the Snowflake Marketplace. 2. Locate and click the **Agentic Document Extraction** listing. 3. Click **Request App**. Request app 4. Fill out and submit the request form. 5. The LandingAI team will review the request and contact you with more information. ## Install the App After you have [requested the app](#request-the-app) and have been granted access to it, follow the instructions below to install it in Snowflake: 1. Go to **Snowsight** > **Catalog** > **Apps** > **Recently Shared with You**. 2. Locate and click the **Agentic Document Extraction - App** listing. 3. Click **Get**. Get app 4. If you want to change the name of the application, click **Options** and enter a name in **Application Name**. 5. Click **Get**. Get app 6. A pop-up indicates that the installation process has begun. Installation in progress 7. After this process is complete, click **Configure** on the pop-up. (You will also get an email notifying you that you can access the app.) If you've navigated away from the page, go to **Snowsight** > **Catalog** > **Apps**. Click the app listing. Although the app appears in Installed Apps, configuration is not complete. Configure 8. The app opens. 9. In the pop-up that displays, grant egress access by clicking **Connect**. **The app requires this access to function.** Egress access allows the app to connect to the service over the internet. Egress access pop-up The pop-up displays the following message: > AGENTIC\_DOCUMENT\_EXTRACTION\_\_APP would like to connect to the following external endpoints > > This app requires Internet access to connect to the LandingAI Agentic Document Extraction service. The app will not function without this access. After clicking **Connect**, the following objects are created in your Snowflake environment. You can change the default values if needed. (`APP_NAME` is replaced with the name of your [app](./ade-sf-app-name).) * **Network rule**: APP\_NAME\_EGRESS\_NETWORK\_RULE * **Location**: APP\_NAME\_APP\_DATA.CONFIGURATION * **External access integration**: APP\_NAME\_EGRESS\_EXTERNAL\_ACCESS For more information about external network access, see the [Snowflake Documentation](https://docs.snowflake.com/en/developer-guide/external-network-access/external-network-access-overview). 10. After enabling egress access, [enter your API key](#enter-api-key). ## Enter API Key To parse documents or run extraction, you must enter your API key in the app in Snowsight. Only one API key can be entered in the app at a time. In case multiple users access the app, the API key is masked in the interface. **In :** 1. If you haven't already, [create an account](https://va.landing.ai/). 2. [Get your API key](https://va.landing.ai/my/settings/api-key). **In Snowsight:** 1. Go to **Catalog** > **Apps** > **Agentic Document Extraction - App**. 2. Click **Settings** (if the app does not automatically open to this tab). 3. Enter your API key. 4. Click **Add API Key**. Enter API key 5. The app saves the API key, generates procedures for using , and unlocks usage instructions on the Home page. This might take a few minutes. ## Next Steps After you install the app and enter your API key in the app, you are ready to [parse documents](./ade-sf-parse-cloud) and [extract data](./ade-sf-extract-cloud). ## App Updates The app is automatically updated when a new version is released. Users cannot manually update the app. ## Troubleshooting Tips ### Error: Connect to allow the app to access the LandingAI Agentic Document Extraction service During the installation process, you might get this error: > Connect to allow the app to access the LandingAI Agentic Document Extraction service > > This app requires Internet access to connect to the LandingAI Agentic Document Extraction service. The app will not function without this access. > > There was an error initializing the configuration flow. Close the dialog and try again later. If you get this error message, try selecting a different warehouse from the drop-down menu in the top right corner. # Overview Source: https://docs.landing.ai/ade/ade-sf-overview (ADE) parses and extracts structured data from unstructured documents like PDFs and images. It identifies elements such as text, tables, and form fields, and returns them as Markdown and hierarchical JSON with exact page and coordinate references. is available as an application in the Snowflake Marketplace. Snowflake customers can use the app to parse and extract data from: * documents staged in Snowflake. * documents available at publicly accessible URLs. ## How ADE on Snowflake Works When you run the [parsing](./ade-sf-parse-cloud) or [extraction](./ade-sf-extract-cloud) procedure, the relevant files are sent to the service, which is hosted by . The results display directly in Snowsight. ## Billing When you use the app, documents are processed using your account that is hosted by LandingAI. Billing for processing documents (like running parsing and field extraction) is managed by LandingAI. ## Zero Data Retention The ADE on Snowflake application cannot be used when Zero Data Retention (ZDR) is enabled in your organization. To use this application, turn off ZDR or use an organization that does not have ZDR enabled. ## ADE on Snowflake Checklist To install and start using the app on Snowflake, complete the following steps: **In :** 1. If you haven't already, [create an account](https://va.landing.ai/). 2. [Get your API key](https://va.landing.ai/my/settings/api-key). **In Snowsight:** 1. [Install the app](./ade-sf-install). Ensure that you grant the app access to external endpoints during the setup process. 2. [Enter your API key in the app](./ade-sf-install#enter-api-key). 3. If you want to parse documents staged in Snowflake, [grant the app access to the stages that have the files you want to parse](./ade-sf-grant-access-to-stages). 4. [Parse your files](./ade-sf-parse-cloud). 5. (Optional) [Run field extraction on the parsed output](./ade-sf-extract-cloud). # Parse Documents Source: https://docs.landing.ai/ade/ade-sf-parse-cloud ## Prerequisites **In :** 1. If you haven't already, [create an account](https://va.landing.ai/). 2. [Get your API key](https://va.landing.ai/my/settings/api-key). **In Snowsight:** 1. [Install the app](./ade-sf-install). 2. [Enter your API key in the app](./ade-sf-install#enter-api-key). 3. If you want to parse documents staged in Snowflake, [grant the app access to the stages that have the files you want to parse](./ade-sf-grant-access-to-stages). ## Set Up the Session Before running a parse or extract procedure, run the command below to set your session to use the Agentic Document Extraction application and procedures. Replace this placeholder with the [name of your instance](./ade-sf-app-name) of Agentic Document Extraction: `APP_NAME`. ```sql theme={null} USE "APP_NAME"; ``` ## Parse To parse documents, use the `api.parse` procedure. The `api.parse` procedure sends a document from a Snowflake stage or publicly accessible URL to the -hosted service, and saves the parsed content to an output table (defaults to `db.parse_output`). The `api.parse` procedure runs the [ADE Parse Jobs API](./ade-parse-async). ## Optional Parameters The `api.parse` procedure supports these optional parameters: * **`model`**: Specify the model version to use for parsing. For full details on parsing models, go to [Document Pre-Trained Transformers (Parsing Models)](./ade-parse-models). * **`split`**: Specify how to [split](./ade-separate-apis#set-up-splits-for-parsing) the document. * **`output_table`**: Specify a [custom output table name](#specify-a-custom-output-table) instead of the default `parse_output`. ## Specify a Custom Output Table You can specify a custom table name for storing parse results instead of using the default `parse_output`: ```sql theme={null} CALL api.parse( file_path => 'https://va.landing.ai/pdfs/invoice_1.pdf', output_table => 'my_custom_results' ); SELECT * FROM db.my_custom_results; ``` The procedure automatically creates the table (if it doesn't exist) with the following schema: * DOC\_ID * SOURCE\_URL * FILENAME * PAGE\_COUNT * MODEL\_VERSION * PARSED\_AT * STATUS\_CODE * MARKDOWN * CHUNKS * SPLITS * GROUNDING * METADATA * ERROR ### Example with Optional Parameters ```sql theme={null} CALL api.parse( file_path => 'https://va.landing.ai/pdfs/invoice_1.pdf', model => 'dpt-2-mini-latest', output_table => 'invoice_results' ); SELECT * FROM db.invoice_results; ``` ## Parse Return Object The `api.parse` procedure returns an OBJECT with the following fields: * **`message`**: Success or error message * **`output_table`**: Name of the table where results were saved (such as "db.parse\_output") * **`doc_id`**: Unique document identifier for the parsed document * **`status_code`**: HTTP status code for the request This return object is useful when chaining parse and extract operations together. You can capture the result into a variable and pass it to `api.extract`. Example of capturing the return object: ```sql theme={null} DECLARE result OBJECT; BEGIN CALL api.parse( 'https://va.landing.ai/pdfs/invoice_1.pdf' ) INTO :result; RETURN result; END; ``` ## Use build\_scoped\_file\_url for Staged Files You can use the `build_scoped_file_url()` function to reference files in Snowflake stages. ```sql theme={null} CALL api.parse( file_path => build_scoped_file_url('@your_db.your_schema.your_stage', '/sample_image.png') ); SELECT * FROM db.parse_output; ``` ## Sample Scenarios This section provides examples of how to run the `api.parse` procedure in different scenarios. * [Parse a File at a Publicly Accessible URL](#parse-a-file-at-a-publicly-accessible-url) * [Parse a Staged File](#parse-a-staged-file) * [Parse Multiple Staged Files](#parse-multiple-staged-files) ### Parse a File at a Publicly Accessible URL Run the command below to parse a single file at a publicly accessible URL. We've provided a sample file to help you get started. Replace this placeholder with your information: `APP_NAME`. ```sql theme={null} USE "APP_NAME"; CALL api.parse( 'https://va.landing.ai/pdfs/invoice_1.pdf' ); SELECT * FROM db.parse_output; ``` ### Parse a Staged File Before parsing staged files, you must grant the application access to your stage. For more information, go to [Grant Access to Stages](./ade-sf-grant-access-to-stages). Run the command below to parse a single file in a Snowflake stage. Replace these placeholders with your information: `APP_NAME`, `your_db`, `your_schema`, `your_stage`, and `path/to/file.pdf`. ```sql theme={null} USE "APP_NAME"; CALL api.parse( '@your_db.your_schema.your_stage/path/to/file.pdf' ); SELECT * FROM db.parse_output; ``` #### Sample Script: Parse a Staged File Let's say you have the following setup: * **APP\_NAME**: AGENTIC\_DOCUMENT\_EXTRACTION\_\_APP * **Database**: DEMO\_DB * **Schema**: DEMO\_SCHEMA * **Stage**: DEMO\_STAGE * **PDF**: statement-jane-harper.pdf First, grant the application access to the stage: ```sql theme={null} GRANT USAGE ON DATABASE DEMO_DB TO APPLICATION "AGENTIC_DOCUMENT_EXTRACTION__APP"; GRANT USAGE ON SCHEMA DEMO_DB.DEMO_SCHEMA TO APPLICATION "AGENTIC_DOCUMENT_EXTRACTION__APP"; GRANT READ, WRITE ON STAGE DEMO_DB.DEMO_SCHEMA.DEMO_STAGE TO APPLICATION "AGENTIC_DOCUMENT_EXTRACTION__APP"; ``` Then, parse the PDF: ```sql theme={null} USE "AGENTIC_DOCUMENT_EXTRACTION__APP"; CALL api.parse( '@DEMO_DB.DEMO_SCHEMA.DEMO_STAGE/statement-jane-harper.pdf' ); SELECT * FROM db.parse_output; ``` ### Parse Multiple Staged Files Before parsing staged files, you must grant the application access to your stage. For more information, go to [Grant Access to Stages](./ade-sf-grant-access-to-stages). One way to process multiple documents is to call the `api.parse` procedure for each file. The procedure saves the results to the `db.parse_output` table, where you can query all parsed documents. Here's an example of processing multiple files and then viewing the results. Replace these placeholders with your information: `APP_NAME`, `your_db`, `your_schema`, and `your_stage`. ```sql theme={null} USE "APP_NAME"; -- Parse multiple files CALL api.parse('@your_db.your_schema.your_stage/file1.pdf'); CALL api.parse('@your_db.your_schema.your_stage/file2.pdf'); CALL api.parse('@your_db.your_schema.your_stage/file3.pdf'); -- View all parsed results SELECT * FROM db.parse_output; ``` #### Sample Script: Parse Multiple Staged Files Let's say you have the following setup: * **APP\_NAME**: AGENTIC\_DOCUMENT\_EXTRACTION\_\_APP * **Database**: DEMO\_DB * **Schema**: DEMO\_SCHEMA * **Stage**: DEMO\_STAGE (contains PDFs and images) The DEMO\_STAGE contains the following files: * statement-george-mathew\.png * statement-jane-harper.pdf * statement-john-doe.png * statement-john-smith.png First, grant the application access to the stage: ```sql theme={null} GRANT USAGE ON DATABASE DEMO_DB TO APPLICATION "AGENTIC_DOCUMENT_EXTRACTION__APP"; GRANT USAGE ON SCHEMA DEMO_DB.DEMO_SCHEMA TO APPLICATION "AGENTIC_DOCUMENT_EXTRACTION__APP"; GRANT READ, WRITE ON STAGE DEMO_DB.DEMO_SCHEMA.DEMO_STAGE TO APPLICATION "AGENTIC_DOCUMENT_EXTRACTION__APP"; ``` Then, parse the documents: ```sql theme={null} USE "AGENTIC_DOCUMENT_EXTRACTION__APP"; -- Parse each document CALL api.parse('@DEMO_DB.DEMO_SCHEMA.DEMO_STAGE/statement-george-mathew.png'); CALL api.parse('@DEMO_DB.DEMO_SCHEMA.DEMO_STAGE/statement-jane-harper.pdf'); CALL api.parse('@DEMO_DB.DEMO_SCHEMA.DEMO_STAGE/statement-john-doe.png'); CALL api.parse('@DEMO_DB.DEMO_SCHEMA.DEMO_STAGE/statement-john-smith.png'); -- View all parsed results SELECT * FROM db.parse_output; ``` ## Remove Parse Output Tables Each time you run the `api.parse` procedure, the results are saved to an output table (defaults to `db.parse_output`). Over time, you may want to remove these tables to start fresh or clean up old results. This is especially useful if you have specified [custom output table names](#specify-a-custom-output-table) and accumulated multiple tables. To remove parse output tables: 1. Go to **Catalog** > **Apps** > **Agentic Document Extraction - App**. 2. Click **Settings**. 3. Navigate to **SQL Execution**. 4. Enter the following SQL command in the SQL field: ```sql theme={null} DROP TABLE IF EXISTS APP_NAME.DB.TABLE_NAME; ``` 5. Replace these placeholders with your information: * `APP_NAME`: The [name of your app instance](./ade-sf-app-name) * `DB`: The app's database name (use `db`) * `TABLE_NAME`: The name of the table you want to remove (such as `parse_output` or your custom table name) 6. Click **Run Query**. # Split Source: https://docs.landing.ai/ade/ade-split Use the API to split a parsed document into multiple classified sub-documents. Organizations typically use Split when they receive batched documents that contain multiple document types or multiple instances of the same document type. The API classifies each sub-document and returns the full Markdown content for downstream processing. Splitting occurs after [parsing](https://docs.landing.ai/api-reference/tools/ade-parse) but before [extraction](https://docs.landing.ai/api-reference/tools/ade-extract). You can use Split without performing extraction. is in Preview. This feature is still in development and may not return accurate results. Do not use this feature in production environments. ## Example Use Cases * **Financial Services**: Financial institutions processing Know Your Customer (KYC) documentation often receive PDFs containing multiple document types for each customer, such as bank statements, utility bills, and identification documents. After parsing the batch, the API separates and classifies each document type. * **Healthcare**: Healthcare systems ingesting patient records may receive batched PDFs containing intake forms, pathology reports, and medication lists. The API separates these documents by type for routing to appropriate systems. * **Accounting**: Accounting departments processing expense documentation receive PDFs with multiple invoices and receipts. The API separates each document and can use identifiers like invoice numbers or dates to create individual splits for each transaction. * **Academic Research**: Research institutions and libraries processing academic articles receive PDFs containing article bodies, references, and supplemental materials. The API separates these sections for indexing, citation extraction, or archival purposes. * **Product Documentation**: Organizations managing product catalogs receive PDFs containing specifications for multiple products. The API separates each product's specifications, enabling automated data entry into product databases or comparison tools. * **Technical Documentation**: Companies distributing multilingual instruction manuals receive PDFs with the same content repeated in different languages. The API separates the manual by language, allowing each version to be routed to the appropriate regional system or translation workflow. For information about pricing and credits, go to [Pricing & Billing](./ade-pricing). ## Process Overview Follow these steps to split a document into classified sub-documents: 1. **Parse your document** using the [ API](https://docs.landing.ai/api-reference/tools/ade-parse). The API requires Markdown content from the parse API as input. Save the Markdown output for the next step. 2. **Define your Split Rules** by creating a set of Split Types that describe the different document types or sections in your file. Learn more about [Split Rules](#split-rules). The easiest way to create and test Split Rules is in the [Playground](#split-in-the-playground). 3. **Run the API** by passing the parsed Markdown content and your Split Rules. Choose your method: * [Split in the Playground](#split-in-the-playground): Test and refine your Split Rules interactively * [Split with the API](#split-with-the-api): Integrate splitting into your application * [Split with the Python & TypeScript Libraries](#use-split-with-our-libraries): Use our libraries 4. **Use the split results** in your downstream workflows. The API returns each classified sub-document with its full Markdown content. Learn more about the [response structure](./ade-split-response). ## Split Rules Split Rules define how the API classifies and separates a document into sub-documents. The Split Rules are a collection of all Split Types you define for a single API call. Each Split Rule consists of a: * [Split Type](#split-types) * [Description](#descriptions-optional) (optional) * [Identifier](#identifiers-optional) (optional) Split Rules are defined differently depending on whether you use the Playground, API, or one of our libraries. See the interface-specific sections below for more information on how to create and pass the Split Rules for your method. ### Split Types Split Types define how your document is classified into sections, such as pay stubs, bank statements, and W-2s. You can define up to 19 Split Types in one API call. If the API cannot determine which Split Type a page belongs to, it classifies the page as **Uncategorized**. ### Descriptions (Optional) The Description provides additional context about what a Split Type represents. Detailed descriptions help the API identify what information to include in each split and improve classification accuracy. Descriptions can also impact how the API interprets Identifiers. For example, these two descriptions for clinical notes produce different identifier behavior: * **Less specific description**: "A clinical note documenting a patient's office visit, including history, exam, assessment, and plan, authored by a provider." The API might consider multiple dates as potential identifiers. * **More specific description**: "A clinical note documenting a patient's office visit, including history, exam, assessment, and plan, authored by a provider. Each note is separated by office visit date. Do not look at any other dates. Only include date before the words 'office visit'." The API only considers dates directly before "office visit" as identifiers. ### Identifiers (Optional) When your document contains multiple instances of the same Split Type, use an Identifier to specify what makes each instance unique, such as invoice number, order ID, or date. The API creates a separate split for each unique value of the Identifier. For example, if your document contains 6 pay stubs and you specify "Pay Stub Date" as the identifier, the API creates 6 separate splits—one for each unique date value. ### Example A document contains 1 bank statement and 6 pay stubs with different dates. You define two Split Types: * **Bank Statement** (no identifier needed) * **Pay Stub** with "Pay Stub Date" as the identifier The API returns 7 splits: 1 bank statement and 6 pay stubs, each separated by payment date. ## Split in the Playground To make it as easy as possible to split documents, we've created a wizard in our [Playground](https://va.landing.ai/my/playground/ade) that guides you through the process. The Playground is designed as a proof-of-concept to help you understand what the API can do and how it might fit into your workflows. After you've split a document in the Playground copy the code to use it in API calls or our Python and TypeScript libraries so that you can scale. 1. Go to the [Playground](https://va.landing.ai/my/playground/ade). 2. Click **Split**. 3. Select the file you want to split. 4. loads and parses the file in the background. 5. You can now create the [Split Rules](#split-rules), which determine how the document is split into sub-documents. There are a few ways to do this: * **View Suggested Split Rules**: The app automatically recommends rules based on the parsed content. We recommend trying this approach first, and then editing the rules if needed. * **Write a Split Rules Prompt**: Write a prompt that tells the app what specific Split Types and Identifiers should be used. The app then generates rules based on this prompt. * **Start from Scratch**: Manually define the Split Rules. 6. After creating your first round of Split Rules, edit them if needed. 7. Click **Split Document** to see the results, which open in a new panel. You can toggle between a visual representation of the results and the actual API JSON response. 8. You can continue to edit the Split Rules if needed. 9. When the document splits as expected, copy the code so that you can scale the API call. ## Split with the API Split a document by calling the endpoint. This example splits a document containing bank statements and pay stubs: ```shell theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/split' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'markdown=@markdown.md' \ -F 'split_class=[{"name": "Bank Statement", "description": "Document from a bank that summarizes all account activity over a period of time."}, {"name": "Pay Stub", "description": "Document that details an employee'\''s earnings, deductions, and net pay for a specific pay period.", "identifier": "Pay Stub Date"}]' \ -F 'model=split-latest' ``` ### Parameters Get the full parameters from the [API reference](https://docs.landing.ai/api-reference/tools/ade-split). * **`markdown`** (required): The Markdown output from the [ API](https://docs.landing.ai/api-reference/tools/ade-parse). You can pass the Markdown content directly or reference a file. * **`split_class`** (required): A JSON array defining the [Split Rules](#split-rules). Each Split Type is a JSON object with: * [`name`](#split-types): The Split Type name (required) * [`description`](#descriptions-optional): Additional context about the Split Type (optional) * [`identifier`](#identifiers-optional): The field that makes each instance unique (optional) * **`model`** (optional): The model version to use for splitting. If omitted, the API uses the latest model. For more information, see [Split Model Versions](./ade-split-models). ## Use Split with Our Libraries Click one of the tiles below to learn how to use the API with our libraries. Use Split with our Python library. Use Split with our TypeScript library. ## Additional Considerations on Splitting * Each page in a document can only be assigned to one Split Type. If one page has content that could belong to more than one Split Type, the API chooses the Split Type that the page matches more closely. * The API is different from the `split` parameter in the API. The API separates a document into sub-documents after parsing, while the `split` parameter can be used during parsing to organize the parsed output by page. # Split Model Versions Source: https://docs.landing.ai/ade/ade-split-models A split model powers the classification and splitting capabilities of the API. The model analyzes parsed Markdown content and classifies pages into the Split Types you define, then separates the document into sub-documents. You can specify a model when calling the API directly or when using the [client libraries](#set-the-model-with-the-client-libraries). If you don't specify a model, the API uses the latest split model (currently `split-20251105`). ## Model Versions The following table lists the available `model` values for the API: | Model Values | Description | | ---------------- | ---------------------------------------------------------- | | `split-20251105` | Use the split model snapshot released on November 5, 2025. | | `split-latest` | Use the latest split model snapshot. | ### Why Model Versioning Matters When integrating the API, you have two options for specifying the model: 1. **Use `split-latest`** to always get the newest version. This automatically gives you improvements and updates, but results may change when new model versions are released. 2. **Use a specific version** (like `split-20251105`) to pin to an exact model version. This ensures consistent results over time, but you won't receive improvements. ## Set the Model in the API When calling the endpoint, you can set the model using the `model` parameter. If you omit the `model` parameter, the API uses the latest model. This example shows how to specify a model: ```shell theme={null} curl -X POST 'https://api.va.landing.ai/v1/ade/split' \ -H 'Authorization: Bearer YOUR_API_KEY' \ -F 'markdown=@markdown.md' \ -F 'split_class=[{"name": "section_type", "description": "Type of document section (e.g., header, paragraph, table)"}]' \ -F 'model=split-latest' ``` ## Set the Model with the Client Libraries When using the Python or TypeScript library, you can set the model using the `model` parameter in the `split()` method. If you omit the `model` parameter, the library will use the latest split model. ```python {13} Python theme={null} import json from pathlib import Path from landingai_ade import LandingAIADE client = LandingAIADE() response = client.split( markdown=Path("/path/to/parsed_output.md"), split_class=json.dumps([ {"name": "Invoice", "description": "Payment request document"}, {"name": "Receipt", "description": "Payment confirmation document"} ]), model="split-latest" ) ``` ```typescript {12} TypeScript theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; const client = new LandingAIADE(); const response = await client.split({ markdown: fs.createReadStream("/path/to/parsed_output.md"), split_class: [ { name: "Invoice", description: "Payment request document" }, { name: "Receipt", description: "Payment confirmation document" } ], model: "split-latest" }); ``` ## The Playground Uses the Latest Model Version The Playground uses the latest model version. # JSON Response for Splitting Source: https://docs.landing.ai/ade/ade-split-response When you split a document with the [ API](https://docs.landing.ai/api-reference/tools/ade-split), the classified splits and metadata are returned in a structured JSON format. ## Response Structure The response contains the following top-level fields: * [`splits`](#splits-array-splits): Array of split objects containing the classified sub-documents. * [`metadata`](#processing-metadata-metadata): Processing information including credit usage, duration, filename, job ID, page count, and model version. ## Splits Array (`splits`) The `splits` field contains an array of classified sub-documents. Each split object includes: * **`classification`**: The Split Type name assigned to this sub-document (e.g., "Bank Statement", "Pay Stub"). * **`identifier`**: The unique identifier value for this split (e.g., "2024-01-15", "Invoice #12345"). This field is `null` if no identifier was specified for this Split Type. * **`pages`**: Array of zero-indexed page numbers that belong to this split. * **`markdowns`**: Array of [Markdown](./ade-markdown-response) content strings, one for each page in this split. The order matches the `pages` array. ### Classification and Identifiers The `classification` field corresponds to the Split Type names you defined in your [Split Rules](./ade-split#split-rules). If the API cannot classify a page, it assigns the classification "Uncategorized". When you specify an identifier in your Split Rules (such as "Date" or "Invoice Number"), the API creates separate splits for each unique identifier value it finds. The `identifier` field contains the extracted value (e.g., "2024-01-15" or "INV-001"). ### Pages and Markdown Content The `pages` array lists which pages belong to each split. Pages are zero-indexed, so the first page is `0`. The `markdowns` array contains the Markdown content for each page. Each element corresponds to the page at the same index in the `pages` array. For example, if `pages` is `[0, 1, 2]`, then `markdowns[0]` contains the Markdown for page 0, `markdowns[1]` contains the Markdown for page 1, and `markdowns[2]` contains the Markdown for page 2. ## Processing Metadata (`metadata`) The `metadata` field provides information about the split process: * **`filename`**: The name of the input Markdown file. * **`org_id`**: Organization identifier. * **`page_count`**: Total number of pages in the document. * **`duration_ms`**: Processing time in milliseconds. * **`credit_usage`**: Number of credits consumed. * **`job_id`**: Unique job identifier. * **`version`**: Model version used for splitting. For more information, go to [Split Model Versions](./ade-split-models). ## Example Response Here is a complete example showing a split response for a document containing bank statements and pay stubs: ```json theme={null} { "splits": [ { "classification": "Bank Statement", "identifier": null, "pages": [0], "markdowns": [ "\n\n## Bank Statement\n\nAccount Number: 1234567890\n\nStatement Period: January 1 - January 31, 2025\n\n| Date | Description | Amount |\n|------|-------------|--------|\n| 01/05 | Deposit | $2,500.00 |\n| 01/12 | Withdrawal | -$500.00 |\n\nEnding Balance: $2,000.00" ] }, { "classification": "Pay Stub", "identifier": "2025-01-15", "pages": [1], "markdowns": [ "\n\n## Pay Stub\n\nEmployee: John Smith\n\nPay Date: January 15, 2025\n\nGross Pay: $6,000.00\n\nNet Pay: $4,500.00" ] }, { "classification": "Pay Stub", "identifier": "2025-01-30", "pages": [2], "markdowns": [ "\n\n## Pay Stub\n\nEmployee: John Smith\n\nPay Date: January 30, 2025\n\nGross Pay: $6,000.00\n\nNet Pay: $4,500.00" ] } ], "metadata": { "filename": "mixed-documents.md", "org_id": "org_abc123", "page_count": 3, "duration_ms": 2145, "credit_usage": 3.0, "job_id": "split_xyz789", "version": "split-20251105" } } ``` In this example: * The document was split into 3 sub-documents: 1 bank statement and 2 pay stubs. * The bank statement has no identifier (set to `null`). * Each pay stub is identified by its pay date ("2025-01-15" and "2025-01-30"), creating separate splits even though they have the same classification. * Each split contains the page numbers and Markdown content for that sub-document. # Troubleshoot Splitting Source: https://docs.landing.ai/ade/ade-split-troubleshoot Use this section to troubleshoot issues encountered when calling the API ([https://api.va.landing.ai/v1/ade/split](https://api.va.landing.ai/v1/ade/split)). ## Common Status Codes | Status Code | Name | Description | What to Do | | ----------- | --------------------- | ----------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | | 200 | Success | Split classification completed successfully. | Continue with normal operations. | | 400 | Bad Request | Invalid request due to markdown download failure from URL. | Review error message for specific issue. See [Status 400](#status-400-bad-request). | | 401 | Unauthorized | Missing or invalid API key. | Check that your `apikey` header is present and contains a valid [API key](./agentic-api-key). | | 402 | Payment Required | Your account does not have enough credits to complete processing. | If you have multiple accounts, make sure you're using the correct [API key](./agentic-api-key). Add more credits to your account. | | 422 | Unprocessable Entity | Input validation failed. | Review your request parameters. See [Status 422](#status-422-unprocessable-entity). | | 429 | Too Many Requests | Rate limit exceeded. | Wait before retrying. Reduce request frequency and implement exponential backoff. | | 500 | Internal Server Error | Server error during split classification. | Retry. If the issue persists, contact [support@landing.ai](mailto:support@landing.ai). See [Status 500](#status-500-internal-server-error). | | 504 | Gateway Timeout | Request processing exceeded the timeout limit. | Reduce markdown content size. See [Status 504](#status-504-gateway-timeout). | ## Status 400: Bad Request This status code indicates invalid request parameters or client-side errors. Review the specific error message to identify the issue. ### Error: Failed to download document from URL This error occurs when the API cannot download the Markdown file from the provided `markdown_url`. **Error message:** ``` Failed to download document from URL: {error_details} ``` **What to do:** * Verify the URL is accessible and returns valid content. * Check network connectivity and URL permissions. * Ensure the URL points to a Markdown file (.md extension). ## Status 422: Unprocessable Entity This status code indicates input validation failures. Review the error message and adjust your request parameters. ### Error: No markdown file, content, or URL provided This error occurs when your request does not include the `markdown` parameter or `markdown_url` parameter, or when the values are empty. **Error message:** ``` No markdown file, content, or URL provided. ``` **What to do:** Add one of these parameters to your request: * Use the `markdown` parameter to upload a Markdown file or provide inline Markdown content, OR * Use the `markdown_url` parameter to provide a URL to a Markdown file. ### Error: Multiple markdown files detected This error occurs when multiple Markdown files are included in the request. **Error message:** ``` Multiple markdown files detected (X). Please provide only one markdown file. ``` **What to do:** Send only one Markdown file per request. ### Error: Unsupported format This error occurs when you provide a file other than Markdown (.md) to the split endpoint, such as PDF, DOCX, XLSX, or image files. **Error message:** ``` Unsupported format: {mime_type} ({filename}). Supported formats: MD ``` **What to do:** * The split endpoint only accepts Markdown files with a .md extension. * If you have a PDF, DOCX, or other document format, use the [ API](https://docs.landing.ai/api-reference/tools/ade-parse) endpoint to convert your document to Markdown first, then pass the parsed Markdown output to the split endpoint. * Ensure your file has a .md extension and contains valid UTF-8 encoded Markdown content. ### Error: split\_class must contain at most 19 split classification names This error occurs when you provide more than 19 Split Types in the `split_class` parameter. **Error message:** ``` split_class must contain at most 19 split classification names ``` **What to do:** * Reduce the number of Split Types to 19 or fewer. * Consider combining similar Split Types or removing less important ones. * For more information, see [Split Types](./ade-split#split-types). ## Status 500: Internal Server Error This error indicates an unexpected server error occurred during split classification. **Error message:** ``` Internal server error during split classification ``` **What to do:** * Retry the request. * Verify your [Split Rules](./ade-split#split-rules) are properly formatted with valid Split Type names, descriptions, and identifiers. * Check that your Markdown content is valid and properly formatted. * If the error persists, contact [support@landing.ai](mailto:support@landing.ai). ## Status 504: Gateway Timeout This error occurs when the split classification process exceeds the timeout limit. **Error message:** ``` Request timed out after {seconds} seconds ``` **What to do:** * Reduce the size of your Markdown document. * If processing a very large document, consider splitting it into smaller sections before using the split endpoint. * If the error persists, contact [support@landing.ai](mailto:support@landing.ai). ## Best Practices for Avoiding Errors * **Use parsed Markdown**: Always pass Markdown content that has been generated by the [ API](./ade-separate-apis). The split endpoint expects structured Markdown with the specific format produced by the parse API. * **Define clear Split Rules**: Provide detailed descriptions for each Split Type to help the model accurately classify your document sections. For guidance, see [Split Rules](./ade-split#split-rules). * **Use appropriate identifiers**: Only specify identifiers when your document contains multiple instances of the same Split Type that need to be separated. For guidance, see [Identifiers](./ade-split#identifiers-optional). ## When Are Credits Consumed? Credits are consumed only when the API returns a 200 status code. All other responses, including errors, do not consume credits. # Single Sign-On (SSO) Source: https://docs.landing.ai/ade/ade-sso ## About SSO supports SSO via **SAML 2.0** and **OpenID Connect (OIDC)**, letting your organization manage access through your existing identity provider (IdP). Common IdPs include Okta, Microsoft Entra ID (formerly Azure AD), OneLogin, Ping Identity, and Google Workspace. Users can also sign in with Google or GitHub without any additional setup. This page covers SSO setup via SAML 2.0 and OIDC only. ## Availability SSO via SAML 2.0 and OIDC is available on the **Enterprise** plan. ## What to Know Before Enabling SSO ### SSO Replaces Existing Login Methods Once SSO is enabled for your organization, it becomes the only way users can log in to . Users will no longer be able to sign in with a password, Google, or GitHub. This is by design; it ensures that all user access is managed through your IdP, so your organization's security policies are consistently enforced. To avoid disruption, communicate this change to all users in your organization before enabling SSO. ### User Access and JIT Provisioning Your IT department controls who can access through your IdP by managing user groups. Having a company email address does not automatically grant access. Users must be included in the group configured for in your IdP. When setting up SSO, let the team know which option you prefer for adding users to your organization: * **By invitation**: Only users who have been invited to your organization can log in, even after SSO is enabled. For information about manually inviting users, see [Organizations & Members](./ade-members). * **Just-in-Time (JIT) provisioning**: Any user in your IdP's user group who successfully authenticates via SSO is automatically added to your organization the first time they log in, with no invitation required. ## Enable SSO SSO setup involves multiple steps and ongoing coordination between your team and the team. Here is an overview of the process: 1. [Request SSO](#request-sso) through your Organization Settings. 2. The team reaches out to begin the process. 3. Share the required information with the team: [SAML 2.0](#required-information-for-saml-20) or [OIDC](#required-information-for-oidc). 4. The team enters your information in the backend. 5. The team shares configuration details with you. [Add these to your IdP](#complete-setup-in-your-idp). 6. [Test that SSO is working correctly](#test-that-sso-is-working-correctly). ### Request SSO To request SSO: 1. Log in to [https://va.landing.ai/](https://va.landing.ai/). 2. Go to the [Organization Settings](https://va.landing.ai/settings/organization/general) page (to navigate there manually, click your profile icon at the bottom left corner of the page and click **Organization Settings**). 3. In the **Single Sign-On (SSO)** box, click **Contact Support**. This sends an automated message to the team. The team will contact you about next steps for setting up SSO. ### Required Information for SAML 2.0 Share the following information with the team. Most of it can be found in your IdP's SAML configuration page. The examples below are for Microsoft Entra ID. Formats and field names vary by IdP. | Item | Description | | --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | IdP (Identity Provider) | The service provider your organization uses to manage email and SSO.

**Example**: Okta, Microsoft Entra ID, etc. | | JIT provisioning preference | Whether you want to enable Just-in-Time (JIT) provisioning for your organization. See [User Access and JIT Provisioning](#user-access-and-jit-provisioning). | | Metadata URL | The URL that provides your IdP's SAML metadata, including the Entity ID, SSO login URL, and signing certificate. This URL allows to automatically configure the SAML connection without requiring each value separately.

In Microsoft Entra ID, this is called **App Federation Metadata URL**. In Okta, this is called **Identity Provider Metadata**.

**Example**: `https://login.microsoftonline.com/123/federationmetadata/2007-06/federationmetadata.xml?appid=456` | | Enterprise email domains | Each email domain that will need access to .

**Example**: acme.com, acme.ai | | Email claim | The Uniform Resource Identifier (URI) for the email claim type. This communicates the email address of the user.

**Example**: `http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress` | | Name claim | Optional. The Uniform Resource Identifier (URI) for the name claim type. This communicates the name of the user.

**Example**: `http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name` | ### Required Information for OIDC Share the following information with the team. Most of it can be found in your IdP's OIDC configuration page. The examples below are for Microsoft Entra ID. Formats and field names vary by IdP. | Item | Description | | --------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | IdP (Identity Provider) | The service provider your organization uses to manage email and SSO.

**Example**: Okta, Microsoft Entra ID, etc. | | JIT provisioning preference | Whether you want to enable Just-in-Time (JIT) provisioning for your organization. See [User Access and JIT Provisioning](#user-access-and-jit-provisioning). | | Client ID | A unique identifier for the application registered in your IdP. Your IdP generates this when you create the application. | | Client Secret | A secret key used to authenticate the application with your IdP. Share this with the team through a secure channel. | | Issuer URL | The base URL of your IdP's OIDC configuration, used to locate the OIDC metadata endpoint.

**Example**: `https://login.microsoftonline.com/{tenant-id}/v2.0` | | Scope | The permissions requested from your IdP. At minimum: `openid`, `profile`, `email`. | | Enterprise email domains | Each email domain that will need access to .

**Example**: acme.com, acme.ai | ### Complete Setup in Your IdP After the team enters your information in the backend, they will continue coordinating with you to complete the setup. The team will give you the following information to enter in your IdP configuration page: | Protocol | Item | Description | | -------- | ------------------------------ | ---------------------------------------------------------------------------------------------------------------------- | | SAML 2.0 | Assertion Consumer Service URL | In Microsoft Entra ID, this is called a "Reply URL".

**Example**: `https://login.landing.ai/api/authn/...` | | SAML 2.0 | Audience URI | Also called an "SP Entity ID".

**Example**: `https://login.landing.ai/api/enterprise-sso/...` | | OIDC | Redirect URI (Callback URL) | **Example**: `https://login.landing.ai/callback/1234` | ### Test That SSO Is Working Correctly After adding the information from the team, test that SSO is working correctly: 1. Go to [https://login.landing.ai/sign-in](https://login.landing.ai/sign-in). 2. If you are currently logged in, log out. 3. Click **Continue with Enterprise SSO** and follow the on-screen prompts to log in. If you're unable to log in, send an email to [support@landing.ai](mailto:support@landing.ai). ## View Your SSO Settings After SSO has successfully been configured, you can view your SSO settings in read-only mode in : 1. Log in to [https://va.landing.ai/](https://va.landing.ai/). 2. Go to the [Organization Settings](https://va.landing.ai/settings/organization/general) page (to navigate there manually, click your profile icon at the bottom left corner of the page and click **Organization Settings**). 3. In the **Single Sign-On (SSO)** box, click **View Details**. The SSO settings display. # Support Source: https://docs.landing.ai/ade/ade-support ## Support Resources In addition to the documentation in this site, use the following resources to learn more about . Share your scripts, ask questions, and get feedback from other programmers in our developer-focused Discord server. Explore end-to-end examples for common document processing workflows. Watch demos and video tutorials from our Machine Learning Experts. Plus, get the latest feature announcements from Dr. Andrew Ng! Join the LandingAI community to connect with other users and get the latest updates. ## Submit a Support Ticket If you experience an issue with , send an email to [support@landing.ai](mailto:support@landing.ai). To help us troubleshoot your issue, include this information in your email: * The email address for your account, found in your [profile page](https://va.landing.ai/settings/personal/profile) * The document you were processing when the issue occurred * The processing method you were using when the issue occurred: * **API call**: Include your script or command. * **Library**: Include your script and the library version. * **Playground**: Include the link to the results. To get this, click the **Share Results** icon in the top right corner of the page and enable sharing. Share Results ## Delete Account To delete your account, send an email with your request to [support@landing.ai](mailto:support@landing.ai). Use the email address you used to sign up for . # TypeScript Library Source: https://docs.landing.ai/ade/ade-typescript The library is a lightweight TypeScript library you can use for parsing documents, classifying pages, extracting data, generating tables of contents, and splitting documents into sub-documents. The library is automatically generated from our API specification, ensuring you have access to the latest endpoints and parameters. ## Install the Library ```bash theme={null} npm install landingai-ade ``` ## Set the API Key as an Environment Variable To use the library, first [generate an API key](https://va.landing.ai/my/settings/api-key). Save the key to a `.zshrc` file or another secure location on your computer. Then export the key as an environment variable. ```bash theme={null} export VISION_AGENT_API_KEY= ``` When initializing the client, the library automatically reads from the `VISION_AGENT_API_KEY` environment variable: ```typescript theme={null} import LandingAIADE from "landingai-ade"; const client = new LandingAIADE(); ``` Alternatively, you can explicitly pass the API key when initializing the client: ```typescript theme={null} import LandingAIADE from "landingai-ade"; const client = new LandingAIADE({ apikey: process.env.VISION_AGENT_API_KEY }); ``` For more information about API keys and alternate methods for setting the API key, go to [API Key](./agentic-api-key). ## Use with EU Endpoints By default, the library uses the US endpoints. If your API key is from the EU endpoint, set the `environment` parameter to `eu` when initializing the client. ```typescript theme={null} import LandingAIADE from "landingai-ade"; const client = new LandingAIADE({ environment: "eu", }); // ... rest of your code ``` For more information about using in the EU, go to [European Union (EU)](./ade-eu). ## Parse: Getting Started The `parse` method converts documents into structured Markdown with chunk and grounding metadata. Use these examples as guides to get started with parsing with the library. ### Parse Local Files Use the `document` parameter to parse files from your filesystem. Pass the file as a read stream using `fs.createReadStream()`. ```typescript theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; const client = new LandingAIADE(); // Replace with your file path const response = await client.parse({ document: fs.createReadStream("/path/to/file/document"), model: "dpt-2-latest", saveTo: "output_folder" // optional: saves as {input_file}_parse_output.json }); console.log(response.chunks); // Save Markdown to a file if (response.markdown) { fs.writeFileSync("output.md", response.markdown, "utf-8"); console.log("\nMarkdown content saved to a Markdown file."); } else { console.log("No 'markdown' field found in the response"); } ``` ### Parse Remote URLs Use the `document` parameter with `fetch()` to parse files from remote URLs (http, https). ```typescript theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; const client = new LandingAIADE(); // Parse a remote file const response = await client.parse({ document: await fetch("https://example.com/document.pdf"), model: "dpt-2-latest" }); console.log(response.chunks); // Save Markdown to a file if (response.markdown) { fs.writeFileSync("output.md", response.markdown, "utf-8"); console.log("\nMarkdown content saved to a Markdown file."); } else { console.log("No 'markdown' field found in the response"); } ``` ### Set Parameters The `parse` method accepts optional parameters to customize parsing behavior. To see all available parameters, go to [ADE Parse API](https://docs.landing.ai/api-reference/tools/ade-parse). Pass these parameters directly to the `parse()` method. ```typescript theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; const client = new LandingAIADE(); const response = await client.parse({ document: fs.createReadStream("/path/to/document.pdf"), model: "dpt-2-latest", split: "page" }); ``` ### Parse Jobs The `parseJobs` resource enables you to asynchronously parse documents that are up to 1,000 pages or 1 GB. For more information about parse jobs, go to [Parse Large Files (Parse Jobs)](./ade-parse-async). Here is the basic workflow for working with parse jobs: 1. Start a parse job. 2. Copy the `job_id` in the response. 3. Get the results from the parsing job with the `job_id`. This script contains the full workflow: ```typescript [expandable] theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; const client = new LandingAIADE(); // Step 1: Create a parse job const job = await client.parseJobs.create({ document: fs.createReadStream("/path/to/file/document"), model: "dpt-2-latest" }); const jobId = job.job_id; console.log(`Job ${jobId} created.`); // Step 2: Get the parsing results while (true) { const response = await client.parseJobs.get(jobId); if (response.status === "completed") { console.log(`Job ${jobId} completed.`); break; } console.log(`Job ${jobId}: ${response.status} (${(response.progress * 100).toFixed(0)}% complete)`); await new Promise(resolve => setTimeout(resolve, 5000)); } // Step 3: Access the parsed data const response = await client.parseJobs.get(jobId); console.log("Global Markdown:", response.data.markdown.substring(0, 200) + "..."); console.log(`Number of chunks: ${response.data.chunks.length}`); // Save Markdown output (useful if you plan to run extract on the Markdown) fs.writeFileSync("output.md", response.data.markdown, "utf-8"); ``` #### List Parse Jobs To list all async parse jobs associated with your API key, run this code: ```typescript theme={null} import LandingAIADE from "landingai-ade"; const client = new LandingAIADE(); // List all jobs const response = await client.parseJobs.list(); for (const job of response.jobs) { console.log(`Job ${job.job_id}: ${job.status}`); } ``` ### Work with Parse Response Data **Access all text chunks:** ```typescript theme={null} for (const chunk of response.chunks) { if (chunk.type === 'text') { console.log(`Chunk ${chunk.id}: ${chunk.markdown}`); } } ``` **Filter chunks by page:** ```typescript theme={null} const page0Chunks = response.chunks.filter(chunk => chunk.grounding.page === 0); console.log(page0Chunks); ``` **Get chunk locations:** ```typescript theme={null} for (const chunk of response.chunks) { const box = chunk.grounding.box; console.log(`Chunk at page ${chunk.grounding.page}: (${box.left}, ${box.top}, ${box.right}, ${box.bottom})`); } ``` **Identify the chunk type for each chunk:** ```typescript theme={null} for (const [chunkId, grounding] of Object.entries(response.grounding)) { console.log(`Chunk ${chunkId} has type: ${grounding.type}`); } ``` ## Extract: Getting Started The `extract` method extracts structured data from Markdown content using extraction schemas. Use these examples as guides to get started with extracting with the library. **Pass Markdown Content** The library supports a few methods for passing the Markdown content for extraction: * Extract data directly from the [parse response](#extract-from-parse-response) * Extract data from a local [Markdown file](#extract-from-markdown-files) * Extract data from a Markdown file at a remote URL: `markdown: await fetch("https://example.com/file.md")` **Pass the Extraction Schema** The library supports a few methods for passing the extraction schema: * [Zod schemas](#extraction-with-zod) * [JSON schema (inline)](#extraction-with-json-schema-inline) * [JSON schema file](#extraction-with-json-schema-file) ### Extract from Parse Response After parsing a document, you can pass the markdown string directly from the `ParseResponse` to the extract method without saving it to a file. ```typescript theme={null} import LandingAIADE, { toFile } from "landingai-ade"; import fs from "fs"; // Define your extraction schema const schemaDict = { type: "object", properties: { employee_name: { type: "string", description: "The employee's full name" } } }; const client = new LandingAIADE(); const schemaJson = JSON.stringify(schemaDict); // Parse the document const parseResponse = await client.parse({ document: fs.createReadStream("/path/to/document.pdf"), model: "dpt-2-latest" }); // Extract data using the markdown string from parse response const extractResponse = await client.extract({ schema: schemaJson, markdown: await toFile(Buffer.from(parseResponse.markdown), "document.md"), model: "extract-latest", saveTo: "output_folder" // optional: saves as {input_file}_extract_output.json }); // Access the extracted data console.log(extractResponse.extraction); ``` ### Extract from Markdown Files If you already have a Markdown file (from a previous parsing operation), you can extract data directly from it. Use the `markdown` parameter with `fs.createReadStream()` for local Markdown files or with `fetch()` for remote Markdown files. ```typescript [expandable] theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; // Define your extraction schema const schemaDict = { type: "object", properties: { employee_name: { type: "string", description: "The employee's full name" }, employee_ssn: { type: "string", description: "The employee's Social Security Number" }, gross_pay: { type: "number", description: "The gross pay amount" } } }; const client = new LandingAIADE(); const schemaJson = JSON.stringify(schemaDict); // Extract from a local markdown file const extractResponse = await client.extract({ schema: schemaJson, markdown: fs.createReadStream("/path/to/output.md"), model: "extract-latest" }); // Or extract from a remote markdown file const extractResponse = await client.extract({ schema: schemaJson, markdown: await fetch("https://example.com/document.md"), model: "extract-latest" }); // Access the extracted data console.log(extractResponse.extraction); ``` ### Extraction with Zod Use Zod schemas to define your extraction schema in a type-safe way. Zod provides TypeScript type inference and runtime validation for your extracted data. To use Zod with the library, install `zod`: ```bash theme={null} npm install zod ``` After installing `zod`, run extraction with the library: ```typescript [expandable] theme={null} import LandingAIADE, { toFile } from "landingai-ade"; import fs from "fs"; import { z } from "zod"; // Define your extraction schema as a Zod schema const PayStubSchema = z.object({ employee_name: z.string().describe("The employee's full name"), employee_ssn: z.string().describe("The employee's Social Security Number"), gross_pay: z.number().describe("The gross pay amount") }); // Extract TypeScript type from schema type PayStubData = z.infer; // Initialize the client const client = new LandingAIADE(); // First, parse the document to get markdown const parseResponse = await client.parse({ document: fs.createReadStream("/path/to/pay-stub.pdf"), model: "dpt-2-latest" }); // Convert Zod schema to JSON schema const schema = JSON.stringify(z.toJSONSchema(PayStubSchema)); // Extract structured data using the schema const extractResponse = await client.extract({ schema: schema, markdown: await toFile(Buffer.from(parseResponse.markdown), "document.md"), model: "extract-latest" }); // Access the extracted data with type safety const data = extractResponse.extraction as PayStubData; console.log(data); // Access extraction metadata to see which chunks were referenced console.log(extractResponse.extraction_metadata); ``` ### Extraction with JSON Schema (Inline) Define your extraction schema directly as a JSON string in your script. ```typescript [expandable] theme={null} import LandingAIADE, { toFile } from "landingai-ade"; import fs from "fs"; // Define your extraction schema as an object const schemaDict = { type: "object", properties: { employee_name: { type: "string", description: "The employee's full name" }, employee_ssn: { type: "string", description: "The employee's Social Security Number" }, gross_pay: { type: "number", description: "The gross pay amount" } } }; // Initialize the client const client = new LandingAIADE(); // First, parse the document to get markdown const parseResponse = await client.parse({ document: fs.createReadStream("/path/to/pay-stub.pdf"), model: "dpt-2-latest" }); // Convert schema object to JSON string const schemaJson = JSON.stringify(schemaDict); // Extract structured data using the schema const extractResponse = await client.extract({ schema: schemaJson, markdown: await toFile(Buffer.from(parseResponse.markdown), "document.md"), model: "extract-latest" }); // Access the extracted data console.log(extractResponse.extraction); // Access extraction metadata to see which chunks were referenced console.log(extractResponse.extraction_metadata); ``` ### Extraction with JSON Schema File Load your extraction schema from a separate JSON file for better organization and reusability. For example, here is the `pay_stub_schema.json` file: ```json theme={null} { "type": "object", "properties": { "employee_name": { "type": "string", "description": "The employee's full name" }, "employee_ssn": { "type": "string", "description": "The employee's Social Security Number" }, "gross_pay": { "type": "number", "description": "The gross pay amount" } } } ``` You can pass the JSON file defined above in the following script: ```typescript [expandable] theme={null} import LandingAIADE, { toFile } from "landingai-ade"; import fs from "fs"; // Initialize the client const client = new LandingAIADE(); // First, parse the document to get markdown const parseResponse = await client.parse({ document: fs.createReadStream("/path/to/pay-stub.pdf"), model: "dpt-2-latest" }); // Load schema from JSON file const schemaJson = fs.readFileSync("pay_stub_schema.json", "utf-8"); // Extract structured data using the schema const extractResponse = await client.extract({ schema: schemaJson, markdown: await toFile(Buffer.from(parseResponse.markdown), "document.md"), model: "extract-latest" }); // Access the extracted data console.log(extractResponse.extraction); // Access extraction metadata to see which chunks were referenced console.log(extractResponse.extraction_metadata); ``` ### Extract Nested Subfields Define nested Zod schemas to extract hierarchical data from documents. This approach organizes related information under meaningful section names. Define nested schemas before the main extraction schema. Otherwise, the nested schemas will not be defined when referenced. For example, to extract data from the **Patient Details** and **Emergency Contact Information** sections in this Medical Form, define separate schemas for each section, then combine them in a main schema. ```typescript [expandable] theme={null} import LandingAIADE, { toFile } from "landingai-ade"; import fs from "fs"; import { z } from "zod"; // Define a nested schema for patient-specific information const PatientDetailsSchema = z.object({ patient_name: z.string().describe("Full name of the patient."), date: z.string().describe("Date the patient information form was filled out.") }); // Define a nested schema for emergency contact details const EmergencyContactInformationSchema = z.object({ emergency_contact_name: z.string().describe("Full name of the emergency contact person."), relationship_to_patient: z.string().describe("Relationship of the emergency contact to the patient."), primary_phone_number: z.string().describe("Primary phone number of the emergency contact."), secondary_phone_number: z.string().describe("Secondary phone number of the emergency contact."), address: z.string().describe("Full address of the emergency contact.") }); // Define the main extraction schema that combines all the nested schemas const PatientAndEmergencyContactInformationSchema = z.object({ patient_details: PatientDetailsSchema.describe("Information about the patient as provided in the form."), emergency_contact_information: EmergencyContactInformationSchema.describe("Details of the emergency contact person for the patient.") }); // Extract TypeScript type from schema type PatientAndEmergencyContactInformation = z.infer; // Initialize the client const client = new LandingAIADE(); // Parse the document to get markdown const parseResponse = await client.parse({ document: fs.createReadStream("/path/to/medical-form.pdf"), model: "dpt-2-latest" }); // Convert Zod schema to JSON schema const schema = JSON.stringify(z.toJSONSchema(PatientAndEmergencyContactInformationSchema)); // Extract structured data using the schema const extractResponse = await client.extract({ schema: schema, markdown: await toFile(Buffer.from(parseResponse.markdown), "document.md"), model: "extract-latest" }); // Display the extracted structured data console.log(extractResponse.extraction); ``` ### Extract Variable-Length Data with List Objects Use Zod's `z.array()` to extract repeatable data structures when you don't know how many items will appear. Common examples include line items in invoices, transaction records, or contact information for multiple people. For example, to extract variable-length wire instructions and line items from this Wire Transfer Form, use `z.array(DescriptionItemSchema)` for line items and `z.array(WireInstructionSchema)` for wire transfer details. ```typescript [expandable] theme={null} import LandingAIADE, { toFile } from "landingai-ade"; import fs from "fs"; import { z } from "zod"; // Nested schemas for array fields const DescriptionItemSchema = z.object({ description: z.string().describe("Invoice or Bill Description"), amount: z.number().describe("Invoice or Bill Amount") }); const WireInstructionSchema = z.object({ bank_name: z.string().describe("Bank name"), bank_address: z.string().describe("Bank address"), bank_account_no: z.string().describe("Bank account number"), swift_code: z.string().describe("SWIFT code"), aba_routing: z.string().describe("ABA routing number"), ach_routing: z.string().describe("ACH routing number") }); // Invoice schema containing array fields const InvoiceSchema = z.object({ description_or_particular: z.array(DescriptionItemSchema).describe("List of invoice line items (description and amount)"), wire_instructions: z.array(WireInstructionSchema).describe("Wire transfer instructions") }); // Main extraction schema const ExtractedInvoiceFieldsSchema = z.object({ invoice: InvoiceSchema.describe("Invoice list-type fields") }); // Extract TypeScript type from schema type ExtractedInvoiceFields = z.infer; // Initialize the client const client = new LandingAIADE(); // Parse the document to get markdown const parseResponse = await client.parse({ document: fs.createReadStream("/path/to/wire-transfer.pdf"), model: "dpt-2-latest" }); // Convert Zod schema to JSON schema const schema = JSON.stringify(z.toJSONSchema(ExtractedInvoiceFieldsSchema)); // Extract structured data using the schema const extractResponse = await client.extract({ schema: schema, markdown: await toFile(Buffer.from(parseResponse.markdown), "document.md"), model: "extract-latest" }); // Display the extracted data console.log(extractResponse.extraction); ``` ## Classify: Getting Started The `classify` method classifies each page in a document by type. Provide your document and a list of classes, and the API assigns a class to each page. Use these examples as guides to get started with classifying with the library. ### Classify Local Files Use the `document` parameter to classify files from your filesystem. Pass the file as a read stream using `fs.createReadStream()`. ```typescript theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; const client = new LandingAIADE(); const classes = [ { class: "invoice", description: "A commercial bill with line items, totals, and payment terms" }, { class: "bank_statement", description: "A monthly summary of account transactions" }, { class: "pay_stub" } ]; const classesJson = JSON.stringify(classes); const response = await client.classify({ classes: classesJson as any, document: fs.createReadStream("/path/to/document.pdf"), model: "classify-latest" }); for (const result of response.classification) { console.log(`Page ${result.page}: ${result.class}`); } ``` ### Classify Remote URLs Use the `document_url` parameter to classify files from remote URLs (http, https). ```typescript theme={null} import LandingAIADE from "landingai-ade"; const client = new LandingAIADE(); const classes = [ { class: "invoice", description: "A commercial bill with line items, totals, and payment terms" }, { class: "bank_statement", description: "A monthly summary of account transactions" } ]; const classesJson = JSON.stringify(classes); const response = await client.classify({ classes: classesJson as any, document_url: "https://example.com/document.pdf", model: "classify-latest" }); for (const result of response.classification) { console.log(`Page ${result.page}: ${result.class}`); } ``` ### Set Parameters The `classify` method accepts optional parameters to customize classification behavior. To see all available parameters, go to [ API](https://docs.landing.ai/api-reference/tools/ade-classify). ```typescript theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; const client = new LandingAIADE(); const classes = [ { class: "invoice" }, { class: "bank_statement" } ]; const classesJson = JSON.stringify(classes); const response = await client.classify({ classes: classesJson as any, document: fs.createReadStream("/path/to/document.pdf"), model: "classify-latest" }); ``` ### Classify Output The `classify` method returns a `ClassifyResponse` object with the following fields: * **`classification`**: Array of `Classification` objects, one per page, each containing: * **`class`**: The predicted class label, or `'unknown'` if the page could not be classified * **`page`**: The zero-indexed page number * **`reason`**: A brief explanation of the classification (for debugging) * **`suggested_class`**: A proposed class when the prediction is `'unknown'` * **`metadata`**: Processing information (credit usage, duration, filename, job ID, page count, version) For detailed information about the response structure, see [JSON Response for Classification](./ade-classify-response). #### Work with Classify Response Data **Get classification for each page:** ```typescript theme={null} for (const result of response.classification) { console.log(`Page ${result.page}: ${result.class}`); } ``` **Filter pages by class:** ```typescript theme={null} const invoices = response.classification.filter(r => r.class === "invoice"); console.log(`Found ${invoices.length} invoice pages`); ``` **Handle pages that could not be classified:** ```typescript theme={null} const unknown = response.classification.filter(r => r.class === "unknown"); for (const r of unknown) { console.log(`Page ${r.page}: suggested class is ${r.suggested_class}`); } ``` ## Section: Getting Started The `section` method analyzes a parsed document and generates a hierarchical table of contents. Use these examples as guides to get started with sectioning with the library. **Pass Markdown Content** The library supports a few methods for passing the Markdown content for sectioning: * Section data directly from the [parse response](#section-from-parse-response) * Section data from a local [Markdown file](#section-from-markdown-files) * Section data from a Markdown file at a remote URL: `markdown: await fetch("https://example.com/file.md")` ### Section from Parse Response After parsing a document, you can pass the Markdown string directly from the `ParseResponse` to the section method without saving it to a file. ```typescript theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; const client = new LandingAIADE(); // Parse the document const parseResponse = await client.parse({ document: fs.createReadStream("/path/to/document.pdf"), model: "dpt-2-latest" }); // Section using the Markdown string from parse response const sectionResponse = await client.section({ markdown: parseResponse.markdown, // Pass Markdown string directly model: "section-latest" }); // Access the table of contents for (const entry of sectionResponse.table_of_contents) { const indent = " ".repeat(entry.level - 1); console.log(`${indent}${entry.section_number}. ${entry.title}`); } ``` ### Section from Markdown Files If you already have a Markdown file (from a previous parsing operation), you can section it directly. Use the `markdown` parameter for local Markdown files or the `markdown` parameter with `fetch()` for remote Markdown files. ```typescript theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; const client = new LandingAIADE(); // Section from a local Markdown file const sectionResponse = await client.section({ markdown: fs.createReadStream("/path/to/parsed_output.md"), model: "section-latest" }); // Or section from a remote Markdown file const sectionResponse = await client.section({ markdown: await fetch("https://example.com/document.md"), model: "section-latest" }); // Access the table of contents for (const entry of sectionResponse.table_of_contents) { const indent = " ".repeat(entry.level - 1); console.log(`${indent}${entry.section_number}. ${entry.title}`); } ``` ### Set Parameters The `section` method accepts optional parameters to customize sectioning behavior. To see all available parameters, go to [ API](https://docs.landing.ai/api-reference/tools/ade-section). ```typescript theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; const client = new LandingAIADE(); const sectionResponse = await client.section({ markdown: fs.createReadStream("/path/to/parsed_output.md"), guidelines: "Treat each numbered article as a top-level section", model: "section-latest" }); ``` ### Section Output The `section` method returns a `SectionResponse` object with the following fields: * **`table_of_contents`**: Array of `SectionTOCEntry` objects, each containing: * **`title`**: The generated section heading text * **`level`**: The hierarchy depth (1 = top-level, 2 = subsection, 3 = sub-subsection, and so on) * **`section_number`**: The hierarchical number (for example, `"1"`, `"1.2"`, `"1.2.3"`) * **`start_reference`**: The chunk ID where this section begins, corresponding to a `chunks[].id` value from the parse response * **`table_of_contents_md`**: Markdown-formatted TOC string with anchor links * **`metadata`**: Processing information (credit usage, duration, filename, job ID, version) For detailed information about the response structure, see [JSON Response for Sectioning](./ade-section-response). ## Split: Getting Started The `split` method classifies and separates a parsed document into multiple sub-documents based on Split Rules you define. Use these examples as guides to get started with splitting with the library. **Pass Markdown Content** The library supports a few methods for passing the Markdown content for splitting: * Split data directly from the [parse response](#split-from-parse-response) * Split data from a local [Markdown file](#split-from-markdown-files) * Split data from a Markdown file at a remote URL: `markdown: await fetch("https://example.com/file.md")` **Define Split Rules** Split Rules define how the API classifies and separates your document. Each Split Rule consists of: * `name`: The Split Type name (required) * `description`: Additional context about what this Split Type represents (optional) * `identifier`: A field that makes each instance unique, used to create separate splits (optional) For more information about Split Rules, see [Split Rules](./ade-split#split-rules). ### Split from Parse Response After parsing a document, you can pass the Markdown string directly from the `ParseResponse` to the split method without saving it to a file. ```typescript theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; const client = new LandingAIADE(); // Parse the document const parseResponse = await client.parse({ document: fs.createReadStream("/path/to/document.pdf"), model: "dpt-2-latest" }); // Define Split Rules const splitClass = [ { name: "Bank Statement", description: "Document from a bank that summarizes all account activity over a period of time." }, { name: "Pay Stub", description: "Document that details an employee's earnings, deductions, and net pay for a specific pay period.", identifier: "Pay Stub Date" } ]; const splitClassJson = JSON.stringify(splitClass); // Split using the Markdown string from parse response const splitResponse = await client.split({ split_class: splitClassJson as any, markdown: parseResponse.markdown, // Pass Markdown string directly model: "split-latest", saveTo: "output_folder" // optional: saves as {input_file}_split_output.json }); // Access the splits for (const split of splitResponse.splits) { console.log(`Classification: ${split.classification}`); console.log(`Identifier: ${split.identifier}`); console.log(`Pages: ${split.pages}`); } ``` ### Split from Markdown Files If you already have a Markdown file (from a previous parsing operation), you can split it directly. Use the `markdown` parameter for local Markdown files or the `markdown` parameter with `fetch()` for remote Markdown files. ```typescript theme={null} import LandingAIADE, { toFile } from "landingai-ade"; import fs from "fs"; const client = new LandingAIADE(); // Define Split Rules const splitClass = [ { name: "Invoice", description: "A document requesting payment for goods or services.", identifier: "Invoice Number" }, { name: "Receipt", description: "A document acknowledging that payment has been received." } ]; const splitClassJson = JSON.stringify(splitClass); // Split from a local Markdown file const splitResponse = await client.split({ split_class: splitClassJson as any, markdown: fs.createReadStream("/path/to/parsed_output.md"), model: "split-latest" }); // Or split from a remote Markdown file const splitResponse = await client.split({ split_class: splitClassJson as any, markdown: await fetch("https://example.com/document.md"), model: "split-latest" }); // Access the splits for (const split of splitResponse.splits) { console.log(`Classification: ${split.classification}`); if (split.identifier) { console.log(`Identifier: ${split.identifier}`); } console.log(`Number of pages: ${split.pages.length}`); console.log(`Markdown content: ${split.markdowns[0].substring(0, 100)}...`); } ``` ### Set Parameters The `split` method accepts optional parameters to customize split behavior. To see all available parameters, go to [ADE Split API](https://docs.landing.ai/api-reference/tools/ade-split). ```typescript theme={null} import LandingAIADE from "landingai-ade"; import fs from "fs"; const client = new LandingAIADE(); const splitClass = [ { name: "Section A", description: "Introduction section" }, { name: "Section B", description: "Main content section" } ]; const splitClassJson = JSON.stringify(splitClass); const splitResponse = await client.split({ split_class: splitClassJson as any, markdown: fs.createReadStream("/path/to/parsed_output.md"), model: "split-latest" }); ``` ### Split Output The `split` method returns a `SplitResponse` object with the following fields: * **`splits`**: Array of `Split` objects, each containing: * **`classification`**: The Split Type name assigned to this sub-document * **`identifier`**: The unique identifier value (or `null` if no identifier was specified) * **`pages`**: Array of zero-indexed page numbers that belong to this split * **`markdowns`**: Array of Markdown content strings, one for each page * **`metadata`**: Processing information (credit usage, duration, filename, job ID, page count, version) For detailed information about the response structure, see [JSON Response for Splitting](./ade-split-response). ### Work with Split Response Data **Access all splits by classification:** ```typescript theme={null} for (const split of splitResponse.splits) { console.log(`Split Type: ${split.classification}`); console.log(`Pages included: ${split.pages}`); } ``` **Filter splits by classification:** ```typescript theme={null} const invoices = splitResponse.splits.filter(split => split.classification === "Invoice"); console.log(`Found ${invoices.length} invoices`); ``` **Access Markdown content for each split:** ```typescript theme={null} for (const split of splitResponse.splits) { console.log(`Classification: ${split.classification}`); for (let i = 0; i < split.markdowns.length; i++) { console.log(` Page ${split.pages[i]} Markdown: ${split.markdowns[i].substring(0, 100)}...`); } } ``` **Group splits by identifier:** ```typescript theme={null} const splitsByIdentifier = new Map>(); for (const split of splitResponse.splits) { if (split.identifier) { const existing = splitsByIdentifier.get(split.identifier) || []; existing.push(split); splitsByIdentifier.set(split.identifier, existing); } } for (const [identifier, splits] of splitsByIdentifier.entries()) { console.log(`Identifier '${identifier}': ${splits.length} split(s)`); } ``` # API Key Source: https://docs.landing.ai/ade/agentic-api-key Running the API, either by using a library or by calling an API directly, requires an API key. * [Get your API key](#get-your-api-key) * [Set your API key when using the library](#set-your-api-key-when-using-the-library) * [Set your API key when calling the API directly](#set-your-api-key-when-calling-the-api-directly) ## Get Your API Key If using in the EU, get your API key [here](https://va.eu-west-1.landing.ai/settings/api-key). Get your API key on the [API Key](https://va.landing.ai/settings/api-key) page: 1. Log in to [https://va.landing.ai/](https://va.landing.ai/). 2. Go to the [API Key](https://va.landing.ai/settings/api-key) page (to navigate there manually, click your profile icon at the bottom left corner of the page and click **API Key**). Navigate to API Key 3. If you're on a subscription plan, you can [create an API key](#create-api-keys-subscription-plans-only) if needed. 4. Click the **Copy** icon to copy your API key. API Key ## Personal Plans (Explore Plans) Have One API Key The personal plan (the "Explore" plan) only has one API key. This API key cannot be deleted or revoked. The Explore plan type is designed for testing, prototyping, and hobby use. If you'd like to deploy to production and need more granular API key management, like the ability to create and revoke API keys, [upgrade](./ade-pricing) to a subscription plan. If your API key is exposed or leaked and you'd like to request a new API key, contact [support@landing.ai](mailto:support@landing.ai) ## Create API Keys If you are on a Team or Enterprise plan, you can create multiple API keys for your organization to use. To create an API key: 1. Log in to [https://va.landing.ai/](https://va.landing.ai/). 2. Go to the [API Key](https://va.landing.ai/settings/api-key) page (to navigate there manually, click your profile icon at the bottom left corner of the page and click **API Key**). Navigate to API Key 3. If you're not currently in the correct organization, select your organization from the drop-down menu. 4. Click **Create Key**. 5. Enter a brief, descriptive name for the API key in the key name field. 6. Click **Create API Key**. 7. Click the **Copy** icon to copy the API key. 8. Click **Done** to close the pop-up. ## Revoke API Keys If you are on the Team or Enterprise plan, you can revoke API keys. A user with the Admin role can revoke any API key. A user with the Developer role can revoke only API keys they have created. To revoke an API key: 1. Log in to [https://va.landing.ai/](https://va.landing.ai/). 2. Go to the [API Key](https://va.landing.ai/settings/api-key) page (to navigate there manually, click your profile icon at the bottom left corner of the page and click **API Key**). Navigate to API Key 3. If you're not currently in the correct organization, select your organization from the drop-down menu. 4. Locate the row for the API key. (You may need to search to narrow down the list of API keys.) 5. Click the **Settings** button (ellipses) and select **Revoke API Key**. Follow the on-screen prompts to complete the process. ## Set Your API Key When Using the Library There are a few methods you can use to set the API key when using the library, including: * [Method 1: Set the API Key as an Environment Variable](#method-1-set-the-api-key-as-an-environment-variable) * [Method 2: Store API key in a .env file](#method-2-store-api-key-in-a-env-file) * [Method 3: Set API Key in Notebooks](#method-3-set-api-key-in-notebooks) ### Method 1: Set the API Key as an Environment Variable Store your API key as an environment variable in your system. 1. Set the API key as an environment variable: **Linux/macOS:** Add the API key to your shell configuration file (`.zshrc` or `.bashrc`). After updating the file, run `source ~/.zshrc` to apply the changes. ```bash theme={null} export VISION_AGENT_API_KEY=your_api_key_here ``` **Windows (Command Prompt):** ```cmd theme={null} set VISION_AGENT_API_KEY=your_api_key_here ``` **Windows (PowerShell):** ```powershell theme={null} $env:VISION_AGENT_API_KEY="your_api_key_here" ``` 2. Initialize the client in your Python code. The library will automatically detect the `VISION_AGENT_API_KEY` environment variable, so no additional code is required. However, you can explicitly pass the API key to make your code clearer and easier to debug: ```python theme={null} import os from landingai_ade import LandingAIADE client = LandingAIADE( apikey=os.environ.get("VISION_AGENT_API_KEY"), ) ``` ### Method 2: Store API Key in a .env File Use a `.env` file to store your API key. 1. Install the `python-dotenv` package: ```bash theme={null} pip install python-dotenv ``` 2. Create a `.env` file in your project root and store your API key in it: ``` VISION_AGENT_API_KEY=your_api_key_here ``` 3. Add code to load your API key from the `.env` file: ```python theme={null} from dotenv import load_dotenv import os from landingai_ade import LandingAIADE # Load environment variables from .env file load_dotenv() # Initialize the client (it will automatically use VISION_AGENT_API_KEY from environment) client = LandingAIADE() ``` ### Method 3: Set API Key in Notebooks If using IPython-based notebooks (such as Jupyter Notebook, JupyterLab, Google Colab, or Kaggle), you can set the API key for the current session using the `%env` magic command. This method sets the API key only for the current session. The key is cleared when you restart the notebook kernel. Additionally, the API key will be visible in the notebook file if you share or commit it to version control. 1. In a notebook cell, set the environment variable using the `%env` magic command: ```python theme={null} %env VISION_AGENT_API_KEY=your_api_key_here ``` 2. Initialize the client in a subsequent cell: ```python theme={null} from landingai_ade import LandingAIADE # Initialize the client (it will automatically use VISION_AGENT_API_KEY from environment) client = LandingAIADE() ``` ### API Key Precedence If the API key is set using multiple methods, the library uses the following order of precedence (highest to lowest): * Environment variable * .env file The first available key found in this order will be used. All other values will be ignored. ## Set Your API Key When Calling the API Directly Include the API key in request headers. To see code samples, go to [API](https://docs.landing.ai/api-reference/tools/ade-parse). ## Security Best Practices * If using a `.env` file: Add `.env` to `.gitignore` to prevent accidentally committing the API key to version control. * Do not pass the API key directly to the code, because others could see and use your API key. ## Troubleshoot API Key Issues Use the troubleshooting tips in this section to resolve issues related to API keys. ### .env File Issues If you set the API key in the `.env` file but the environment variable is not loading, try these possible solutions to resolve this issue: * Check file location: The `.env` file should be in the same directory where you run your script. * Verify file format: No spaces around the equals sign: ``` # Correct VISION_AGENT_API_KEY=your_api_key_here # Incorrect VISION_AGENT_API_KEY = your_api_key_here ``` ### Environment Variable Issues If you set the API key as an environmental variable and are encountering issues, try these possible solutions to resolve this issue: * Restart your application after setting environment variables. * Check variable name: Must be exactly `VISION_AGENT_API_KEY`. * Check if the environment variable is correct: ```bash theme={null} echo $VISION_AGENT_API_KEY # Linux/macOS echo %VISION_AGENT_API_KEY% # Windows CMD $Env:VISION_AGENT_API_KEY # Windows PowerShell ``` * Ensure that you apply the changes after setting, changing, or removing the environment variable: * **Linux/macOS**: Restart your terminal. If you added the variable to your `.zshrc` file, run source `~/.zshrc`. * **Windows**: Close and reopen Command Prompt/PowerShell, or restart your IDE. ### Wrong API Key Used If you have multiple accounts and is using the wrong API key, the API key might be set using multiple methods and one is taking precedence over the other. To resolve this issue, check which API keys are set using each method. **Example Scenario:** 1. You create an account with your *personal* email address and set the API as an environment variable. 2. A few weeks later you create an account with your *work* email address and store the API key for that account in a .env file. 3. When you run , it uses the first API key, because the environment variable takes precedence over the .env file. **Solution**: Checking each way the API key can be set and then removing the unwanted API keys will ensure that the correct API key is used. ### Error: Missing Authorization Header This error occurs if the authorization header is not included when calling the API directly. To fix this, include the API header: ```python theme={null} headers = {"Authorization": f"Bearer {api_key}"} ``` ### Error: User Not Found, Please Check Your API Key This error occurs when the API key is entered incorrectly. To resolve this issue: * Verify the API key: Copy your key directly from the [dashboard](https://va.landing.ai/settings/api-key). * Restart your environment: Close and reopen your terminal, IDE, or Jupyter notebook. # Zero Data Retention (ZDR) Option Overview Source: https://docs.landing.ai/ade/zdr At LandingAI, we provide robust tools to process your documents efficiently. For customers with strict data privacy and compliance requirements, we offer the **Zero Data Retention (ZDR) option for our (ADE) product.** This document provides an overview of what the ZDR option means, its availability, and how to enable it for your organization. ## 1.0 What is the Zero Data Retention (ZDR) Option? When the ZDR option is enabled for your account, your data is processed in real-time **without being saved to our systems or any third-party systems**. This means: * Your documents are processed in-memory and are **never stored at rest** on LandingAI systems or by our sub-processors. * Your data is used exclusively to perform the extraction process you initiate and is immediately and irrevocably discarded after processing is complete. * LandingAI does not use your data for training or improving our models when ZDR is active. This provides the highest level of data privacy and control for your most sensitive documents. ## 2.0 Versions of Agentic Document Extraction That Support ZDR ZDR is available for the following versions of Agentic Document Extraction: * [LandingAI-Hosted ADE](#2-1-landingai-hosted-ade-us-%26-eu) * [ADE Containerized App on Customer’s VPC](#2-4-ade-containerized-app-on-customer’s-vpc) Support varies for each version. Read the full descriptions below to understand the extent of support. ### 2.1 LandingAI-Hosted ADE (US & EU) The LandingAI-hosted version of ADE is available in both the [US](https://va.landing.ai/) and [EU](https://va.eu-west-1.landing.ai/). Users can enable ZDR based on their plan, as described in the table below. | Location | Region | ZDR Availability | | -------- | ------------------------------ | ------------------------------------------------------------------------------------ | | US | AWS Ohio
(us-east-2) | Users on the Team and Enterprise plan can enable ZDR directly in the user interface. | | EU | AWS Ireland
(eu-west-1) | Users with custom pricing plans can enable ZDR directly in the user interface. | #### 2.1.1 Scope When ZDR is enabled, our fully managed SaaS offering ensures zero data retention across the entire platform, including all subprocessors. When ZDR is enabled, ZDR is applicable when parsing documents with our Python libraries and when calling the ADE APIs directly. A separate setting controls whether ZDR is applied to the [Playground](https://va.landing.ai/). #### 2.1.2 Cost When ZDR is enabled, parsing documents costs 1 extra credit per page. For detailed information on pricing, go to [Pricing](https://docs.landing.ai/ade/ade-pricing). #### 2.1.3 Enable ZDR US users (on the Team and Enterprise plans) and EU users (on custom pricing plans) can enable ZDR in the user interface by following the instructions below: 1. Log in to [https://va.landing.ai/](https://va.landing.ai/). 2. Go to the [Organization Settings](https://va.landing.ai/settings/organization/general) page (to navigate there manually, click your profile icon at the bottom left corner of the page and click **Organization Settings**). 3. In the Zero Data Retention box, click **Turn It On**. 4. A pop-up opens. Carefully read the information so that you understand how ZDR works. 5. If you want ZDR to also be applied when uploading documents to the [Playground](https://va.landing.ai/), select the **Also apply to Playground UI** checkbox. 6. Click **Enable Zero Data Retention**. 7. The Zero Data Retention box confirms that ZDR is enabled. 8. If you need to ensure HIPAA compliance, you must have a Business Associate Agreement (BAA) in place with LandingAI. To initiate the BAA process, submit your request through the form on the [Organization Settings](https://va.landing.ai/settings/organization/general) page (available after ZDR is enabled). US users can also enable ZDR when upgrading to a Team plan. #### 2.1.4 Turn Off ZDR If you have enabled ZDR and later want to turn it off, contact [support@landing.ai](mailto:support@landing.ai). ### 2.2 ADE Containerized App on Customer’s VPC ADE can be deployed within your own Virtual Private Cloud (e.g., AWS, Azure, GCP). In this deployment, ADE maintains zero data retention because it is on your VPC. In a VPC deployment, LandingAI is not responsible for zero data retention related to your infrastructure or any subprocessors you integrate (e.g., your own LLM API keys). Your organization is responsible for managing these. ## 3.0 Using ZDR for Protected Health Information (PHI), Personally Identifiable Information (PII) & HIPAA Compliance The ZDR option provides the necessary technical safeguards to support customers processing Protected Health Information (PHI) and Personally Identifiable Information (PII) in compliance with HIPAA. If you are a covered entity and intend to process PHI or PII with our ADE service, you must: 1. Enable the Zero Data Retention (ZDR) option. 2. Have a signed Business Associate Agreement (BAA) in place with LandingAI. To initiate the BAA process, submit your request through the form on the [Organization Settings](https://va.landing.ai/settings/organization/general) page (available after ZDR is enabled). To learn more about HIPAA compliance, see these resources: * [Security and Data Privacy](https://landing.ai/security-at-landingai) * [Trust Center](https://trust.landing.ai/) # ADE Build Extract Schema Source: https://docs.landing.ai/api-reference/tools/ade-build-extract-schema /ade/va_openapi_ade2.json post /v1/ade/extract/build-schema Generate a JSON schema from Markdown using AI. This endpoint analyzes Markdown content and generates a JSON schema suitable for use with the extract endpoint. It can also refine an existing schema based on new documents or iterate on a schema based on prompt instructions. For EU users, use this endpoint: `https://api.va.eu-west-1.landing.ai/v1/ade/extract/build-schema`. # ADE Classify Source: https://docs.landing.ai/api-reference/tools/ade-classify /ade/va_openapi_ade2.json post /v1/ade/classify Classify the pages of a document into classes you define. This endpoint accepts PDFs, images, and other supported file types (either as a `document` upload or `document_url`) together with a list of `classes`, and returns a classification result for each page. For EU users, use this endpoint: `https://api.va.eu-west-1.landing.ai/v1/ade/classify`. # ADE Extract Source: https://docs.landing.ai/api-reference/tools/ade-extract /ade/va_openapi_ade2.json post /v1/ade/extract Extract structured data from Markdown using a JSON schema. This endpoint processes Markdown content and extracts structured data according to the provided JSON schema. For EU users, use this endpoint: `https://api.va.eu-west-1.landing.ai/v1/ade/extract`. # ADE Get Parse Jobs Source: https://docs.landing.ai/api-reference/tools/ade-get-parse-jobs /ade/va_openapi_ade2.json get /v1/ade/parse/jobs/{job_id} Get the status for an async parse job. Returns the job status or an error response. For EU users, use this endpoint: `https://api.va.eu-west-1.landing.ai/v1/ade/parse/jobs/{job_id}`. # ADE List Parse Jobs Source: https://docs.landing.ai/api-reference/tools/ade-list-parse-jobs /ade/va_openapi_ade2.json get /v1/ade/parse/jobs List all async parse jobs associated with your API key. Returns the list of jobs or an error response. For EU users, use this endpoint: `https://api.va.eu-west-1.landing.ai/v1/ade/parse/jobs`. # ADE Parse Source: https://docs.landing.ai/api-reference/tools/ade-parse /ade/va_openapi_ade2.json post /v1/ade/parse Parse a document or spreadsheet. This endpoint parses documents (PDF, images) and spreadsheets (XLSX, CSV) into structured Markdown, chunks, and metadata. For EU users, use this endpoint: `https://api.va.eu-west-1.landing.ai/v1/ade/parse`. # ADE Parse Jobs Source: https://docs.landing.ai/api-reference/tools/ade-parse-jobs /ade/va_openapi_ade2.json post /v1/ade/parse/jobs Parse documents asynchronously. This endpoint creates a job that handles the processing for both large documents and large batches of documents. For EU users, use this endpoint: `https://api.va.eu-west-1.landing.ai/v1/ade/parse/jobs`. # ADE Section Source: https://docs.landing.ai/api-reference/tools/ade-section /ade/va_openapi_ade2.json post /v1/ade/section Section parsed markdown into a hierarchical table of contents. This endpoint accepts the markdown output from /ade/parse (with reference anchors) and returns a flat, reading-order list of sections with hierarchy levels and reference ranges. For EU users, use this endpoint: `https://api.va.eu-west-1.landing.ai/v1/ade/section`. # ADE Split Source: https://docs.landing.ai/api-reference/tools/ade-split /ade/va_openapi_ade2.json post /v1/ade/split Split classification for documents. This endpoint classifies document sections based on markdown content and split options. For EU users, use this endpoint: `https://api.va.eu-west-1.landing.ai/v1/ade/split`. # Home Source: https://docs.landing.ai/index

Parse Documents with

Learn how to get started with .
Just getting started? Load your documents in our Playground to see immediate results. Make your first API call in minutes. Integrate using our API. Learn about changes and new features in .

Scale with Our Libraries

Use our Python and TypeScript libraries to build custom scripts and scale document processing.
Integrate document parsing into Python applications, data pipelines, and automation scripts. Build type-safe Node.js applications and web services with full IDE support.
# LandingLens and LandingEdge Documentation Source: https://docs.landing.ai/landinglens-landingedge LandingLens and LandingEdge documentation has moved to this dedicated site: [https://landinglens.docs.landing.ai/](https://landinglens.docs.landing.ai/). Build and deploy computer vision models for object detection, classification, segmentation, and anomaly detection. Use the LandingEdge app or Docker to run LandingLens models on-premises for real-time visual inspection.