Split

Use the API to split a parsed document into multiple classified sub-documents. Organizations typically use Split when they receive batched documents that contain multiple document types or multiple instances of the same document type. The API classifies each sub-document and returns the full Markdown content for downstream processing. Splitting occurs after parsing but before extraction. You can use Split without performing extraction.

is in Preview. This feature is still in development and may not return accurate results. Do not use this feature in production environments.

Example Use Cases

Financial Services: Financial institutions processing Know Your Customer (KYC) documentation often receive PDFs containing multiple document types for each customer, such as bank statements, utility bills, and identification documents. After parsing the batch, the API separates and classifies each document type.
Healthcare: Healthcare systems ingesting patient records may receive batched PDFs containing intake forms, pathology reports, and medication lists. The API separates these documents by type for routing to appropriate systems.
Accounting: Accounting departments processing expense documentation receive PDFs with multiple invoices and receipts. The API separates each document and can use identifiers like invoice numbers or dates to create individual splits for each transaction.
Academic Research: Research institutions and libraries processing academic articles receive PDFs containing article bodies, references, and supplemental materials. The API separates these sections for indexing, citation extraction, or archival purposes.
Product Documentation: Organizations managing product catalogs receive PDFs containing specifications for multiple products. The API separates each product’s specifications, enabling automated data entry into product databases or comparison tools.
Technical Documentation: Companies distributing multilingual instruction manuals receive PDFs with the same content repeated in different languages. The API separates the manual by language, allowing each version to be routed to the appropriate regional system or translation workflow.

For information about pricing and credits, go to Pricing & Billing.

Process Overview

Follow these steps to split a document into classified sub-documents:

Parse your document using the API. The API requires Markdown content from the parse API as input. Save the Markdown output for the next step.
Define your Split Rules by creating a set of Split Types that describe the different document types or sections in your file. Learn more about Split Rules. The easiest way to create and test Split Rules is in the Playground.
Run the API by passing the parsed Markdown content and your Split Rules. Choose your method:
- Split in the Playground: Test and refine your Split Rules interactively
- Split with the API: Integrate splitting into your application
- Split with the Python & TypeScript Libraries: Use our libraries
Use the split results in your downstream workflows. The API returns each classified sub-document with its full Markdown content. Learn more about the response structure.

Split Rules

Split Rules define how the API classifies and separates a document into sub-documents. The Split Rules are a collection of all Split Types you define for a single API call. Each Split Rule consists of a:

Split Type
Description (optional)
Identifier (optional)

Split Rules are defined differently depending on whether you use the Playground, API, or one of our libraries. See the interface-specific sections below for more information on how to create and pass the Split Rules for your method.

Split Types

Split Types define how your document is classified into sections, such as pay stubs, bank statements, and W-2s. You can define up to 19 Split Types in one API call. If the API cannot determine which Split Type a page belongs to, it classifies the page as Uncategorized.

Descriptions (Optional)

The Description provides additional context about what a Split Type represents. Detailed descriptions help the API identify what information to include in each split and improve classification accuracy. Descriptions can also impact how the API interprets Identifiers. For example, these two descriptions for clinical notes produce different identifier behavior:

Less specific description: “A clinical note documenting a patient’s office visit, including history, exam, assessment, and plan, authored by a provider.” The API might consider multiple dates as potential identifiers.
More specific description: “A clinical note documenting a patient’s office visit, including history, exam, assessment, and plan, authored by a provider. Each note is separated by office visit date. Do not look at any other dates. Only include date before the words ‘office visit’.” The API only considers dates directly before “office visit” as identifiers.

Identifiers (Optional)

When your document contains multiple instances of the same Split Type, use an Identifier to specify what makes each instance unique, such as invoice number, order ID, or date. The API creates a separate split for each unique value of the Identifier. For example, if your document contains 6 pay stubs and you specify “Pay Stub Date” as the identifier, the API creates 6 separate splits—one for each unique date value.

Example

A document contains 1 bank statement and 6 pay stubs with different dates. You define two Split Types:

Bank Statement (no identifier needed)
Pay Stub with “Pay Stub Date” as the identifier

The API returns 7 splits: 1 bank statement and 6 pay stubs, each separated by payment date.

Split in the Playground

To make it as easy as possible to split documents, we’ve created a wizard in our Playground that guides you through the process. The Playground is designed as a proof-of-concept to help you understand what the API can do and how it might fit into your workflows. After you’ve split a document in the Playground copy the code to use it in API calls or our Python and TypeScript libraries so that you can scale.

Go to the Playground.
Click Split.
Select the file you want to split.
loads and parses the file in the background.
You can now create the Split Rules, which determine how the document is split into sub-documents. There are a few ways to do this:
- View Suggested Split Rules: The app automatically recommends rules based on the parsed content. We recommend trying this approach first, and then editing the rules if needed.
- Write a Split Rules Prompt: Write a prompt that tells the app what specific Split Types and Identifiers should be used. The app then generates rules based on this prompt.
- Start from Scratch: Manually define the Split Rules.
After creating your first round of Split Rules, edit them if needed.
Click Split Document to see the results, which open in a new panel. You can toggle between a visual representation of the results and the actual API JSON response.
You can continue to edit the Split Rules if needed.
Once you’re happy with the results, copy the code so that you can scale the API call.

Split with the API

Split a document by calling the endpoint. This example splits a document containing bank statements and pay stubs:

curl -X POST 'https://api.va.landing.ai/v1/ade/split' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -F 'markdown=@markdown.md' \
  -F 'split_class=[{"name": "Bank Statement", "description": "Document from a bank that summarizes all account activity over a period of time."}, {"name": "Pay Stub", "description": "Document that details an employee'\''s earnings, deductions, and net pay for a specific pay period.", "identifier": "Pay Stub Date"}]' \
  -F 'model=split-latest'

Parameters

Get the full parameters from the API reference.

markdown (required): The Markdown output from the API. You can pass the Markdown content directly or reference a file.
split_class (required): A JSON array defining the Split Rules. Each Split Type is a JSON object with:
- name: The Split Type name (required)
- description: Additional context about the Split Type (optional)
- identifier: The field that makes each instance unique (optional)
model (optional): The model version to use for splitting. If omitted, the API uses the latest model. For more information, see Split Model Versions.

Use Split with Our Libraries

Click one of the tiles below to learn how to use the API with our libraries.

Python Library

Use Split with our Python library.

TypeScript Library

Use Split with our TypeScript library.

The legacy agentic-doc library does not support the API.

Additional Considerations on Splitting

Each page in a document can only be assigned to one Split Type. If one page has content that could belong to more than one Split Type, the API chooses the Split Type that the page matches more closely.
The API is different from the split parameter in the API. The API separates a document into sub-documents after parsing, while the split parameter can be used during parsing to organize the parsed output by page.

Get Started

Client Libraries

Parsing

Extraction

General

Security

Administration

Agentic Document Extraction on Snowflake

Legacy ADE

Legacy Python Library

More from LandingAI

Split

Example Use Cases

Process Overview

Split Rules

Split Types

Descriptions (Optional)

Identifiers (Optional)

Example

Split in the Playground

Split with the API

Parameters

Use Split with Our Libraries

Python Library

TypeScript Library

Additional Considerations on Splitting

Get Started

Client Libraries

Parsing

Split

Extraction

General

Security

Administration

Agentic Document Extraction on Snowflake

Legacy ADE

Legacy Python Library

More from LandingAI

​Example Use Cases

​Process Overview

​Split Rules

​Split Types

​Descriptions (Optional)

​Identifiers (Optional)

​Example

​Split in the Playground

​Split with the API

​Parameters

​Use Split with Our Libraries

Python Library

TypeScript Library

​Additional Considerations on Splitting

Example Use Cases

Process Overview

Split Rules

Split Types

Descriptions (Optional)

Identifiers (Optional)

Example

Split in the Playground

Split with the API

Parameters

Use Split with Our Libraries

Additional Considerations on Splitting