> ## Documentation Index
> Fetch the complete documentation index at: https://docs.landing.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Split

> Separate a parsed document into classified sub-documents by type (Preview).

export const splitJSON = 'split rules';

export const split = 'ADE Split';

export const adePythonLibrary = 'ade-python';

export const dpt2mini = 'DPT-2 mini';

export const dpt2 = 'DPT-2';

export const dpt1 = 'DPT-1';

export const dpt = 'Document Pre-Trained Transformer';

export const companyName = 'LandingAI';

export const extract = 'ADE Extract';

export const parse = 'ADE Parse';

export const ade = 'Agentic Document Extraction';

Use the [{split}](https://docs.landing.ai/api-reference/tools/ade-split) API to split a parsed document into multiple classified sub-documents.

Organizations typically use Split when they receive batched documents that contain multiple document types or multiple instances of the same document type. The API classifies each sub-document and returns the full Markdown content for downstream processing.

Splitting occurs after [parsing](https://docs.landing.ai/api-reference/tools/ade-parse) but before [extraction](https://docs.landing.ai/api-reference/tools/ade-extract). You can use Split without performing extraction.

<Info>{split} is in Preview. This feature is still in development and may not return accurate results. Do not use this feature in production environments.</Info>

## Example Use Cases

* **Financial Services**: Financial institutions processing Know Your Customer (KYC) documentation often receive PDFs containing multiple document types for each customer, such as bank statements, utility bills, and identification documents. After parsing the batch, the {split} API separates and classifies each document type.

* **Healthcare**: Healthcare systems ingesting patient records may receive batched PDFs containing intake forms, pathology reports, and medication lists. The {split} API separates these documents by type for routing to appropriate systems.

* **Accounting**: Accounting departments processing expense documentation receive PDFs with multiple invoices and receipts. The {split} API separates each document and can use identifiers like invoice numbers or dates to create individual splits for each transaction.

* **Academic Research**: Research institutions and libraries processing academic articles receive PDFs containing article bodies, references, and supplemental materials. The {split} API separates these sections for indexing, citation extraction, or archival purposes.

* **Product Documentation**: Organizations managing product catalogs receive PDFs containing specifications for multiple products. The {split} API separates each product's specifications, enabling automated data entry into product databases or comparison tools.

* **Technical Documentation**: Companies distributing multilingual instruction manuals receive PDFs with the same content repeated in different languages. The {split} API separates the manual by language, allowing each version to be routed to the appropriate regional system or translation workflow.

<Info>For credit consumption rates, see [Credit Consumption](./ade-credit-consumption#credit-costs-for-the-split-api).</Info>

## Process Overview

Follow these steps to split a document into classified sub-documents:

1. **Parse your document** using the [{parse} API](https://docs.landing.ai/api-reference/tools/ade-parse). The {split} API requires Markdown content from the parse API as input. Save the Markdown output for the next step.

2. **Define your Split Rules** by creating a set of Split Types that describe the different document types or sections in your file. Learn more about [Split Rules](#split-rules). The easiest way to create and test Split Rules is in the [Playground](#split-in-the-playground).

3. **Run the {split} API** by passing the parsed Markdown content and your Split Rules. Choose your method:
   * [Split in the Playground](#split-in-the-playground): Test and refine your Split Rules interactively
   * [Split with the API](#split-with-the-api): Integrate splitting into your application
   * [Split with the Python & TypeScript Libraries](#use-split-with-our-libraries): Use our libraries

4. **Use the split results** in your downstream workflows. The API returns each classified sub-document with its full Markdown content. Learn more about the [response structure](./ade-split-response).

## Split Rules

Split Rules define how the {split} API classifies and separates a document into sub-documents. The Split Rules are a collection of all Split Types you define for a single API call.

Each Split Rule consists of a:

* [Split Type](#split-types)
* [Description](#descriptions-optional) (optional)
* [Identifier](#identifiers-optional) (optional)

<Info>Split Rules are defined differently depending on whether you use the Playground, API, or one of our libraries. See the interface-specific sections below for more information on how to create and pass the Split Rules for your method.</Info>

### Split Types

Split Types define how your document is classified into sections, such as pay stubs, bank statements, and W-2s.

You can define up to 19 Split Types in one API call. If the API cannot determine which Split Type a page belongs to, it classifies the page as **Uncategorized**.

### Descriptions (Optional)

The Description provides additional context about what a Split Type represents. Detailed descriptions help the API identify what information to include in each split and improve classification accuracy.

Descriptions can also impact how the API interprets Identifiers. For example, these two descriptions for clinical notes produce different identifier behavior:

* **Less specific description**: "A clinical note documenting a patient's office visit, including history, exam, assessment, and plan, authored by a provider." The API might consider multiple dates as potential identifiers.
* **More specific description**: "A clinical note documenting a patient's office visit, including history, exam, assessment, and plan, authored by a provider. Each note is separated by office visit date. Do not look at any other dates. Only include date before the words 'office visit'." The API only considers dates directly before "office visit" as identifiers.

### Identifiers (Optional)

When your document contains multiple instances of the same Split Type, use an Identifier to specify what makes each instance unique, such as invoice number, order ID, or date. The API creates a separate split for each unique value of the Identifier.

For example, if your document contains 6 pay stubs and you specify "Pay Stub Date" as the identifier, the API creates 6 separate splits—one for each unique date value.

### Example

A document contains 1 bank statement and 6 pay stubs with different dates. You define two Split Types:

* **Bank Statement** (no identifier needed)
* **Pay Stub** with "Pay Stub Date" as the identifier

The API returns 7 splits: 1 bank statement and 6 pay stubs, each separated by payment date.

## Split in the Playground

To make it as easy as possible to split documents, we've created a wizard in our [Playground](https://ade.landing.ai/my/playground/ade) that guides you through the process.

The Playground is designed as a proof-of-concept to help you understand what the API can do and how it might fit into your workflows. After you've split a document in the Playground copy the code to use it in API calls or our Python and TypeScript libraries so that you can scale.

1. Go to the [Playground](https://ade.landing.ai/my/playground/ade).
2. Click **Split**.
3. Select the file you want to split.
4. {ade} loads and parses the file in the background.
5. You can now create the [Split Rules](#split-rules), which determine how the document is split into sub-documents. There are a few ways to do this:
   * **View Suggested Split Rules**: The app automatically recommends rules based on the parsed content. We recommend trying this approach first, and then editing the rules if needed.
   * **Write a Split Rules Prompt**: Write a prompt that tells the app what specific Split Types and Identifiers should be used. The app then generates rules based on this prompt.
   * **Start from Scratch**: Manually define the Split Rules.
6. After creating your first round of Split Rules, edit them if needed.
7. Click **Split Document** to see the results, which open in a new panel. You can toggle between a visual representation of the results and the actual API JSON response.
8. You can continue to edit the Split Rules if needed.
9. When the document splits as expected, copy the code so that you can scale the API call.

## Split with the API

Split a document by calling the [{split}](https://docs.landing.ai/api-reference/tools/ade-split) endpoint.

This example splits a document containing bank statements and pay stubs:

```shell theme={null}
curl -X POST 'https://api.va.landing.ai/v1/ade/split' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -F 'markdown=@markdown.md' \
  -F 'split_class=[{"name": "Bank Statement", "description": "Document from a bank that summarizes all account activity over a period of time."}, {"name": "Pay Stub", "description": "Document that details an employee'\''s earnings, deductions, and net pay for a specific pay period.", "identifier": "Pay Stub Date"}]' \
  -F 'model=split-latest'
```

### Parameters

Get the full parameters from the [API reference](https://docs.landing.ai/api-reference/tools/ade-split).

* **`markdown`** (required): The Markdown output from the [{parse} API](https://docs.landing.ai/api-reference/tools/ade-parse). You can pass the Markdown content directly or reference a file.
* **`split_class`** (required): A JSON array defining the [Split Rules](#split-rules). Each Split Type is a JSON object with:
  * [`name`](#split-types): The Split Type name (required)
  * [`description`](#descriptions-optional): Additional context about the Split Type (optional)
  * [`identifier`](#identifiers-optional): The field that makes each instance unique (optional)
* **`model`** (optional): The model version to use for splitting. If omitted, the API uses the latest model. For more information, see [Split Model Versions](./ade-split-models).

## Use Split with Our Libraries

Click one of the tiles below to learn how to use the [{split}](https://docs.landing.ai/api-reference/tools/ade-split) API with our libraries.

<CardGroup cols={2}>
  <Card title="Python Library" icon="python" href="./ade-python#split-getting-started">
    Use Split with our Python library.
  </Card>

  <Card
    title="TypeScript Library"
    icon={
  <>
    <img className="block dark:hidden" src="https://mintcdn.com/landingaitest/zBBru77Y9z-WA7GH/images/ts-logo-512-03221Dxcf.png?fit=max&auto=format&n=zBBru77Y9z-WA7GH&q=85&s=e41836c55303fe6a82d458a5bdaa2991" alt="TypeScript" style={{ width: "1.5rem", height: "1.5rem", margin: 0 }} />
    <img className="hidden dark:block" src="https://mintcdn.com/landingaitest/zBBru77Y9z-WA7GH/images/ts-logo-512-DBFF9B.png?fit=max&auto=format&n=zBBru77Y9z-WA7GH&q=85&s=27ddad7064b681cea3f5da0aea4c6be6" alt="TypeScript" style={{ width: "1.5rem", height: "1.5rem", margin: 0 }} />
  </>
}
    href="./ade-typescript#split-getting-started"
  >
    Use Split with our TypeScript library.
  </Card>
</CardGroup>

## Additional Considerations on Splitting

* Each page in a document can only be assigned to one Split Type. If one page has content that could belong to more than one Split Type, the API chooses the Split Type that the page matches more closely.

* The {split} API is different from the `split` parameter in the {parse} API. The {split} API separates a document into sub-documents after parsing, while the `split` parameter can be used during parsing to organize the parsed output by page.