is in Preview. This feature is still in development and may not return accurate results. Do not use this feature in production environments.
Example Use Cases
- Financial Services: Financial institutions processing Know Your Customer (KYC) documentation often receive PDFs containing multiple document types for each customer, such as bank statements, utility bills, and identification documents. After parsing the batch, the API separates and classifies each document type.
- Healthcare: Healthcare systems ingesting patient records may receive batched PDFs containing intake forms, pathology reports, and medication lists. The API separates these documents by type for routing to appropriate systems.
- Accounting: Accounting departments processing expense documentation receive PDFs with multiple invoices and receipts. The API separates each document and can use identifiers like invoice numbers or dates to create individual splits for each transaction.
- Academic Research: Research institutions and libraries processing academic articles receive PDFs containing article bodies, references, and supplemental materials. The API separates these sections for indexing, citation extraction, or archival purposes.
- Product Documentation: Organizations managing product catalogs receive PDFs containing specifications for multiple products. The API separates each product’s specifications, enabling automated data entry into product databases or comparison tools.
- Technical Documentation: Companies distributing multilingual instruction manuals receive PDFs with the same content repeated in different languages. The API separates the manual by language, allowing each version to be routed to the appropriate regional system or translation workflow.
For information about pricing and credits, go to Pricing & Billing.
Availability
The API is available:- in the Playground
- by calling the endpoint directly:
- when using the Python and TypeScript libraries
Process Overview
Follow these steps to split a document into classified sub-documents:- Parse your document using the API. The API requires Markdown content from the parse API as input. Save the Markdown output for the next step.
- Define your Split Rules by creating a set of Split Types that describe the different document types or sections in your file. Learn more about Split Rules. The easiest way to create and test Split Rules is in the Playground.
-
Run the API by passing the parsed Markdown content and your Split Rules. Choose your method:
- Split in the Playground: Test and refine your Split Rules interactively
- Split with the API: Integrate splitting into your application
- Split with the Python & TypeScript Libraries: Use our libraries
- Use the split results in your downstream workflows. The API returns each classified sub-document with its full Markdown content. Learn more about the response structure.
Split Rules
Split Rules define how the API classifies and separates a document into sub-documents. The Split Rules are a collection of all Split Types you define for a single API call. Each Split Rule consists of a:- Split Type
- Description (optional)
- Identifier (optional)
Split Rules are defined differently depending on whether you use the Playground, API, or one of our libraries. See the interface-specific sections below for more information on how to create and pass the Split Rules for your method.
Split Types
Split Types define how your document is classified into sections, such as pay stubs, bank statements, and W-2s. You can define up to 19 Split Types in one API call. If the API cannot determine which Split Type a page belongs to, it classifies the page as Uncategorized.Descriptions (Optional)
The Description provides additional context about what a Split Type represents. Detailed descriptions help the API identify what information to include in each split and improve classification accuracy. Descriptions can also impact how the API interprets Identifiers. For example, these two descriptions for clinical notes produce different identifier behavior:- Less specific description: “A clinical note documenting a patient’s office visit, including history, exam, assessment, and plan, authored by a provider.” The API might consider multiple dates as potential identifiers.
- More specific description: “A clinical note documenting a patient’s office visit, including history, exam, assessment, and plan, authored by a provider. Each note is separated by office visit date. Do not look at any other dates. Only include date before the words ‘office visit’.” The API only considers dates directly before “office visit” as identifiers.
Identifiers (Optional)
When your document contains multiple instances of the same Split Type, use an Identifier to specify what makes each instance unique, such as invoice number, order ID, or date. The API creates a separate split for each unique value of the Identifier. For example, if your document contains 6 pay stubs and you specify “Pay Stub Date” as the identifier, the API creates 6 separate splits—one for each unique date value.Example
A document contains 1 bank statement and 6 pay stubs with different dates. You define two Split Types:- Bank Statement (no identifier needed)
- Pay Stub with “Pay Stub Date” as the identifier
Split in the Playground
To make it as easy as possible to split documents, we’ve created a wizard in our Playground that guides you through the process. The Playground is designed as a proof-of-concept to help you understand what the API can do and how it might fit into your workflows. After you’ve split a document in the Playground copy the code to use it in API calls or our Python and TypeScript libraries so that you can scale.- Go to the Playground.
- Click Split.
- Select the file you want to split.
- loads and parses the file in the background.
- You can now create the Split Rules, which determine how the document is split into sub-documents. There are a few ways to do this:
- View Suggested Split Rules: The app automatically recommends rules based on the parsed content. We recommend trying this approach first, and then editing the rules if needed.
- Write a Split Rules Prompt: Write a prompt that tells the app what specific Split Types and Identifiers should be used. The app then generates rules based on this prompt.
- Start from Scratch: Manually define the Split Rules.
- After creating your first round of Split Rules, edit them if needed.
- Click Split Document to see the results, which open in a new panel. You can toggle between a visual representation of the results and the actual API JSON response.
- You can continue to edit the Split Rules if needed.
- Once you’re happy with the results, copy the code so that you can scale the API call.
Split with the API
Split a document by calling the endpoint. This example splits a document containing bank statements and pay stubs:Parameters
Get the full parameters from the API reference.markdown(required): The Markdown output from the API. You can pass the Markdown content directly or reference a file.split_class(required): A JSON array defining the Split Rules. Each Split Type is a JSON object with:name: The Split Type name (required)description: Additional context about the Split Type (optional)identifier: The field that makes each instance unique (optional)
model(optional): The model version to use for splitting. If omitted, the API uses the latest model. For more information, see Split Model Versions.
Split with the Python & TypeScript Libraries
Click one of the tiles below to learn how to split documents with our libraries.Python Library
Split documents with our Python library.
TypeScript Library
Split documents with our TypeScript library.
Additional Considerations on Splitting
- Each page in a document can only be assigned to one Split Type. If one page has content that could belong to more than one Split Type, the API chooses the Split Type that the page matches more closely.
-
The API is different from the
splitparameter in the API. The API separates a document into sub-documents after parsing, while thesplitparameter can be used during parsing to organize the parsed output by page.

