Skip to main content
When you split a document with the API, the classified splits and metadata are returned in a structured JSON format.

Response Structure

The response contains the following top-level fields:
  • splits: Array of split objects containing the classified sub-documents.
  • metadata: Processing information including credit usage, duration, filename, job ID, page count, and model version.

Splits Array (splits)

The splits field contains an array of classified sub-documents. Each split object includes:
  • classification: The Split Type name assigned to this sub-document (e.g., “Bank Statement”, “Pay Stub”).
  • identifier: The unique identifier value for this split (e.g., “2024-01-15”, “Invoice #12345”). This field is null if no identifier was specified for this Split Type.
  • pages: Array of zero-indexed page numbers that belong to this split.
  • markdowns: Array of Markdown content strings, one for each page in this split. The order matches the pages array.

Classification and Identifiers

The classification field corresponds to the Split Type names you defined in your Split Rules. If the API cannot classify a page, it assigns the classification “Uncategorized”. When you specify an identifier in your Split Rules (such as “Date” or “Invoice Number”), the API creates separate splits for each unique identifier value it finds. The identifier field contains the extracted value (e.g., “2024-01-15” or “INV-001”).

Pages and Markdown Content

The pages array lists which pages belong to each split. Pages are zero-indexed, so the first page is 0. The markdowns array contains the Markdown content for each page. Each element corresponds to the page at the same index in the pages array. For example, if pages is [0, 1, 2], then markdowns[0] contains the Markdown for page 0, markdowns[1] contains the Markdown for page 1, and markdowns[2] contains the Markdown for page 2.

Processing Metadata (metadata)

The metadata field provides information about the split process:
  • filename: The name of the input Markdown file.
  • org_id: Organization identifier.
  • page_count: Total number of pages in the document.
  • duration_ms: Processing time in milliseconds.
  • credit_usage: Number of credits consumed.
  • job_id: Unique job identifier.
  • version: Model version used for splitting. For more information, go to Split Model Versions.

Example Response

Here is a complete example showing a split response for a document containing bank statements and pay stubs:
{
  "splits": [
    {
      "classification": "Bank Statement",
      "identifier": null,
      "pages": [0],
      "markdowns": [
        "<a id='72ba3cca-01e5-407b-9fc4-81f54f9f0c51'></a>\n\n## Bank Statement\n\nAccount Number: 1234567890\n\nStatement Period: January 1 - January 31, 2025\n\n| Date | Description | Amount |\n|------|-------------|--------|\n| 01/05 | Deposit | $2,500.00 |\n| 01/12 | Withdrawal | -$500.00 |\n\nEnding Balance: $2,000.00"
      ]
    },
    {
      "classification": "Pay Stub",
      "identifier": "2025-01-15",
      "pages": [1],
      "markdowns": [
        "<a id='a3f5d8c9-2b4e-4a1c-8f7e-9d6c5b4a3e2f'></a>\n\n## Pay Stub\n\nEmployee: John Smith\n\nPay Date: January 15, 2025\n\nGross Pay: $6,000.00\n\nNet Pay: $4,500.00"
      ]
    },
    {
      "classification": "Pay Stub",
      "identifier": "2025-01-30",
      "pages": [2],
      "markdowns": [
        "<a id='5b8865b9-1a81-46df-bcf7-0bdbed9130dc'></a>\n\n## Pay Stub\n\nEmployee: John Smith\n\nPay Date: January 30, 2025\n\nGross Pay: $6,000.00\n\nNet Pay: $4,500.00"
      ]
    }
  ],
  "metadata": {
    "filename": "mixed-documents.md",
    "org_id": "org_abc123",
    "page_count": 3,
    "duration_ms": 2145,
    "credit_usage": 3.0,
    "job_id": "split_xyz789",
    "version": "split-20251105"
  }
}
In this example:
  • The document was split into 3 sub-documents: 1 bank statement and 2 pay stubs.
  • The bank statement has no identifier (set to null).
  • Each pay stub is identified by its pay date (“2025-01-15” and “2025-01-30”), creating separate splits even though they have the same classification.
  • Each split contains the page numbers and Markdown content for that sub-document.