> ## Documentation Index
> Fetch the complete documentation index at: https://docs.landing.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Chunk Types

export const dpt2 = 'DPT-2';

export const dpt1 = 'DPT-1';

export const dpt = 'Document Pre-Trained Transformer';

export const companyName = 'LandingAI';

export const extract = 'ADE Extract';

export const parse = 'ADE Parse';

export const ade = 'Agentic Document Extraction';

## Chunk Definition

A **chunk** is a discrete element extracted from a document, such as a block of text, a table, or a figure.

## Chunk Overview

When you send a document to the {ade} API, it analyzes the content on each page, breaks it down into meaningful elements, and returns each one as a chunk.

Each chunk includes structured data that describes the content of the chunk and the location of the chunk in the document. This structure makes it easier to understand the extracted data and use it for downstream tasks.

Extracted chunks are included in the API response.

## Semantic Chunking

The {ade} API uses semantic chunking, which means it intelligently groups content based on meaning rather than just layout or formatting.

Instead of splitting documents at arbitrary points like fixed lengths or paragraph breaks, the API identifies coherent units of information (like complete ideas, logical sections, or related data) and extracts them as individual chunks.

Semantic chunking improves the relevance and usability of the extracted content, especially in downstream tasks like search, retrieval, and analysis.

## Why Do We Create Chunks?

Chunking makes downstream tasks faster, more accurate, and easier to scale. It serves several key purposes:

* **Enables downstream apps to process large documents efficiently**: Chunking allows applications like RAG systems and LLMs to index and retrieve smaller, meaningful segments instead of full documents. This helps avoid input size constraints, such as token limits.
* **Improves retrieval granularity**: Smaller, semantically meaningful units allow for more accurate and relevant results in downstream tasks like question answering and summarization.
* **Supports downstream semantic search and embeddings**: Well-structured chunks provide better inputs for embedding and make it easier to index and retrieve information during search.
* **Maintains human readability**: Chunking reflects how a human would naturally read the document, maintaining the visual and logical relationships between elements on the page.

## Chunk Types

Each chunk is labeled with a chunk type (`chunk_type` or `type`, depending on the API used), which identifies what kind of content it represents.

The chunk types returned by {ade} are:

* [`text`](#text)
* [`table`](#table)
* [`marginalia`](#marginalia)
* [`figure`](#figure)
* [`logo`](#logo): This is only available when using [{dpt2}](./ade-parse-models#dpt-2)
* [`card`](#card): This is only available when using [{dpt2}](./ade-parse-models#dpt-2)
* [`attestation`](#attestation): This is only available when using [{dpt2}](./ade-parse-models#dpt-2)
* [`scan_code`](#scan_code): This is only available when using [{dpt2}](./ade-parse-models#dpt-2)

## Text

A `text` chunk type is an element that consists entirely of characters (letters and numbers), such as:

* paragraphs
* titles and headings
* lists
* form fields
* checkboxes
* radio buttons
* equations
* code blocks
* handwritten text

### Output for Key-Value Pairs

If the `text` content has key-value pairs, like form fields, the extracted data will be returned as key-value pairs separated by line breaks (`\n`).

### Example: Paragraph

Here is an example of the API marking a paragraph as a `text` chunk:

<img src="https://mintcdn.com/landingaitest/F5faBIUtwTly1YVw/images/ade-text-3.png?fit=max&auto=format&n=F5faBIUtwTly1YVw&q=85&s=ac9c51165a273c7567907024f97718f5" alt="Chunk Type: Text" width="976" height="328" data-path="images/ade-text-3.png" />

Here is the rendered Markdown for that chunk:

<img src="https://mintcdn.com/landingaitest/F5faBIUtwTly1YVw/images/ade-text-3-md-light.png?fit=max&auto=format&n=F5faBIUtwTly1YVw&q=85&s=735f59c6953e3eb4781d77652dc9c7df" alt="Chunk Type: Text" width="631" height="108" data-path="images/ade-text-3-md-light.png" />

### Example: Lists

Here is an example of the API marking a list as a `text` chunk:

<img src="https://mintcdn.com/landingaitest/F5faBIUtwTly1YVw/images/ade-text-2.png?fit=max&auto=format&n=F5faBIUtwTly1YVw&q=85&s=453ac4619a73bc0f544d6076bc32cd1c" alt="Chunk Type: Text" width="560" height="179" data-path="images/ade-text-2.png" />

Here is the rendered Markdown for that chunk:

<img src="https://mintcdn.com/landingaitest/F5faBIUtwTly1YVw/images/ade-text-2-md.png?fit=max&auto=format&n=F5faBIUtwTly1YVw&q=85&s=e60fb1780a31662cc4bb4b1b960b4907" alt="Chunk Type: Text" width="616" height="303" data-path="images/ade-text-2-md.png" />

## Table

A `table` chunk type is a grid of rows and columns containing data.

{ade} doesn't require gridlines to be present, and typically interprets well-aligned sets of data to be part of a table. For example, part of a receipt can be extracted as a table if the purchased items align with the costs.

When you parse spreadsheets, sets of data are also interpreted as `table` chunks.

### Example: Receipt

Here is an example of the API marking receipt line items as a `table` chunk:

<img src="https://mintcdn.com/landingaitest/F5faBIUtwTly1YVw/images/ade-table-receipt.png?fit=max&auto=format&n=F5faBIUtwTly1YVw&q=85&s=c19bb0754127297ef93840bee318ece3" alt="Chunk Type: Table" width="446" height="163" data-path="images/ade-table-receipt.png" />

Here is the rendered Markdown for that chunk:

<img src="https://mintcdn.com/landingaitest/F5faBIUtwTly1YVw/images/ade-table-receipt-md.png?fit=max&auto=format&n=F5faBIUtwTly1YVw&q=85&s=14df25726bc1bca676f471d18dca005e" alt="Chunk Type: Table" width="746" height="145" data-path="images/ade-table-receipt-md.png" />

### Example: Earnings Statement

Here is an example of the API marking part of an earnings statement as a `table` chunk:

<img src="https://mintcdn.com/landingaitest/F5faBIUtwTly1YVw/images/ade-table-2.png?fit=max&auto=format&n=F5faBIUtwTly1YVw&q=85&s=fb7e3e7ffdb385749f8a14ccfeaf487c" alt="Chunk Type: Table" width="441" height="520" data-path="images/ade-table-2.png" />

Here is the rendered Markdown for that chunk:

<img src="https://mintcdn.com/landingaitest/F5faBIUtwTly1YVw/images/ade-table-2-md.png?fit=max&auto=format&n=F5faBIUtwTly1YVw&q=85&s=7074928479d8fac0cc8959e7693b0bc8" alt="Chunk Type: Table" width="760" height="753" data-path="images/ade-table-2-md.png" />

### Example: Spreadsheet

Here is an example of the API marking data in a spreadsheet as a `table` chunk:

<img src="https://mintcdn.com/landingaitest/GBZRBncMPEmJrjaW/images/ade-table-spreadsheet.png?fit=max&auto=format&n=GBZRBncMPEmJrjaW&q=85&s=013c317e607a477e68cfc4527a47c07f" alt="Chunk Type: Table" width="400" data-path="images/ade-table-spreadsheet.png" />

Here is the rendered Markdown for that chunk:

<img src="https://mintcdn.com/landingaitest/GBZRBncMPEmJrjaW/images/ade-table-spreadsheet-md.png?fit=max&auto=format&n=GBZRBncMPEmJrjaW&q=85&s=56ff001f266f844e0ac6180e198b03d2" alt="Chunk Type: Table" width="500" data-path="images/ade-table-spreadsheet-md.png" />

## Marginalia

A `marginalia` chunk type is a set of text in the top, bottom, or side margins of a document, including:

* page headers
* page footers
* page numbers
* handwritten notes in margins
* line numbers on one side of a page

#### Example: Header and Page Number

Here is an example of the API marking a header and page number as a `page_header` chunk:

<img src="https://mintcdn.com/landingaitest/F5faBIUtwTly1YVw/images/ade-marginalia-1.png?fit=max&auto=format&n=F5faBIUtwTly1YVw&q=85&s=f62bf1e5abcc61157a263cf9465bb800" alt="Chunk Type: Page Header" width="638" height="234" data-path="images/ade-marginalia-1.png" />

Here is the rendered Markdown for that chunk:

<img src="https://mintcdn.com/landingaitest/F5faBIUtwTly1YVw/images/ade-marginalia-1-md.png?fit=max&auto=format&n=F5faBIUtwTly1YVw&q=85&s=4e21d7704f127b158abecee17ab3e9b4" alt="Chunk Type: Page Header" width="552" height="132" data-path="images/ade-marginalia-1-md.png" />

## Figure

A `figure` chunk type is an element that contains visual or graphical non-text content, including:

* pictures
* graphs (bar graphs, line graphs, etc.)
* flowcharts
* diagrams

### Example: Medical Imaging

Here is an example of the API marking a pathology image as a `figure` chunk:

<img src="https://mintcdn.com/landingaitest/F5faBIUtwTly1YVw/images/ade-figure-3.png?fit=max&auto=format&n=F5faBIUtwTly1YVw&q=85&s=fb10cd7894e422628ddd8f10cdc3a98e" alt="Chunk Type: Figure" width="707" height="531" data-path="images/ade-figure-3.png" />

Here is the rendered Markdown for that chunk:

<img src="https://mintcdn.com/landingaitest/F5faBIUtwTly1YVw/images/ade-figure-3-md.png?fit=max&auto=format&n=F5faBIUtwTly1YVw&q=85&s=784ffd8250580e802ae9c361b4602511" alt="Chunk Type: Figure" width="750" height="436" data-path="images/ade-figure-3-md.png" />

### Example: Bar Chart

Here is an example of the API marking a bar chart as a `figure` chunk:

<img src="https://mintcdn.com/landingaitest/F5faBIUtwTly1YVw/images/ade-figure-4.png?fit=max&auto=format&n=F5faBIUtwTly1YVw&q=85&s=16473ae4d3c020b30dedb3d395bb940e" alt="Chunk Type: Figure" width="624" height="429" data-path="images/ade-figure-4.png" />

Here is the rendered Markdown for that chunk:

<img src="https://mintcdn.com/landingaitest/F5faBIUtwTly1YVw/images/ade-figure-4-md.png?fit=max&auto=format&n=F5faBIUtwTly1YVw&q=85&s=14489121fc0cc9feb2236939e2091488" alt="Chunk Type: Figure" width="970" height="1123" data-path="images/ade-figure-4-md.png" />

## Logo

A `logo` chunk type identifies logos.

<Info>The `logo` chunk type is only available when using [{dpt2}](./ade-parse-models#dpt-2).</Info>

### Example: Logo in Header

Here is an example of the API marking a logo in a document header as a `logo` chunk:

<img src="https://mintcdn.com/landingaitest/_KsvkXhGOfpH-Yar/images/ade-logo-header.png?fit=max&auto=format&n=_KsvkXhGOfpH-Yar&q=85&s=24be3e55229ebe1b578a43b1ed6b2d10" alt="Chunk Type: Logo" width="1678" height="598" data-path="images/ade-logo-header.png" />

Here is the rendered Markdown for that chunk:

<img src="https://mintcdn.com/landingaitest/_KsvkXhGOfpH-Yar/images/ade-logo-header-md.png?fit=max&auto=format&n=_KsvkXhGOfpH-Yar&q=85&s=d90ea324a174388a74ec38d4ea8ae8ac" alt="Chunk Type: Logo" width="1276" height="276" data-path="images/ade-logo-header-md.png" />

## Card

A `card` chunk type identifies:

* ID cards
* driver licenses

<Info>The `card` chunk type is only available when using [{dpt2}](./ade-parse-models#dpt-2).</Info>

### Example: Driver's License

Here is an example of the API marking a driver's license as a `card` chunk:

<img src="https://mintcdn.com/landingaitest/_KsvkXhGOfpH-Yar/images/ade-card-license.png?fit=max&auto=format&n=_KsvkXhGOfpH-Yar&q=85&s=8021381f130c3c50f28e8f9384686562" alt="Chunk Type: Card" width="551" height="358" data-path="images/ade-card-license.png" />

Here is the rendered Markdown for that chunk:

<img src="https://mintcdn.com/landingaitest/_KsvkXhGOfpH-Yar/images/ade-card-license-md.png?fit=max&auto=format&n=_KsvkXhGOfpH-Yar&q=85&s=e49551ddb00722107854b731e7340a28" alt="Chunk Type: Card" width="818" height="348" data-path="images/ade-card-license-md.png" />

## Attestation

An `attestation` chunk type includes:

* signatures
* stamps
* seals

<Info>The `attestation` chunk type is only available when using [{dpt2}](./ade-parse-models#dpt-2).</Info>

### Example: Signature

Here is an example of the API marking a signature as an `attestation` chunk:

<img src="https://mintcdn.com/landingaitest/_KsvkXhGOfpH-Yar/images/ade-attestation-signature.png?fit=max&auto=format&n=_KsvkXhGOfpH-Yar&q=85&s=3159ea2878768b39ba28a2eada27df8b" alt="Chunk Type: Attestation" width="412" height="292" data-path="images/ade-attestation-signature.png" />

Here is the rendered Markdown for that chunk:

<img src="https://mintcdn.com/landingaitest/_KsvkXhGOfpH-Yar/images/ade-attestation-signature-md.png?fit=max&auto=format&n=_KsvkXhGOfpH-Yar&q=85&s=341cb99428491b5d07819581f261e6a3" alt="Chunk Type: Attestation" width="724" height="216" data-path="images/ade-attestation-signature-md.png" />

## Scan\_code

A `scan_code` chunk type identifies:

* QR codes
* bar codes

<Info>The `scan_code` chunk type is only available when using [{dpt2}](./ade-parse-models#dpt-2).</Info>

### Example: Bar Codes

Here is an example of the API marking two barcodes as `scan_code` chunks:

<img src="https://mintcdn.com/landingaitest/_KsvkXhGOfpH-Yar/images/ade-scan-code-barcode.png?fit=max&auto=format&n=_KsvkXhGOfpH-Yar&q=85&s=35a10a0b7daa0c971f4e2cccb67556ba" alt="Chunk Type: Scan Code" width="523" height="350" data-path="images/ade-scan-code-barcode.png" />

Here is the rendered Markdown for these chunks:

<img src="https://mintcdn.com/landingaitest/_KsvkXhGOfpH-Yar/images/ade-scan-code-barcode-md.png?fit=max&auto=format&n=_KsvkXhGOfpH-Yar&q=85&s=39d7a498fb2e57967e7c0519edf6868b" alt="Chunk Type: Scan Code" width="735" height="416" data-path="images/ade-scan-code-barcode-md.png" />
