chunk_type
), which identifies what kind of content it represents.
The chunk types returned by are:
text
chunk type is an element that consists entirely of characters (letters and numbers), such as:
text
content has key-value pairs, like form fields, the extracted data will be returnd as key-value pairs separated by line breaks (\n
).
Here is an example JSON output for a text
chunk that has form fields:
text
chunk:
text
chunk:
table
chunk type is a grid of rows and columns containing data.
doesn’t require gridlines to be present, and typically interprets well-aligned sets of data to be part of a table. For example, part of a receipt can be extracted as a table if the purchased items align with the costs.
text
object. For table
chunk types, the chunk is returned as HTML.
Here is an example JSON output for a table
chunk:
table
chunk:
table
chunk:
marginalia
chunk type is a set of text in the top, bottom, or side margins of a document, including:
page_header
chunk:
figure
chunk type is an element that contains visual or graphical non-text content, including:
figure
chunk:
figure
chunk:
marginalia
:
page_header
page_footer
page_number
text
:
title
form
key_value
marginalia
type doesn’t exist and will fallback to page_header
.