Extract Data Based on Document Type (Classification)
As part of the extraction process, you can classify documents and extract data based on the type of document it is. For example, you can extract different sets of data from Invoices, Receipts, and Waybills.To classify documents, use the enum keyword to define the document types in your script. Here is an example:
Copy
Ask AI
# Define document typesclass_schema = { "type": "object", "properties": { "document_type": {"type": "string", "enum": ["Document Type 1", "Document Type 2", "Document Type 3"]} }, "required": ["document_type"],}
Let’s say you need to parse three types of documents: Passports, Invoices, and Other. The data you need to extract depends on the document type.The following script shows you how to define your document types, the fields to extract for each document type, and the code to actually parse the documents, classify the documents, and extract the data.