The field extraction results include a confidence score for each extracted field. This score indicates how certain is about the accuracy of the extracted data.Having a confidence score allows you to create logic to route fields with low-confidence scores to human reviewers before sending data to downstream systems. For example, you can write a script that sends an extracted field to a human reviewer if the confidence score is lower than a set threshold.The higher the confidence score, the more confident is that the prediction is accurate.
The confidence score feature is experimental and still in development, and may not return accurate results.
The API returns a confidence property for each extracted field within the data.extraction_metadata object.The sample extraction_metadata output below shows how the confidence property displays after extraction.
Copy the Python script below and save it to a local directory:
Sample Python Script for Field Extraction
Copy
Ask AI
from __future__ import annotationsfrom pydantic import BaseModel, Fieldfrom agentic_doc.parse import parseclass SampleExtractionSchema(BaseModel): accountHolder: str = Field( ..., description='The full name of the person who holds the bank account.', title='Account Holder Name', ) accountNumber: str = Field( ..., description='The bank account number associated with the account holder.', title='Bank Account Number', )# Parse a file and extract the fieldsresults = parse("estatement.pdf", extraction_model=SampleExtractionSchema)fields = results[0].extraction# Return the value of the extracted fieldsprint("Extracted Fields:")print(fields) # Return the value of the extracted field metadataprint("\nExtraction Metadata:")print(results[0].extraction_metadata)
Because the confidence score feature is experimental and still in development, there are certain situations where scores are not available.The confidence score value will be null in the following situations:
Tables: Data extracted from tables will have a null confidence score.
Changes to formatting: Fields with custom formatting applied during extraction will have a null confidence score. For example, reformatting a date from “DD-MM-YYYY” to “MM-DD-YYYY” results in a null score.