Challenge 1: Document Processing and Vectorized Search

Expected Duration: 60 minutes

Introduction

Welcome to Challenge 1! In this challenge, you'll build a comprehensive document processing and search system using Azure AI services. This foundational challenge demonstrates how to process insurance documents (both text and images), create vectorized search capabilities, and prepare the knowledge base that will power all subsequent AI agent implementations.

What are we building?

In this challenge, we will create a complete document processing and vectorized search system that forms the backbone of our insurance AI agent ecosystem:

Document Upload System: Azure Blob Storage integration for secure document management
Multimodal Processing Pipeline: GPT-4-1-mini powered text and image processing capabilities
OCR Extraction System: Advanced text extraction from insurance claim images
Vectorized Search Index: Azure AI Search with integrated vectorization for semantic search
Hybrid Search Capabilities: Combined keyword, vector, and semantic search functionality

This system will serve as the knowledge foundation for all agents in subsequent challenges, enabling them to access and query insurance policies, claims, and statements intelligently.

Data Structure Overview

Your challenge includes comprehensive insurance data across three categories:

Data Category	Files	Purpose
Claims Data (`data/images/`)	`crash1.jpg`, `crash2.jpg`, `crash3.jpg`, `crash4.jpg`, `crash5.jpg`	Vehicle accident documentation for OCR processing
Policy Data (`data/policies/`)	`commercial_auto_policy.md`, `comprehensive_auto_policy.md`, `high_value_vehicle_policy.md`, `liability_only_policy.md`, `motorcycle_policy.md`	Insurance policy documents for text processing and policy validation
Claim Statements (`data/statements/`)	`crash1_front.jpeg`, `crash1_back.jpeg`, `crash2_front.jpeg`, `crash2_back.jpeg`, `crash3_front.jpeg`, `crash3_back.jpeg`, `crash4_front.jpeg`, `crash4_back.jpeg`, `crash5_front.jpeg`, `crash5_back.jpeg`	Written statements (front and back) corresponding to each claim for comprehensive analysis

Document Processing after the Generative AI Wave

Generative AI has transformed document processing from rigid template-based systems to intelligent, understanding-based approaches. Modern models process text, images, and complex layouts simultaneously, eliminating separate pipelines for different content types.

Key Advantages:

Context-aware extraction without predefined templates
Unified processing of multimodal content (text, images, tables)
Natural language queries instead of fixed extraction fields
Minimal training requirements using pre-trained foundation models

For Insurance: Complex documents—policies, claim photos, handwritten statements, invoices—can now be processed in a single pipeline, understanding both textual terms and visual damage assessments together.

When building document processing systems today, selecting the right AI models is crucial for achieving accurate and efficient results. Here are the primary options available for processing insurance documents:

Mistral Document AI

Mistral's specialized document AI models provide efficient document understanding capabilities:

Optimized for document structure recognition
Strong performance on text extraction and classification
Cost-effective alternative for text-heavy documents
Support for multiple document formats and layouts
Fast inference times for batch processing scenarios

Why Choose Mistral:

Up to 3x faster inference times compared to general-purpose LLMs for document tasks
40-60% cost savings on high-volume document processing workloads
Native multilingual support with 100+ languages out-of-the-box
99%+ accuracy on structured document extraction tasks
Scalable throughput handling 1000+ documents per minute

Mistral Document AI is particularly well-suited for scenarios where you need to process large volumes of textual documents efficiently while maintaining high accuracy and controlling costs.

Azure Document Intelligence

Azure Document Intelligence (formerly Form Recognizer) is an AI service that provides advanced document processing capabilities using multimodal foundation models. Unlike traditional OCR tools, Document Intelligence uses machine learning to understand documents in a more comprehensive way:

Prebuilt Models: Ready-to-use models for invoices, receipts, ID cards, insurance documents, and other common forms
Custom Models: Train models on your specific document types for tailored extraction accuracy
Layout Analysis: Understand document structure including tables, paragraphs, selection marks, and document hierarchy
Key-Value Extraction: Automatically identify and extract field-label pairs from forms
Multi-Page Support: Process complex multi-page documents with sophisticated layout understanding
High Accuracy OCR: Extract printed and handwritten text with high precision across multiple languages
Azure Integration: Native integration with Azure AI services, security, compliance, and Azure AI Search

Azure Document Intelligence is particularly powerful for insurance scenarios where structured forms (claims, policies) and unstructured documents (handwritten statements, accident reports) need to be processed together. Its prebuilt insurance document models can extract policy numbers, claim amounts, dates, and other critical fields automatically.

Multimodal Models

GPT-4.1-mini (GPT-4-1-mini) A powerful multimodal model that can process both text and images with high accuracy. This model excels at:

Understanding complex document layouts and formatting
Extracting structured information from unstructured documents
Processing insurance claim images and photos
Performing optical character recognition (OCR) on documents
Analyzing visual content alongside textual information

GPT-4.1-mini offers an excellent balance between cost, speed, and performance for document processing tasks, making it ideal for processing both policy documents and visual claim evidence.

Task 1 - Statement Processing with Multiple AI Approaches

The statements_processing folder contains advanced examples showcasing different AI approaches for processing insurance claim statements. This section demonstrates how to choose and implement the right model for your specific use case. It will also generate the markdown files needed for vectorization in the next part of the challenge:

GPT Statement Processing (gpt_statement_processing.py)

Uses GPT-4-1-mini for intelligent statement analysis
Excels at understanding context and extracting nuanced information
Ideal for complex, unstructured claim narratives
Provides high-quality extraction with natural language understanding

Mistral Document Intelligence (mistral_doc_intelligence.py)

Leverages Mistral's specialized document AI models
Optimized for structured document processing at scale
Cost-effective for high-volume statement processing
Fast inference times for batch operations

Azure Document Intelligence Integration

Demonstrates prebuilt models for form and document extraction
Shows custom model training for insurance-specific documents
Provides layout analysis and key-value pair extraction
Ideal for standardized forms and structured statements

This comparison helps you understand when to use each approach based on document type, volume, complexity, and cost considerations. Review the implementations to see practical examples of model selection and integration strategies.

Task 2 - Image and Claims Processing

Note: You must run mistral_doc_intelligence.py before starting this part or copy the contents of the examples/mistral folder to ./output/mistral.

Time to extract information from claim images! Please navigate to scripts/imageprocessing.ipynb for a detailed implementation of:

Processing insurance claim photos and accident documentation
Extracting text from images using GPT-4-1-mini vision capabilities
Performing OCR on handwritten statements and invoices
Structuring extracted data for vectorization
Integrating visual claim evidence into Azure AI Search

This notebook showcases multimodal AI processing techniques for analyzing damage photos and extracting critical claim information from visual content.

Task 3 - Policy Document Processing

Time to process your insurance policy documents! Please navigate to scripts/policiesprocessing.ipynb for a comprehensive walkthrough on:

Setting up Azure Blob Storage for document management
Processing text-based policy documents using GPT-4-1-mini
Extracting structured information from policy markdown files
Creating vectorized embeddings for semantic search
Uploading processed documents to Azure AI Search

This notebook demonstrates how to transform unstructured policy text into a searchable knowledge base that agents can query intelligently.

Great! If you are finished and ready for extra challenges, there's much more to explore!

Task 4 (optional) - Explore the Full Potential of Mistral Models

Mistral AI offers powerful document understanding capabilities through Data Annotations - a feature that enables structured data extraction with precise location information (bounding boxes) for each extracted field.

What are Data Annotations?

Data Annotations in Mistral Document AI allow you to:

Feature	Description
Structured Extraction	Define JSON schemas to extract specific fields from documents
Bounding Boxes	Get precise coordinates showing where each field was found on the page
Visual Verification	Highlight extracted regions for human review and validation
Confidence Scores	Receive confidence levels for each extracted field
Multi-page Support	Process complex documents with page-level annotations

Why Use Data Annotations for Insurance Claims?

In insurance claim processing, knowing where information came from is just as important as what was extracted:

Audit Trail: Bounding boxes provide visual proof of where data originated
Fraud Detection: Verify that signatures, dates, and amounts match their expected locations
Human-in-the-Loop: Enable reviewers to quickly verify AI extractions
Regulatory Compliance: Maintain traceable data lineage for compliance requirements

Implementation Example

Navigate to statements_processing/mistral_doc_intel_annotations.py for a comprehensive implementation that demonstrates:

# Define a schema for structured extraction
CLAIM_STATEMENT_SCHEMA = {
    "type": "object",
    "properties": {
        "claimant_name": {"type": "string", "description": "Full name of claimant"},
        "policy_number": {"type": "string", "description": "Insurance policy number"},
        "incident_description": {"type": "string", "description": "What happened"},
        "damage_description": {"type": "string", "description": "Damage details"},
        "signature_present": {"type": "boolean", "description": "Is signature present"}
    }
}

# Extract with bounding box annotations
result = extract_with_annotations(
    file_path="claim_form.pdf",
    json_schema=CLAIM_STATEMENT_SCHEMA,
    include_bboxes=True
)

# Each field includes its location on the document
for annotation in result['annotations']:
    print(f"Field: {annotation.field_name}")
    print(f"Value: {annotation.value}")
    print(f"Location: Page {annotation.bbox.page}, ({annotation.bbox.x_min}, {annotation.bbox.y_min})")

Key Functions in the Implementation

Function	Purpose
`extract_with_annotations()`	Core extraction with JSON schema and bounding boxes
`extract_claim_statement()`	Pre-configured for insurance claim forms
`extract_damage_assessment()`	Specialized for vehicle damage analysis
`batch_extract_with_annotations()`	Process multiple documents concurrently
`visualize_annotations()`	Generate visual overlay of extracted fields
`export_annotations_to_json()`	Save results with full annotation data

Running the Demo

cd challenge-1/statements_processing
python mistral_doc_intel_annotations.py ../../challenge-0/data/statements/crash1_front.jpeg

This will process the claim statement, extract structured data with annotations, and export the results to a JSON file showing the exact location of each extracted field.

Learn More

🎯 Conclusion

Congratulations! You've successfully built a comprehensive document processing and vectorized search system that serves as the foundation for intelligent AI agents.

Key Achievements:

Processed insurance policy documents and created searchable embeddings with Azure AI Search
Extracted information from claim images using GPT-4-1-mini's multimodal capabilities
Implemented hybrid search combining keyword, vector, and semantic ranking
Explored multiple AI approaches (GPT, Mistral, Azure Document Intelligence) for different use cases
Established a knowledge base that AI agents can query using natural language

What You Built: Your system now intelligently processes text policies, claim photos, and statements—transforming unstructured insurance documents into a queryable knowledge corpus. This infrastructure enables the AI agents you'll build in Challenge 2 to access policy terms, analyze claim evidence, and make informed decisions.

Next Challenge: In Challenge 2, you'll build an AI agent that leverages this document processing foundation to autonomously orchestrate claims assessment workflows. Ready to continue? Head to Challenge 2!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Challenge 1: Document Processing and Vectorized Search

Introduction

What are we building?

Data Structure Overview