All Work02 / 03
2025PythonOCRMLComputer Vision
Entity Extraction Model
Document Intelligence at Hackathon Speed
Private repo
20% → 70%
AccuracyAmazon ML Challenge
ContextPython · OCR · CV
StackProblem
Manually extracting structured data from document images is slow, error-prone, and fundamentally does not scale.
Approach
Built an ML pipeline combining OCR with rule-based extraction techniques. Iteratively refined accuracy through preprocessing improvements and feature engineering during the Amazon ML Challenge 2024.
Outcome
Accuracy improved from 20% to 70% — a 3.5× gain — through systematic experimentation within tight hackathon constraints.