Vendor accuracy claims for OCR and document AI are notoriously hard to verify. Numbers like "99% accuracy" appear in every data sheet, but they rarely specify what documents were tested, what quality level, and what exactly "accuracy" means (character-level? field-level? document-level?). Between Q2 and Q4 2025, we ran a structured benchmark across 50,000 real enterprise documents — sourced with permission from Synairo clients — through five OCR and document AI systems. Here are the results.
Methodology
We measured field-level extraction accuracy: for each document type, we defined the key fields that must be extracted correctly, and scored each document as correct only if all key fields matched the ground-truth human annotation. Documents were split into three quality tiers: High (native digital PDF), Medium (scanned at 200+ DPI, clean background), and Low (scanned below 150 DPI, shadows, rotation, or handwritten annotations). The five systems tested were: Scanforce, Azure Document Intelligence (prebuilt invoice model), Google Document AI (Form Parser), AWS Textract (Analyze Document), and Tesseract 5.x with post-processing.
Results by document type (High quality, digital PDF)
- Supplier invoices: Scanforce 99.5%, Azure DI 98.8%, Google DAI 98.3%, AWS Textract 97.6%, Tesseract 91.2%
- Purchase orders: Scanforce 99.1%, Azure DI 98.5%, Google DAI 97.9%, AWS Textract 97.1%, Tesseract 88.7%
- Delivery notes / GRN: Scanforce 98.7%, Azure DI 97.4%, Google DAI 97.1%, AWS Textract 96.3%, Tesseract 85.4%
- Credit notes: Scanforce 99.3%, Azure DI 98.6%, Google DAI 98.0%, AWS Textract 97.2%, Tesseract 90.1%
- Bank statements: Scanforce 98.4%, Azure DI 97.8%, Google DAI 97.5%, AWS Textract 96.8%, Tesseract 83.2%
- Contracts (structured): Scanforce 94.1%, Azure DI 92.3%, Google DAI 91.8%, AWS Textract 90.4%, Tesseract 76.5%
- Contracts (unstructured): Scanforce 87.3%, Azure DI 84.1%, Google DAI 83.7%, AWS Textract 81.2%, Tesseract 62.4%
- Expense reports: Scanforce 97.8%, Azure DI 96.5%, Google DAI 96.1%, AWS Textract 95.3%, Tesseract 82.1%
- ID documents (passports, IDs): Scanforce 99.2%, Azure DI 99.0%, Google DAI 98.7%, AWS Textract 98.2%, Tesseract 94.5%
- Customs declarations: Scanforce 96.3%, Azure DI 94.8%, Google DAI 94.2%, AWS Textract 93.1%, Tesseract 79.8%
- Insurance certificates: Scanforce 95.7%, Azure DI 93.9%, Google DAI 93.4%, AWS Textract 92.0%, Tesseract 77.3%
- Payslips: Scanforce 98.1%, Azure DI 96.9%, Google DAI 96.5%, AWS Textract 95.4%, Tesseract 81.6%
The quality cliff
The most striking finding was how sharply accuracy drops in the Low quality tier. Scanforce on Low-quality scans averaged 93.1% across all document types — still the best performer, but a 6-point drop from its high-quality baseline. Azure Document Intelligence dropped 9 points, Google 10 points. Tesseract dropped 24 points, averaging 59.8% on low-quality scans — essentially unusable for automation. If your document intake includes scanned invoices from legacy suppliers, pre-processing quality is the most important variable to optimize before comparing extraction engines.
Unstructured contracts: the unsolved problem
Contracts remain the hardest document type across all systems. Unlike invoices, which follow predictable layouts, contracts vary enormously in structure and use legal language that requires semantic understanding, not just pattern matching. The best-performing system (Scanforce at 87.3%) was fine-tuned on a large corpus of Polish and EU contract templates; the gap between fine-tuned and off-the-shelf models was 12+ points. If contract extraction is a core use case for your organization, budget for a fine-tuning project on your specific contract library — generic models will disappoint.
Cost comparison
At 50,000 documents per month, the cost-per-document ranged from €0.006 (Tesseract, self-hosted) to €0.031 (Azure Document Intelligence at standard pricing). Scanforce sits in the mid-range at €0.018 per document for the full extraction pipeline including validation and confidence scoring. For most organizations, the accuracy premium of a specialized system pays for itself in reduced manual review costs within the first 2–3 months of production use.
Raw OCR character accuracy (what vendors usually quote) is not the right metric for business decisions. Always benchmark on field-level extraction accuracy for your specific document types at your actual scan quality.