What is OCR?
Optical Character Recognition (OCR) is a technology that converts text from images, PDFs and scanned documents into machine-readable, structured data.
How OCR works
At a high level, OCR pipelines follow three steps:
1. Image preprocessing
The document is deskewed, denoised and binarised. Contrast is normalised so the next step can reliably distinguish characters from the background.
2. Character detection
The engine segments the page into regions, lines and individual character candidates. Modern systems use neural networks to locate text in complex layouts, including rotated and handwritten content.
3. Text extraction
Detected character shapes are classified into Unicode characters, then grouped back into words, lines and paragraphs. The output is plain text and, in modern stacks, structured fields (dates, totals, line items).
Traditional OCR vs AI/LLM OCR
Traditional OCR engines (Tesseract, ABBYY) focus on character recognition and hand the output to downstream rule-based parsers. AI/LLM OCR combines recognition with language understanding: the same model reads the document and returns structured fields directly, without per-vendor templates.
| Aspect | Traditional OCR | AI / LLM OCR |
|---|---|---|
| Setup | Templates per vendor | Zero-shot, no templates |
| Output | Raw text + bounding boxes | Structured fields (JSON) |
| Handles new layouts | Poorly without retraining | Natively |
| Reasoning | None | Contextual (infers missing fields) |
Common use cases
- Invoices and receiptsExtract vendor, dates, totals, tax and line items for automated bookkeeping.
- Identity documentsRead passports, driver\u2019s licences and national IDs for KYC onboarding.
- Forms and contractsDigitise signed forms, purchase orders and contracts for workflow automation.
- Archive digitisationConvert historical paper archives into searchable, full-text databases.
Accuracy factors
OCR accuracy depends on several factors, including:
- Image quality \u2014 resolution, lighting, skew and compression.
- Font and language \u2014 standard fonts and Latin scripts outperform handwritten or non-Latin scripts.
- Layout complexity \u2014 multi-column pages, tables and stamps challenge traditional engines.
- Model type \u2014 modern AI OCR reasons about context, so missing or partially-occluded fields can still be recovered.
Related terms
- Invoice data entry \u2014 the downstream process that consumes OCR output.
- Accounts payable automation \u2014 the broader workflow OCR sits within.
- 3-way matching \u2014 a downstream control that relies on accurate OCR data.
See AI OCR in action
Zerentry uses AI/LLM-powered OCR to extract structured fields from any invoice or receipt \u2014 no templates required.
Further reading
AI OCR software
See how Zerentry’s OCR pipeline extracts structured fields from any invoice, receipt or bank statement.
Read more →GlossaryInvoice data entry
The downstream process that turns OCR output into bookkeeping entries.
Read more →CompareZerentry vs Mindee
Full product with native accounting sync vs Mindee's developer-first OCR API.
Read more →