Why Vision AI Beats Traditional OCR for Indian Financial Documents
OCR was built for clean, typed text. Indian bank statements are anything but. Here's why Claude Vision outperforms traditional OCR pipelines.

The Problem with Indian Financial Documents
Indian financial documents are uniquely challenging for automated data extraction. Unlike the standardized forms common in Western banking, Indian bank statements are a patchwork of inconsistencies. Hindi headers sit alongside English transaction descriptions. Amounts follow the Indian numbering system (1,23,456.78 instead of 123,456.78). Watermarks, security patterns, and low-resolution scans further degrade readability.
Traditional OCR pipelines were designed for a simpler world: clean typed text on white backgrounds, consistent layouts, single scripts. Indian bank statements break every one of those assumptions.
Why Rule-Based Parsers Break
The first instinct when building a document extraction system is to reach for Tesseract or a similar OCR engine, followed by a layer of regex and rule-based parsing. This approach works well when you control the document format. It falls apart in the Indian banking context for several reasons.
There are too many formats. India has over 30 major banks, each with its own statement layout. HDFC Bank places the transaction date in column one and the value date in column two. SBI reverses this order. Axis Bank uses a completely different column structure with separate debit and credit columns, while ICICI Bank merges them into a single "withdrawal/deposit" column. Layouts change without notice. Banks update their statement templates periodically, sometimes even varying formats between branches or account types. A parser tuned for HDFC savings accounts may fail on HDFC current accounts. Date formats are inconsistent. You will encounter DD/MM/YYYY, DD-MMM-YYYY, DD-MMM-YY, and occasionally YYYY-MM-DD, sometimes within the same document. The month abbreviations may be in English or Hindi. Amount formats are non-trivial. The Indian numbering system groups digits differently after the thousands place: 12,34,567.89 instead of 1,234,567.89. Some banks use "Dr" and "Cr" suffixes. Others use parentheses for debits. Some omit decimal places entirely for round amounts.A rule-based parser that handles HDFC statements perfectly will produce garbage output when given an SBI statement. Maintaining 28+ rule sets, each with their own edge cases, is an engineering burden that scales poorly.
How Vision AI Changes the Game
Vision AI models like Claude approach documents the way a human would: by understanding the visual layout and semantic meaning, not just recognizing individual characters.
When Claude Vision processes a bank statement, it does not merely convert pixels to text. It understands that a grid of numbers below a header reading "Statement of Account" is a transaction table. It recognizes that the numbers in the rightmost column are running balances. It infers that "Cr" next to an amount means credit, even if the OCR misreads a character or two.
This semantic understanding is the key difference. Traditional OCR produces a stream of characters that downstream code must reassemble into meaning. Vision AI produces structured understanding directly.
Handling Mixed Scripts
Indian documents frequently mix Devanagari and Latin scripts. A salary slip might have the employer name in Hindi, column headers in English, and the employee's name transliterated inconsistently. Traditional OCR engines struggle with script detection at the character level, often mangling Hindi text or misinterpreting Devanagari characters as Latin ones.
Vision AI models are inherently multilingual. They recognize Hindi and English text in context, understanding which script is being used without explicit script detection logic.
Understanding Degraded Documents
Scanned bank statements are rarely pristine. They come with scanner artifacts, uneven lighting, fold marks, and security watermarks that overlay the text. Traditional OCR accuracy drops sharply when the image quality degrades. A watermark over a transaction amount can cause OCR to hallucinate digits or skip the field entirely.
Vision AI models are trained on millions of degraded, noisy images. They handle watermarks, low contrast, and partial occlusion far more gracefully, often inferring the correct value from context even when individual characters are ambiguous.
Real-World Accuracy: HDFC vs SBI vs Axis
We tested both approaches on a corpus of 500 bank statements across three major Indian banks. The results speak for themselves.
HDFC Bank statements use a relatively clean tabular format. Traditional OCR achieved 91% field-level accuracy. Claude Vision reached 97%. The gap comes primarily from amount parsing (the Indian numbering system) and date format inconsistencies. SBI statements are more challenging, with denser layouts, smaller fonts, and frequent use of abbreviations. OCR accuracy dropped to 83%. Vision AI maintained 95% accuracy, correctly interpreting abbreviated narrations and handling the SBI-specific column ordering. Axis Bank statements include transaction reference numbers that wrap across lines, a layout quirk that completely breaks line-by-line OCR parsing. OCR achieved just 76% accuracy. Vision AI, understanding the visual grouping of wrapped text, reached 94%.The pattern is consistent: the more complex or non-standard the layout, the wider the gap between OCR and Vision AI.
When OCR Still Makes Sense
Vision AI is not universally superior. For high-volume processing of perfectly uniform documents, such as machine-generated PDFs from a single known source, traditional OCR with a tuned parser can be faster and cheaper. If you control the document format and it never changes, rule-based extraction is perfectly adequate.
The calculus shifts when you need to handle documents from many sources, when formats change without notice, or when document quality is inconsistent. These are exactly the conditions that Indian financial documents present.
How Lekha Approaches This
Lekha uses Claude Vision as the extraction backbone, with bank-specific prompt templates that guide the model's attention to the right fields. Each bank gets a tailored prompt that encodes knowledge about that bank's specific layout, column ordering, and formatting conventions.
This prompt-per-bank approach gives us the accuracy benefits of bank-specific tuning without the maintenance burden of bank-specific parsers. When a bank changes its format, we update a prompt template, not a fragile regex pipeline.
The result: structured JSON from any Indian bank statement, CAS report, salary slip, or ITR form, with field-level confidence scores so your agent knows exactly how much to trust each extracted value.