CIBIL Credit Report Parser: Build a Credit Assessment Agent
Parse CIBIL credit reports into structured JSON with AI. Build a credit assessment agent for lending, NBFC, and insurance workflows. Full TypeScript code.
CIBIL credit reports are the single most important document in Indian lending — yet extracting structured data from them remains one of the hardest parsing problems in fintech. A typical CIBIL PDF spans 8–15 pages, mixes tabular and narrative sections, uses inconsistent date formats across accounts, and contains abbreviations like SMA-1, DPD, STD, and SUB that have precise regulatory meanings.
Manual review takes a loan officer 15–20 minutes per applicant. An AI agent with the right extraction layer can do it in under two seconds.
This guide shows you how to parse CIBIL credit reports into structured JSON using Lekha and build a credit assessment agent on top of the extracted data.
What's Inside a CIBIL Credit Report
Before writing code, understand the data you're working with. A CIBIL report contains five major sections:
| Section | Key Fields | | ------------------- | ---------------------------------------------------------------------------------------------------------------------- | | Personal Info | Name, DOB, PAN, Aadhaar (masked), address history | | Credit Score | Score (300–900), score date, version (V1/V2) | | Account Summary | Total accounts, active/closed, overdue accounts, total balance | | Account Details | Per-account: lender name, type, open date, limit/sanctioned, current balance, overdue, DPD history (24 months), status | | Enquiry History | Date, lender, purpose, amount — last 24 months |
The DPD (Days Past Due) history is the most complex part. It shows a 24-month grid of payment behaviour using codes: 000 (on time), 030/060/090/120/150/180 (days overdue), STD (standard), SMA (special mention account), SUB (substandard), DBT (doubtful), LSS (loss), XXX (no information).
Extracting this grid reliably from a scanned PDF — especially multi-bureau reports that mix CIBIL, Experian, and Equifax data — is where traditional OCR falls apart completely.
Parse a CIBIL Report with Lekha
Lekha's API accepts a CIBIL PDF and returns a fully typed JSON object. Here's the minimal call:
import fs from "fs";
const pdf = fs.readFileSync("./applicant-cibil.pdf");
const base64 = pdf.toString("base64");
const response = await fetch("https://lekhadev.com/api/extract", {
method: "POST",
headers: {
Authorization: Bearer ${process.env.LEKHA_API_KEY},
"Content-Type": "application/json",
},
body: JSON.stringify({
document: base64,
type: "cibil",
}),
});
const { data } = await response.json();
console.log(data.credit_score); // 742
console.log(data.accounts.length); // 6
console.log(data.overdue_accounts); // 1
The returned data object is structured JSON — no regex, no brittle parsing, no post-processing required. You get numbers as numbers, dates as ISO 8601 strings, and DPD history as a typed array.
The Extracted Schema
interface CibilReport {
// Credit score
credit_score: number; // 742
score_date: string; // "2026-04-01"
score_version: "V1" | "V2";
// Personal
pan: string; // "ABCDE1234F"
name: string;
date_of_birth: string; // "1988-03-15"
// Summary
total_accounts: number;
active_accounts: number;
closed_accounts: number;
overdue_accounts: number;
total_balance: number; // in INR, always a number
total_overdue: number;
// Per-account detail
accounts: CibilAccount[];
// Enquiry history
enquiries: CibilEnquiry[];
}
interface CibilAccount {
lender_name: string; // "HDFC Bank"
account_type: string; // "Home Loan", "Credit Card", "Personal Loan"
open_date: string; // "2021-06-01"
close_date: string | null;
sanctioned_amount: number;
current_balance: number;
overdue_amount: number;
emi: number | null;
status: "STD" | "SMA" | "SUB" | "DBT" | "LSS" | "WO" | "CLOSED";
dpd_history: DpdEntry[]; // 24 months, newest first
credit_limit: number | null; // for credit cards
}
interface DpdEntry {
month: string; // "2026-03"
dpd: number; // 0, 30, 60, 90, 120, 150, 180
code: string; // "000", "STD", "SMA-1", etc.
}
interface CibilEnquiry {
date: string;
lender_name: string;
purpose: string; // "Home Loan", "Personal Loan"
amount: number;
}
Try it live on the Lekha Playground with a sample CIBIL PDF before writing any code.
Build a Credit Assessment Agent
A raw CIBIL parse gives you data. An agent gives you a decision. Here's a complete credit assessment agent that accepts an applicant's CIBIL PDF and requested loan amount, then outputs an approval recommendation with reasoning.
Step 1: Define the Assessment Tool
// src/agents/credit-assessment.ts
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const tools: Anthropic.Tool[] = [
{
name: "parse_cibil_report",
description:
"Parse a CIBIL credit report PDF and return structured data including credit score, account history, DPD grid, and enquiries.",
input_schema: {
type: "object" as const,
properties: {
document_base64: {
type: "string",
description: "Base64-encoded CIBIL PDF",
},
},
required: ["document_base64"],
},
},
{
name: "compute_credit_metrics",
description: "Compute derived credit risk metrics from parsed CIBIL data.",
input_schema: {
type: "object" as const,
properties: {
cibil_data: {
type: "object",
description: "Parsed CIBIL JSON from parse_cibil_report",
},
loan_amount_requested: {
type: "number",
description: "Loan amount the applicant has requested in INR",
},
},
required: ["cibil_data", "loan_amount_requested"],
},
},
];
Step 2: Implement Tool Handlers
async function parseCibilReport(documentBase64: string) {
const response = await fetch("https://lekhadev.com/api/extract", {
method: "POST",
headers: {
Authorization: Bearer ${process.env.LEKHA_API_KEY},
"Content-Type": "application/json",
},
body: JSON.stringify({
document: documentBase64,
type: "cibil",
}),
});
const { data } = await response.json();
return data;
}
function computeCreditMetrics(
cibilData: CibilReport,
loanAmountRequested: number,
) {
const activeAccounts = cibilData.accounts.filter((a) => !a.close_date);
// Max DPD in last 12 months across all accounts
const maxDpd12m = Math.max(
...activeAccounts.flatMap((a) =>
a.dpd_history.slice(0, 12).map((d) => d.dpd),
),
);
// Count of accounts with any DPD > 0 in last 6 months
const accounts_with_dpd_6m = activeAccounts.filter((a) =>
a.dpd_history.slice(0, 6).some((d) => d.dpd > 0),
).length;
// Credit utilization for revolving credit (credit cards)
const creditCards = cibilData.accounts.filter(
(a) => a.account_type === "Credit Card" && a.credit_limit,
);
const utilization =
creditCards.length > 0
? creditCards.reduce((sum, c) => sum + c.current_balance, 0) /
creditCards.reduce((sum, c) => sum + (c.credit_limit ?? 0), 0)
: null;
// Enquiry velocity: enquiries in last 90 days
const cutoff = new Date();
cutoff.setDate(cutoff.getDate() - 90);
const recent_enquiries = cibilData.enquiries.filter(
(e) => new Date(e.date) >= cutoff,
).length;
// Debt-to-income proxy: total EMIs (we'll need salary data for true DTI)
const total_monthly_emi = activeAccounts.reduce(
(sum, a) => sum + (a.emi ?? 0),
0,
);
return {
credit_score: cibilData.credit_score,
overdue_accounts: cibilData.overdue_accounts,
total_overdue_inr: cibilData.total_overdue,
max_dpd_last_12_months: maxDpd12m,
accounts_with_any_dpd_last_6m: accounts_with_dpd_6m,
credit_card_utilization_pct: utilization
? Math.round(utilization * 100)
: null,
recent_enquiries_90_days: recent_enquiries,
total_monthly_emi_inr: total_monthly_emi,
loan_amount_requested_inr: loanAmountRequested,
has_writeoff: cibilData.accounts.some((a) => a.status === "WO"),
has_settled: cibilData.accounts.some((a) =>
["SUB", "DBT", "LSS"].includes(a.status),
),
};
}
Step 3: Run the Agentic Loop
async function assessCreditApplication(
cibilPdfBase64: string,
loanAmountRequested: number,
loanPurpose: string,
): Promise {
const messages: Anthropic.MessageParam[] = [
{
role: "user",
content: You are a credit underwriter at an Indian NBFC. Assess this loan application:
Loan purpose: ${loanPurpose}
Amount requested: ₹${loanAmountRequested.toLocaleString("en-IN")}
Use the tools to parse the CIBIL report and compute credit metrics, then provide:
- APPROVE / DECLINE / REFER recommendation
- Key risk factors
- Suggested loan amount (if declining full amount but partial approval is possible)
- Conditions if any (e.g. require co-applicant, lower amount)
Be specific and cite the actual numbers from the CIBIL report.
,
},
];
let cibilBase64 = cibilPdfBase64;
while (true) {
const response = await client.messages.create({
model: "claude-opus-4-7",
max_tokens: 2048,
tools,
messages,
});
if (response.stop_reason === "end_turn") {
const textBlock = response.content.find((b) => b.type === "text");
return textBlock?.text ?? "No assessment generated.";
}
// Process tool calls
const toolUseBlocks = response.content.filter((b) => b.type === "tool_use");
const toolResults: Anthropic.ToolResultBlockParam[] = [];
for (const toolUse of toolUseBlocks) {
if (toolUse.type !== "tool_use") continue;
let result: unknown;
if (toolUse.name === "parse_cibil_report") {
result = await parseCibilReport(
(toolUse.input as { document_base64: string }).document_base64,
);
} else if (toolUse.name === "compute_credit_metrics") {
const input = toolUse.input as {
cibil_data: CibilReport;
loan_amount_requested: number;
};
result = computeCreditMetrics(
input.cibil_data,
input.loan_amount_requested,
);
}
toolResults.push({
type: "tool_result",
tool_use_id: toolUse.id,
content: JSON.stringify(result),
});
}
messages.push({ role: "assistant", content: response.content });
messages.push({ role: "user", content: toolResults });
}
}
// Usage
import fs from "fs";
const pdf = fs.readFileSync("./applicant-cibil.pdf");
const assessment = await assessCreditApplication(
pdf.toString("base64"),
500000, // ₹5 lakh personal loan
"Medical emergency",
);
console.log(assessment);
A sample output from this agent:
RECOMMENDATION: APPROVE (with conditions)
Credit Score: 718 — acceptable for personal loans up to ₹7L.
Key positives:
- Zero DPD in the last 18 months across all active accounts
- Credit card utilization at 34% (healthy threshold is <40%)
- No write-offs or settlements in history
Risk factors:
- 3 enquiries in the last 90 days — suggests rate-shopping or multiple simultaneous applications. Verify purpose.
- HDFC Personal Loan (₹2.2L outstanding) with DPD of 30 in March 2025, recovered since.
Conditions:
- Approve ₹5,00,000 at standard rate (no risk loading required)
- Request explanation for recent enquiry spike before disbursement
- Co-applicant not required
Monthly EMI obligation post-disbursal: ~₹11,400 (estimated at 14% / 48 months)
Handling Edge Cases
Real CIBIL reports surface several failure modes that a production agent must handle:
New-to-Credit (NTC) applicants — No CIBIL score, score shows as-1 or NH. Lekha flags this in score_version: null. Your agent should switch to alternative data (bank statement cash flow, rental history).
Multiple bureau reports — Some lenders pull Experian or Equifax alongside CIBIL. Lekha extracts the primary report; if you receive a combined PDF, pass it as-is and Lekha's classifier identifies the dominant bureau format.
Disputed accounts — CIBIL marks disputed tradelines with a DISPUTED status code. Exclude these from DPD and utilization calculations to avoid penalising applicants unfairly.
Joint accounts — The applicant may appear as primary or secondary holder. Check the ownership_type field (INDIVIDUAL, JOINT, GUARANTOR) on each account before including balances in the total exposure calculation.
Integrating with a Full Lending Workflow
For production loan processing, pair CIBIL parsing with salary slip extraction to compute true DTI (Debt-to-Income ratio):
import { assessCreditApplication } from "./credit-assessment";
async function fullLoanAssessment(
cibilBase64: string,
salarySlipBase64: string,
requestedAmount: number,
) {
// Parse both documents in parallel
const [cibilResponse, salaryResponse] = await Promise.all([
fetch("https://lekhadev.com/api/extract", {
method: "POST",
headers: {
Authorization: Bearer ${process.env.LEKHA_API_KEY},
"Content-Type": "application/json",
},
body: JSON.stringify({ document: cibilBase64, type: "cibil" }),
}),
fetch("https://lekhadev.com/api/extract", {
method: "POST",
headers: {
Authorization: Bearer ${process.env.LEKHA_API_KEY},
"Content-Type": "application/json",
},
body: JSON.stringify({ document: salarySlipBase64, type: "salary_slip" }),
}),
]);
const { data: cibil } = await cibilResponse.json();
const { data: salary } = await salaryResponse.json();
const monthlyIncome = salary.net_pay;
const existingEmi = cibil.accounts
.filter((a: CibilAccount) => !a.close_date)
.reduce((sum: number, a: CibilAccount) => sum + (a.emi ?? 0), 0);
// FOIR: Fixed Obligation to Income Ratio — RBI guideline is <55%
const estimatedNewEmi = requestedAmount * 0.019; // rough at 14% / 60m
const foir = (existingEmi + estimatedNewEmi) / monthlyIncome;
return {
credit_score: cibil.credit_score,
net_monthly_income: monthlyIncome,
existing_emi_obligations: existingEmi,
foir_after_new_loan: Math.round(foir * 100),
foir_within_rbi_guideline: foir < 0.55,
recommended_max_emi: Math.round(monthlyIncome * 0.55 - existingEmi),
};
}
See the Lekha docs for the complete salary slip schema and how to combine multiple document types in a single assessment pipeline.
FAQ
What CIBIL report formats does Lekha support? Lekha handles all standard CIBIL consumer PDF formats including reports downloaded from the CIBIL website directly, bureau copies from lenders, and combined multi-bureau reports (Experian, Equifax, CRIF alongside CIBIL). Password-protected PDFs are supported — pass the password in the request body. How accurate is AI extraction compared to structured APIs? Lekha achieves >97% field-level accuracy on CIBIL PDFs. The main failure modes are heavily scanned or low-resolution documents (< 150 DPI). For scans, Lekha's vision model still outperforms OCR significantly because it understands the semantic structure rather than reading character by character. Can I use this for RBI-compliant loan underwriting? Yes, with caveats. Lekha is the extraction layer — your lending policy, audit trail, and credit model remain your responsibility. Ensure your decisioning system logs the raw extracted JSON alongside every decision for RBI audit requirements under the Digital Lending Guidelines 2022. What is the latency for CIBIL extraction? Average extraction time is 2.1 seconds for a standard 10-page CIBIL report. For batch pre-screening workflows, use the async endpoint to process hundreds of reports concurrently — see the batch processing guide.Stop reviewing CIBIL reports by hand. Give your underwriting agent the ability to read, parse, and reason over CIBIL data in under two seconds — and spend your team's time on decisions, not data entry.
Get your free API key at lekhadev.com →