← Back to blog
·7 min read

Indian Salary Slip Parser: Build a Verification Agent with AI

Extract CTC, deductions, and net pay from any Indian salary slip as structured JSON. Build a salary verification agent for lending, KYC, and rental workflows.

salary slip parserindian payrollai agentdocument extractionincome verificationfintech apistructured jsonlending automation

Salary slip verification sits at the heart of lending, rental approvals, and KYC workflows across India. Yet most teams still handle it manually — someone downloads a PDF, eyeballs the numbers, and punches data into a spreadsheet. This guide shows you how to automate that entirely.

You'll extract structured JSON from any Indian salary slip format and then wire it into an agent that makes a verification decision — all in under 50 lines of TypeScript.

What Makes Indian Salary Slips Hard to Parse

Indian salary slips are not standardised. A Tata Group CHRO's payslip looks nothing like a startup's Razorpay Payroll export or a government employee's HRMS printout. Common variations include:

  • Component labeling: "Basic" vs "Basic Pay" vs "Basic Salary" vs "बेसिक वेतन"
  • Deduction layout: EPF, ESIC, PT, TDS in different orders and groupings
  • CTC vs net pay: some show only in-hand, others show full CTC breakdown
  • Multi-employer slips: two jobs, different formats in the same PDF
  • Scanned images: older SME payroll printed and scanned, no selectable text
  • Traditional OCR (Tesseract, AWS Textract) collapses on mixed layouts. Regex breaks the moment a payroll team reorders columns. Vision AI reads the document the same way a human analyst does — understanding context, not just character positions.

    What Lekha Extracts from a Salary Slip

    Lekha returns a normalised JSON object regardless of the source format:

    {
      "document_type": "salary_slip",
      "employee": {
        "name": "Priya Sharma",
        "employee_id": "EMP-4821",
        "designation": "Senior Engineer",
        "department": "Technology",
        "pan": "ABCPS1234F"
      },
      "employer": {
        "name": "Acme Technologies Pvt Ltd",
        "location": "Bengaluru"
      },
      "pay_period": {
        "month": "April",
        "year": 2026,
        "working_days": 22,
        "days_present": 22
      },
      "earnings": {
        "basic": 85000,
        "hra": 34000,
        "special_allowance": 18000,
        "lta": 5000,
        "gross": 142000
      },
      "deductions": {
        "epf_employee": 10200,
        "professional_tax": 200,
        "tds": 12500,
        "total": 22900
      },
      "net_pay": 119100,
      "ctc_annual": 1932000
    }
    

    Every amount is a number (never a string like "₹1,19,100"). Every date is ISO 8601. This is the format your downstream agent can reason over without any additional parsing.

    Quick Start: Extract a Salary Slip

    Install the Lekha SDK and extract your first payslip in three lines:

    import { LekhaClient } from "@lekhadev/sdk";
    import { readFileSync } from "fs";
    

    const client = new LekhaClient({ apiKey: process.env.LEKHA_API_KEY });

    const result = await client.extract({ document: readFileSync("april-payslip.pdf"), documentType: "salary_slip", // optional — Lekha auto-detects });

    console.log(result.data.net_pay); // 119100 console.log(result.data.employer.name); // "Acme Technologies Pvt Ltd"

    You can also let Lekha auto-classify the document type — useful when your users upload a mix of payslips, bank statements, and offer letters:

    const result = await client.extract({
      document: buffer,
      // no documentType — classifier picks it up
    });
    

    if (result.data.document_type === "salary_slip") { processPayslip(result.data); }

    Try it live at lekhadev.com/playground — paste a payslip URL or upload a PDF to see the JSON output instantly.

    Build a Salary Verification Agent

    A complete verification agent needs to answer one question: does this applicant's income meet the requirement? Here's a minimal but production-ready agent that handles the full flow — upload, extract, evaluate, and return a decision.

    import { LekhaClient } from "@lekhadev/sdk";
    import Anthropic from "@anthropic-ai/sdk";
    

    const lekha = new LekhaClient({ apiKey: process.env.LEKHA_API_KEY }); const claude = new Anthropic();

    interface VerificationRequest { payslipBuffer: Buffer; requiredMonthlyIncome: number; // minimum net pay in ₹ employmentMonths?: number; // how many months of payslips provided }

    interface VerificationResult { approved: boolean; netPay: number; employerName: string; reasoning: string; flags: string[]; }

    async function verifySalary( req: VerificationRequest, ): Promise { // Step 1: Extract salary slip data const extraction = await lekha.extract({ document: req.payslipBuffer, documentType: "salary_slip", });

    if (!extraction.success) { throw new Error(Extraction failed: ${extraction.error.message}); }

    const slip = extraction.data;

    // Step 2: Ask Claude to evaluate the extracted data const response = await claude.messages.create({ model: "claude-sonnet-4-6", max_tokens: 512, system: You are a loan underwriting assistant. Evaluate salary slip data and return a JSON decision. Always respond with: { "approved": boolean, "reasoning": string, "flags": string[] }, messages: [ { role: "user", content: Salary slip data: ${JSON.stringify(slip, null, 2)} Requirement: Monthly net pay ≥ ₹${req.requiredMonthlyIncome.toLocaleString("en-IN")} Check for: 1. Net pay meets the minimum threshold 2. Employment appears genuine (reasonable employer name, proper deductions like EPF/TDS) 3. Any red flags (round numbers only, missing deductions, mismatch between gross and net) Return your JSON decision., }, ], });

    const decision = JSON.parse( (response.content[0] as { type: string; text: string }).text, );

    return { approved: decision.approved, netPay: slip.net_pay, employerName: slip.employer.name, reasoning: decision.reasoning, flags: decision.flags, }; }

    // Usage const result = await verifySalary({ payslipBuffer: readFileSync("applicant-payslip.pdf"), requiredMonthlyIncome: 75000, });

    console.log(result.approved); // true or false console.log(result.flags); // ["Missing EPF deduction — verify employment type"]

    This agent runs in under 3 seconds end-to-end. The extraction handles the PDF complexity; Claude handles the reasoning. Neither step bleeds into the other's concern.

    Handling Multiple Payslips

    Lenders typically ask for 3 months of payslips to smooth out variable pay. Process them concurrently and aggregate:

    async function verifyMultiplePayslips(
      buffers: Buffer[],
      requiredMonthlyIncome: number,
    ): Promise<{ averageNetPay: number; consistent: boolean; slips: unknown[] }> {
      const extractions = await Promise.all(
        buffers.map((buf) =>
          lekha.extract({ document: buf, documentType: "salary_slip" }),
        ),
      );
    

    const slips = extractions.filter((e) => e.success).map((e) => e.data);

    const netPays = slips.map((s) => s.net_pay); const average = netPays.reduce((a, b) => a + b, 0) / netPays.length;

    // Flag if any month deviates more than 20% from the average const consistent = netPays.every( (p) => Math.abs(p - average) / average < 0.2, );

    return { averageNetPay: average, consistent, slips }; }

    Common Edge Cases

    Password-protected PDFs — Lekha handles decryption automatically when you pass the password:
    const result = await lekha.extract({
      document: buffer,
      documentType: "salary_slip",
      password: "DOB_DDMMYYYY", // many payroll systems use DOB as password
    });
    
    Scanned image payslips — pass a PNG or JPG directly. Lekha's vision model reads it the same as a native PDF:
    const result = await lekha.extract({
      document: readFileSync("scanned-payslip.jpg"),
      mimeType: "image/jpeg",
    });
    
    Government payslips (HRMS/SPARROW) — these have Hindi labels and a very different layout. Lekha normalises them into the same output schema.

    Integrating with Your Existing Stack

    If you're already using a database or workflow engine, slot Lekha in as a pre-processing step. Full API reference at lekhadev.com/docs.

    // Example: save structured data to your database after extraction
    const extraction = await lekha.extract({
      document: uploadedFile.buffer,
      documentType: "salary_slip",
    });
    

    if (extraction.success) { await db.applications.update({ where: { id: applicationId }, data: { verifiedNetPay: extraction.data.net_pay, employerName: extraction.data.employer.name, salaryVerifiedAt: new Date(), }, }); }

    Lekha processes everything in memory — no document is written to disk — so you stay compliant with DPDP (India's data protection law) by default. See our DPDP compliance guide for details.

    FAQ

    Which salary slip formats does Lekha support?

    Lekha supports all major Indian payroll software outputs: Keka, Razorpay Payroll, GreytHR, Zoho Payroll, Darwinbox, SAP HR, Oracle HRMS, and government HRMS portals (SPARROW, IFMS). It also handles manually created PDFs and scanned images from any source.

    Can Lekha extract CTC breakdown, not just net pay?

    Yes. When the payslip includes an annual CTC section, Lekha extracts ctc_annual, ctc_monthly, and all components (employer EPF, gratuity provision, insurance, etc.) separately from the monthly take-home.

    What accuracy does Lekha achieve on salary slips?

    On native PDFs (digitally generated), accuracy exceeds 99% for structured fields like net pay, gross, and standard deductions. On scanned images, accuracy is 95–97% depending on scan quality.

    How do I handle a payslip where some fields are missing?

    Lekha returns null for fields that aren't present in the document rather than guessing. Your downstream code can check for null and prompt the user to upload a more complete document if required fields are missing.


    Ready to automate salary verification in your lending or KYC workflow? Sign up at lekhadev.com and get 100 free extractions — no credit card required. The API is live in under five minutes.