Introduction

India's credit market has undergone a structural transformation over the last decade. The rise of digital lending, account aggregator-enabled data flows, co-lending partnerships, and embedded finance has expanded credit access at a pace that would have seemed improbable in 2015. But with scale comes consequence. Unsecured retail loans now constitute a meaningful share of bank and NBFC books, BNPL penetration is deepening across tier-2 and tier-3 markets, and the RBI has signalled clearly that growth without rigorous risk frameworks is not a strategy it will tolerate.

Against this backdrop, credit risk modelling has moved from a back-office compliance exercise to a front-line strategic capability. Institutions that can accurately estimate default probability, quantify potential losses, and price risk at the portfolio level are not just better positioned for regulatory examinations. They lend more profitably, collect more efficiently, and allocate capital more precisely than their peers.

This article provides a definitive, operationally grounded guide to credit risk modelling in India: how PD, LGD, and EAD frameworks work in practice, what Basel III and RBI norms demand of Indian institutions, how machine learning is reshaping underwriting, and where implementation gaps persist across banks, NBFCs, and fintech lenders.

What Is Credit Risk Modelling?

Credit risk modelling is the systematic quantification of the likelihood that a borrower will fail to meet their contractual debt obligations, along with the financial impact of that failure on the lending institution.

At its most fundamental, a credit risk model takes observable information about a borrower (demographics, bureau history, cashflow signals, repayment behaviour) and converts it into structured risk estimates. These estimates inform four interconnected decisions: underwriting eligibility, loan pricing, provisioning under expected credit loss frameworks, and regulatory capital allocation.

The tools have evolved considerably. Early models relied on judgmental scorecards and rule-based cutoffs. Over time, logistic regression became the statistical workhorse of credit risk analytics, offering interpretability alongside predictive power. Today, gradient boosted trees, ensemble models, and neural networks are entering production environments, though their adoption in regulated lending still requires careful governance.

Regardless of methodology, all credit risk models serve the same purpose: to separate borrowers who will repay from those who will not, and to estimate the financial cost of being wrong.

Why Credit Risk Modelling Matters for Indian Banks and NBFCs

The stakes of poor credit risk models in India are not abstract. The RBI's supervisory focus on unsecured retail lending, its 2023 risk weight increases on consumer credit, and its ongoing scrutiny of digital lending practices have all underscored one message: institutions need to know what risk they are carrying, expressed in quantitative terms.

Several structural forces are raising the bar simultaneously.

Unsecured lending growth. Personal loans and credit cards now account for a disproportionate share of incremental credit growth. Without collateral backstops, PD estimation accuracy directly determines provisioning adequacy and profitability.

BNPL and embedded finance. New credit products embedded in merchant flows or consumer apps generate thin credit histories but real default risk. Models calibrated on traditional bureau populations often perform poorly on these segments.

Financial inclusion pressures. New-to-credit (NTC) and thin-file borrowers represent a significant portion of India's addressable credit market. They are systematically underserved by bureau-centric models. Alternative data modelling is not just an innovation play; it is a business necessity.

Co-lending and portfolio risk transfer. Under co-lending arrangements between banks and NBFCs, risk is shared across balance sheets. Each partner needs credible, comparable risk estimates, which raises the minimum bar for model quality across the ecosystem.

Collections and portfolio monitoring. Risk models do not stop at origination. Behavioural scoring at the portfolio level drives collections prioritisation, limit management, and early warning systems, all of which materially affect recoveries and credit costs.

Strategic Insight: Institutions that treat credit risk modelling as a regulatory checkbox, rather than a portfolio intelligence capability, consistently underperform on risk-adjusted returns across credit cycles.

The Three Core Credit Risk Models: PD, LGD, and EAD

The Basel framework organises credit risk into three components, each corresponding to a distinct estimation problem. Together, they produce the Expected Credit Loss (ECL): the product of probability of default, loss given default, and exposure at default.

Key Components of Credit Risk Modelling PD, LGD, and EAD

PD, LGD, and EAD framework used in modern credit risk modelling and banking analytics.

Probability of Default (PD)

A PD model estimates the probability that a borrower will default within a defined time window, typically 12 months for Ind AS / IFRS 9 Stage 1 assessments, or over the loan lifetime for Stage 2 and Stage 3.

How PD models are built

The standard approach begins with logistic regression. A binary outcome variable (default / no default) is modelled as a function of predictor variables: bureau score, repayment history, credit utilisation, age of credit, loan-to-income ratio, and product-specific features. The logistic regression outputs a probability, which is then calibrated against observed default rates.

Before variable selection, practitioners typically apply Weight of Evidence (WOE) transformation and Information Value (IV) analysis. WOE converts raw predictor variables into a monotonic scale aligned with default risk, while IV ranks variables by their discriminatory power. Variables with IV below 0.02 are generally considered weak predictors; those above 0.30 are strong candidates for inclusion. This step is fundamental to scorecard development and is often underemphasised in model documentation.

Modern implementations frequently replace or augment logistic regression with gradient boosted trees (XGBoost, LightGBM), which handle non-linear feature interactions more naturally. However, the interpretability requirements under RBI's model risk management expectations, and under consumer-facing fairness considerations, make pure black-box approaches difficult to deploy at scale in regulated Indian lending.

Bureau-based vs behavioural scoring

Origination PD models draw primarily on CIBIL, Experian, or CRIF data: existing trade lines, enquiry patterns, delinquency history, and credit age. The CIBIL score is a widely used input variable, though sophisticated lenders build proprietary PD models using raw bureau attributes rather than treating the bureau score as a single composite variable.

Behavioural PD models operate at the portfolio level, updating default probability estimates on the basis of observed repayment behaviour, utilisation changes, and product interactions. These models drive limit management, collections tiering, and early warning triggers for accounts showing stress signals before formal delinquency.

Through-the-cycle vs point-in-time PD

A through-the-cycle (TTC) PD captures average default rates across an economic cycle. It is more stable and better suited to capital planning. A point-in-time (PIT) PD incorporates current macroeconomic conditions, making it more volatile but more accurate for provisioning under ECL frameworks. Indian banks using Ind AS are expected to use PIT PDs for their ECL calculations, introducing a procyclical element that requires careful governance and macroeconomic overlay methodology.

Validation metrics: Gini coefficient and KS statistic

The Gini coefficient and KS (Kolmogorov-Smirnov) statistic are the primary discrimination metrics in Indian scorecard practice. A Gini above 0.45 is typically considered acceptable for retail models; values above 0.55 indicate strong model performance. The KS statistic measures the maximum separation between the cumulative default and non-default distributions across score bands. Both metrics should be computed on out-of-time validation samples, not just the development dataset, to produce a credible assessment of live model performance.

Loss Given Default (LGD)

LGD represents the fraction of the exposure that the lender ultimately loses after accounting for recoveries, collateral liquidation, and recovery costs. If EAD is Rs 10 lakh and the lender eventually recovers Rs 6 lakh, LGD is 40%.

LGD is product-specific and depends heavily on collateral type, legal enforcement speed, and recovery infrastructure.

Loss Given Default (LGD) Comparison Across Different Loan Types

Typical LGD ranges and key risk drivers across retail and MSME loan products in credit risk modelling.

LGD calculation approaches in India

Most Indian institutions estimate LGD using workout LGD methodology: tracking the actual recovery experience on defaulted loans over time, discounting cash flows back to the default date, and computing the average loss rate across cohorts. This requires multi-year default and recovery data, which presents a genuine data availability challenge for newer institutions and fintech lenders without legacy portfolios.

Where historical recovery data is limited, market LGD or implied market LGD approaches provide proxies using secondary market prices for distressed debt, though India's secondary distressed debt market remains thin relative to more developed credit markets.

Haircuts applied to collateral valuations must account for forced-sale discounts, legal costs, and time-to-realisation. For Indian mortgage portfolios, the gap between market value and realised recovery value is often wider than international benchmarks due to SARFAESI timelines and state-level variance in enforcement effectiveness.

Recovery timeline modelling

A frequently overlooked element of LGD modelling is the time value of money impact on recovery cash flows. Recoveries that arrive three years post-default are worth significantly less than face value when discounted back to the default date. Indian lenders with large NPA portfolios should reflect realistic recovery timelines in their LGD estimates, particularly for segments where legal enforcement is slow.

Exposure at Default (EAD)

EAD represents the total outstanding amount owed by a borrower at the moment of default. For term loans, EAD is relatively straightforward: scheduled amortisation reduces balance predictably, and EAD can be estimated from the repayment schedule. The complexity arises with revolving and contingent exposures.

For credit cards, overdraft facilities, working capital lines, and BNPL accounts, utilisation at default is not fixed. Borrowers under financial stress tend to draw down more of their available limits before defaulting, a well-documented behaviour that means EAD is typically higher than the current drawn balance at any point in time.

Credit conversion factors (CCF)

The Basel standardised approach assigns CCFs to different commitment types: undrawn revolving commitments unconditionally cancellable by the lender attract a lower CCF (typically 10-20%) than committed undrawn facilities (75%). Internal models for EAD estimate CCFs empirically based on observed drawdown behaviour in the 12 months before default. For Indian credit card and working capital portfolios, this behavioural analysis consistently shows higher pre-default utilisation spikes than static CCF assumptions would imply.

For Indian retail and MSME lenders, EAD modelling tends to be underinvested relative to PD modelling. Yet EAD estimation errors directly affect provisioning accuracy and capital requirements, particularly for institutions with large revolving credit or working capital portfolios.

Basel III and RBI's Regulatory Framework for India

India has adopted Basel III capital requirements through a phased RBI implementation, benchmarked to BIS timelines. The regulatory framework creates direct and material linkages between credit risk model quality and capital adequacy requirements.

Standardised Approach vs Internal Ratings-Based Approach

Most Indian banks currently operate under the Standardised Approach (SA), where risk weights are assigned by asset class and credit quality using external ratings and regulatory tables rather than institution-specific internal models. This limits the capital efficiency benefit of superior internal models but significantly reduces the validation and governance burden.

The Internal Ratings-Based Approach (IRBA) allows qualifying institutions to use their own PD, LGD, and EAD estimates for regulatory capital calculation. Under the Foundation IRBA (F-IRBA), banks supply their own PD estimates while using regulatory-prescribed LGD and EAD values. The Advanced IRBA (A-IRBA) permits internal estimation of all three components.

No Indian bank has yet received RBI approval for full A-IRBA adoption. The data depth, model governance infrastructure, and validation frameworks required are substantial. RBI has been measured in its guidance on the approval pathway, reflecting both the complexity involved and a preference for ensuring institutional readiness before permitting capital relief based on internal models.

ECL under Ind AS 109

India's Ind AS 109, aligned with IFRS 9, introduced a forward-looking expected credit loss provisioning framework for large listed entities. This replaced the legacy incurred loss model with a more proactive, through-the-cycle approach. Under ECL, loans are classified into three stages:

Stage 1: No significant increase in credit risk since origination. Provision equals 12-month ECL.
Stage 2: Significant increase in credit risk since origination. Provision equals lifetime ECL.
Stage 3: Credit-impaired. Provision equals lifetime ECL with individually assessed LGD.

This staging framework requires institutions to maintain live, calibrated PD, LGD, and EAD models as operational inputs to quarterly provisioning, not merely for periodic regulatory reporting. The quality of these models directly affects reported profitability and capital ratios.

For NBFCs transitioning to Ind AS, the shift from incurred-loss to ECL provisioning often increases provisioning buffers materially in the first implementation year, requiring upfront capital planning and board-level expectation management.

Capital adequacy implications

Under Basel III, credit risk-weighted assets (RWA) directly determine the minimum capital that institutions must hold. Better model calibration under IRBA leads to more risk-sensitive RWA, which can reduce capital requirements for high-quality portfolios and increase them for risky ones. For Indian banks aspiring to IRBA qualification, the business case is not purely regulatory. It is also about precision in capital deployment.

Machine Learning in Credit Risk Modelling

Machine learning is not replacing traditional credit risk models in Indian banking uniformly. In most bank environments, it is augmenting them. In progressive NBFC and fintech contexts, it is supplanting them. The distinction matters for how institutions approach governance and validation.

Where ML adds measurable value

Traditional scorecards perform well on populations with rich bureau histories and relatively stable feature distributions. They struggle with feature interactions, non-linearity, and populations that fall outside their training distribution. ML models, particularly gradient boosted trees, handle these challenges more naturally.

Research by S&P Global and practitioners across the lending industry consistently shows that ensemble methods deliver Gini improvements of 5 to 15 percentage points over logistic regression on comparable datasets, with the gain concentrated in thin-file and NTC segments where bureau signal is weak and alternative data carries disproportionate predictive weight.

For thin-file and NTC borrowers, where bureau data is sparse or absent, ML models can extract signal from alternative data sources: UPI transaction patterns, GST filing consistency, bank statement cashflows, mobile usage metadata, and account aggregator-enabled financial data. This is where AI in credit risk management delivers its most differentiated value relative to traditional approaches.

Random forests, gradient boosting, and neural networks

Among ML approaches applied to credit risk, gradient boosted trees (XGBoost, LightGBM, CatBoost) are the most widely deployed in production lending environments due to their predictive performance, relative interpretability compared to deep learning, and tolerance for mixed data types. Random forests offer similar benefits with slightly lower predictive lift but greater stability across data perturbations.

Neural networks and deep learning architectures have seen adoption in unstructured data settings: parsing bank statements, extracting features from transaction text, and processing document images during onboarding. Their application to structured credit tabular data has been more limited, partly due to marginal gains over gradient boosting on tabular datasets and partly due to the explainability burden.

The explainability challenge

RBI's published guidance on model risk management and its broader regulatory stance on AI-driven credit decisions create a genuine tension. Regulators and internal audit teams expect to understand why a specific loan was declined. Black-box models create this friction at scale.

SHAP (SHapley Additive exPlanations) values and LIME (Local Interpretable Model-Agnostic Explanations) have become standard tools for generating post-hoc explanations of individual model predictions. Some institutions have adopted monotonic gradient boosting constraints that preserve interpretability while retaining most of the predictive lift. The practical standard in Indian regulated lending is: if you cannot explain an adverse credit decision to a borrower or a regulator in plain terms, the model is not production-ready regardless of its AUC.

Traditional Scorecards vs ML Models: A Practical Comparison

Traditional Credit Scorecards vs Machine Learning Models in Credit Risk Modelling

Comparison table highlighting the differences between traditional credit scorecards and machine learning models.

Real-time underwriting workflows

Fintechs and digital NBFCs have moved furthest on ML adoption. Real-time underwriting APIs, capable of decisioning a loan application in under three seconds, are becoming standard in consumer lending. These systems integrate bureau calls, bank statement parsing, GST validation, and internal model scoring in a single orchestrated pipeline.

Banks are more cautious, partly due to legacy model governance infrastructure and partly due to higher regulatory scrutiny on automated credit decisions. Co-lending arrangements increasingly create a bifurcated underwriting reality: the NBFC partner makes the initial credit decision using a proprietary ML model, while the bank co-lender applies a secondary filter using its own risk framework. Aligning the risk standards of both partners is an underappreciated operational challenge in co-lending at scale.

Data Requirements for Credit Risk Models in India

Model quality is bounded by data quality. India's credit data landscape has improved dramatically over the last five years, but gaps remain, particularly for MSME and NTC segments.

Bureau Data: CIBIL, Experian, CRIF, and Equifax

The four credit information companies operating in India (TransUnion CIBIL, Experian, CRIF High Mark, and Equifax) collectively provide the most standardised and reliable inputs for PD models in retail lending.

Key variables used in scorecard development include:

Number of active trade lines and product mix across secured and unsecured categories
Delinquency flags at 30, 60, and 90-plus days past due across all trade lines
Total outstanding balance and credit utilisation ratios
Enquiry count and recency, which function as a stress indicator when enquiries spike
Vintage of credit history, with longer histories generally indicating lower risk
Written-off account flags and settlement history

Vintage analysis is a fundamental technique in both scorecard development and ongoing portfolio monitoring. Examining cohort default rates as a function of loan age reveals whether a portfolio's risk profile is shifting over time, independent of book growth. Lenders that skip vintage analysis during periods of rapid growth typically discover deterioration later than they should.

SMERA and CGTMSE data provide partial coverage for MSME bureau data, though SME credit information remains far less standardised than retail bureau data, limiting model sophistication for formal MSME underwriting.

GST, Bank Statement, and Alternative Data

The expansion of alternative data sources is the most consequential development in Indian credit underwriting over the last three years.

GST data provides a verifiable proxy for business revenue, filing regularity, and sector exposure. For MSME lending, GST returns validated against bank account inflows create a cross-checked cashflow picture that was previously impossible to construct at scale and low cost.

Bank statement analysis, now largely automated through AI-driven parsers, extracts income stability, expenditure patterns, EMI obligations not visible in bureau data, and liquidity buffers. The signal quality is high; the data cleanliness challenge is real, particularly for lenders processing PDF statements at scale across multiple bank formats and varying transaction description conventions.

Account Aggregator (AA) infrastructure is arguably the most transformative regulatory development for credit data in India. By enabling consent-based financial data sharing across registered Financial Information Providers (FIPs) and Financial Information Users (FIUs), the AA framework creates a structured, auditable pipeline for bank statement, insurance, and tax data. Adoption is accelerating among fintechs and progressive NBFCs. For lenders, the AA ecosystem reduces friction in data collection while meeting consent and privacy requirements under data protection norms.

UPI and device signals remain experimental for formal credit decisioning, though several fintech lenders use these signals for risk tiering in pre-approved credit products. Their inclusion in regulatory capital models would require careful validation and explicit regulatory endorsement.

The operational challenges across alternative data categories are real: consent fatigue among borrowers, inconsistent data quality across FIPs, the need for robust data normalisation before modelling, and compliance with evolving data localisation and purpose limitation requirements.

Model Validation and Back-testing

A credit risk model without a validation framework is itself a risk. Model risk, defined as the risk that a model performs worse than expected or fails in ways that were not anticipated during development, is explicitly recognised by global regulators and increasingly by RBI in its supervisory engagement with larger institutions.

Scorecard validation: a structured workflow

Out-of-time (OOT) testing: Apply the model to a holdout sample drawn from a different time period than the development dataset. Population drift and economic regime changes show up here before they manifest in portfolio losses.
Discrimination testing: Compute Gini and KS statistics on both development and validation samples. A significant drop in OOT performance signals overfitting or distributional shift and should trigger model review.
Calibration testing: Assess whether predicted default rates match observed default rates at each score band. Poor calibration leads to systematic provisioning errors that compound over time.
Population Stability Index (PSI): Monitor the distribution of model input variables and output scores over time. A PSI above 0.10 warrants investigation; above 0.25 typically triggers model review or redevelopment.
Champion-challenger frameworks: Run new model variants alongside the production model on live originations, comparing performance on subsequent repayment outcomes. This is the most robust method for evaluating proposed model improvements before full deployment.

Back-testing and regulatory governance

Model back-testing compares predicted ECL against actual loss experience over a defined horizon. It is both a technical exercise and a governance obligation. RBI and the auditing community increasingly expect institutions to document back-testing results, explain variance between predicted and actual losses, and maintain clear escalation paths when a model exceeds defined performance tolerance thresholds.

Independent model validation, performed by a team or function separate from model development, is a best practice for banks and is becoming an expectation for larger NBFCs. Internal audit coverage of model risk is expanding. The practical requirement is that someone other than the model developer must be able to evaluate and challenge the model's assumptions, construction, and ongoing performance.

Institutions that invest in model governance infrastructure before regulatory pressure arrives are consistently better positioned during supervisory reviews. Model documentation, validation reports, and change management logs are auditable assets, not administrative overhead.

Common Pitfalls in Credit Risk Modelling

Even well-resourced institutions make systematic errors in credit model design and deployment. Understanding these failure modes is as important as understanding the modelling techniques themselves.

Poor data quality. Models built on uncleaned, inconsistently defined, or incomplete training data produce biased predictions. In credit risk, that means miscalibrated provisioning and mispriced loans, both of which affect profitability and regulatory standing.

Overfitting ML models. High AUC on the training sample that collapses on out-of-sample data is the most common failure mode for complex ML models built on small or homogeneous datasets. Regularisation, cross-validation, and mandatory OOT testing are non-negotiable.

Ignoring macroeconomic cycles. Models developed during benign credit environments do not automatically generalise to stress periods. Incorporating macroeconomic overlays (GDP growth, unemployment, real estate price indices, sector-specific stress indicators) is essential for ECL provisioning and forward-looking stress testing.

Bureau dependency risk. Institutions that rely almost entirely on bureau scores for underwriting implicitly assume that bureau coverage, data quality, and predictive validity remain stable across all borrower segments. None of these assumptions holds uniformly across India's credit market.

Lack of explainability. Models that cannot be explained to regulators, audit committees, or borrowers create governance and legal risk, independent of their predictive accuracy. This is not a soft concern; it is a model approval blocker.

Weak monitoring frameworks. Models degrade as population composition shifts and economic conditions change. Without systematic PSI tracking, Gini monitoring, and periodic back-testing, model decay goes undetected until it shows up in portfolio loss rates.

Inadequate governance documentation. Model development decisions that are made but not documented become institutional liabilities during audits and supervisory reviews.

Practical mitigation priorities:

Establish data quality standards and lineage tracking before model development begins
Build out-of-time validation into every model development process as a mandatory step
Document model assumptions, limitations, and governance decisions at the time of development
Implement automated monitoring dashboards for all production models
Plan model refresh cycles aligned with portfolio growth and economic regime changes
Define explicit model performance thresholds that trigger escalation, review, or redevelopment

Conclusion

Credit risk modelling in India has crossed an inflection point. What was once a compliance-driven exercise in provisioning adequacy has become a central driver of lending strategy, capital efficiency, and competitive differentiation.

Three converging forces are reshaping the landscape. First, regulatory expectations are rising. RBI's increasing attention to model governance, ECL provisioning quality, and AI-driven credit decisions will continue to raise the minimum viable standard for all lending institutions. Second, data infrastructure is maturing. The Account Aggregator ecosystem, GST integration, and bank statement analytics are giving lenders access to a richer picture of borrower financial health than bureau scores alone could ever provide. Third, machine learning has proven its operational value. Lenders with robust ML-enabled underwriting are demonstrating measurably better risk selection and portfolio performance in high-growth and underserved segments.

The institutions that will outperform over the next credit cycle are those that treat credit risk modelling not as a function but as an organisational capability, one that integrates data engineering, model science, credit judgment, and regulatory intelligence into a coherent operating model.

Modern credit risk infrastructure increasingly depends on high-quality identity, fraud, consent, and financial data orchestration capabilities. The lenders building those foundations today are not just preparing for the next RBI examination. They are building the systems that will define how credit works in India at scale.

FAQs

What is a PD credit risk model?

A PD (Probability of Default) model estimates the likelihood that a borrower will default on their debt obligations within a defined time window, typically 12 months. It is built using statistical techniques, most commonly logistic regression or gradient boosted trees, trained on historical borrower data including bureau scores, repayment behaviour, income, and credit utilisation. PD models are used for underwriting decisions, loan pricing, and expected credit loss provisioning under Ind AS 109.

What are the 5 Cs of credit risk analysis?

The 5 Cs are: Character (the borrower's repayment history and creditworthiness), Capacity (the ability to repay based on income and existing obligations), Capital (net worth and financial reserves), Collateral (assets pledged to secure the loan), and Conditions (the broader economic environment and purpose of the loan). In modern credit risk analytics, each C maps to specific model inputs: character maps to bureau history, capacity to income verification and FOIR, capital to balance sheet strength, collateral to LGD drivers, and conditions to macroeconomic overlays in PIT PD models.

What is LGD in banking?

LGD (Loss Given Default) measures the proportion of credit exposure that a lender loses when a borrower defaults, after accounting for recoveries from collateral liquidation, collections, and legal proceedings. It is expressed as a percentage: an LGD of 70% means the lender recovers 30 paise on every rupee of exposure at default. LGD varies significantly by product type. It is typically low for secured loans such as gold or mortgage, and high for unsecured personal loans and BNPL facilities where there is no collateral backstop.

What is EAD in Basel III?

EAD (Exposure at Default) is the estimated outstanding balance owed by a borrower at the moment of default. For term loans, EAD approximates the scheduled outstanding principal. For revolving products such as credit cards, working capital lines, and BNPL, EAD is harder to estimate because distressed borrowers tend to draw down available limits before defaulting. Basel III uses credit conversion factors (CCFs) to estimate EAD for undrawn commitments. EAD, multiplied by PD and LGD, produces the Expected Credit Loss used for provisioning and capital adequacy calculations.

How is AI used in credit risk management?

AI and machine learning are applied across the credit lifecycle in India. At origination, ML models (particularly gradient boosted trees) improve default prediction on thin-file and NTC borrowers by extracting signal from alternative data: bank statement cashflows, UPI transaction patterns, GST filing history, and account aggregator-enabled financial data. In portfolio management, behavioural ML models power early warning systems and collections prioritisation. AI-driven document processing automates income verification and financial statement analysis. The primary governance challenge is explainability: regulatory and audit requirements demand that model decisions can be articulated at an individual case level, not just at an aggregate performance level.

What data is used in Indian credit risk models?

Indian credit risk models draw on several data categories. Bureau data from CIBIL, Experian, CRIF High Mark, and Equifax provides trade line history, delinquency signals, utilisation, and enquiry patterns. Bank statement data, increasingly accessed through the Account Aggregator framework, provides cashflow-based income verification and expenditure analysis. GST return data verifies business revenue for MSME lending. Internal behavioural data from existing customer relationships feeds portfolio-level and collections models. Alternative signals, including device metadata and UPI patterns, are used in fintech underwriting for NTC segments, though their adoption in formal regulated lending remains at the experimental stage pending clearer regulatory guidance.

Explore how IDfy's integrated financial data orchestration, identity verification, and consent infrastructure can strengthen your credit risk models and underwriting workflows.

Credit Risk Modelling in India: PD, LGD, EAD, and Basel Norms Explained