当前位置:网站首页>Effect evaluation of regression model under credit product quota pricing scenario
Effect evaluation of regression model under credit product quota pricing scenario
2022-06-23 14:45:00 【Tomato risk control】
In the decision-making process of credit risk control , We must put models in some nodes to use , for example A card ( apply )、B card ( Behavior )、C card ( collection )、F card ( Anti fraud ) etc. . When we build a model offline , Only through multiple tests inside and outside the sample , And after the model performance index reaches the standard , The model will be deployed to the line to implement the application . however , In this work link , How to evaluate the performance of the model , And reasonably interpret the meaning of each evaluation index , It is the ability that we must master when engaging in risk control model . For the classification of model categories , Understand from the perspective of algorithm principle , Include categories 、 Return to 、 clustering 、 Dimensionality reduction is 4 class , In the scenarios of risk control or marketing in the financial field , Classification and regression models are widely used . Evaluation method for each category model , Because the mechanism of model fitting training is different , There are also great differences in relevant evaluation indicators . This paper will select the commonly used regression algorithm model , To introduce the evaluation dimensions and analysis ideas of the model , The specific content will be described through the actual business scenario of credit line pricing .
1、 The business scenario introduces that a commercial bank plans to develop a credit line model based on the stock data , For users who apply for incoming documents , First, the same initial quota will be preset amount, Then by analyzing the user's qualification and risk situation , Get the specific quota coefficient of each user n, Then the final credit line of the user is amount*n. Case data contains 10000 Samples and 20 Features , The example is shown in the figure 2 Shown , among ID Is the sample primary key ,Y Is the target variable ( Quota coefficient ),X01~X20 Is the characteristic variable . Because the target variable is continuous , So the model type is regression problem , It can be realized by regression model algorithm , Common examples are linear regression 、 Random forests 、XGBoost、KNN etc. .

chart 1 Modeling sample data
For modeling sample data , The results of simple descriptive statistical analysis are shown in Figure 2 Shown , It can be seen that each variable field of the sample is of numerical type , And there is no missing value . among , Target variable Y Field is float type , The value range is [0.0303931, 0.62798), It meets the value type requirements of the target variables of the regression model .

chart 2 Distribution of characteristic values
2、 Model training and fitting
We select the most commonly used linear regression in the regression algorithm LinearRegression To train the model , In order to avoid the influence of the dimension of each feature on the model fitting , Now the modeling data is z-score Standardized treatment , The implementation code and result output are as shown in Figure 3、 chart 4 Shown .
chart 3 Feature standardization code

chart 4 Results of feature standardization
According to the standardized data , Linear regression algorithm is adopted LinearRegression To realize the fitting training of the model , The specific code is shown in the figure 5 Shown .
chart 5 Model fitting training
The variable coefficient result of the final linear function relationship of the model is shown in Figure 6 Shown , Corresponding features in turn X01~X10 Function coefficients and constant terms of , Here, we can intuitively understand the positive and negative influence of each model variable on the target variable and the corresponding weight .

chart 6 Model variable coefficient
3、 Regression model evaluation
When the model training is successful , Next, we need to focus on evaluating the performance of the model , Specifically, it is analyzed according to relevant indicators , Include R_Square、MAE、MAPE、MSE、RMSE Equal index .
(1)MAE
MAE(Mean Absolute Error): Mean absolute error , Is the average of the absolute errors , It can better reflect the actual situation of predicted value error .MAE The smaller the value , It shows that the higher the accuracy of the model .MAE The principle calculation formula is as follows ,Python The code implementation is shown in the figure 7 Shown .

chart 7 Mean absolute error MAE
(2)MAPE
MAPE(Mean Absolute Percentage Error): Mean absolute percentage error , comparison MAE The indicator has more denominators yi.MAPE The smaller the value , It shows that the higher the accuracy of the model .MAPE The principle calculation formula is as follows ,Python The code implementation is shown in the figure 8 Shown .

chart 8 Mean absolute percentage error MAPE
(3)MSE
MSE(Mean Square Error): Mean square error , Is the square of the difference between the real value and the predicted value , Then find the average of the sum , It is generally used to detect the deviation between the predicted value and the real value of the model .MSE The smaller the value , It shows that the higher the accuracy of the model .MSE The principle calculation formula is as follows ,Python The code implementation is shown in the figure 9 Shown .

chart 9 Mean square error MSE
(4)RMSE
RMSE(Root Mean Square Error): Root mean square error , Also known as standard error , Is the arithmetic square root of the mean square error .RMSE The smaller the value , It shows that the higher the accuracy of the model .RMSE The principle calculation formula is as follows ,Python The code implementation is shown in the figure 10 Shown .

chart 10 Root mean square error RMSE
(5)R_Square
R_Square(Coefficientof determination):R Fang , Coefficient of determination , It reflects the accuracy of the model fitting data . commonly R_Square The value range is 0~1, The closer the value is. 1, Explain the equation X Variable to target Y The better the ability to explain , The fitting degree of model training is also good , In real business ,R_Square>0.4 It indicates that the performance of the model is good .R_Square The principle calculation formula is as follows ,Python The code implementation is shown in the figure 11 Shown .

chart 11 Coefficient of determination R_Square
According to the principle logic and implementation code of the above indicators , Output the regression model indicators of this case scenario , The result is shown in Fig. 12 Shown , It is known that the model determination coefficient R_Square>0.4, The fitting effect of the model is good , And other error indicators are small , The accuracy of the model is high .

chart 12 Regression model indicators
Above contents , It is a simple modeling process of regression algorithm in credit product quota pricing scenario , And the regression model fitting the main evaluation index dimensions after training , For more details, we will continue to give you a detailed introduction in the following content . In order to make everyone familiar with the application of regression model in actual business scenarios , And master the evaluation method of regression model , We have prepared the sample data synchronized with the content of this article Python Code , For your reference , For details, please move to the knowledge planet to view the relevant content .

…
~ Original article
边栏推荐
- Is flush a stock? Is it safe to open an account online now?
- WPF (c) new open source control library: newbeecoder UI waiting animation
- ICML 2022 | 上下文集成的基于transformer的拍卖设计神经网络
- Do you know which position in the IT industry has the most girls?
- 【二级等保】过二级等保用哪个堡垒机品牌好?
- 【深入理解TcaplusDB技术】TcaplusDB构造数据
- 巴比特 | 元宇宙每日必读:Meta、微软等科技巨头成立元宇宙标准论坛组织,华为、阿里加入,英伟达高管称欢迎来自加密世界的参与者...
- Networknt:: JSON schema validator source code appreciation
- Thinking and Practice on Quality Standardization (suitable for product, development, testing and management post learning)
- [digital signal processing] linear time invariant system LTI (judge whether a system is a "non time varying" system | case 1 | transform before shift | shift before transform)
猜你喜欢
![[deeply understand tcapulusdb technology] tmonitor background one click installation](/img/0a/742503e96a9b51735f5fd3f598b9af.png)
[deeply understand tcapulusdb technology] tmonitor background one click installation

LEGO announces price increase, speculators are more excited

raspberry pi安装 wiringpi

k8s--部署单机版MySQL,并持久化
![[deeply understand tcapulusdb technology] tcapulusdb import data](/img/c5/fe0c9333b46c25be15ed4ba42f7bf8.png)
[deeply understand tcapulusdb technology] tcapulusdb import data

建議自查!MySQL驅動Bug引發的事務不回滾問題,也許你正面臨該風險!

JSR303数据校验
![[deeply understand tcapulusdb technology] one click installation of tmonitor background](/img/0a/742503e96a9b51735f5fd3f598b9af.png)
[deeply understand tcapulusdb technology] one click installation of tmonitor background

Error when Oracle enters sqlplus

图解OneFlow的学习率调整策略
随机推荐
ts封装请求
2021-05-22
2021-05-08
AI intelligent robot saves us time and effort
The largest IPO of Hong Kong stocks this year, with a net worth of 66billion, is the "King" sitting on the mine
Auto - vérification recommandée! Les bogues MySQL ne font pas reculer les transactions, peut - être êtes - vous à risque!
中国矿大团队,开发集成多尺度深度学习模型,用于 RNA 甲基化位点预测
2021-05-08
【深入理解TcaplusDB技术】TcaplusDB导入数据
阿里 Seata 新版本终于解决了 TCC 模式的幂等、悬挂和空回滚问题
Hot Recruitment! The second Tencent light · public welfare innovation challenge is waiting for you to participate
[digital signal processing] linear time invariant system LTI (judge whether a system is a "non time variant" system | case 2)
5分钟快速上线Web应用和API(Vercel)
ICML 2022 𞓜 context integrated transformer based auction design neural network
2022 college entrance examination quarterly essay winners announced
【深入理解TcaplusDB技术】TcaplusDB构造数据
MySQL 创建和管理表
When I went to oppo for an interview, I got numb...
物流贸易相关
ASP. Net C pharmacy management information system (including thesis) graduation project [demonstration video]