当前位置:网站首页>Evaluation of classification model

Evaluation of classification model

2022-07-23 13:17:00 weixin_ nine hundred and sixty-one million eight hundred and se

Accuracy rate

  • The most common use is accuracy , That is, the correct percentage of predicted results :estimator.score()

Accuracy (Precision) And recall rate (Recall)

Confusion matrix

  • Under the classification task , Predicted results (Predicted Condition) With the right mark (True Condition) There are four different combinations , Make up the confusion matrix

image.png

Accuracy

  • Accuracy ( Precision rate ): The predicted result is the proportion of real positive cases in positive samples

P = T P T P + F P P=\frac{TP}{TP+FP} P=TP+FPTP

Recall rate

  • Recall rate ( Recall rate ): The proportion of real positive samples with positive prediction results

R = T P T P + F N R=\frac{TP}{TP+FN} R=TP+FNTP

The relationship between the two

  • Precision and recall are contradictory variables . Generally speaking , When the accuracy is high , The recall rate is often low ; When the recall rate is high , Precision is often low .

P-R curve

  • P-R The figure intuitively shows the recall rate of the learner in the sample population 、 Precision rate
  • If a learner P-R The curve is completely changed by the curve of another learner “ encase ”, It can be asserted that the performance of the latter is better than the former
  • If two learners P-R The curves intersect ,
    • You can compare P-R The size of the area under the curve , To some extent, it represents the relative success of the learner in precision and recall “ Double high ” The proportion of .
    • But this value is not easy to estimate , have access to “ Balance point ”(BEP) To measure , It is “ Precision rate = Incomplete rate ” The value of time , The higher one is better .

F1 Measure

  • BEP Or too simplistic , More often F1 Measure :

F 1 = 2 × P × R P + R F1=\frac{2\times P\times R}{P+R} F1=P+R2×P×R
notes :F1 Measurement is based on the harmonic average of precision and recall :
1 F = 1 2 ( 1 P + 1 R ) \frac{1}{F}=\frac{1}{2}(\frac{1}{P}+\frac{1}{R}) F1=21(P1+R1)

  • In some applications , The importance of precision and recall is different . For example, in the commodity recommendation system , In order to disturb users as little as possible , More hope that the recommended content is really what users are interested in , At this time, the accuracy is more important ; In the fugitive information retrieval system , More hope to miss as few fugitives as possible , At this point, recall is more important .F1 The general form of measurement —— F β F_{\beta} Fβ, Can let us express the accuracy / Different preferences for recall , It is defined as

F 1 = ( 1 + β ) 2 × P × R ( β 2 × P ) + R F1=\frac{(1+\beta)^2\times P\times R}{(\beta ^2 \times P)+R} F1=(β2×P)+R(1+β)2×P×R
β > 1 \beta>1 β>1 Time recall has a greater impact , β < 1 \beta<1 β<1 Time accuracy has a greater impact

ROC and AUC

  • ROC The vertical axis of the curve is “ True case rate ”(TPR), TPR(True Positive Rate) It can be understood as all positive classes , How many are predicted to be positive classes ), The horizontal axis is " The false positive rate is "(FRP), FPR(False Positive Rate) It can be understood that in all anti classes , How many are predicted to be positive classes ( Positive class prediction error ) . The two are defined as

T P R = T P T P + F N F P R = F P F P + T N TPR=\frac{TP}{TP+FN}\qquad FPR=\frac{FP}{FP+TN} TPR=TP+FNTPFPR=FP+TNFP
image.png

  • If a learner ROC The curve is completely changed by the curve of another learner “ encase ”, It can be asserted that the performance of the latter is better than the former
  • If two learners P-R The curves intersect , Compare ROC The area under the curve , namely AUC.
原网站

版权声明
本文为[weixin_ nine hundred and sixty-one million eight hundred and se]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/204/202207230558483022.html