CAP、AR、ROC、AUC、KS 监控模型的区分能力

CAP、AR、ROC、AUC、KS 监控模型的区分能力

AR值(Accuracy Ratio)和KS值(Kolmogorov-Smirnov)主要是为了监控模型的区分能力。

CAP Introduction

The cumulative accuracy profile (CAP) is used in data science to visualize the discriminative power of a model. The CAP of a model represents the cumulative number of positive outcomes along the y-axis versus the corresponding cumulative number of a classifying parameter along the x-axis (i.e. for a point (0.1,0.3) on the cap means that the worst 10% borrower given by the model includes 30% of defautls). The CAP is distinct from the receiver operating characteristic (ROC), which plots the true-positive rate against the false-positive rate.

An example is a model that predicts whether a borrower is default (positive outcome) by each individual from a group of people (classifying parameter) based on factors such as their working status, annual income, loan purpose etc.

If group members would be contacted at random, the cumulative number of defaults would rise linearly toward a maximum value corresponding to the total number of borrowers within the group. This distribution is called the "random" CAP.

A perfect prediction, on the other hand, determines exactly which group members will buy the product, such that the maximum number of products sold will be reached with a minimum number of calls. This produces a steep line on the CAP curve that stays flat once the maximum is reached (contacting all other group members will not lead to more products sold), which is the "perfect" CAP.

The CAP profiles for the perfect, good and random model predicting the default borrowers from a pool of 100 individuals.

A successful model predicts the likelihood of default individuals and ranks these probabilities to produce a list of potential customers to be contacted first. The resulting cumulative number of sold products will increase rapidly and eventually flatten out to the given maximum as more group members are contacted. This results in a distribution that lies between the random and the perfect CAP curves.

Modified by Xuan, By Victoriaweller - Own work, CC BY-SA 4.0,

Analyse CAP

Analyzing a CAP The CAP can be used to evaluate a model by comparing the curve to the perfect CAP in which the maximum number of positive outcomes is achieved directly and to the random CAP in which the positive outcomes are distributed equally. A good model will have a CAP between the perfect CAP and the random CAP with a better model tending to the perfect CAP.

The accuracy ratio (AR) is defined as the ratio of the area between the model CAP and the random CAP and the area between the perfect CAP and the random CAP. For a successful model the AR has values between zero and one, with a higher value for a stronger model.

Another indication of the model strength is given by the cumulative number of positive outcomes at 50% of the classifying parameter. For a successful model this value should lie between 50% and 100% of the maximum, with a higher percentage for stronger models.

ROC Introduction

The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The true-positive rate is also known as sensitivity, recall or probability of detection in machine learning. The false-positive rate is also known as probability of false alarm and can be calculated as (1 − specificity).

To draw an ROC curve, only the true positive rate (TPR) and false positive rate (FPR) are needed (as functions of some classifier parameter). The TPR defines how many correct positive results occur among all positive samples available during the test. FPR, on the other hand, defines how many incorrect positive results occur among all negative samples available during the test.

An ROC space is defined by FPR and TPR as x and y axes, respectively, which depicts relative trade-offs between true positive (benefits) and false positive (costs). Since TPR is equivalent to sensitivity and FPR is equal to 1 − specificity, the ROC graph is sometimes called the sensitivity vs (1 − specificity) plot. Each prediction result or instance of a confusion matrix represents one point in the ROC space.

The best possible prediction method would yield a point in the upper left corner or coordinate (0,1) of the ROC space, representing 100% sensitivity (no false negatives) and 100% specificity (no false positives). The (0,1) point is also called a perfect classification. A random guess would give a point along a diagonal line (the so-called line of no-discrimination) from the left bottom to the top right corners (regardless of the positive and negative base rates). An intuitive example of random guessing is a decision by flipping coins. As the size of the sample increases, a random classifier's ROC point tends towards the diagonal line. In the case of a balanced coin, it will tend to the point (0.5, 0.5).

The diagonal divides the ROC space. Points above the diagonal represent good classification results (better than random); points below the line represent bad results (worse than random). Note that the output of a consistently bad predictor could simply be inverted to obtain a good predictor.

Let us look into four prediction results from 100 positive and 100 negative instances:

Plots of the four results above in the ROC space are given in the figure. The result of method A clearly shows the best predictive power among A, B, and C. The result of B lies on the random guess line (the diagonal line), and it can be seen in the table that the accuracy of B is 50%. However, when C is mirrored across the center point (0.5,0.5), the resulting method C′ is even better than A. This mirrored method simply reverses the predictions of whatever method or test produced the C contingency table. Although the original C method has negative predictive power, simply reversing its decisions leads to a new predictive method C′ which has positive predictive power. When the C method predicts p or n, the C′ method would predict n or p, respectively. In this manner, the C′ test would perform the best. The closer a result from a contingency table is to the upper left corner, the better it predicts, but the distance from the random guess line in either direction is the best indicator of how much predictive power a method has. If the result is below the line (i.e. the method is worse than a random guess), all of the method's predictions must be reversed in order to utilize its power, thereby moving the result above the random guess line.

Area Under the ROC Curve (AUC)

When using normalized units, the area under the curve (often referred to as simply the AUC) is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming 'positive' ranks higher than 'negative').

  • Meanings: Beacuse we calculate area in 1x1 area, so AUC must between 0 and 1. Assign positive for values over threshold, and negative for below. Randomly choose a positive sample and a negative sample, the AUC is the probability of the classifier successfully classify positive sample than negative sample. To sum up, a classifier with higher AUC will have higher accuracy.

  • Criterion: AUC = 1,是完美分类器,采用这个预测模型时,存在至少一个阈值能得出完美预测。绝大多数预测的场合,不存在完美分类器。 0.5 < AUC < 1,优于随机猜测。这个分类器(模型)妥善设定阈值的话,能有预测价值。 AUC = 0.5,跟随机猜测一样(例:丢铜板),模型没有预测价值。 AUC < 0.5,比随机猜测还差;但只要总是反预测而行,就优于随机猜测。

但是如果只用AUC来判断的话,要注意:AUC can hide performance issuces


KS = max(TPR-FPR)


Receiver operating characteristic在新标签页中打开

Cumulative accuracy profile在新标签页中打开