# PR曲线和ROC曲线间的关系

2021年09月15日 阅读数：4

### An introduction to ROC analysis

https://ccrma.stanford.edu/workshops/mir2009/references/ROCintro.pdfpython

Receiver Operator Characteristic (ROC) curves are commonly used to present results for binary decision problems in machine learning. However, when dealing with highly skewed datasets, Precision-Recall
(PR) curves give a more informative picture of an algorithm’s performance. We show that a deep connection exists between ROC space and PR space, such that a curve dominates in ROC space if and only if it dominates in PR space. A corollary is the notion of an achievable PR curve, which has properties much like the convex hull in ROC space; we show an efficient algorithm for computing this curve. Finally, we also note differences in the two types of curves are significant for algorithm design. For example, in PR space it is incorrect to linearly interpolate between points. Furthermore, algorithms that optimize the area under the ROC curve are not guaranteed to optimize the area under the PR curve.算法

## 使用场景

1. ROC曲线因为兼顾正例与负例，因此适用于评估分类器的总体性能，相比而言PR曲线彻底聚焦于正例。机器学习

2. 若是有多份数据且存在不一样的类别分布，好比信用卡欺诈问题中每月正例和负例的比例可能都不相同，这时候若是只想单纯地比较分类器的性能且剔除类别分布改变的影响，则ROC曲线比较适合，由于类别分布改变可能使得PR曲线发生变化时好时坏，这种时候难以进行模型比较；反之，若是想测试不一样类别分布下对分类器的性能的影响，则PR曲线比较适合。性能

3. 若是想要评估在相同的类别分布下正例的预测状况，则宜选PR曲线。学习

4. 类别不平衡问题中，ROC曲线一般会给出一个乐观的效果估计，因此大部分时候仍是PR曲线更好。测试

5. 最后能够根据具体的应用，在曲线上找到最优的点，获得相对应的precision，recall，f1 score等指标，去调整模型的阈值，从而获得一个符合具体应用的模型。优化

### 机器学习之类别不平衡问题 (3) —— 采样方法

https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python/this

## Reference:

1. Tom Fawcett. An introduction to ROC analysis
2. Jesse Davis, Mark Goadrich0 The Relationship Between Precision-Recall and ROC Curves
3. Haibo He, Edwardo A. Garcia. Learning from Imbalanced Data
4. 周志华. 《机器学习》
5. Pang-Ning Tan, etc. Introduction to Data Mining
6. https://stats.stackexchange.com/questions/7207/roc-vs-precision-and-recall-curves