A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The method was originally developed for operators of military radar receivers
Videos
Topic | Access |
---|---|
Train, Test, & Validation Data Sets (7 min) | https://www.youtube.com/watch?v=Zi-0rlM4RDs |
Confusion Matrix (7 min) | https://www.youtube.com/watch?v=Kdsp6soqA7o |
ROC and AUC, Clearly Explained (16 min) |
https://www.youtube.com/watch?v=4jRBRDbJemM |
Evaluating Classifiers: Gains and Lift Charts ( 14 min) | https://www.youtube.com/watch?v=1dYOcDaDJLY |
Articles
Medium | Topic | Access |
---|---|---|
Excercise | Understanding Confusion Matrix in R | https://www.datacamp.com/community/tutorials/confusion-matrix-calculation-r |
Exercise with Answer | Model Evaluation | https://www.r-exercises.com/2016/12/02/model-evaluation-exercise-1/ |
Exercise with Answer | Model Evaluation | https://www.r-exercises.com/2016/12/22/model-evaluation-2/ |
Tutorial | Lift Charts Updated by IBM |
https://www.ibm.com/docs/en/spss-statistics/28.0.0?topic=customers-cumulative-gains-lift-charts |
Tutorial | Generate ROC Curve Charts for Print and Interactive Use | https://cran.r-project.org/web/packages/plotROC/vignettes/examples.html |
From a confusion matrix
- condition positive (P)
- the number of real positive cases
- condition negative (N)
- the number of real negative cases
- true positive (TP)
- True positive Prediction=A test result that correctly indicates the presence of a condition or characteristic
- true negative (TN)
- True negative Prediction=A test result that correctly indicates the absence of a condition or characteristic
- false positive (FP)
- False positive Prediction=A test result which wrongly indicates that a particular condition or attribute is present
- false negative (FN)
- False negative Prediction=A test result which wrongly indicates that a particular condition or attribute is absent
-
sensitivity
- sensitivity, recall, hit rate, or true positive rate (TPR)
-
specificity
- specificity, selectivity or true negative rate (TNR)
-
precision
- precision or positive predictive value (PPV)
- negative predictive value (NPV)
- miss rate or false negative rate (FNR)
- fall-out or false positive rate (FPR)
- false discovery rate (FDR)
- false omission rate (FOR)
- Positive likelihood ratio (LR+)
- Negative likelihood ratio (LR-)
- prevalence threshold (PT)
- threat score (TS) or critical success index (CSI)
- Prevalence
-
rpp (Rate of Positive Predictions)
rpp = (tp+fp)/(tp+fn+fp+tn)
When rpp is 0.1, the number of times we have had some positive prediction are 10% of the total(regardless of our goodness)
-
Accuracy
- accuracy (ACC)
- balanced accuracy (BA)
- F1 score
- is the harmonic mean of precision and sensitivity:
- phi coefficient (φ or rφ) or Matthews correlation coefficient (MCC)
-
- Fowlkes–Mallows index (FM)
-
- informedness or bookmaker informedness (BM)
-
- markedness (MK) or deltaP (Δp)
-
- Diagnostic odds ratio (DOR)
-
Sources: Fawcett (2006),[2] Piryonesi and El-Diraby (2020),[3] Powers (2011),[4] Ting (2011),[5] CAWCR,[6] D. Chicco & G. Jurman (2020, 2021),[7][8] Tharwat (2018).[9]
==================================================================================
#To create a confusion matrix with complete analysis of measures
library(caret)
KNNconfusionMatrix<-caret::confusionMatrix(Predictionsvector,actualvector)
KNNconfusionMatrix
#Easiest and best visualization gain and lift in R:
#install.packages(‘CustomerScoringMetrics’)
library(CustomerScoringMetrics)
CustomerScoringMetrics::cumGainsChart(Predictionsvector,actualvector)
CustomerScoringMetrics::liftChart(Predictionsvector,actualvector)
# ROC gain lift With more elaboration in R
##################################################
# Create a prediction object from predictions and actuals of any test data
# Then by Various performance analysis you will have ROC , cumulative gain and lift charts
# rpp = (tp+fp)/(tp+fn+fp+tn) (Rate of Positive Predictions)
#################################################
ROCRpredictionObjfromAnyModel<-ROCR::prediction(as.numeric(PredictionsfromAnyTestset),as.numeric(ActualsfromAnyTestset))
plotableROC<- ROCR::performance(ROCRpredictionObjfromAnyModel,measure=”tpr”,x.measure=”fpr”)
plot(plotableROC, col=”orange”, lwd=2, main=”ROC curve for blah blah”)
plotableGain<- ROCR::performance(ROCRpredictionObjfromAnyModel,measure=”tpr”,x.measure=”rpp”)
plot(plotableGain, col=”orange”, lwd=2, main=”Gaincurve for blah blah”)
#################################################
# For example we will create a prediction object from KNN model prediction vector
#####################################################
ROCRpredictionObjfromKNN<-ROCR::prediction(as.numeric(amirKNNextraxtedPredictionsvector),as.numeric(actualvectoradmitancefromtestdata))
# Now we will use the prediction object we created above as a parameter that we pass to performance function
# this performance object will contain tpr and fpr
# ROC Chart:
ROCKNN <- ROCR::performance(ROCRpredictionObjfromKNN,measure=”tpr”,x.measure=”fpr”)
# if we plot tpr vs fpr it is ROC by definition 🙂
plot(ROCKNN, col=”orange”, lwd=2, main=”ROC curve ROCRpredictionObjKNN”)
# Now we will use the prediction object we created above as a parameter that we pass to performance function
# this performance object will contain tpr and rpp (Rate of Positive Predictions)
# if we plot tpr vs rpp it is Gains chart by definition 🙂
# Gains Chart:
gainKNN <- ROCR::performance(ROCRpredictionObjfromKNN,measure=”tpr”,x.measure=”rpp”)
plot(gainKNN, col=”orange”, lwd=2, main=”gain curve KNN”)
# Lift Chart:
liftchartKNN<-ROCR::performance(ROCRpredictionObjfromKNN,”lift”,”rpp”)
plot(liftchartKNN, main=”Lift curve KNN”, colorize=T)
#AUC
library(ROCit)
roc_empirical <- ROCit::rocit(score = as.numeric(amirKNNextraxtedPredictionsvector), class = as.numeric(actualvectoradmitancefromtestdata), negref = 1)
# to get the area under the curve = AUC
summary(roc_empirical)
plot(roc_empirical)
=============================================
http://mlwiki.org/index.php/Cumulative_Gain_Chart
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4608333/
https://idswater.com/2020/09/19/what-is-a-gain-chart/
http://www2.cs.uregina.ca/~dbd/cs831/notes/lift_chart/lift_chart.html
AUC: https://www.r-bloggers.com/2016/11/calculating-auc-the-area-under-a-roc-curve/
https://www.rdocumentation.org/packages/caret/versions/5.07-001/topics/predict.train
https://www.datanovia.com/en/lessons/determining-the-optimal-number-of-clusters-3-must-know-methods/