Confusion Matrix – In detail


Confusion Matrix :-

A matrix that helps us to clear our confusion about data during the process of building an accurate model. Confusion matrix which is also known as an error matrix, help us to see the performance of our algorithm. By visualization of the matrix, we can be sure about and plot our related ROC (Receiver Operating Characteristic) curve.

Example of Confusion Matrix:-

Smokers in the Hostel:-

A hostel in which 100 boys are staying with a strict rule of not to smoke in the building. But one day owner of that hostel smell the smoke of cigarette and try to catch the boy but fail. So he hired a warden to do the job of finding the number of smokers in the hostel. But warden is not so good in his job, so sometimes he actually caught the student and sometimes he assume that smokers friend is also a smoker. Based on his prediction vs actual, he submits his report after 10 days.

Here is the report:-


Let me explain to clear your confusion:-

Matrix value and their meaning:-

20  —>   Actually a smoker and Prediction is also Yes       —> True Positive

5   —>  Actually a smoker and Prediction is No                —>  False Negative

10 —> Actually not a smoker but Prediction is Yes          —> False Positive

65  —>  Actually not a smoker and Prediction is also No  —> True Negative

As a result, SMOKERS(20) along with NON-SMOKER(10) get punished.

So from the above matrix, we can see that two cases are there which provide us wrong information about boys i.e

False Positive     —>   Type I Error

False Negative   —>   Type II Error

Similarly in our machine learning model, sometimes our algorithm is also doing such kind of mistakes as like warden, so we need to check the accuracy of our model before taking any of our decision.

Terminology: –

  1. Sensitivity, Recall, Hit Rate or True Positive Rate (TPR)

                  TPR = True Positive / (True Positive + False Negative)

                          = 20 / (20+ 5)

  • Specificity, Selectivity or True Negative Rate (TNR)

                 TNR = True Negative / (True Negative + False Positive)

                          = 65 / (65+ 10)

  • Precision or Positive predicted value (PPV)

                 PPV = True Positive / (True Positive + False Positive)

                          = 20 / (20+ 10)

  • Negative predicted value(NPV)

                 NPV = True Negative / (True Negative + False Negative)

                          = 65 / (65+ 5)

  • Miss Rate or False Negative Rate(FNR)

                  FNR = False Negative / (True Positive + False Negative)

                          = 5 / (20+ 5)

  • Fall-out or False positive rate(FPR)

                  FPR = False Positive / (True Negative + False Positive)

                          = 10 / (65+ 10)

  • Accuracy (ACC)

                ACC = (True Negative + True Positive) /

                      (True Negative + True Positive + False Positive + False Negative)

                          = 65 + 20 / (65+ 20+ 10 + 5)

  • F1 Score – Harmonic mean of precision and sensitivity

                F1 = 2  *  (PPV*TPR)/(PPV+TPR)

                     = 2 *   True Positive/(2*True Positive + False Positive +                                                                                                                                               False Negative)    

All these above calculation help us to plot the ROC curve. We will discuss in our next blog.  

Leave a Reply