False Positives and False Negatives
With the growing significance of data science, the chances are high that you have come across the words False Positive and False Negative.
But have you ever wondered what exactly these terms mean and their significance in the world of data solutions, data science, and machine learning? This article will discuss everything related to False Positives, False Negatives, and their importance.
What Are False Positives and False Negatives?
As you start deeply exploring the world of ML, data science, and statistics, you will find out that False Positives and False Negatives are two phenomenons of prime importance. Simply put, they are two types of errors that often occur when learning about hypothesis testing. They are technically known as Type I error and Type II error, respectively.
A False Positive (type I error) occurs when we reject a null hypothesis when, in reality, it was actually true.
A False Negative (type II error) occurs when we accept a null hypothesis when, in reality, it was false.
So, how does this all relate to machine learning predictions?
Machine learning models are built to perform specific tasks and predict the accuracy of certain situations. In such cases, a False Positive (FP) is a positive outcome that the machine learning model predicted incorrectly while a False Negatives (FN) is a negative outcome that the model predicted incorrectly.
When You Should Care About One Over the Other
1. A Cancer Detection Test
Suppose you have built a machine learning model to determine whether a patient has cancer or not.
The hypothesis we are using here is: The patient has cancer.
In this case, a False positive would be when your model predicts that a patient has cancer when, in reality, he does not. Similarly, a False Negative would be when your ML model predicts that a patient does not have cancer when, in fact, he does.
In such cases, False Negatives are what pose serious risks, and they are what you should care and worry about over False Positives. False Positives are okay here because the maximum that would happen is the patient might get an initial shock. But, doctors will later find out that they don’t have cancer. This is far better than undetected cancer, as such patients will miss crucial treatments and medicines, causing serious damage to their body, and ultimately it will cost them their lives.
2. Email Spam-Filtering
Let’s take another use case of email spam-filtering. Suppose, your ML model is built to filter and automatically remove spam for your users successfully.
The hypothesis here is: This email is spam.
A False Positive would be your ML model predicting that an email is spam when, in fact, it is not.
And a False Negative would be your ML model predicting that an email is not spam when, in reality, it actually is.
In this case, we can say that False Negatives are fine. The problem causing the error here is the Type I error, that is, False Positives, because there are chances of missing important emails if there are too many False Positives and the spam-filtering program automatically removes them. It is definitely better to get all important emails along with a few spam ones, which could easily be ignored.
From the use cases above, it is clear that there are no fixed scenarios when you should care more about one error over the other. Both False Positives and False Negatives have their own significance depending upon the problems and situations.
A data scientist would always prefer to have no errors at all. However, that is not possible to achieve. It's normal for errors to occur, and each of them comes with its own set of complications and difficulties. So it is up to the data scientist, the designer, or the person who is performing the hypothesis testing to decide which of the two errors is more problematic and needs to be curtailed than the other one.