Confusion Matrix as a technique to understand Cyber Attacks and gain insights

Harshal_Atmaramani
5 min readJun 10, 2021

Disclaimer: This is a compilation of articles I’ve read during my research. I have made use of the references mentioned below in the Story to compile and form this article.

A task of writing an article was given to us wherein we have to write about cyber crime cases where confusion matrix or its two types of error had some sort of role to play. Hope you find it useful!

When we hear any term, the first question that comes to our mind is what it would be? With the rapid increase of digitalization, crime related to it increases. Innumerable hackers are waiting to grab any minute fault that happens in our architecture that would come to their use.

Particularly in the last decade, Internet usage has been growing rapidly. However, as the Internet becomes a part of the day to day activities, cybercrime is also on the rise.

What is Cybercrime?

Cybercrime is a criminal activity that either targets or uses a computer, a computer network, or a networked device. The hackers ask for ransom with the exchange of the most vulnerable data for the company — they procured by hacking the system. Some cybercriminals are organized, use advanced techniques, and are highly technically skilled — it would become very hard to suspect if somebody is scrutinizing your computer.

Cyber-attacks have become one of the biggest problems of the world. They cause serious financial damages to countries and people every day. The increase in cyber-attacks also brings along cyber-crime. The key factors in the fight against crime and criminals are identifying the perpetrators of cyber-crime and understanding the methods of attack.

Cybercrime will cost nearly $6 trillion per annum by 2021 as per the cybersecurity ventures report in 2020.

For illegal activities, cybercriminals utilize any network computing devices as a primary means of communication with a victims’ devices, so attackers get profit in terms of finance, publicity and others by exploiting the vulnerabilities over the system. Cybercrimes are steadily increasing daily.

Examples of Cybercrime

  • Email and internet fraud.
  • Identity fraud.
  • Theft of financial or card payment data.
  • Theft and sale of corporate data.
  • Cyberextortion (demanding money to prevent a threatened attack).
  • Ransomware attacks (a type of cyber extortion).
  • Cryptojacking (where hackers mine cryptocurrency using resources they do not own).
  • Cyberespionage (where hackers access government or company data)

Use of Machine Learning in Cyber Security

Machine learning has become a vital technology for cybersecurity. Machine learning preemptively stamps out cyber threats and bolsters security infrastructure through pattern detection, real-time cyber crime mapping and thorough penetration testing.

Confusion Matrix

The confusion matrix was invented in 1904 by Karl Pearson. He used the term Contingency Table. A confusion matrix is a performance measurement technique for Machine learning classification problems.

A confusion matrix is a table that outlines different predictions and test results and contrasts them with real-world values. Confusion matrices are used in statistics, data mining, machine learning models, and other artificial intelligence (AI) applications. A confusion matrix can also be called an error matrix. Confusion matrices are used to make the in-depth analysis of statistical data faster and the results easier to read through clear data visualization. The tables can help analyze faults in statistics, data mining, forensics, and medical tests. A thorough analysis helps users decide what results indicate how errors are made rather than merely assessing performance.

In field of machine learning Confusion matrix is often used to visualize the performance of classification algorithm.

Let’s represent Confusion Matrix

Note: Let we consider a model that predict a person suffering from cancer or not.

Let’s unwrap it :

  • TP — TP stand for true positivethat means actual data was positiveand our model also predicted positive.
    .eg. If a person was suffering from cancer and model also predicted that person is suffering from cancer then this is called TP
  • FP — FP stand for false positive that means actual data was negative but our model predicted positive.
    .eg. If a person was not suffering from cancer but our model predicted person is suffering from cancer then this is called FP
  • FN — FN stand for false negative that means actual data was positivebut our model predicted negative.
    .eg. If a person was suffering from cancer but our model predicted person is not suffering from cancer then this is called FN
  • TN — TN stand for true negative that means actual data was negative and and our model also predicted negative.
    .eg. If a person was not suffering from cancer and our model predicted person is not suffering from cancer then this is called TN

Final Touch

Mathematics and Calculation

Problem Statement — Let’s we have total 165 patient they are tested for a disease on positive or negative scale.

Confusion Matrix for the above problem statement

This is a list of rates that are often computed from a confusion matrix for a binary classifier:

Accuracy: Overall, how often is the classifier correct?

  • (TP+TN)/total = (100+50)/165 = 0.91

Misclassification Rate: Overall, how often is it wrong?

  • (FP+FN)/total = (10+5)/165 = 0.09
  • equivalent to 1 minus Accuracy
  • also known as “Error Rate”

True Positive Rate: When it’s actually yes, how often does it predict yes?

  • TP/actual yes = 100/105 = 0.95
  • also known as “Sensitivity” or “Recall”

False Positive Rate: When it’s actually no, how often does it predict yes?

  • FP/actual no = 10/60 = 0.17

True Negative Rate: When it’s actually no, how often does it predict no?

  • TN/actual no = 50/60 = 0.83
  • equivalent to 1 minus False Positive Rate
  • also known as “Specificity”

Precision: When it predicts yes, how often is it correct?

  • TP/predicted yes = 100/110 = 0.91

Prevalence: How often does the yes condition actually occur in our sample? actual yes/total = 105/165 = 0.64

Conclusion

I hope you found the article helpful and it helped you understand how Confusion Matrix can be used as a technique to understand Cyber Security threats and gain insights from it.

Thanks for reading!

You’re awesome😄😄

— Harshal Atmaramani

--

--

Harshal_Atmaramani

Hi there, myself Harshal Atmaramani, ARTH Learner | ASPEN FELLOW | CTM at LiGHT Wardha, GYWS | SIH'20 Finalist | NIC'20 Regionalist | BIT Wardha