In this section, we will delve deeper into the field of computer vision and explore various concepts related to evaluating and interpreting the outcomes of classification tasks.
Reading the classification table outcome is an important step in evaluating the performance of a classification model. The classification table, also known as a confusion matrix, provides a summarized view of the predictions made by the model compared to the ground truth labels. It consists of rows and columns representing the actual and predicted classes, respectively. Here's how to read the classification table outcome:
True negatives (TN) These are the cases where the model correctly predicted the negative class. |
False positives (FP) These are the cases where the model incorrectly predicted the positive class when the actual class was negative. |
False negatives (FN) These are the cases where the model incorrectly predicted the negative class when the actual class was positive. |
True positives (TP) These are the cases where the model correctly predicted the positive class. |
The classification table is typically organized in a matrix format, where each cell represents the count or frequency of predictions falling into a specific category. It allows you to assess the model's performance across different classes and understand the types of errors it makes.
Additionally, based on the classification table, several performance metrics can be derived, such as accuracy, precision, recall (sensitivity), specificity, and F1 score, which provide more detailed insights into the model's effectiveness in classifying instances.
By carefully analysing the classification table outcome, you can identify patterns, assess the strengths and weaknesses of the model, and make informed decisions about its performance and potential improvements.
Accuracy is the proportion of correct predictions (both true positives and true negatives) to the total number of instances. It provides an overall measure of the model's correctness.
Accuracy = (TP + TN) / (TP + FP + TN + FN)
Precision is the proportion of true positive predictions to the total number of positive predictions. Precision measures the model's ability to correctly identify positive instances.
Precision = TP / (TP + FP)
Recall (sensitivity is the proportion of true positive predictions to the total number of actual positive instances. Recall measures the model's ability to correctly identify all positive instances.
Recall = TP / (TP + FN)
F1 Score
The F1 score is a measure of a model's accuracy that combines both precision and recall into a single metric. It is the harmonic mean of precision and recall and provides a balanced assessment of a model's performance. The F1 score ranges from 0 to 1, where a value of 1 indicates perfect precision and recall.
Confusion matrix
A confusion matrix is a table that summarizes the performance of a classification model by comparing the predicted labels to the actual labels of a dataset. It provides a comprehensive view of the model's true positive, true negative, false positive, and false negative predictions. The matrix is typically organized in a square format, where the rows represent the actual labels, and the columns represent the predicted labels.
The confusion matrix helps in understanding the types and quantities of errors made by the model. It can be used to calculate various performance metrics such as accuracy, precision, recall, and F1 score. By analysing the confusion matrix, one can gain insights into the strengths and weaknesses of a classification model and make improvements accordingly.
In the context of evaluating classification models, there are different types of averages that can be used to calculate performance metrics. These averages are used to summarize the values obtained for different classes in a multi-class classification problem. The three common types of averages are:
- Micro average: The micro average calculates the performance metric by considering the total number of true positives, false positives, and false negatives across all classes. It treats the classification problem as a single aggregate result. This average is suitable when the dataset is imbalanced, and there is a significant difference in the number of samples in each class.
- Macro average: The macro average calculates the performance metric separately for each class and then takes the average across all classes. It treats each class equally, regardless of the number of samples in each class. This average is suitable when all classes are considered equally important, and the dataset is balanced.
- Weighted average: The weighted average is similar to the macro average but takes into account the number of samples in each class. It calculates the performance metric for each class and then weighs the average based on the number of samples in each class. This average is suitable when there is a class imbalance, and some classes have more significance than others.