05-Supervised Learning Evaluation Metrics for Classification

## Confusion Matrix The confusion matrix is a table used to evaluate the performance of a classification algorithm. It compares the actual target values with those predicted by the model. ### Structure A confusion matrix for a binary classifier is a 2x2 matrix: | | Predicted Postive | Predicted Negative | | --------------- | ----------------- | ------------------ | | Actual Positive | TP | FN | | Actual Negative | FP | TN | - **Variables**: - $TP$: True Positives - Correctly predicted positive cases. - $FN$: False Negatives - Incorrectly predicted negative cases. - $FP$: False Positives - Incorrectly predicted positive cases. - $TN$: True Negatives - Correctly predicted negative cases. ### Python Implementation ```python from sklearn.metrics import confusion_matrix # Sample true labels and predicted labels y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0] y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0] # Confusion matrix cm = confusion_matrix(y_true, y_pred) print("Confusion Matrix:\n", cm)` ``` --- ## Accuracy Accuracy is the ratio of correctly predicted observations to the total observations. It is a simple and widely used metric for classification. ### Formula $Accuracy = \frac{TP + TN}{TP + TN + FP + FN}$ ### Python Implementation ```python from sklearn.metrics import accuracy_score # Calculate accuracy accuracy = accuracy_score(y_true, y_pred) print("Accuracy:", accuracy)` ``` --- ## Recall Recall (or Sensitivity) is the ratio of correctly predicted positive observations to the actual positives. It is useful for situations where detecting positive instances is more important. ### Formula $Recall= \frac{TP}{TP + FN}$ ### Python Implementation ```python from sklearn.metrics import recall_score # Calculate recall recall = recall_score(y_true, y_pred) print("Recall:", recall)` ``` --- ## Precision Precision is the ratio of correctly predicted positive observations to the total predicted positives. It is useful for situations where the cost of false positives is high. ### Formula $Precision = \frac{TP}{TP + FP}$ ### Python Implementation ```python from sklearn.metrics import precision_score # Calculate precision precision = precision_score(y_true, y_pred) print("Precision:", precision)` ``` --- ## F1-Score The F1-score is the harmonic mean of precision and recall. It provides a single metric that balances both concerns, especially useful in cases of imbalanced datasets. ### Formula $F1_-Score= 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$ ### Python Implementation ```python from sklearn.metrics import f1_score # Calculate F1-score f1 = f1_score(y_true, y_pred) print("F1-Score:", f1)` ``` --- ## Conclusion ### Summary of Metrics - **Confusion Matrix:** Provides a comprehensive breakdown of true positives, false negatives, false positives, and true negatives. - **Accuracy:** Measures the overall correctness of the model. - **Recall:** Focuses on the model’s ability to identify positive instances. - **Precision:** Emphasizes the accuracy of positive predictions. - **F1-Score:** Balances precision and recall, particularly useful for imbalanced datasets. ### Python Implementation for All Metrics ```python # Sample true labels and predicted labels y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0] y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0] # Confusion matrix cm = confusion_matrix(y_true, y_pred) print("Confusion Matrix:\n", cm) # Accuracy accuracy = accuracy_score(y_true, y_pred) print("Accuracy:", accuracy) # Recall recall = recall_score(y_true, y_pred) print("Recall:", recall) # Precision precision = precision_score(y_true, y_pred) print("Precision:", precision) # F1-Score f1 = f1_score(y_true, y_pred) print("F1-Score:", f1)` ``` Continue: [[06-Supervised Learning Advanced Regressors and Classifiers]]