Benchmark Metrics
Understanding available metrics and when to use them is crucial for effective model evaluation.
Classification Metrics
Accuracy
Definition: Percentage of correct predictions.
When to Use:
- Balanced datasets
- Equal importance of all classes
- General performance indicator
Interpretation:
- Higher is better (0-1 or 0-100%)
- 1.0 = perfect predictions
- 0.5 = random guessing (for binary classification)
Precision
Definition: Percentage of positive predictions that are correct.
When to Use:
- When false positives are costly
- Fraud detection
- Medical diagnosis
Interpretation:
- Higher is better
- Measures prediction reliability
- Trade-off with recall
Recall (Sensitivity)
Definition: Percentage of actual positives correctly identified.
When to Use:
- When false negatives are costly
- Disease detection
- Security screening
Interpretation:
- Higher is better
- Measures coverage of positive class
- Trade-off with precision
F1 Score
Definition: Harmonic mean of precision and recall.
When to Use:
- Balanced precision and recall
- Single metric for comparison
- Imbalanced datasets
Interpretation:
- Higher is better (0-1)
- Balances precision and recall
- Good for imbalanced classes
ROC AUC
Definition: Area under the ROC curve.
When to Use:
- Binary classification
- Ranking problems
- Threshold-independent evaluation
Interpretation:
- Higher is better (0-1)
- 1.0 = perfect classifier
- 0.5 = random classifier
Log Loss (Cross-Entropy)
Definition: Logarithmic loss measuring prediction confidence.
When to Use:
- Probabilistic predictions
- Confidence matters
- Multi-class classification
Interpretation:
- Lower is better
- Penalizes confident wrong predictions
- Sensitive to prediction probabilities
KS Statistic (Kolmogorov-Smirnov)
Definition: Maximum difference between cumulative distributions.
When to Use:
- Credit scoring
- Risk assessment
- Distribution comparison
Interpretation:
- Higher is better (0-1)
- Measures separation between classes
- Common in financial services
Gini Coefficient
Definition: Measure of inequality in predictions.
When to Use:
- Credit scoring
- Risk modeling
- Financial services
Interpretation:
- Higher is better (0-1)
- Related to ROC AUC
- Common in credit risk
Regression Metrics
MSE (Mean Squared Error)
Definition: Average of squared differences between predictions and actuals.
When to Use:
- General regression evaluation
- When large errors are costly
- Standard regression metric
Interpretation:
- Lower is better
- Penalizes large errors more
- Sensitive to outliers
MAE (Mean Absolute Error)
Definition: Average of absolute differences between predictions and actuals.
When to Use:
- When all errors are equally important
- Robust to outliers
- Interpretable error magnitude
Interpretation:
- Lower is better
- Same units as target variable
- Less sensitive to outliers than MSE
RMSE (Root Mean Squared Error)
Definition: Square root of MSE.
When to Use:
- Same units as target variable
- When large errors matter
- Standard regression metric
Interpretation:
- Lower is better
- Same units as target
- More interpretable than MSE
R-squared (R²)
Definition: Proportion of variance explained by the model.
When to Use:
- Model fit evaluation
- Variance explanation
- Standard regression metric
Interpretation:
- Higher is better (can be negative)
- 1.0 = perfect fit
- 0.0 = no better than mean
When to Use Each Metric
For Classification:
- General: Accuracy, F1 Score
- Imbalanced Data: F1, ROC AUC
- Cost-Sensitive: Precision, Recall
- Probabilistic: Log Loss, ROC AUC
- Financial: KS, Gini
For Regression:
- General: RMSE, R²
- Outlier-Sensitive: MAE
- Large Errors Important: MSE, RMSE
- Variance Explanation: R²
Interpreting Results
Good Performance Indicators:
- High accuracy/F1 for classification
- Low MSE/MAE for regression
- High R² for regression
- Balanced precision/recall
Red Flags:
- Very low metrics
- Large gaps between training and validation
- Inconsistent results
- Metrics not improving
Next Steps
- Learn about Creation and Management to create benchmarks
- Check Operations to execute benchmarks