Benchmark Metrics

Understanding available metrics and when to use them is crucial for effective model evaluation.

Classification Metrics

Accuracy

Definition: Percentage of correct predictions.

When to Use:

Balanced datasets
Equal importance of all classes
General performance indicator

Interpretation:

Higher is better (0-1 or 0-100%)
1.0 = perfect predictions
0.5 = random guessing (for binary classification)

Precision

Definition: Percentage of positive predictions that are correct.

When to Use:

When false positives are costly
Fraud detection
Medical diagnosis

Interpretation:

Higher is better
Measures prediction reliability
Trade-off with recall

Recall (Sensitivity)

Definition: Percentage of actual positives correctly identified.

When to Use:

When false negatives are costly
Disease detection
Security screening

Interpretation:

Higher is better
Measures coverage of positive class
Trade-off with precision

F1 Score

Definition: Harmonic mean of precision and recall.

When to Use:

Balanced precision and recall
Single metric for comparison
Imbalanced datasets

Interpretation:

Higher is better (0-1)
Balances precision and recall
Good for imbalanced classes

ROC AUC

Definition: Area under the ROC curve.

When to Use:

Binary classification
Ranking problems
Threshold-independent evaluation

Interpretation:

Higher is better (0-1)
1.0 = perfect classifier
0.5 = random classifier

Log Loss (Cross-Entropy)

Definition: Logarithmic loss measuring prediction confidence.

When to Use:

Probabilistic predictions
Confidence matters
Multi-class classification

Interpretation:

Lower is better
Penalizes confident wrong predictions
Sensitive to prediction probabilities

KS Statistic (Kolmogorov-Smirnov)

Definition: Maximum difference between cumulative distributions.

When to Use:

Credit scoring
Risk assessment
Distribution comparison

Interpretation:

Higher is better (0-1)
Measures separation between classes
Common in financial services

Gini Coefficient

Definition: Measure of inequality in predictions.

When to Use:

Credit scoring
Risk modeling
Financial services

Interpretation:

Higher is better (0-1)
Related to ROC AUC
Common in credit risk

Regression Metrics

MSE (Mean Squared Error)

Definition: Average of squared differences between predictions and actuals.

When to Use:

General regression evaluation
When large errors are costly
Standard regression metric

Interpretation:

Lower is better
Penalizes large errors more
Sensitive to outliers

MAE (Mean Absolute Error)

Definition: Average of absolute differences between predictions and actuals.

When to Use:

When all errors are equally important
Robust to outliers
Interpretable error magnitude

Interpretation:

Lower is better
Same units as target variable
Less sensitive to outliers than MSE

RMSE (Root Mean Squared Error)

Definition: Square root of MSE.

When to Use:

Same units as target variable
When large errors matter
Standard regression metric

Interpretation:

Lower is better
Same units as target
More interpretable than MSE

R-squared (R²)

Definition: Proportion of variance explained by the model.

When to Use:

Model fit evaluation
Variance explanation
Standard regression metric

Interpretation:

Higher is better (can be negative)
1.0 = perfect fit
0.0 = no better than mean

When to Use Each Metric

For Classification:

General: Accuracy, F1 Score
Imbalanced Data: F1, ROC AUC
Cost-Sensitive: Precision, Recall
Probabilistic: Log Loss, ROC AUC
Financial: KS, Gini

For Regression:

General: RMSE, R²
Outlier-Sensitive: MAE
Large Errors Important: MSE, RMSE
Variance Explanation: R²

Interpreting Results

Good Performance Indicators:

High accuracy/F1 for classification
Low MSE/MAE for regression
High R² for regression
Balanced precision/recall

Red Flags:

Very low metrics
Large gaps between training and validation
Inconsistent results
Metrics not improving

Next Steps

Learn about Creation and Management to create benchmarks
Check Operations to execute benchmarks

Classification Metrics​

Accuracy​

Precision​

Recall (Sensitivity)​

F1 Score​

ROC AUC​

Log Loss (Cross-Entropy)​

KS Statistic (Kolmogorov-Smirnov)​

Gini Coefficient​

Regression Metrics​

MSE (Mean Squared Error)​

MAE (Mean Absolute Error)​

RMSE (Root Mean Squared Error)​

R-squared (R²)​

When to Use Each Metric​

Interpreting Results​

Next Steps​

Classification Metrics

Accuracy

Precision

Recall (Sensitivity)

F1 Score

ROC AUC

Log Loss (Cross-Entropy)

KS Statistic (Kolmogorov-Smirnov)

Gini Coefficient

Regression Metrics

MSE (Mean Squared Error)

MAE (Mean Absolute Error)

RMSE (Root Mean Squared Error)

R-squared (R²)

When to Use Each Metric

Interpreting Results

Next Steps