Skip to main content

Benchmark Usage Guide

Complete step-by-step guide to creating and using benchmarks, best practices, and troubleshooting.

Complete Step-by-Step Guide

Complete guide to creating and using benchmarks:

Step 1: Prepare Datasets

  • Ensure datasets are READY
  • Verify datasets are appropriate for evaluation
  • Check dataset consistency

Step 2: Create Benchmark

  • Navigate to Benchmark section
  • Click "New Benchmark"
  • Enter benchmark name and description

Step 3: Configure Benchmark

  • Select task type (Classification, Regression, etc.)
  • Select datasets for evaluation
  • Configure metrics to compute

Step 4: Save Benchmark

  • Review all configurations
  • Save benchmark
  • Verify benchmark is created

Step 5: Execute Benchmark

  • Select benchmark to execute
  • Choose model/checkpoint to evaluate
  • Start evaluation
  • Monitor progress

Step 6: Analyze Results

  • View evaluation results
  • Compare with other models
  • Identify best performing model
  • Track performance trends

Choosing Appropriate Metrics

Guidelines for selecting metrics:

For Classification:

  • General Purpose: Accuracy, F1 Score
  • Imbalanced Classes: F1, ROC AUC
  • Cost-Sensitive: Precision, Recall based on costs
  • Probabilistic: Log Loss, ROC AUC
  • Financial Services: KS, Gini Coefficient

For Regression:

  • General Purpose: RMSE, R²
  • Outlier Robust: MAE
  • Large Errors Important: MSE, RMSE
  • Variance Explanation: R²

Best Practices:

  • Use multiple metrics for comprehensive evaluation
  • Choose metrics relevant to business goals
  • Consider metric trade-offs
  • Understand metric interpretations

Best Practices

Benchmark Creation:

  • Use descriptive names
  • Document benchmark purpose
  • Ensure dataset consistency
  • Select appropriate metrics

Benchmark Execution:

  • Evaluate on consistent datasets
  • Compare models fairly
  • Track evaluation history
  • Document evaluation results

Result Analysis:

  • Compare multiple models
  • Consider metric trade-offs
  • Look for consistent patterns
  • Validate results

Consistency:

  • Maintain dataset consistency
  • Track dataset versions
  • Document changes
  • Validate consistency regularly

Common Troubleshooting

Issue: Benchmark Creation Fails

  • Symptom: Cannot create benchmark
  • Possible Causes:
    • Invalid configuration
    • Missing required fields
    • Dataset issues
  • Solutions:
    • Review configuration
    • Verify all required fields
    • Check dataset status

Issue: Evaluation Fails

  • Symptom: Benchmark evaluation fails
  • Possible Causes:
    • Model compatibility issues
    • Dataset issues
    • Resource constraints
  • Solutions:
    • Check model compatibility
    • Verify dataset status
    • Check resource availability

Issue: Inconsistent Results

  • Symptom: Results vary between evaluations
  • Possible Causes:
    • Dataset changes
    • Model changes
    • Evaluation issues
  • Solutions:
    • Check dataset consistency
    • Verify model versions
    • Review evaluation process

Issue: Metrics Not Available

  • Symptom: Some metrics not computed
  • Possible Causes:
    • Task type mismatch
    • Metric not supported
    • Configuration issues
  • Solutions:
    • Verify task type
    • Check metric support
    • Review configuration

Next Steps