Benchmark Usage Guide
Complete step-by-step guide to creating and using benchmarks, best practices, and troubleshooting.
Complete Step-by-Step Guide
Complete guide to creating and using benchmarks:
Step 1: Prepare Datasets
- Ensure datasets are READY
- Verify datasets are appropriate for evaluation
- Check dataset consistency
Step 2: Create Benchmark
- Navigate to Benchmark section
- Click "New Benchmark"
- Enter benchmark name and description
Step 3: Configure Benchmark
- Select task type (Classification, Regression, etc.)
- Select datasets for evaluation
- Configure metrics to compute
Step 4: Save Benchmark
- Review all configurations
- Save benchmark
- Verify benchmark is created
Step 5: Execute Benchmark
- Select benchmark to execute
- Choose model/checkpoint to evaluate
- Start evaluation
- Monitor progress
Step 6: Analyze Results
- View evaluation results
- Compare with other models
- Identify best performing model
- Track performance trends
Choosing Appropriate Metrics
Guidelines for selecting metrics:
For Classification:
- General Purpose: Accuracy, F1 Score
- Imbalanced Classes: F1, ROC AUC
- Cost-Sensitive: Precision, Recall based on costs
- Probabilistic: Log Loss, ROC AUC
- Financial Services: KS, Gini Coefficient
For Regression:
- General Purpose: RMSE, R²
- Outlier Robust: MAE
- Large Errors Important: MSE, RMSE
- Variance Explanation: R²
Best Practices:
- Use multiple metrics for comprehensive evaluation
- Choose metrics relevant to business goals
- Consider metric trade-offs
- Understand metric interpretations
Best Practices
Benchmark Creation:
- Use descriptive names
- Document benchmark purpose
- Ensure dataset consistency
- Select appropriate metrics
Benchmark Execution:
- Evaluate on consistent datasets
- Compare models fairly
- Track evaluation history
- Document evaluation results
Result Analysis:
- Compare multiple models
- Consider metric trade-offs
- Look for consistent patterns
- Validate results
Consistency:
- Maintain dataset consistency
- Track dataset versions
- Document changes
- Validate consistency regularly
Common Troubleshooting
Issue: Benchmark Creation Fails
- Symptom: Cannot create benchmark
- Possible Causes:
- Invalid configuration
- Missing required fields
- Dataset issues
- Solutions:
- Review configuration
- Verify all required fields
- Check dataset status
Issue: Evaluation Fails
- Symptom: Benchmark evaluation fails
- Possible Causes:
- Model compatibility issues
- Dataset issues
- Resource constraints
- Solutions:
- Check model compatibility
- Verify dataset status
- Check resource availability
Issue: Inconsistent Results
- Symptom: Results vary between evaluations
- Possible Causes:
- Dataset changes
- Model changes
- Evaluation issues
- Solutions:
- Check dataset consistency
- Verify model versions
- Review evaluation process
Issue: Metrics Not Available
- Symptom: Some metrics not computed
- Possible Causes:
- Task type mismatch
- Metric not supported
- Configuration issues
- Solutions:
- Verify task type
- Check metric support
- Review configuration
Next Steps
- Learn about Leaderboard to compare models
- Explore Training to train models
- Check Datasets to prepare data