Leaderboard Usage Guide
Complete step-by-step guide to using the leaderboard, interpreting rankings, and selecting the best models.
Complete Step-by-Step Guide
Complete guide to using the leaderboard:
Step 1: Access Leaderboard
- Navigate to Leaderboard section
- View overview statistics
- Review current evaluations
Step 2: Choose View
- Select Table View for comprehensive view
- Select List View for detailed exploration
- Toggle between views as needed
Step 3: Apply Filters
- Filter by benchmarks if needed
- Filter by trainings if needed
- Filter by date range if needed
- Search by name if needed
Step 4: Sort Results
- Select metric to sort by
- Choose sort direction
- Review sorted results
Step 5: Create Evaluations
- Click "New Evaluation"
- Select checkpoint to evaluate
- Select benchmark(s)
- Run evaluation
Step 6: Compare Models
- Select models to compare
- Review comparison metrics
- Analyze differences
- Select best model
Step 7: Select Best Model
- Review rankings
- Compare metrics
- Consider business requirements
- Select model for deployment
Interpreting Rankings
Understanding leaderboard rankings:
Ranking Factors:
- Metric Value: Based on selected metric
- Sort Direction: Ascending or descending
- Benchmark: Rankings per benchmark
- Aggregate: Overall rankings across benchmarks
Ranking Interpretation:
- Higher Rank: Better performance (for most metrics)
- Metric Context: Consider metric meaning
- Benchmark Context: Consider benchmark characteristics
- Trends: Consider performance trends
Ranking Considerations:
- Single Metric: Rankings based on one metric
- Multiple Metrics: Consider all metrics, not just ranking
- Business Goals: Align with business objectives
- Consistency: Check consistency across benchmarks
Choosing the Best Model
Guidelines for selecting the best model:
Selection Criteria:
- Performance Metrics: High performance on key metrics
- Consistency: Consistent performance across benchmarks
- Business Alignment: Aligns with business goals
- Stability: Stable and reliable performance
Selection Process:
- Review Rankings: Review model rankings
- Check Metrics: Check all relevant metrics
- Compare Models: Compare top candidates
- Consider Context: Consider business context
- Validate: Validate model performance
- Select: Select best model
Selection Factors:
- Primary Metric: Primary metric for your use case
- Secondary Metrics: Consider secondary metrics
- Trade-offs: Understand metric trade-offs
- Requirements: Meet business requirements
Best Practices:
- Don't rely solely on rankings
- Consider multiple metrics
- Validate on test data
- Consider deployment requirements
Common Troubleshooting
Issue: No Evaluations Shown
- Symptom: Leaderboard is empty
- Possible Causes:
- No evaluations run yet
- Filters too restrictive
- No checkpoints evaluated
- Solutions:
- Run evaluations
- Check filters
- Verify checkpoints exist
Issue: Missing Metrics
- Symptom: Some metrics not displayed
- Possible Causes:
- Metrics not configured in benchmark
- Evaluation not completed
- Metric calculation error
- Solutions:
- Check benchmark configuration
- Verify evaluation completed
- Review evaluation logs
Issue: Rankings Don't Make Sense
- Symptom: Rankings seem incorrect
- Possible Causes:
- Wrong sort direction
- Metric interpretation issue
- Data inconsistency
- Solutions:
- Check sort direction
- Review metric definitions
- Verify data consistency
Issue: Can't Create Evaluation
- Symptom: Cannot create new evaluation
- Possible Causes:
- No checkpoints available
- No benchmarks available
- Resource constraints
- Solutions:
- Verify checkpoints exist
- Verify benchmarks exist
- Check resource availability
Best Practices
Leaderboard Best Practices:
- Regular Evaluation: Regularly evaluate checkpoints
- Consistent Benchmarks: Use consistent benchmarks
- Multiple Metrics: Consider multiple metrics
- Document Decisions: Document model selection decisions
Evaluation Best Practices:
- Evaluate All Checkpoints: Evaluate all relevant checkpoints
- Use Multiple Benchmarks: Evaluate against multiple benchmarks
- Track Trends: Track performance trends over time
- Validate Results: Validate evaluation results
Comparison Best Practices:
- Compare Fairly: Compare models fairly
- Consider Context: Consider business context
- Multiple Views: Use multiple views for comparison
- Document Findings: Document comparison findings
Next Steps
- Learn about Training to train models
- Explore Benchmark to create benchmarks
- Check Inference Server to deploy models