Leaderboard Usage Guide

Complete step-by-step guide to using the leaderboard, interpreting rankings, and selecting the best models.

Complete Step-by-Step Guide

Complete guide to using the leaderboard:

Step 1: Access Leaderboard

Navigate to Leaderboard section
View overview statistics
Review current evaluations

Step 2: Choose View

Select Table View for comprehensive view
Select List View for detailed exploration
Toggle between views as needed

Step 3: Apply Filters

Filter by benchmarks if needed
Filter by trainings if needed
Filter by date range if needed
Search by name if needed

Step 4: Sort Results

Select metric to sort by
Choose sort direction
Review sorted results

Step 5: Create Evaluations

Click "New Evaluation"
Select checkpoint to evaluate
Select benchmark(s)
Run evaluation

Step 6: Compare Models

Select models to compare
Review comparison metrics
Analyze differences
Select best model

Step 7: Select Best Model

Review rankings
Compare metrics
Consider business requirements
Select model for deployment

Interpreting Rankings

Understanding leaderboard rankings:

Ranking Factors:

Metric Value: Based on selected metric
Sort Direction: Ascending or descending
Benchmark: Rankings per benchmark
Aggregate: Overall rankings across benchmarks

Ranking Interpretation:

Higher Rank: Better performance (for most metrics)
Metric Context: Consider metric meaning
Benchmark Context: Consider benchmark characteristics
Trends: Consider performance trends

Ranking Considerations:

Single Metric: Rankings based on one metric
Multiple Metrics: Consider all metrics, not just ranking
Business Goals: Align with business objectives
Consistency: Check consistency across benchmarks

Choosing the Best Model

Guidelines for selecting the best model:

Selection Criteria:

Performance Metrics: High performance on key metrics
Consistency: Consistent performance across benchmarks
Business Alignment: Aligns with business goals
Stability: Stable and reliable performance

Selection Process:

Review Rankings: Review model rankings
Check Metrics: Check all relevant metrics
Compare Models: Compare top candidates
Consider Context: Consider business context
Validate: Validate model performance
Select: Select best model

Selection Factors:

Primary Metric: Primary metric for your use case
Secondary Metrics: Consider secondary metrics
Trade-offs: Understand metric trade-offs
Requirements: Meet business requirements

Best Practices:

Don't rely solely on rankings
Consider multiple metrics
Validate on test data
Consider deployment requirements

Common Troubleshooting

Issue: No Evaluations Shown

Symptom: Leaderboard is empty
Possible Causes:
- No evaluations run yet
- Filters too restrictive
- No checkpoints evaluated
Solutions:
- Run evaluations
- Check filters
- Verify checkpoints exist

Issue: Missing Metrics

Symptom: Some metrics not displayed
Possible Causes:
- Metrics not configured in benchmark
- Evaluation not completed
- Metric calculation error
Solutions:
- Check benchmark configuration
- Verify evaluation completed
- Review evaluation logs

Issue: Rankings Don't Make Sense

Symptom: Rankings seem incorrect
Possible Causes:
- Wrong sort direction
- Metric interpretation issue
- Data inconsistency
Solutions:
- Check sort direction
- Review metric definitions
- Verify data consistency

Issue: Can't Create Evaluation

Symptom: Cannot create new evaluation
Possible Causes:
- No checkpoints available
- No benchmarks available
- Resource constraints
Solutions:
- Verify checkpoints exist
- Verify benchmarks exist
- Check resource availability

Best Practices

Leaderboard Best Practices:

Regular Evaluation: Regularly evaluate checkpoints
Consistent Benchmarks: Use consistent benchmarks
Multiple Metrics: Consider multiple metrics
Document Decisions: Document model selection decisions

Evaluation Best Practices:

Evaluate All Checkpoints: Evaluate all relevant checkpoints
Use Multiple Benchmarks: Evaluate against multiple benchmarks
Track Trends: Track performance trends over time
Validate Results: Validate evaluation results

Comparison Best Practices:

Compare Fairly: Compare models fairly
Consider Context: Consider business context
Multiple Views: Use multiple views for comparison
Document Findings: Document comparison findings

Next Steps

Learn about Training to train models
Explore Benchmark to create benchmarks
Check Inference Server to deploy models

Complete Step-by-Step Guide​

Interpreting Rankings​

Choosing the Best Model​

Common Troubleshooting​

Best Practices​

Next Steps​