Leaderboard Core Concepts
Understanding how the leaderboard works, how checkpoints are evaluated, and how rankings are determined.
How Leaderboard Works
The Leaderboard aggregates evaluation results from benchmarks:
Leaderboard Process:
- Training: Models are trained and generate checkpoints
- Evaluation: Checkpoints are evaluated against benchmarks
- Aggregation: Evaluation results are aggregated in the leaderboard
- Ranking: Models are ranked by performance metrics
- Comparison: Compare models across different benchmarks
Key Components:
- Checkpoints: Model checkpoints from training runs
- Evaluations: Benchmark evaluation results
- Metrics: Performance metrics from evaluations
- Rankings: Model rankings based on metrics
Checkpoints and Evaluations
Understanding checkpoints and evaluations:
Checkpoints:
- Definition: Saved model states during training
- Creation: Automatically created during training
- Evaluation: Checkpoints are evaluated against benchmarks
- Selection: Best checkpoints can be selected for deployment
Evaluations:
- Definition: Benchmark evaluation results for checkpoints
- Execution: Evaluations run checkpoints against benchmarks
- Results: Evaluation results include metrics for each benchmark
- Comparison: Results enable model comparison
Evaluation Process:
- Select checkpoint to evaluate
- Select benchmark(s) to evaluate against
- Run evaluation
- Results appear in leaderboard
- Compare with other evaluations
Metrics and Rankings
How metrics and rankings work:
Metrics:
- Per Benchmark: Each benchmark provides specific metrics
- Aggregated: Metrics can be aggregated across benchmarks
- Comparable: Metrics enable fair model comparison
- Tracked: Metrics are tracked over time
Rankings:
- Metric-Based: Rankings based on selected metrics
- Direction: Ascending or descending based on metric type
- Best Checkpoint: Automatically identify best checkpoint per training
- Comparison: Compare rankings across different metrics
Ranking Methods:
- Single Metric: Rank by a single metric
- Average: Rank by average across multiple metrics
- Best Checkpoint Only: Show only best checkpoint per training
- Custom: Custom ranking configurations
Best Checkpoint Selection
Automatically identify best performing checkpoints:
Selection Criteria:
- Metric Value: Based on selected ranking metric
- Direction: Ascending (lower is better) or descending (higher is better)
- Per Training: Best checkpoint selected per training run
- Automatic: Automatic selection based on configuration
Selection Process:
- Select ranking metric
- Determine direction (asc/desc)
- Group by training
- Select best checkpoint per training
- Display in leaderboard
Use Cases:
- Model Comparison: Compare best models from each training
- Deployment Selection: Select best checkpoint for deployment
- Performance Tracking: Track best performance over time
- Experiment Analysis: Analyze best results from experiments
Next Steps
- Learn about Visualizations to see different views
- Explore Operations to use the leaderboard