Skip to main content

Leaderboard Usage Guide

Complete step-by-step guide to using the leaderboard, interpreting rankings, and selecting the best models.

Complete Step-by-Step Guide

Complete guide to using the leaderboard:

Step 1: Access Leaderboard

  • Navigate to Leaderboard section
  • View overview statistics
  • Review current evaluations

Step 2: Choose View

  • Select Table View for comprehensive view
  • Select List View for detailed exploration
  • Toggle between views as needed

Step 3: Apply Filters

  • Filter by benchmarks if needed
  • Filter by trainings if needed
  • Filter by date range if needed
  • Search by name if needed

Step 4: Sort Results

  • Select metric to sort by
  • Choose sort direction
  • Review sorted results

Step 5: Create Evaluations

  • Click "New Evaluation"
  • Select checkpoint to evaluate
  • Select benchmark(s)
  • Run evaluation

Step 6: Compare Models

  • Select models to compare
  • Review comparison metrics
  • Analyze differences
  • Select best model

Step 7: Select Best Model

  • Review rankings
  • Compare metrics
  • Consider business requirements
  • Select model for deployment

Interpreting Rankings

Understanding leaderboard rankings:

Ranking Factors:

  • Metric Value: Based on selected metric
  • Sort Direction: Ascending or descending
  • Benchmark: Rankings per benchmark
  • Aggregate: Overall rankings across benchmarks

Ranking Interpretation:

  • Higher Rank: Better performance (for most metrics)
  • Metric Context: Consider metric meaning
  • Benchmark Context: Consider benchmark characteristics
  • Trends: Consider performance trends

Ranking Considerations:

  • Single Metric: Rankings based on one metric
  • Multiple Metrics: Consider all metrics, not just ranking
  • Business Goals: Align with business objectives
  • Consistency: Check consistency across benchmarks

Choosing the Best Model

Guidelines for selecting the best model:

Selection Criteria:

  • Performance Metrics: High performance on key metrics
  • Consistency: Consistent performance across benchmarks
  • Business Alignment: Aligns with business goals
  • Stability: Stable and reliable performance

Selection Process:

  1. Review Rankings: Review model rankings
  2. Check Metrics: Check all relevant metrics
  3. Compare Models: Compare top candidates
  4. Consider Context: Consider business context
  5. Validate: Validate model performance
  6. Select: Select best model

Selection Factors:

  • Primary Metric: Primary metric for your use case
  • Secondary Metrics: Consider secondary metrics
  • Trade-offs: Understand metric trade-offs
  • Requirements: Meet business requirements

Best Practices:

  • Don't rely solely on rankings
  • Consider multiple metrics
  • Validate on test data
  • Consider deployment requirements

Common Troubleshooting

Issue: No Evaluations Shown

  • Symptom: Leaderboard is empty
  • Possible Causes:
    • No evaluations run yet
    • Filters too restrictive
    • No checkpoints evaluated
  • Solutions:
    • Run evaluations
    • Check filters
    • Verify checkpoints exist

Issue: Missing Metrics

  • Symptom: Some metrics not displayed
  • Possible Causes:
    • Metrics not configured in benchmark
    • Evaluation not completed
    • Metric calculation error
  • Solutions:
    • Check benchmark configuration
    • Verify evaluation completed
    • Review evaluation logs

Issue: Rankings Don't Make Sense

  • Symptom: Rankings seem incorrect
  • Possible Causes:
    • Wrong sort direction
    • Metric interpretation issue
    • Data inconsistency
  • Solutions:
    • Check sort direction
    • Review metric definitions
    • Verify data consistency

Issue: Can't Create Evaluation

  • Symptom: Cannot create new evaluation
  • Possible Causes:
    • No checkpoints available
    • No benchmarks available
    • Resource constraints
  • Solutions:
    • Verify checkpoints exist
    • Verify benchmarks exist
    • Check resource availability

Best Practices

Leaderboard Best Practices:

  • Regular Evaluation: Regularly evaluate checkpoints
  • Consistent Benchmarks: Use consistent benchmarks
  • Multiple Metrics: Consider multiple metrics
  • Document Decisions: Document model selection decisions

Evaluation Best Practices:

  • Evaluate All Checkpoints: Evaluate all relevant checkpoints
  • Use Multiple Benchmarks: Evaluate against multiple benchmarks
  • Track Trends: Track performance trends over time
  • Validate Results: Validate evaluation results

Comparison Best Practices:

  • Compare Fairly: Compare models fairly
  • Consider Context: Consider business context
  • Multiple Views: Use multiple views for comparison
  • Document Findings: Document comparison findings

Next Steps