Skip to main content

Creating and Managing Benchmarks

Learn how to create benchmarks, select datasets, configure metrics, and manage benchmark configurations.

Creating a Benchmark

To create a new benchmark:

  1. Navigate to Benchmark: Go to the Benchmark section
  2. Click "New Benchmark": Start the benchmark creation process
  3. Basic Information: Enter benchmark name and description
  4. Select Task Type: Choose task type (Classification, Regression, Text Generation)
  5. Select Datasets: Choose datasets for evaluation
  6. Configure Metrics: Select metrics to compute
  7. Save Benchmark: Save the benchmark configuration

Benchmark Requirements:

  • Name: Unique name for the benchmark
  • Description: Description of the benchmark
  • Task Type: Must select a task type
  • Datasets: At least one dataset required
  • Metrics: At least one metric required

Dataset Selection

Select datasets for benchmark evaluation:

Dataset Selection:

  • Available Datasets: List of READY datasets
  • Multiple Datasets: Can select multiple datasets
  • Consistency: Ensure datasets remain consistent
  • Validation: Validate dataset compatibility

Dataset Requirements:

  • Datasets must be READY
  • Datasets should be consistent
  • Appropriate for task type
  • Sufficient data for evaluation

Metric Configuration

Configure metrics for the benchmark:

Metric Selection:

  • Task-Specific: Select metrics appropriate for task type
  • Multiple Metrics: Can select multiple metrics
  • Business Relevance: Choose metrics relevant to business
  • Comprehensive: Use multiple metrics for full evaluation

Metric Configuration:

  • Classification Metrics: Accuracy, Precision, Recall, F1, ROC AUC, etc.
  • Regression Metrics: MSE, MAE, RMSE, R²
  • Custom Metrics: Support for custom metrics (future)

Task Type Selection

Select the appropriate task type:

Task Type Options:

  • Classification: For discrete category prediction
  • Regression: For continuous value prediction
  • Text Generation: For text sequence generation

Task Type Considerations:

  • Data Type: Match task type to data
  • Business Goal: Align with business objectives
  • Model Output: Match model output type
  • Evaluation Needs: Consider evaluation requirements

Editing Benchmarks

Edit existing benchmark configurations:

Edit Process:

  1. Select Benchmark: Choose benchmark to edit
  2. Click Edit: Start editing process
  3. Modify Configuration: Update benchmark settings
  4. Validate: Validate new configuration
  5. Save: Save changes

Editable Settings:

  • Name: Benchmark name
  • Description: Benchmark description
  • Datasets: Add or remove datasets
  • Metrics: Add or remove metrics
  • Task Type: Cannot change after creation

Edit Considerations:

  • Consistency: Maintain dataset consistency
  • Impact: Understand impact of changes
  • Validation: Always validate after changes

Deleting Benchmarks

Delete benchmarks that are no longer needed:

Deletion Process:

  1. Select Benchmark: Choose benchmark to delete
  2. Click Delete: Start deletion process
  3. Confirm Deletion: Confirm you want to delete
  4. Wait for Completion: Wait for deletion to complete

Deletion Considerations:

  • Active Use: Check if benchmark is in use
  • Dependencies: Check for evaluation dependencies
  • Permanent: Deletion is permanent
  • History: Evaluation history may be retained

Next Steps