Creating and Managing Benchmarks
Learn how to create benchmarks, select datasets, configure metrics, and manage benchmark configurations.
Creating a Benchmark
To create a new benchmark:
- Navigate to Benchmark: Go to the Benchmark section
- Click "New Benchmark": Start the benchmark creation process
- Basic Information: Enter benchmark name and description
- Select Task Type: Choose task type (Classification, Regression, Text Generation)
- Select Datasets: Choose datasets for evaluation
- Configure Metrics: Select metrics to compute
- Save Benchmark: Save the benchmark configuration
Benchmark Requirements:
- Name: Unique name for the benchmark
- Description: Description of the benchmark
- Task Type: Must select a task type
- Datasets: At least one dataset required
- Metrics: At least one metric required
Dataset Selection
Select datasets for benchmark evaluation:
Dataset Selection:
- Available Datasets: List of READY datasets
- Multiple Datasets: Can select multiple datasets
- Consistency: Ensure datasets remain consistent
- Validation: Validate dataset compatibility
Dataset Requirements:
- Datasets must be READY
- Datasets should be consistent
- Appropriate for task type
- Sufficient data for evaluation
Metric Configuration
Configure metrics for the benchmark:
Metric Selection:
- Task-Specific: Select metrics appropriate for task type
- Multiple Metrics: Can select multiple metrics
- Business Relevance: Choose metrics relevant to business
- Comprehensive: Use multiple metrics for full evaluation
Metric Configuration:
- Classification Metrics: Accuracy, Precision, Recall, F1, ROC AUC, etc.
- Regression Metrics: MSE, MAE, RMSE, R²
- Custom Metrics: Support for custom metrics (future)
Task Type Selection
Select the appropriate task type:
Task Type Options:
- Classification: For discrete category prediction
- Regression: For continuous value prediction
- Text Generation: For text sequence generation
Task Type Considerations:
- Data Type: Match task type to data
- Business Goal: Align with business objectives
- Model Output: Match model output type
- Evaluation Needs: Consider evaluation requirements
Editing Benchmarks
Edit existing benchmark configurations:
Edit Process:
- Select Benchmark: Choose benchmark to edit
- Click Edit: Start editing process
- Modify Configuration: Update benchmark settings
- Validate: Validate new configuration
- Save: Save changes
Editable Settings:
- Name: Benchmark name
- Description: Benchmark description
- Datasets: Add or remove datasets
- Metrics: Add or remove metrics
- Task Type: Cannot change after creation
Edit Considerations:
- Consistency: Maintain dataset consistency
- Impact: Understand impact of changes
- Validation: Always validate after changes
Deleting Benchmarks
Delete benchmarks that are no longer needed:
Deletion Process:
- Select Benchmark: Choose benchmark to delete
- Click Delete: Start deletion process
- Confirm Deletion: Confirm you want to delete
- Wait for Completion: Wait for deletion to complete
Deletion Considerations:
- Active Use: Check if benchmark is in use
- Dependencies: Check for evaluation dependencies
- Permanent: Deletion is permanent
- History: Evaluation history may be retained
Next Steps
- Learn about Operations to execute benchmarks
- Check Usage Guide for best practices