Skip to main content

Training Operations

Learn how to view training jobs, analyze results, use checkpoints, and manage multiple training jobs.

Viewing Training Jobs

View and manage training jobs:

Training List:

  • View all training jobs
  • Filter by status, date, or model
  • Search by name
  • Sort by various criteria

Training Information:

  • Training Name: Name of the training
  • Status: Current training status
  • Progress: Training progress percentage
  • Duration: Training duration
  • Metrics: Current training metrics

Training Status:

  • Running: Training is in progress
  • Completed: Training completed successfully
  • Failed: Training failed
  • Stopped: Training was stopped

Analyzing Results

Analyze training results:

Result Analysis:

  • Metrics Comparison: Compare training vs validation metrics
  • Trend Analysis: Analyze metric trends over time
  • Performance Evaluation: Evaluate model performance
  • Error Analysis: Analyze prediction errors

Analysis Tools:

  • Metric Charts: Visualize metrics over time
  • Comparison Views: Compare different trainings
  • Performance Reports: Generate performance reports
  • Export Results: Export results for further analysis

Using Checkpoints

Use checkpoints for evaluation and deployment:

Checkpoint Usage:

  • Model Evaluation: Evaluate checkpoints on test data
  • Model Comparison: Compare different checkpoints
  • Best Checkpoint Selection: Select best performing checkpoint
  • Model Deployment: Deploy checkpoints to inference servers

Checkpoint Operations:

  • View Checkpoints: View all checkpoints for a training
  • Compare Metrics: Compare checkpoint metrics
  • Download: Download checkpoint files
  • Deploy: Deploy checkpoint to inference server

Managing Multiple Training Jobs

Manage multiple training jobs efficiently:

Management Features:

  • Bulk Operations: Select and manage multiple trainings
  • Status Monitoring: Monitor status of all trainings
  • Resource Management: Manage GPU allocation across trainings
  • Priority Management: Set training priorities

Best Practices:

  • Organize by Project: Group trainings by project
  • Use Descriptive Names: Use clear naming conventions
  • Monitor Resources: Monitor resource usage
  • Clean Up: Remove completed or failed trainings

Next Steps