Skip to main content

Creating Training

Learn how to create training jobs, select datasets, configure features and targets, and set up training parameters.

Creating a New Training

To create a new training:

  1. Navigate to Training: Go to the Training section
  2. Click "New Training": Start the training creation process
  3. Basic Information: Enter training name and description
  4. Select Datasets: Choose datasets for training
  5. Configure Features/Targets: Select features and targets
  6. Configure Data Split: Set up training/validation split
  7. Configure Architecture: Set model architecture and hyperparameters
  8. Start Training: Start the training job

Training Requirements:

  • Name: Unique name for the training
  • Datasets: At least one READY dataset
  • Features: At least one feature selected
  • Targets: At least one target selected (for supervised learning)
  • Cluster: Active cluster available

Dataset Selection

Select datasets for training:

Dataset Selection:

  • Available Datasets: List of READY datasets
  • Multiple Datasets: Can select multiple datasets
  • Feature Datasets: Datasets with features only
  • Target Datasets: Datasets with targets

Dataset Configuration:

  • Feature Selection: Select features from each dataset
  • Target Selection: Select targets from datasets
  • Exclusion: Exclude unwanted features
  • Validation: Validate dataset compatibility

Feature and Target Configuration

Configure features and targets:

Feature Configuration:

  • Feature Selection: Choose features from datasets
  • Feature Types: Different feature types supported
  • Feature Engineering: Automatic feature engineering
  • Feature Validation: Validate feature selection

Target Configuration:

  • Target Selection: Choose target variables
  • Target Type: Classification, regression, etc.
  • Multiple Targets: Support for multiple targets
  • Target Validation: Validate target selection

Configuration Guidelines:

  • Select relevant features
  • Avoid data leakage
  • Ensure target is appropriate
  • Balance feature count

Data Splitting

Configure training/validation data splits:

Split Methods:

Percentage Split:

  • Specify training percentage (e.g., 80%)
  • Validation percentage automatically calculated
  • Simple and commonly used
  • Good for most use cases

Date Range Split:

  • Specify date ranges for training and validation
  • Maintains temporal ordering
  • Important for time-series data
  • Prevents data leakage

Split Configuration:

  • Training Split: Percentage or date range
  • Validation Split: Percentage or date range
  • Timestamp Column: For date-based splits
  • Random Seed: For reproducible splits

Architecture Configuration (YAML)

Configure model architecture via YAML:

YAML Configuration:

  • Model Architecture: Define model structure
  • Hyperparameters: Set training hyperparameters
  • Optimization: Configure optimizer settings
  • Regularization: Set regularization parameters

Configuration Options:

  • Model Type: NeoLDM or Transformer
  • Model Size: Small, Medium, Large
  • Layer Configuration: Number of layers, dimensions
  • Activation Functions: Activation function choices
  • Dropout: Dropout rates

Example Configuration:

model:
type: neoldm
dim_per_feat: 2
encoder_n_layers: 2
backbone_n_layers: 2
run:
epochs: 20
batch_size: 4096
learning_rate: 1e-5

GPU Count Configuration

Configure GPU allocation:

GPU Configuration:

  • GPU Count: Number of GPUs to use
  • GPU Allocation: Automatic or manual allocation
  • Resource Limits: GPU resource limits
  • Scaling: Scale based on workload

Configuration Guidelines:

  • Start with 1-2 GPUs for testing
  • Scale up for larger models
  • Consider cost vs. performance
  • Monitor GPU utilization

Next Steps