Skip to main content

Training Core Concepts

Understanding the fundamental concepts of training is essential for effectively training LDM models.

Training Architecture (NeoLDM)

The NeoLDM architecture is optimized for training on enterprise data:

Architecture Components:

  • Feature Encoder: Encodes features into unified representations
  • Cross-Network: Captures feature interactions
  • Deep Network: Deep neural network for pattern recognition
  • Output Layer: Task-specific output layers

Training Features:

  • Distributed Training: Train across multiple GPUs and nodes
  • Gradient Accumulation: Handle large batch sizes efficiently
  • Mixed Precision: Use mixed precision for faster convergence
  • Checkpointing: Automatic checkpointing for fault tolerance

Data Models

Data models represent the structure and configuration of your training:

Data Model Components:

  • Model Type: Type of model (NeoLDM, Transformer)
  • Model Size: Size configuration (Small, Medium, Large)
  • Architecture Config: YAML configuration for architecture
  • Hyperparameters: Training hyperparameters

Model Configuration:

  • Architecture Selection: Choose between NeoLDM and Transformer
  • Size Selection: Select model size based on requirements
  • Custom Configuration: Customize architecture via YAML
  • GPU Configuration: Configure GPU allocation

Training Configuration

Configure training parameters:

Configuration Options:

  • Dataset Selection: Choose datasets for training
  • Feature/Target Selection: Select features and targets
  • Data Splitting: Configure training/validation splits
  • Architecture: Configure model architecture
  • Hyperparameters: Set training hyperparameters
  • GPU Count: Allocate GPU resources

Configuration Files:

  • YAML Configuration: Architecture configuration in YAML format
  • Training Parameters: Epochs, batch size, learning rate
  • Optimization: Optimizer and learning rate schedule
  • Regularization: Dropout, weight decay, etc.

Checkpoints

Checkpoints save model state during training:

Checkpoint Features:

  • Automatic Saving: Automatic checkpointing at intervals
  • Model State: Complete model state and weights
  • Training State: Optimizer state and training progress
  • Versioning: Track checkpoint versions

Checkpoint Usage:

  • Resume Training: Resume from checkpoints
  • Model Evaluation: Evaluate checkpoints
  • Model Deployment: Deploy best checkpoints
  • Experiment Tracking: Track training experiments

Training vs Validation Split

Split data for training and validation:

Split Strategies:

  • Percentage Split: Split by percentage (e.g., 80/20)
  • Date Range Split: Split by date ranges
  • Stratified Split: Maintain class distribution

Split Considerations:

  • Temporal Order: Maintain temporal order for time-series
  • Data Leakage: Avoid data leakage between splits
  • Size Balance: Ensure sufficient data in each split
  • Reproducibility: Use seeds for reproducible splits

Next Steps