Complete Machine Learning Workflow

This guide walks you through the complete end-to-end workflow from connecting your data sources to deploying models in production.

Workflow Overview

The complete ML workflow in NeoSpace consists of six main stages:

Connect Data Sources → 2. Create Datasets → 3. Train Models → 4. Evaluate Models → 5. Compare Performance → 6. Deploy Models

Each stage builds on the previous one, creating a seamless pipeline from raw data to production models.

Stage 1: Connect Data Sources

Objective: Connect your data sources to the NeoSpace platform.

Steps:

Navigate to Integration (Connectors) section
Click "New Integration"
Select connector type (S3, Oracle, etc.)
Configure connection details
Enter credentials securely
Validate connection
Save connector

Outcome: Your data sources are connected and accessible.

Next: Proceed to create datasets from connected data sources.

Stage 2: Create Datasets

Objective: Create and prepare datasets for training.

Steps:

Navigate to Datasets section
Click "New Dataset"
Enter dataset information (name, structure, domain)
Select data files from connectors
Run File Analysis to understand data structure
Configure Modeling:
- Select feature columns
- Select target columns
- Exclude unwanted features
Configure Data Partitioning:
- Choose split method (percentage or date range)
- Set training/validation splits
Process Dataset to make it READY

Outcome: Dataset is ready for training.

Next: Use dataset in training jobs.

Stage 3: Train Models

Objective: Train LDM models on your datasets.

Steps:

Navigate to Training section
Click "New Training"
Enter training name and description
Select Datasets:
- Choose datasets for training
- Configure features and targets per dataset
Configure Data Split:
- Set training/validation split
- Choose split method
Configure Architecture:
- Select model architecture (NeoLDM or Transformer)
- Configure model size
- Customize YAML configuration if needed
- Set GPU count
Start Training
Monitor Training:
- View training logs
- Track training metrics
- Monitor checkpoints
- Review training summary

Outcome: Trained model with checkpoints available.

Next: Evaluate models using benchmarks.

Stage 4: Evaluate Models

Objective: Evaluate trained models using benchmarks.

Steps:

Navigate to Benchmark section
Create Benchmark (if not exists):
- Enter benchmark name and description
- Select task type (Classification, Regression, etc.)
- Select datasets for evaluation
- Configure metrics
Navigate to Leaderboard section
Click "New Evaluation"
Select Checkpoint: Choose checkpoint to evaluate
Select Benchmark: Choose benchmark(s) to evaluate against
Run Evaluation
Monitor Evaluation: Track evaluation progress
View Results: Results appear in leaderboard

Outcome: Model performance evaluated against benchmarks.

Next: Compare models and select best performers.

Stage 5: Compare Performance

Objective: Compare models and identify best performers.

Steps:

Navigate to Leaderboard section
Review Stats: Check overview statistics
Apply Filters (if needed):
- Filter by benchmarks
- Filter by trainings
- Filter by date range
- Search by name
Sort Results: Sort by relevant metrics
Compare Models:
- Select models to compare
- Review metrics side-by-side
- Analyze performance differences
Select Best Model:
- Review rankings
- Consider all metrics
- Align with business goals
- Select best checkpoint

Outcome: Best performing model identified.

Next: Deploy model to inference server.

Stage 6: Deploy Models

Objective: Deploy best model to production.

Steps:

Navigate to Inference Server section
Click "Deploy Model"
Select Model: Choose the best checkpoint
Configure Deployment:
- Set instance count
- Configure GPU allocation
- Set resource limits
Deploy Model: Start deployment
Monitor Deployment: Track deployment progress
Verify Serving: Test that model is serving correctly
Monitor Performance:
- Track inference latency
- Monitor throughput
- Check system health
Scale if Needed: Adjust instances based on demand

Outcome: Model deployed and serving predictions in production.

Next: Monitor and optimize production model.

Component Integration

How components work together:

Data Flow:

Connectors → Provide data sources
Datasets → Organize and prepare data
Training → Train models on datasets
Benchmark → Evaluate models
Leaderboard → Compare and select models
Inference Server → Deploy and serve models

Integration Points:

Datasets use data from Connectors
Training uses READY Datasets
Benchmarks evaluate Training checkpoints
Leaderboard aggregates Benchmark results
Inference Server deploys selected checkpoints

Best Practices Workflow

Recommended practices for the complete workflow:

Data Preparation:

Ensure data quality before creating datasets
Use appropriate dataset structures
Configure proper data splits
Validate dataset health

Model Training:

Start with small models for experimentation
Monitor training closely
Save checkpoints regularly
Track training experiments

Model Evaluation:

Use consistent benchmarks
Evaluate all relevant checkpoints
Consider multiple metrics
Track evaluation history

Model Selection:

Don't rely solely on rankings
Consider business requirements
Validate on test data
Document selection decisions

Model Deployment:

Start with minimal instances
Monitor closely after deployment
Test thoroughly before production
Have rollback plan ready

Workflow Optimization

Tips for optimizing your workflow:

Efficiency:

Reuse datasets across trainings
Use consistent benchmarks
Automate evaluation where possible
Track experiments systematically

Quality:

Ensure data quality at each stage
Validate configurations before proceeding
Monitor performance at each stage
Review and optimize continuously

Collaboration:

Document decisions and configurations
Share datasets and benchmarks
Collaborate on model development
Review and learn from experiments

Next Steps

Explore Use Case Examples for specific scenarios
Review Integration Patterns for advanced integrations

Workflow Overview​

Stage 1: Connect Data Sources​

Stage 2: Create Datasets​

Stage 3: Train Models​

Stage 4: Evaluate Models​

Stage 5: Compare Performance​

Stage 6: Deploy Models​

Component Integration​

Best Practices Workflow​

Workflow Optimization​

Next Steps​

Workflow Overview

Stage 1: Connect Data Sources

Stage 2: Create Datasets

Stage 3: Train Models

Stage 4: Evaluate Models

Stage 5: Compare Performance

Stage 6: Deploy Models

Component Integration

Best Practices Workflow

Workflow Optimization

Next Steps