Dataset Usage Guide
Complete step-by-step guide to creating datasets, best practices, and troubleshooting common issues.
Complete Step-by-Step Guide
Complete guide to creating and using datasets:
Step 1: Create Dataset
- Navigate to Datasets section
- Click "New Dataset"
- Enter basic information
- Select structure and one or more data domains (required)
Step 2: File Analysis
- Connect data source
- Upload or select files
- Run file analysis
- Review analysis results
Step 3: Modeling Configuration
- Select feature columns
- Select target columns
- Exclude unwanted features
- Configure timestamp (if EVENT_BASED)
Step 4: Data Partitioning
- Choose partition method
- Configure training/validation split
- Set date ranges (if applicable)
- Review partition configuration
Step 5: Process Dataset
- Review all configurations
- Click "Process Dataset"
- Monitor processing progress
- Wait for READY status
Step 6: Use in Training
- Dataset is now ready
- Use in training jobs
- Monitor dataset usage
- Track dataset performance
Best Practices
Data Quality:
- Ensure data quality before creating datasets
- Clean and preprocess data as needed
- Validate data formats and types
- Check for missing values and outliers
Feature Selection:
- Select relevant features
- Avoid data leakage
- Consider feature interactions
- Balance feature count and quality
Partitioning:
- Use appropriate partition strategy
- Maintain temporal order for time-series
- Ensure sufficient data in each partition
- Avoid data leakage between partitions
Organization:
- Use descriptive names
- Add clear descriptions
- Organize by domain (can assign multiple domains)
- Track dataset versions
- Use consistent domain naming across datasets
Performance:
- Optimize dataset size
- Use efficient data formats
- Consider data sampling for large datasets
- Monitor processing times
Common Troubleshooting
Issue: Dataset Creation Fails
- Symptom: Cannot create dataset
- Possible Causes:
- Invalid dataset name
- Missing required fields
- Invalid file format
- Solutions:
- Check dataset name requirements
- Verify all required fields
- Check file format compatibility
Issue: File Analysis Fails
- Symptom: File analysis does not complete
- Possible Causes:
- Invalid file format
- Corrupted files
- Insufficient permissions
- Solutions:
- Verify file format
- Check file integrity
- Verify file permissions
- Check file size limits
Issue: Processing Fails
- Symptom: Dataset processing fails
- Possible Causes:
- Invalid configuration
- Data quality issues
- Insufficient resources
- Solutions:
- Review configuration
- Check data quality
- Verify resource availability
- Check processing logs
Issue: Dataset Not Ready
- Symptom: Dataset status not READY
- Possible Causes:
- Incomplete configuration
- Processing not completed
- Processing errors
- Solutions:
- Complete all configurations
- Wait for processing
- Check for errors
- Review dataset status
Issue: Cannot Use in Training
- Symptom: Dataset not available for training
- Possible Causes:
- Dataset not READY
- Missing features or targets
- Dataset in use elsewhere
- Solutions:
- Ensure dataset is READY
- Verify feature/target selection
- Check dataset dependencies
- Review training requirements
Next Steps
- Learn about Training to use datasets for model training
- Explore Connectors to connect data sources
- Check Clusters to understand compute infrastructure