Metrics, Evaluation & Reproducibility
- Prefer cross-validation; include variance/CI.
- Track dataset version and checksums per run.
- Log hyper-params, seeds, env versions.
- Use semantic versioning for app code/images.
- Consider experiment tracking (e.g., store metrics table + params JSON per run).