Set up disciplined ML experiment tracking for reproducibility and comparison. Use when starting a new model project, onboarding a team, or when results cannot be reproduced.
Click to play with sound.
---
name: Experiment Tracking
description: Establishes a disciplined experiment tracking setup covering logging, artifact versioning, and reproducibility standards. Apply when bootstrapping a new ML project, onboarding a team to a tracking tool, or investigating why a result cannot be reproduced.
---
# Experiment Tracking
An experiment no one can reproduce is a result no one can trust. Disciplined tracking is not overhead — it is the minimum viable scientific practice for ML.
## 1. What to Log on Every Run
Inconsistent logging is as bad as no logging.
- Hyperparameters: every value, including defaults — do not rely on code to reconstruct them
- Dataset: name, version or hash, split sizes, and any sampling applied
- Code: git commit SHA; fail the run if the working tree is dirty unless explicitly allowed
- Environment: Python version, key library versions (framework, numpy, pandas minimum)
- Metrics: train, validation, and test values; log per-epoch curves for iterative models
- Artifacts: model checkpoint path, preprocessor path, and evaluation report path
## 2. Experiment Naming and Organization
Naming is the index — garbage names make the tracker useless.
- Format: <project>/<hypothesis>/<variant> (e.g., churn/feature-selection/drop-low-variance)… install to load the full skillSign in to rate and review this skill.
No reviews yet. Be the first to review this skill.