MLFlow Initiative: Establishing Data Science Process Standards
Strategic Problem
As the first and only data scientist at Act-On Software, I identified a fundamental process gap: the organization lacked any systematic approach to machine learning experimentation and model management. Engineering and management had no understanding of data science workflows, and existing processes were designed entirely around software development rather than ML development.
The specific challenges I was facing:
- Running 20-30 experiments daily testing different hyperparameters and feature combinations with no systematic way to track results
- Models and training code stored in Bitbucket alongside application code, which is inappropriate for machine learning artifacts
- No centralized repository for trained models
- Manual, time-consuming model handoffs to engineering
- Difficulty identifying best-performing models and feature sets across hundreds of experiments
- No reproducible process for model development
Without intervention, this lack of process would undermine the quality and reliability of all ML work at the company.
Analysis and Strategic Approach
I evaluated the organization's needs against industry-standard ML development practices. The core insight: data science requires fundamentally different processes than software engineering. Version control systems like Bitbucket are designed for code versioning, not for managing machine learning experiments, models, or the complex relationships between hyperparameters, features, and performance metrics.
Selected MLFlow as the solution based on the following criteria:
- Industry-standard open-source platform specifically designed for ML lifecycle management
- Experiment tracking capabilities that log parameters, metrics, and artifacts systematically
- Model registry providing centralized storage with versioning and lineage tracking
- Model serving functionality that could standardize deployment interfaces
- Superior to having no process, which was the current state
The alternative was continuing with ad-hoc workflows that provided no confidence in model selection decisions and no systematic way to prove models had been properly tested.
Stakeholder Engagement and Advocacy
Management and engineering had no frame of reference for data science processes. I wrote a white paper documenting:
- The distinction between experiment tracking and version control
- Why Bitbucket is inappropriate for storing machine learning models
- The scale of the tracking problem: 20-30 experiments per day generating hundreds of model variants that needed systematic comparison
- How MLFlow would provide the tools to simplify experiment management and save time
- Process improvement benefits: reproducibility, confidence in model selection, systematic testing documentation
Presented the white paper to management to get approval for the initiative.
Implementation required ops to configure and maintain an MLFlow server. Ops was heavily committed to other initiatives and had no understanding of data science requirements. I maintained persistent advocacy through informal conversations in the break room and consistent reminders to my manager about the importance of establishing this infrastructure. This advocacy continued for approximately 5 months before ops had capacity to complete the server configuration.
Phase 1: Experiment Tracking Implementation
Deployed MLFlow experiment tracking approximately 6 months after joining Act-On. This delay was primarily due to ops availability constraints rather than technical complexity.
The experiment tracking system logged:
- Parameters: Hyperparameter configurations and feature selections being tested
- Metrics: Performance measurements including F1 score, precision, and recall for the spam filter detection project; average reputation metrics for the fuzzy logic email delivery optimization
- Artifacts: Trained models stored in the centralized MLFlow repository
- Feature analysis: Systematic tracking of which input features (timing, spoofing indicators, browser version, and other email metadata) contributed to model performance
The MLFlow tracking UI provided dashboard capabilities to compare experiments, identify best-performing configurations, and trace the development history of each model. This replaced the previous approach of manually reviewing tens or hundreds of Jupyter notebooks to find optimal models.
Phase 2: Model Serving Implementation
The model handoff process to engineering was manual and inefficient. I provided Python files containing model code and serialized model artifacts stored in Bitbucket. Engineering had difficulty integrating Python code into their infrastructure and required detailed explanations of integration requirements.
Initiated MLFlow model serving the following year to standardize and automate this process. The approach:
- Packaged models in Docker containers with FastAPI interfaces, providing engineering with standardized REST API endpoints
- Utilized MLFlow's model serving capabilities to enable engineering to load models and feature names automatically via MLFlow function calls
- Eliminated manual handoff documentation and integration explanations
- Created seamless model updates: when models or features changed, the MLFlow function calls handled changes automatically without requiring engineering intervention
This transition required educating engineering on the new deployment approach and coordinating the shift from manual code handoffs to containerized serving infrastructure.
Outcome and Impact
Established reproducible, industry-standard ML development processes at Act-On Software where none existed previously. The specific improvements:
- Systematic tracking of all experiments with complete parameter, metric, and artifact logging
- Centralized model repository replacing inappropriate Bitbucket storage
- Dashboard-based model comparison replacing manual notebook review
- Confidence in model selection decisions based on documented evidence of systematic testing
- Automated model serving that eliminated manual handoff processes and integration challenges
- Significant time savings through efficient experiment management and automated deployment
These processes supported the spam filter detection and fuzzy logic email delivery optimization projects, providing the infrastructure necessary to prove models were properly tested and to track the development of production ML systems.
Development Environment
- MLFlow
- Jupyter Lab
- Docker
- FastAPI