MLFlow Initiative: Establishing Data Science Process Standards

Strategic Problem

As the first and only data scientist at Act-On Software, I identified a fundamental process gap: the organization lacked any systematic approach to machine learning experimentation and model management. Engineering and management had no understanding of data science workflows, and existing processes were designed entirely around software development rather than ML development.

The specific challenges I was facing:

Running 20-30 experiments daily testing different hyperparameters and feature combinations with no systematic way to track results
Models and training code stored in Bitbucket alongside application code, which is inappropriate for machine learning artifacts
No centralized repository for trained models
Manual, time-consuming model handoffs to engineering
Difficulty identifying best-performing models and feature sets across hundreds of experiments
No reproducible process for model development

Without intervention, this lack of process would undermine the quality and reliability of all ML work at the company.

Analysis and Strategic Approach

I evaluated the organization's needs against industry-standard ML development practices. The core insight: data science requires fundamentally different processes than software engineering. Version control systems like Bitbucket are designed for code versioning, not for managing machine learning experiments, models, or the complex relationships between hyperparameters, features, and performance metrics.

Selected MLFlow as the solution based on the following criteria:

Industry-standard open-source platform specifically designed for ML lifecycle management
Experiment tracking capabilities that log parameters, metrics, and artifacts systematically
Model registry providing centralized storage with versioning and lineage tracking
Model serving functionality that could standardize deployment interfaces
Superior to having no process, which was the current state

The alternative was continuing with ad-hoc workflows that provided no confidence in model selection decisions and no systematic way to prove models had been properly tested.

Stakeholder Engagement and Advocacy

Management and engineering had no frame of reference for data science processes. I wrote a white paper documenting:

The distinction between experiment tracking and version control
Why Bitbucket is inappropriate for storing machine learning models
The scale of the tracking problem: 20-30 experiments per day generating hundreds of model variants that needed systematic comparison
How MLFlow would provide the tools to simplify experiment management and save time
Process improvement benefits: reproducibility, confidence in model selection, systematic testing documentation

Presented the white paper to management to get approval for the initiative.

Implementation required ops to configure and maintain an MLFlow server. Ops was heavily committed to other initiatives and had no understanding of data science requirements. I maintained persistent advocacy through informal conversations in the break room and consistent reminders to my manager about the importance of establishing this infrastructure. This advocacy continued for approximately 5 months before ops had capacity to complete the server configuration.

Phase 1: Experiment Tracking Implementation

Deployed MLFlow experiment tracking approximately 6 months after joining Act-On. This delay was primarily due to ops availability constraints rather than technical complexity.

The experiment tracking system logged:

Parameters: Hyperparameter configurations and feature selections being tested
Metrics: Performance measurements including F1 score, precision, and recall for the spam filter detection project; average reputation metrics for the fuzzy logic email delivery optimization
Artifacts: Trained models stored in the centralized MLFlow repository
Feature analysis: Systematic tracking of which input features (timing, spoofing indicators, browser version, and other email metadata) contributed to model performance

The MLFlow tracking UI provided dashboard capabilities to compare experiments, identify best-performing configurations, and trace the development history of each model. This replaced the previous approach of manually reviewing tens or hundreds of Jupyter notebooks to find optimal models.

Phase 2: Model Serving Implementation

The model handoff process to engineering was manual and inefficient. I provided Python files containing model code and serialized model artifacts stored in Bitbucket. Engineering had difficulty integrating Python code into their infrastructure and required detailed explanations of integration requirements.

Initiated MLFlow model serving the following year to standardize and automate this process. The approach:

Packaged models in Docker containers with FastAPI interfaces, providing engineering with standardized REST API endpoints
Utilized MLFlow's model serving capabilities to enable engineering to load models and feature names automatically via MLFlow function calls
Eliminated manual handoff documentation and integration explanations
Created seamless model updates: when models or features changed, the MLFlow function calls handled changes automatically without requiring engineering intervention

This transition required educating engineering on the new deployment approach and coordinating the shift from manual code handoffs to containerized serving infrastructure.

Outcome and Impact

Established reproducible, industry-standard ML development processes at Act-On Software where none existed previously. The specific improvements:

Systematic tracking of all experiments with complete parameter, metric, and artifact logging
Centralized model repository replacing inappropriate Bitbucket storage
Dashboard-based model comparison replacing manual notebook review
Confidence in model selection decisions based on documented evidence of systematic testing
Automated model serving that eliminated manual handoff processes and integration challenges
Significant time savings through efficient experiment management and automated deployment

These processes supported the spam filter detection and fuzzy logic email delivery optimization projects, providing the infrastructure necessary to prove models were properly tested and to track the development of production ML systems.

Development Environment

MLFlow
Jupyter Lab
Docker
FastAPI