3D Body Scan Size Classification Research

The Challenge

Online clothing retailers face significant costs from product returns driven by size mismatches. An early-stage startup aimed to solve this problem using 3D body scans to provide accurate size recommendations. The technical challenge: develop machine learning models that could classify clothing sizes from scan data collected via portable devices.

The project required building classification systems from scratch across multiple technical approaches while working with fundamental constraints in data collection, scanner hardware capabilities, and timeline pressures. Key technical challenges included:

Limited training data - 500 examples to classify 36 different women's sizes
Severe class imbalance - majority of data from petite sizes, target market was plus sizes
Scanner hardware limitations - portable scanner resolution insufficient for accurate body measurements
Data quality issues - mislabeled examples and measurement inconsistencies
Multiple size classification systems required - separate models for men's and women's sizing

Multiple Technical Approaches

I explored three distinct machine learning architectures to identify the most effective approach given the available data and hardware constraints:

Point Cloud Classification with PointNet

Initial approach used fine-tuned PointNet models to classify sizes directly from 3D point cloud data generated by body scans. Point clouds offered the advantage of working with raw spatial data without requiring measurement extraction. However, the portable scanner's limited resolution and the highly correlated nature of points from partial body scans (neck to upper abdomen only) prevented the model from achieving acceptable accuracy.

Multimodal Neural Network

Developed a multimodal architecture combining:

Fine-tuned EfficientNet for processing silhouette images generated from scan data
Multi-layer perceptron trained on tabular measurement features
Combined MLP layer integrating outputs from both networks

This approach aimed to leverage both visual body shape information and precise measurements. While architecturally sound, data limitations prevented the model from learning robust patterns across the full range of size categories.

Gradient Boosted Decision Trees

Focused on GBDT models (XGBoost, CatBoost, LightGBM) trained on engineered features derived from body measurements. This approach proved most effective when sufficient training data existed for specific size ranges.

Data Analysis & Critical Limitations

Scanner Hardware Analysis

The project used two different scanning systems with significantly different capabilities:

Fixed Scanner: Research-grade system designed for body measurement, capable of high-resolution scans and automated calculation of multiple body measurements. Used for initial data collection.
Portable Scanner: iPhone-based system using ARKit for the production application. Limited to partial body scans (neck to upper abdomen), lower resolution, and subject to distortion from device movement during scanning.

The fundamental limitation: measurements derived from the portable scanner lacked the precision and completeness required for accurate size classification, particularly for distinguishing between adjacent size categories.

Dataset Characteristics

Women's Classifier: 500 total examples across 36 size categories, with severe class imbalance. Data heavily skewed toward petite sizes, while the target customer base was plus-size women. Multiple size categories had zero training examples. Limiting classification to the 8 size categories with sufficient data achieved 80%+ accuracy, but this excluded the commercially critical plus sizes.

Men's Classifier: Substantially larger dataset classifying 4 size categories (S/M/L/XL). Initial model showed 85% accuracy but investigation revealed it was trained and tested on the same small data subset. Expanding to the full available dataset and implementing proper train-test splits improved accuracy to 99%.

Data Quality Issues

Exploratory analysis revealed multiple data quality problems:

Labeling errors where scan measurements clearly indicated different sizes than labels
Outliers including physically impossible measurements (e.g., heights of 2.5 feet)
Systematic bias in data collection - petite and fit individuals more willing to participate in scanning than plus-size individuals

Collaborated with the sizing team to correct mislabeled data points. For the men's classifier, removing clear outliers improved accuracy by several percentage points. For the women's classifier, data corrections alone couldn't overcome the fundamental issue of insufficient examples across size categories.

Research & Feature Engineering

Conducted literature review of academic research on 3D body scanning for clothing size classification. This research revealed two critical findings:

Successful implementations required substantially larger datasets than available, with balanced representation across size categories
Height emerged as an important feature across multiple studies, suggesting body proportions were as important as absolute measurements

Based on these insights, engineered a waist-to-height ratio feature. Research from health departments indicated this ratio more accurately captures body fat distribution than BMI and requires only two measurements. This feature improved model performance for size categories with adequate training data, particularly when working with measurements from the fixed scanner.

The feature engineering work highlighted a fundamental constraint: the portable scanner's limited measurement capabilities prevented extraction of the body proportion features that proved most predictive in the research literature and in models trained on fixed scanner data.

Key Findings & Communication

Analysis and experimentation across multiple model architectures led to several technical conclusions:

GBDT models trained on engineered measurement features outperformed deep learning approaches given available data volumes
Accurate classification required balanced training data across all size categories - cannot reliably predict sizes with zero or minimal examples
Portable scanner hardware limitations prevented collection of the measurement types and precision required for production deployment
Data collection showed systematic bias against the target demographic, creating a fundamental mismatch between available training data and business requirements

Documented these findings in a technical report to stakeholders, clearly outlining the data requirements for a successful implementation:

Substantially larger dataset needed, with particular focus on plus sizes where current data was insufficient
Continued reliance on fixed scanner for data collection due to portable scanner's measurement limitations
Realistic timeline and resource requirements for collecting adequate training data across all size categories

The analysis provided stakeholders with clear, evidence-based understanding of why current approaches couldn't meet product requirements and what would be required for a viable solution.

Development Environment

Python
Jupyter Notebook
PyTorch
TensorFlow
Keras
Scikit-Learn
XGBoost
CatBoost
LightGBM
Optuna
Pandas
NumPy
Matplotlib
Seaborn
PointNet
EfficientNet
Open3D
TriMesh
OpenCV
Pandas Profiling