Overview
Built a Random Forest model that predicts data center power demand to 0.79 accuracy, unifying SP Global, LBNL, and Aterio across 200+ data centers. Initiated an external data sourcing call with SP Global that GridCARE converted into a tens-of-thousands-of-dollars data acquisition. Work is now used internally to identify viable AI data center sites months to years faster than industry averages.
Photo Gallery


How it worked
Contributed across the full pipeline: coding, data wrangling, feature engineering, model training, exploratory analysis, presentation. Spearheaded external data sourcing, including initiating and leading a sales call with SP Global that resulted in GridCARE buying enterprise-grade data to strengthen their internal modeling.
- Unified schema across SP Global, LBNL, Aterio. Cleaned, deduplicated, mapped features manually, validated unit consistency across sources.
- Multi-method imputation pipeline for sparse data: KNN, Random Forest, MICE, Neural Network. Each audited against ground-truth holdouts.
- Model selection. Evaluated Decision Tree, XGBoost, Neural Network, Random Forest. Chose Random Forest for best F1 and resistance to overfitting on a small dataset.
- 3 MW power buckets instead of single-value point estimates. Reflects realistic ranges and produces less fragile downstream decisions.
- Stack: Python, scikit-learn, pandas, NumPy, Jupyter, GitHub. Statistical tests (ANOVA, t-tests).