Overview

Built a Random Forest model that predicts data center power demand to 0.79 accuracy, unifying SP Global, LBNL, and Aterio across 200+ data centers. Initiated an external data sourcing call with SP Global that GridCARE converted into a tens-of-thousands-of-dollars data acquisition. Work is now used internally to identify viable AI data center sites months to years faster than industry averages.

0.79accuracy across 5-fold cross-validation

200+data centers unified into one schema

Months to Yearsfaster site identification vs. industry average

Photo Gallery

meeting our liaison at GridCARE HQ

meeting up with GridCARE at the Monterey GridFWD conference

How it worked

Contributed across the full pipeline: coding, data wrangling, feature engineering, model training, exploratory analysis, presentation. Spearheaded external data sourcing, including initiating and leading a sales call with SP Global that resulted in GridCARE buying enterprise-grade data to strengthen their internal modeling.

Unified schema across SP Global, LBNL, Aterio. Cleaned, deduplicated, mapped features manually, validated unit consistency across sources.
Multi-method imputation pipeline for sparse data: KNN, Random Forest, MICE, Neural Network. Each audited against ground-truth holdouts.
Model selection. Evaluated Decision Tree, XGBoost, Neural Network, Random Forest. Chose Random Forest for best F1 and resistance to overfitting on a small dataset.
3 MW power buckets instead of single-value point estimates. Reflects realistic ranges and produces less fragile downstream decisions.
Stack: Python, scikit-learn, pandas, NumPy, Jupyter, GitHub. Statistical tests (ANOVA, t-tests).

Project Report

Data Center Power Prediction Final Report