Home

GridCARE

Data Center Power Prediction Model

Timeline: Jan 2025 to May 2025 Role: Machine Learning Consultant Location: Redwood City, CA
GridCARE Data Center Power Prediction

Overview

Built a Random Forest model that predicts data center power demand to 0.79 accuracy, unifying SP Global, LBNL, and Aterio across 200+ data centers. Initiated an external data sourcing call with SP Global that GridCARE converted into a tens-of-thousands-of-dollars data acquisition. Work is now used internally to identify viable AI data center sites months to years faster than industry averages.

0.79accuracy across 5-fold cross-validation
200+data centers unified into one schema
Months to Yearsfaster site identification vs. industry average

Photo Gallery

How it worked

Contributed across the full pipeline: coding, data wrangling, feature engineering, model training, exploratory analysis, presentation. Spearheaded external data sourcing, including initiating and leading a sales call with SP Global that resulted in GridCARE buying enterprise-grade data to strengthen their internal modeling.

  • Unified schema across SP Global, LBNL, Aterio. Cleaned, deduplicated, mapped features manually, validated unit consistency across sources.
  • Multi-method imputation pipeline for sparse data: KNN, Random Forest, MICE, Neural Network. Each audited against ground-truth holdouts.
  • Model selection. Evaluated Decision Tree, XGBoost, Neural Network, Random Forest. Chose Random Forest for best F1 and resistance to overfitting on a small dataset.
  • 3 MW power buckets instead of single-value point estimates. Reflects realistic ranges and produces less fragile downstream decisions.
  • Stack: Python, scikit-learn, pandas, NumPy, Jupyter, GitHub. Statistical tests (ANOVA, t-tests).

Project Report

Data Center Power Prediction Final Report