Magnefy | Felix Peng

Project deliverables: 0.9959 macro F1, 121,584 pulses, 8 models, 5-fold cross-validation

Overview

Magnefy builds magnetic sensors that watch power transformers for partial discharge, the small electrical breakdowns that precede catastrophic failure. I led the machine learning pipeline that classifies a detected discharge into one of five physical types. Each type points to a different failure mechanism and a different maintenance response, so getting the type wrong means sending the wrong crew. The deployed model hits 0.9959 file-level macro F1 across 121,584 labeled pulses.

0.9959file-level macro F1, deployed model

121,584pulse events extracted and validated

8models trained, tuned, and compared

Photo Gallery

With Joseph, Magnefy's CEO, at Southern California Edison's Grid Digitalization Summit

talking transformer health with Joseph, Magnefy's CEO, at SCE's Grid Digitalization Summit

With Joseph at the GridFWD conference in Monterey

dinner with Joseph at GridFWD Monterey

How it worked

Team Lead on a 4-person Graphite Digital team. I owned the pipeline end to end: signal processing, feature extraction, model selection, evaluation methodology, and the deployment-ready artifact handed to Magnefy.

Signal processing. A 1 to 60 MHz bandpass cuts 60 Hz mains noise. An adaptive 3-tier threshold (Kneedle, then Otsu, then k-sigma) gives every recording its own event-isolation threshold with no human in the loop. The window is asymmetric, 0.5 us before the pulse and 1.5 us after, matched to the way PD pulses rise fast and ring out slowly.
Features. 21 per pulse: 16 time-domain, 4 FFT, and the phase angle in the AC cycle. The strongest discriminators were phase angle, spectral centroid, and rise time.
Models. Trained, tuned, and compared eight: Logistic Regression, Random Forest, XGBoost, LightGBM, MLP, SVM, ROCKET, and a PRPD CNN. Deployed ROCKET for the best balance of accuracy and per-pulse diagnostics.
Measurement rigor. Acquisition-level StratifiedGroupKFold so pulses from the same recording never appear in both train and test. Random pulse splits quietly inflate scores 5 to 10 points. Scored on macro F1 against a ~0.12 dummy baseline because the classes are imbalanced.

Dataset

IEEE Dataport HFCT recordings, the same sensor type Magnefy uses in the field, at a 125 MHz sample rate. 347 labeled recordings across 5 classes. We rejected low-sample-rate, acoustic, UHF, and synthetic datasets because their features do not transfer to HFCT.

Results

Eight models under leakage-free 5-fold cross-validation. ROCKET went to deployment; Logistic Regression matched it at the file level for a fraction of the size.

Per-pulse and file-level macro F1 across all eight models, plus key metrics

Limitations

The 0.9959 is on IEEE lab data. Magnefy hardware has a different sample rate, file format, and noise floor, so field deployment needs ingest changes and retraining.
There is no PD-vs-not-PD upstream gate yet. The classifier assumes a real discharge is present, so a noise-only recording still flows through. Both gaps are documented in the handoff so nothing surprises the team in production.