TwinMe
Sign in Store

BioTwin publishes medical research, clinical development, and wellness education. Clinical claims apply only where authorized. TwinMe wellness content is not medical advice, diagnosis, screening, treatment, or disease monitoring.

Multi-Year Longitudinal Inter-Device Agreement Across Four Consumer Wearables: Concordance, Temporal Diagnostics, and Signal Harmonization

Preprint by Hauguel, Noel, and Anctil quantifying how poorly four consumer wearables agree with each other across an N-of-1 dataset spanning more than 2,400 days, and how much a harmonization layer can recover.

Why it matters

Wearable signals are a core input to the virtual twin, so knowing exactly where two devices disagree, and by how much, is a prerequisite for combining them. This work shows that identical metric labels do not guarantee comparable measurements and that device-specific recalibration is often needed.

Summary

Consumer wearable data from different devices cannot be treated as interchangeable, yet the specific failure modes are poorly characterized for long-term, multi-device use. Using an N-of-1 longitudinal dataset spanning 2,443 days (Garmin Fenix), 819 days (Oura Ring Gen 3), 240 days (Whoop 4.0), and 284 days (TwinMe Watch), this preprint quantifies inter-device agreement, not clinical accuracy, across six metric categories: resting heart rate, heart rate variability, sleep, steps, respiratory rate, and SpO2.

Agreement varies systematically by metric: sleep duration (CCC = 0.643) is more consistent than HRV (0.362), respiratory rate (0.315), and SpO2 (-0.016). Garmin-Oura resting heart rate agreement is poor (CCC = 0.106 raw), driven by scale mismatch rather than location bias, and day-to-day directional agreement is near zero, meaning two devices can disagree about whether your RHR went up or down. A Ridge Regression harmonization layer raises Whoop to Garmin RHR agreement from CCC = 0.190 to 0.768 on aggregated out-of-sample predictions, though per-fold performance is highly variable.

Why it matters

Identical metric labels across wearables do not guarantee comparable constructs. For any system that fuses signals from multiple devices over time, including a virtual twin, this provides a reproducible methodological foundation and a clear warning: cross-ecosystem integration generally requires device-specific recalibration.

Authors

Pierrick Hauguel, Louis-Philippe Noel, Nicolas Anctil. All authors are affiliated with BioTwin Inc.