A Fundamental Misunderstanding of Physics

Mar 13

This is the fourth post in a series examining average power calculation errors in fitness devices. The first post examined Apple's fundamental calculation errors across various data types. The second post quantified these errors using real-world cycling power data from an Apple Watch over eight months. The third post revealed that Garmin makes identical mistakes in their average power calculations. This post performs a detailed comparison of simultaneously recorded Garmin and Apple Watch data from three rides to understand why their reported values differ and what the underlying measurements reveal.

Neither Company Understands Energy and Power

The evidence from the first three posts shows that neither Apple's health team nor Garmin's engineers grasp the basic physics of energy and power. This is a fundamental conceptual error in how these companies calculate average values.

The physics is straightforward:

Energy always equals Power × Time
Power is always Energy ÷ Time
Average Power is simply Energy ÷ Total Time

Where Energy is the sum of each individual power sample multiplied by the time interval of that sample. This isn't advanced physics—it's undergraduate-level mechanics. Yet both companies calculate “average power” by simply adding up all power readings and dividing by the count. This arithmetic mean of readings is not average power unless every sample represents exactly the same time duration.

From the December 6, 2025 ride analyzed in the third post:

Garmin reported: 230W (true average: 218.3W, error: 5.5%)
Apple reported: 256W (true average: 221.0W, error: 12.7%)

Both devices' raw data show nearly identical true average power (within 1.2%), but their displayed values differ by 26W. This pattern holds across all rides analyzed.

Three Rides, Over 26,000 Data Points

I recorded three rides with simultaneous power measurements from both devices, both connected to the same 4iiii Precision Pro dual-sided power meter:

Ride 1 (December 6, 2025): 2h 5m, 7,186 Garmin points, 6,644 Apple Watch points
Ride 2 (December 7, 2025): 1h 7m, 3,843 Garmin points, 3,709 Apple Watch points
Ride 3 (December 10, 2025): 1h 17m, 4,356 Garmin points, 4,256 Apple Watch points

For each ride, I exported the raw data containing every timestamp and power reading, allowing detailed analysis of both the arithmetic mean and true average power.

The Numbers Reveal the Pattern

Here are all three rides compared side by side:

  
    Table 1: The Numbers Reveal the Pattern
    
    The Numbers Reveal the Pattern
    
                Ride
                Metric
                Garmin
                Apple
                Difference
            
                Ride 1 (Dec 6, 2025)
                Reported Values
                230W
                256W
                26W (10.2%)
            
                Arithmetic Mean
                228.5W
                249.4W
                20.9W (8.4%)
            
                True Average Power
                218.4W
                221.0W
                2.7W (1.2%)
            
                Ride 2 (Dec 7, 2025)
                Reported Values
                259W
                269W
                10W (3.7%)
            
                Arithmetic Mean
                257.2W
                266.6W
                9.4W (3.5%)
            
                True Average Power
                248.2W
                244.4W
                3.7W (1.5%)
            
                Ride 3 (Dec 10, 2025)
                Reported Values
                269W
                277W
                8W (2.9%)
            
                Arithmetic Mean
                268.2W
                275.2W
                7.0W (2.5%)
            
                True Average Power
                254.0W
                252.8W
                1.1W (0.4%)

Ride	Metric	Garmin	Apple	Difference
Ride 1 (Dec 6, 2025)	Reported Values	230W	256W	26W (10.2%)
Arithmetic Mean	228.5W	249.4W	20.9W (8.4%)
True Average Power	218.4W	221.0W	2.7W (1.2%)
Ride 2 (Dec 7, 2025)	Reported Values	259W	269W	10W (3.7%)
Arithmetic Mean	257.2W	266.6W	9.4W (3.5%)
True Average Power	248.2W	244.4W	3.7W (1.5%)
Ride 3 (Dec 10, 2025)	Reported Values	269W	277W	8W (2.9%)
Arithmetic Mean	268.2W	275.2W	7.0W (2.5%)
True Average Power	254.0W	252.8W	1.1W (0.4%)

The pattern is unmistakable:

Reported values differ by 3-10% between devices
Arithmetic means differ by 2.5-8.4% between devices
True average power differs by only 0.4-1.5% between devices

When calculated correctly—as total energy divided by total time—both devices agree almost perfectly. The underlying measurements are accurate; the calculation methodology is wrong.

Why Apple's Error Exceeds Garmin's

As established in the second post, Apple Watch samples with highly variable time intervals (mean: 0.8s, standard deviation: 1.35s). Some samples represent 5+ seconds while others are just 1 second apart.

Garmin samples very consistently at 1Hz—almost all intervals are exactly 1 second.

The arithmetic mean treats every reading equally, regardless of duration. A 400W reading gets identical weight whether it represents 1 second or 5 seconds of effort.

True average power weights each reading by its actual duration. A 400W reading lasting 1 second contributes 400 watt-seconds (joules) of energy. A 200W reading lasting 10 seconds contributes 2,000 watt-seconds (joules)—correctly reflecting 10x longer duration at that power.

Garmin's consistent sampling means its arithmetic mean is biased by capturing every brief power spike. Each 800W sprint lasting 1 second gets equal weight to each 200W cruise effort lasting 1 second. In true average power, both contribute exactly 1 second worth of energy—which is correct.

Apple's variable sampling creates additional bias. A 312W reading lasting 5 seconds represents 1,560 watt-seconds or joules of energy, but in the arithmetic mean counts the same as a 312W reading lasting just 1 second (312 watt-seconds or joules). Longer-duration readings at moderate power inflate Apple's arithmetic mean more than Garmin's uniform sampling does.

I analyzed every data point to determine its impact—how much it pushes the arithmetic mean away from true average power:

Garmin: 83-91% of points push arithmetic mean upward, with tightly clustered impacts due to uniform sampling
Apple: 88-92% of points push arithmetic mean upward, but with outlier points (0.4-1.1%) having very long time intervals that create partial offset

The Magnitude of the Error

Here's how wrong the reported values are:

  
    


    
    
    Table 2: The Magnitude of the Error
    


    The Magnitude of the Error
    
        
                Ride
                Device
                Reported
                Should Report
                Error
                Error %
            

        
                Ride 1 (Dec 6)
                Garmin
                230W
                218.4W
                +11.6W
                5.3%
            

                Apple
                256W
                221.0W
                +35.0W
                15.8%
            

                Ride 2 (Dec 7)
                Garmin
                259W
                248.2W
                +10.8W
                4.3%
            

                Apple
                269W
                244.4W
                +24.6W
                10.1%
            

                Ride 3 (Dec 10)
                Garmin
                269W
                254.0W
                +15.0W
                5.9%
            

                Apple
                277W
                252.8W
                +24.2W
                9.6%
            

                Average Across All Rides
                Garmin
                —
                —
                +12.5W
                5.2%
            

                Apple
                —
                —
                +27.9W
                11.8%
            

    



  

Ride	Device	Reported	Should Report	Error	Error %
Ride 1 (Dec 6)	Garmin	230W	218.4W	+11.6W	5.3%
Apple	256W	221.0W	+35.0W	15.8%
Ride 2 (Dec 7)	Garmin	259W	248.2W	+10.8W	4.3%
Apple	269W	244.4W	+24.6W	10.1%
Ride 3 (Dec 10)	Garmin	269W	254.0W	+15.0W	5.9%
Apple	277W	252.8W	+24.2W	9.6%
Average Across All Rides	Garmin	—	—	+12.5W	5.2%
Apple	—	—	+27.9W	11.8%

Apple's error is consistently more than double Garmin's. Both violate basic physics, but Apple's misunderstanding produces larger errors due to their variable sampling strategy.

Data Quality: Connectivity Issues

Beyond calculation errors, both devices suffer connectivity dropouts—periodically recording 0W when the power meter transmits data:

  
    Table 3: Data Quality - Connectivity Issues
    
    Data Quality: Connectivity Issues
    
                Metric
                Ride 1 (Dec 6)
                Ride 2 (Dec 7)
                Ride 3 (Dec 10)
            
                Both Working
                79.0%
                88.2%
                89.0%
            
                Garmin Dropouts
                416 (5.8%)
                68 (1.8%)
                103 (2.4%)
            
                Apple Dropouts
                307 (4.3%)
                42 (1.1%)
                73 (1.7%)
            
                Major Discrepancy Periods
                115
                1
                12

Metric	Ride 1 (Dec 6)	Ride 2 (Dec 7)	Ride 3 (Dec 10)
Both Working	79.0%	88.2%	89.0%
Garmin Dropouts	416 (5.8%)	68 (1.8%)	103 (2.4%)
Apple Dropouts	307 (4.3%)	42 (1.1%)	73 (1.7%)
Major Discrepancy Periods	115	1	12

Garmin experiences 1.4-1.6x more frequent dropouts but they're shorter (9-13 consecutive zeros average, longest: 66 seconds).

Apple Watch has fewer dropouts but they last longer (6-10 consecutive zeros average, longest: 145 seconds).

When both devices successfully record data simultaneously, they agree remarkably well: 3-4W average difference on instantaneous power, with 0.16-0.28 second time synchronization. The measurement hardware works correctly—the problem is purely in the calculation.

Why This Matters

Average power isn't cosmetic—it's foundational for:

Energy expenditure: Total work determines caloric burn
Performance trends: Comparing fitness over time

As shown in the second post, Apple's incorrect calculation produces inconsistent efficiency values (mean 20.0%, SD 1.7%) when comparing reported average power to reported active energy. Correct calculation would yield constant efficiency.

These 5-12% errors compound throughout training analysis, making load calculations unreliable and progress tracking misleading.

What You Should Do

Never trust displayed “average power.” Both companies display arithmetic mean of readings, which violates the physics definition of average power.

The Uncomfortable Question

How do sophisticated engineering teams at Apple and Garmin get undergraduate physics wrong? The correct calculation—numerical integration to compute energy, then divide by time—dates back to the 1800s. This isn't about corner cases or edge conditions. It's about the fundamental definition of average power. Average power is energy divided by total time. Always. Yet here we are, with millions of cyclists trusting “average power” values that aren't average power at all as defined in physics.

The Silver Lining

When you calculate average power correctly from raw data, both devices agree within 0.4-1.5% across all rides. The sensors work. The connectivity protocols work (mostly). The devices faithfully record accurate power measurements.

The only failure is in the final calculation shown to users. This is simultaneously frustrating and encouraging—frustrating because it's such a basic error, encouraging because the underlying data is sound.

Conclusion

Both Apple and Garmin display values that violate the physics definition of average power. The errors are substantial (5-12% overestimation) and consistent across devices and rides.

But the root cause appears to be conceptual rather than malicious. Neither company seems to understand that average power requires weighting by time duration, not just averaging the numbers collected. This is basic physics yet somehow it's being implemented incorrectly by major fitness technology companies.

For now, the solution is clear: don't trust what the devices display.

All analysis and visualizations were performed using Python with raw data exported from both devices. Complete methodology and code are available for verification.

James Mattis

A Fundamental Misunderstanding of Physics

Neither Company Understands Energy and Power

Three Rides, Over 26,000 Data Points

The Numbers Reveal the Pattern

The Numbers Reveal the Pattern

Why Apple's Error Exceeds Garmin's

The Magnitude of the Error

The Magnitude of the Error

Data Quality: Connectivity Issues

Data Quality: Connectivity Issues

Why This Matters

What You Should Do

The Uncomfortable Question

The Silver Lining

Conclusion

Garmin’s erroneous average power data

World Champ Tech