Data Tampering by Shewchuk and Heller

If you follow climate discussions on X, you're bound to see John Shewchuk and/or Tony Heller show graphs that reportedly show that NOAA is tampering with temperature data to fabricate global warming with spurious warming trends. I've gone over many of the reasons why this is nonsense before in posts about bias correction and so-called ghost stations. I think it's good to show what's actually going on with the graphs they present as "proof" of data manipulation, though. I think it can be easily demonstrated here that it's actually Shewchuk and Heller that are tampering with data.

Shewchuk (Top) and Correct (Below)

Above are two graphs. The top graph shows what John Shewchuk claims shows that NOAA is manipulating data. It shows USHCN "raw" and "altered" Tmax data for 1900 to 2023. The bottom graph above is the correct plot of NOAA's published data from the current and correct dataset (nClimDiv) with a 5-year running mean to match Shewchuk's plot.
Heller (Top) and Correct (Below)

Again, above I show Heller's version of the same trick. The top graph is his version of USHCN data using 10-year running means and starts in 1920. The bottom is the correct plot of NOAA's data from nClimDiv matching his start date and his choice running mean. Other than the dimensions of my graphs vs theirs and the fact that my graphs are current through 2024 instead of 2023, you'd think my graphs would be identical to what they're calling "altered" or "adjusted" data, but they're not. In fact, since Heller calculated the average temperature of his red line to be 64.3°F for 1920-2023, I calculated the correct average temperature for 1920-2023 and got 64.43°F; close, but not the same. The differences between the correct plot my graphs and the "altered"/"adjusted" plots in theirs shows the amount of data tampering that they engage in.

Heller's "Raw" Data Misrepresents USHCN Raw Data

The differences between their "raw"/"measured" temperatures and the correct raw data is even larger. This was shown some time ago by Zeke Hausfather, back in the day when Tony Heller was working under the fake name "Steve Goddard." The difference between the Goddard/Heller raw and Correct raw above shows the amount of data tampering goes into creating their graphs.

To be clear, I don't think these people are just changing station data, but the fact remains that neither the "raw"/"measured" data nor the "altered"/"adjusted" data match anything published by NOAA, and there are several tricks that they use to create false impressions of the actual US temperature data. The following are a sampling of their tricks:
  1. They use the now defunct USHCN network instead of the current nClimDiv network. USHCN hasn't been used by NOAA since 2014; the newer nClimDiv network has over 10,000 stations (based on GHCN-daily stations in and near the US), which is over 8x the number of stations that were un USHCN. There are public-facing data tables for the USHCN that are still populated by stations in the old network, but some stations in that network have closed; they're no longer operable. The current nClimDiv network uses far more stations and a finer grid, and it continues to be maintained. Some of the differences between Shewchuk/Heller graphs and correct graphs come from their insistence on using a defunct dataset that is no longer maintained.
  2. They use simple averages of station data, making their graphs and calculated trends significantly affected by station moves, station closures, gaps in station records, etc. In fact, others have noticed that the average latitude of US stations has increased with time due to station moves, openings, and closures. This adds a cooling bias to when using simple station averages. None of these would be issues if they used a proper area-weighted mean with a grid, like NOAA uses with their published values.
  3. Even aside from using defunct datasets and incorrect math, the "raw" data they plot has been affected by known biases, most notably due to changes in the time of observation for reading max-min thermometers. Biases like these can be and have been quantified well in the literature, and it can even be synthesized using the USCRN network. This bias can be quantified and therefore accurately corrected, so even if Shewchuk and Heller knew how to calculate mean temperatures accurately, they are essentially comparing the biased data to bias-corrected data, and the bias-corrected data is known to be more reliable. 
Shewchuk and Heller feel like the differences between the blue and red lines on their graphs indicate data tampering on the part of NOAA; it's simply assumed that NOAA is manipulating/tampering with their data to create spurious warming trends in the US. However, there's a simple way to test if this is actually the case. NOAA also maintains a smaller network called USCRN, which includes 114 ideally-sighted, rural-only stations that require no bias correction. USCRN allows apples-to-apples comparisons between a true "raw" dataset and nClimDiv, which uses bias-correction. Since USCRN began in 2005, we now have 20 years of data to compare with nClimDiv. 
Here it's very clear that you wouldn't be able to tell the difference between the homogenized 10,000-station nClimDiv and the raw, rural-only, 114 station USCRN if I didn't label them.  A few conclusions should be obvious here:
  1. USCRN and nClimDiv trends are nearly identical, and if anything, USCRN shows marginally more warming than nClimDiv. So we can be certain that bias correction in nClimDiv is not adding spurious warming to US temperatures. 
  2. The number of stations reporting essentially doesn't matter, provided you calculate area-weighted means properly. USCRN with 114 stations and nClimDiv with 10,000 stations have essentially the same trends, so complaints about station dropouts in USHCN are clearly unwarranted. The USHCN had 1200 stations, and while many dropped out of the network, these stations drop outs would have a negligible effect on US temperature trends with competent averaging, but Shewchuk and Heller refuse to do this.
  3. We can conclude very little from Shewchuk's and Heller's red and blue comparison graphs because neither were plotted correctly. Both were tampered with, and the correct plots agree with USCRN for the years where they overlap.
I think it's obvious here that there is no evidence here of data manipulation/tampering by NOAA. However, the graphs produced by Shewchuck and Heller do actually constitute compelling evidence that they are willing to tamper with NOAA's data - that is, they misrepresent what NOAA's temperature data actually says, whether through their incompetence and/or deceit.



Comments

Popular posts from this blog

Debunking the Latest CO2 "Saturation" Paper

The Marketing of Alt-Data at Temperature.Global

Patrick Frank Publishes on Errors Again