Type II Errors in Conclusions from Short-Term Trends

95% CI Envelope for HadCRUT4 in McKitrick's Paper

In a recent post, I shared why making conclusions from short-term trends is misleading. It's the kind of mistake that people make on social media, but I didn't think that I would find it in the scientific literature.  But in 2014, Ross McKitrick published a paper attempting to develop a statistical method for determining the length of the so-called "pause" in global warming through 2014. To do this, McKitrick calculated trends with 95% CIs through 2014 in 3 datasets - HadCRUT4, RSS and UAH. The lower bound CI overlapped 0 C/decade in 1995 for HadCRUT4, 1998 for UAH and 1988 for RSS. McKitrick then says these dates can give us the length of the "pause" in global warming. His words:
I propose a robust definition for the length of the pause in the warming trend over the closing subsample of surface and lower tropospheric data sets. The length term JMAX is defined as the maximum duration J for which a valid (HAC-robust) trend confidence interval contains zero for every subsample beginning at J and ending at Tm − where m is the shortest duration of interest. This definition was applied to surface and lower tropospheric temperature series, adding in the requirement that the southern and northern hemispheric data must yield an identical or larger value of JMAX . In the surface data we compute a hiatus length of 19 years, and in the lower tropospheric data we compute a hiatus length of 16 years in the UAH series and 26 years in the RSS series.
Of course, there's a major problem with this. From each year following the start date, the sample size decreases, since the beginning date of the calculation becomes more recent while the end date stays the same. So regardless of the actual warming trend, the CIs are going to widen rapidly as you get near the present (here 2014), and the length of these estimates of the length of the "pause" were all based on sample sizes of less than 30 years. And McKitrick misstated the implications we can make from a lack of statistical significance. The lack of statistical significance in these trends does not mean that there has been a pause in global warming. It means the data can't rule out the possibility of either a warming trend or cooling trend to 95% certainty. The uncertainty estimate for each year also includes uninterrupted warming at similar trends throughout all the years of the so-called "pause." But it's now 2022, and if we calculate the trend since 1995 in HadCRUT4 we get a statistically significant trend of 0.156 ±0.063 °C/decade (2σ). McKitrick's entire argument was based on a misapplication of what statistical significance in climate trends means.

In effect, McKitrick's "robust definition" will inevitably commit Type II errors. Type I errors can be thought of as a "false positive" while Type II errors are a false negative. Most of us have taken a COVID-19 test in the past year or two. Each test has a false positive rate - a chance that it will say you're positive when you're actually negative. It also has a false negative rate - saying your negative when you're actually positive. These are Type I and II errors, respectively. McKitrick's method actually ensures Type II errors for recent years because all trends will be statistically insignificant until the sample size is large enough to achieve statistical significance. So if we were to apply this test consistently, we would literally always be in at least a short-term "pause" regardless of the actual warming trend. The graph from McKitrick's paper above shows why this is assured. The CIs for extremely short term trends are astronomical. Below I plotted the trend with 95% CI from 2013 to 2016. Nobody would suggest that there was a pause during these years, and yet the trend was statistically insignificant at 1.113 ±1.136 °C/decade (2σ). McKitrick certainly knows this, which is why his analysis included an "m" term, which is defined as "minimum-length duration of interest." Presumably this would exclude extreme short-term time frames where CIs are high. But that leaves open a question, how small should "m" be? Should we be interested in trends of 5 years? 10 years? 15 years? The answer to that question depends on the statistical power of the test. If McKitrick's definition were applied to 2013-2016, it would have little if any meaning. We can't automatically assume that a statistically insignificant trend of 19 years has enough statistical power to say that there has been a pause. GMST datasets are wiggly - they contain a significant amount of natural variability, so short-term trends may simply be an artifact of persistent weather changes (ENSO, etc.).


We could construct a method that will commit Type I errors just as easily. We could say warming is continuing with no pause as long as the upper bound CI remains above 0. I'm sure McKitrick would cry foul at such a method, and rightly so. The false positive and false negative rates with with these methods are too high to be useful.

Perhaps another way to articulate my criticism is that McKitrick's paper was written backwards. Normally, if you want to show that a warming trend is occurring, you prove wrong the hypothesis that it's not warming. If you want to show that the warming trend is caused by human activity, you prove wrong the hypothesis that the warming trend can be explained by natural variability. If you want to show that there has been a pause in warming, then you should be proving wrong the hypothesis that there is a warming trend throughout the 19 years in the so-called pause. However, this hypothesis cannot be excluded by the data for any time frame he discusses. McKitrick can't exclude that there has been uninterrupted warming at 95% confidence. That doesn't mean there was nothing in the data that could qualify as a pause. It just means that McKitrick's method didn't find anything that qualifies as a pause to 95% confidence.

It seems a better method would be to examine successive, overlapping 30-year trends, since 30 years is the minimum time frame for climate. A 30-year trend that is small enough that the 95% CI overlaps 0 would be sufficient to say that there has been a pause. Now, this method would not do a good job of identifying short-term pauses of say, 15 years. But that's not a bad thing. The warming signal calculated from 30-year trends is currently over 0.2 C/decade in all major datasets. Internal variability is about that much annually. El Nino years average 0.2 C warmer than La Nina years, for instance. What can we conclude about the climate signal from the fact that, after a strong El Nino, calculated short-term trends are not statistically significant? Practically nothing. Choosing shorter time frames means that the calculated trend will be significantly impacted by weather events and other sources of internal variability (like ENSO) that can obscure the actual climate trend. That's why we have the 30 years = climate rule to begin with.

References:

[1] McKitrick, R. (2014) HAC-Robust Measurement of the Duration of a Trendless Subsample in a Global Climate Time Series. Open Journal of Statistics, 4, 527-535. doi: 10.4236/ojs.2014.47050.

Comments

Popular posts from this blog

The Marketing of Alt-Data at Temperature.Global

Roy Spencer on Models and Observations

Patrick Frank Publishes on Errors Again