Has the "Hockey Stick" Been Disproven? Part 2 - Further M&M Challenges
This is part 2 of a series on the Hockey Stick. Here's part 1.
When Mann and his colleagues were constructing their proxy reconstruction for NH temperatures for their MBH98 paper, there were two significant challenges that had to be addressed. The first was calibrating proxy model to the instrumental record. They can't expect that the proxy data would match the instrumental record with absolute precision, so if they over-calibrated the model, the proxy would fit the instrumental record but would not reconstruct earlier temperatures. Random noise in the proxy and instrumental data will make the proxy data less reliable in reconstructing the past. So the challenge was to fit the overall climate trends in the proxy data to the instrumental record while allowing for variability between the two datasets. The second challenge was to address the clustering of climate proxies so that one set of proxies doesn't overweight regions with a denser concentration of proxy evidence.
To address these challenges, MBH98 was built on what is called "principle component analysis" (or PCA). The paper needed to analyze a great deal of data from temperature proxies that were needed to discover global temperature variability. PCA is a way to discovering the most common patterns in the data. These patterns are "principle components" (PCs) which show a percentage of the variation in the original data. Many of these PCs will not have any statistical significance. Those that do can be used instead of the full set of data. If done properly, the full set of data and the statistically significant PCs will reveal the same things.
The PCs produced through PCA may represent the overall climate trends or other patterns like seasonal cycles. But the PCs produced through PCA can be ranked in terms of the variance they explain. Since PCs are just statistical constructs, they do not necessarily reveal anything that is physically meaningful on their own. And the methodology used in producing these PCs can have a significant impact on the results. In changing the methodology, the ranking of the PCs produced may change and the number of needed PCs may change as well. It's therefore important to know if the PCs are a product of what is physically happening in nature or if they are a product of the methodology. The more PCs you use, the better chances your PCs represent the whole. If the results from the number of PCs you use agree with using all data without PCA, then you likely have used enough PCs.
In the above graph, I plotted MBH98 and HadCRUT5's NH temperatures with the 2-sigma CIs from MBH98, with a 10-year running mean. The plot used a 1850-1900 baseline, so predictably the two agree over that time frame. But you can also see the continued agreement across calibration period from 1902-1980. The two are not identical, but it would be very hard to argue that MBH98 fails to represent climate trends across the calibration period.
For North American tree ring proxies, MBH98 used about 70 chronologies. Since the goal was to reconstruct annual variability in NH temperatures, MBH98 was not as interested in seasonal variability in tree ring proxies or regional variability in nearby forests. The interest was in the shared information between them all so that large-scale variability can be seen. In MBH98, the NA tree ring series were analyzed in this way. The authors used a selective number of PCs to train the model in the calibration period to avoid overfitting the proxy data to the instrumental record. It's the contention of MM05a that the "hockey stick" was a product of the statistical methodology used with NA tree rings and not necessarily the result of anything physically meaningful in terms of NH temperature variability.
Let me state at the outset that I think it's perfectly right and fair for MM05a to investigate whether the hockey stick is a product of the chosen statistical method rather than what was actually happening to NH temperatures. The question here is not whether the analysis contained in MM05a was worth investigating. The question is whether they made their case. Did MM05a show that the Hockey Stick was merely a product of MBH98's statistical method?
The Argument of MM05a
MM05a was largely concerned with the impact of the statistical analysis of 70 North American Tree ring proxies. They argue that MBH used an "unusual data transformation which strongly affects the resulting PCs." Essentially, MBH98 normalized their data for their calibration period, which was 1902-1980, so that the mean of the data over this calibration period was 0. They tested their method on red noise, and they found that the first PC (PC1) produced from this analysis always produced a hockey stick-shaped graph. They then say,
In the controversial 15th century period, the MBH98 method effectively selects only one species (bristlecone pine) into the critical North American PC1, making it implausible to describe it as the "dominant pattern of variance."
They argue that the PC1 in MBH98 could simply be due to the effect of the statistical analysis they chose and not due to the changes in temperature recorded in these tree ring proxies. And sure enough, M&M show that their results on random red noise, the PC1 has a hockey stick shape that resembles the pattern of MBH98.
MM05 then used a different convention normalizing 1400-1980 instead of 1902-1980. The different centering method changed which PC rose to the top, and the PC1s from the MBH method and the M&M method had a different shape.
Huyber Comment to MM05a
In a comment to the MM05a paper, Huyber observes that the M&M alternative method is also questionable. "It is in this same step that MM05 use a questionable normalization procedure, making it useful to describe the various normalization conventions in detail." Huyber notes that MBH used their convention from 1902-1980 because nearly all the proxy records span this interval, but agrees that the method leads to some bias in the results. But he observes that the MM05 method also causes bias in the opposite direction.
MM05 list fifteen records as dominating the MBH98 PC1 (see MM05, Table 1). The MBH98 normalization leads to these fifteen records having roughly twice the variance of the other records, whereas the MM05 normalization effectively down-weights these same records by a factor of two.
The MBH PC1 has a hockey stick index of 1.6, while the MM05 has an index of 0.3 and the full normalization convention has an index of 0.8.
In the image above, Huyber shows the PC1 results from MBH98 (top), MM05 (middle) and full normalization (bottom). Huybers concludes that "the MM05 results are biased in the opposite direction to those of the MBH98 results. The fully normalized PC1 and average closely resemble one another (r2 = 0.95), indicating that the fully normalized PC1 describes variability common to much of the NOAMER data-set." M&M's analysis exaggerated the bias in the MBH98 method, and while bias remains an important issue for MBH98, MM05 contained biases in the opposite direction.
We re-emphasize that our comparison between the MBH98 method and a covariance PC1 was not presented as an attempt to "remove the bias in MBH98’s method," and that we take no position on the relative merits of using a mean, a covariance PC1, or even using PC analysis at all, in paleoclimate work.
Von Storch and Zorita Comment to MM05a
MBH Response to MM04/05b Criticism of their PCA
As noted above, the order of PCs and the number of necessary PCs needed is significantly affected by statistical method. In the MM04/05b alternate reconstruction of NH temperatures, M&M used the top 2 PCs because that was the number of PCs used in the MBH98 reconstruction. The MBH response to MM04/05b pointed out that this was a major flaw in the M&M analysis. In the MBH98 analysis of NA tree ring data, the top 2 PCs explained about 50% of the cumulative variance.However, in the M&M methodology the top 2 PCs would only explain about 29% of the cumulative variance, and this was not enough. To be consistent with the selection criteria used in MBH98, they would have needed to use the top 5 PCs to explain 50% of the cumulative variance.It's important to note that PC4 for M&M is the same as PC1 in MBH98. It turns out that many of the problems found by M&M stem from the fact that they didn't follow standard selection rules used in MBH98.
In the above graph showing the eigenvalue spectrum for MBH (blue) and M&M (red), you can see that PC1 and 2 in blue are significantly above the Monte Carlo simulations, while PC1 through 5 in red are significantly above Monte Carlo simulations, indicating a greater number of PCs were required by MM05 to maintain the same selection criteria. To their credit M&M admit that using the proper selection criteria produces results with a hockey stick shape.
If a centered PC calculation on the North American network is carried out (as we advocate), then MM-type results occur if the first 2 NOAMER PCs are used in the AD1400 network (the number as used in MBH98), while MBH-type results occur if the NOAMER network is expanded to 5 PCs in the AD1400 segment (as proposed in Mann et al., 2004b, 2004d).
In the above graph, MBH98 (blue), a direct use of 95 proxy series back to 1404 (yellow), 94 proxies back to 1400 (green) and the instrumental record (red) are compared. The results are similar regardless of method and to removing certain proxies from the reconstruction, contrary to M&M's claims.
Wahl and Ammann Evaluation of M&M
In 2007, Eugene Wahl and Caspar Ammann (WA07) conducted an independent investigation into the reported findings of the papers and comments published by M&M. WA07 used provided indirect analysis, testing MBH98 data by systematically excluding proxies and processing steps that were challenged by M&M. It also provided a direct analyses PC methods used by MBH98 and MM05. They developed a new reconstruction of NH temperatures for 1400–1980. Their conclusion was that the MBH98 "reconstruction is robust against the proxy-based criticisms addressed," contrary to the findings of MM05. They found that a "hockey stick" similar to MBH98 was produced whether PCA was used or not used - the "hockey stick" was not an artifact of the particular form of PC analysis employed by MBH98. They also found that the proxy reconstruction was unaffected by the use the tree ring proxies deemed "controversial" by M&M. When using PCA, the hockey stick was also robust to the centering method used, whether that by MBH98 or MM05. What was significant is what proxies were included in the reconstruction. When "the full extent of the climate information actually in the proxy data is represented by the PC time series" a hockey stick emerges from the data.
WA07 also addressed M&M's proposed "corrections" to MBH98. MM04/05b had asserted that when MBH98 was corrected, NH temperatures in the 15th century may have been as high as the late 20th century. WA determined that these claims were "without statistical and climatological merit." The WA07 reconstruction essentially replicated MBH98, though with a small modification to the 1400-1450 portion of the reconstruction, where the WA07 reconstruction produced temperatures about +0.05–0.10◦C warmer than MBH98. However, that correction "leaves entirely unaltered the primary conclusion of Mann et al. (as well as many other reconstructions) that both the 20th century upward trend and high late-20th century hemispheric surface temperatures are anomalous over the last 600 y ears."
Conclusion
I consider the WA07 paper to be a somewhat definitive, independent response to the various criticisms of MBH by M&M. The hockey stick shape is in the proxy data, and so it can be discovered in a wide variety of statistical methods, provided you use sufficient data. Using M&M method with the 5 necessary PCs produces a hockey stick. Using MBH98 using 2 PCs produces virtually the same hockey stick. In my last post, we saw that Rutherford et al. discovered the same hockey stick using a RegEM method with the MBH98 proxies. Moberg 2005 discovered a similar hockey stick (with a cooler LIA) using another methodology. The WA07 paper also replicated MBH98 and the 15th century was only slightly warmer in WA07 than in MBH98. And this is not all. WA07 also note that MBH98 has been replicated by studies other than the ones we've already considered:
MBH result of anomalous warmth in the later 20th century remains consistent with other paleoclimate reconstructions developed for the last 1–2 millennia (Mann et al., 2007; Osborn and Briffa, 2006; Moberg et al., 2005; Oerlemans, 2005; Cook et al., 2004; Huang, 2004; Jones and Mann, 2004; Mann and Jones, 2003; Esper et al., 2002; Briffa et al., 2001; Huang et al., 2000; Crowley and Lowery, 2000; Jones et al., 1998; Bradley and Jones, 1993), especially in light of the recent reconciliation of the Esper et al. (2002; cf. Cook et al., 2004) and MBH reconstructions reported by Rutherford et al. (2005).
What is clear is that, provided you use sufficient proxy data, the hockey stick is robust with respect to statistical method. It's discovered because the hockey stick is in the proxy evidence, and it's not an artifact of a statistical method.
A lot of the rhetoric about M&M's criticisms has to do proposed shortcomings of MBH's statistical method. However, the biases in their method are small with respect to the magnitude of the hockey stick, and the hockey stick shows up regardless of methodology. In reality, statistical method was never a meaningful point of contention. The real issue had whether certain NA tree rings should be included in the proxy, particularly certain Bristlecone Pines. These concerns were adequately addressed in MBH99, but regardless, in this area, M&M are in effect no longer auditing the MBH statistical method. They're making judgments about the scientific evidence about tree ring proxies, and they are not dendrochronologists.
Dendrochronologists agree that these tree rings are useful proxies and that the divergence problem does not undermine the validity of proxy reconstructions. In a future post I plan to cover this in more detail. But suffice it to say here that while M&M may be competent statisticians, they have no expertise in dendrochronology, and rejecting tree ring proxies arbitrarily when doing so ruins your validation statistics is not a compelling argument against the statistical methods used by MBH98.
References:
McIntyre, S., and McKitrick, R. (2005a), Hockey sticks, principal components, and spurious significance, Geophys. Res. Lett., 32, L03710, doi:10.1029/2004GL021750.
https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2004GL021750
Huybers, P. (2005), Comment on “Hockey sticks, principal components, and spurious significance” by S. McIntyre and R. McKitrick, Geophys. Res. Lett., 32, L20705, doi:10.1029/2005GL023395.
https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2005GL023395
von Storch, H., and Zorita, E. (2005), Comment on “Hockey sticks, principal components, and spurious significance” by S. McIntyre and R. McKitrick, Geophys. Res. Lett., 32, L20701, doi:10.1029/2005GL022753.
https://agupubs.onlinelibrary.wiley.com/action/showCitFormats?doi=10.1029%2F2005GL022753
https://agupubs.onlinelibrary.wiley.com/action/showCitFormats?doi=10.1029%2F2005GL023089
McIntyre, S., & McKitrick, R. (2005b). The M&M Critique of the MBH98 Northern Hemisphere Climate Index: Update and Implications. Energy & Environment, 16(1), 69–100. https://doi.org/10.1260/0958305053516226
Thanks for this useful summary of the various statistical issues raised by the M&M criticisms of Mann et al’s “hockey stick”.
ReplyDeleteM&M did respond to Huybers' comment, arguing that Huybers' proposed “full normalization” method was unjustified, mainly because (they claimed) it exaggerated the effect of the bristlecone proxies, which they regarded as problematic. But they don’t explicitly refute Huybers’ conclusion that their own method erroneously suppresses the hockey stick. In fact, after all the arguments about the effects of various normalizations, centering and measures of significance, M&M state: “...we take no position on the relative merits of using a mean, a covariance PC1, or even using PC analysis at all…”. Thus they don’t really even defend their own method.
In any case, neither MM05 or Huybers address the issue of the appropriate number of PCs, or the specific quantitative effects of including or excluding various proxy series. So I think you are correct to regard WA07 as definitive.
Note however that there is a scathing response by M&M to what must have been an earlier version of WA07 (submitted to GRL) posted on McIntyre’s Climate Audit website (unavailable now, but available at https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=caca9d551363670ae34daa1d39b85aa6128da0bd). I found no published response to WA07.
Good point. I had seen that M&M response to what may have been an AW comment to MM05, but I wasn't able to find the AW comment, so I wasn't sure how to evaluate it. I decided just to go with the published WA07.
Delete