Has the "Hockey Stick" Been Disproven? Part 2 - Further M&M Challenges

 

This is part 2 of a series on the Hockey Stick. Here's part 1.

When Mann and his colleagues were constructing their proxy reconstruction for NH temperatures for their MBH98 paper, there were two significant challenges that had to be addressed. The first was calibrating proxy model to the instrumental record. They can't expect that the proxy data would match the instrumental record with absolute precision, so if they over-calibrated the model, the proxy would fit the instrumental record but would not reconstruct earlier temperatures. Random noise in the proxy and instrumental data will make the proxy data less reliable in reconstructing the past. So the challenge was to fit the overall climate trends in the proxy data to the instrumental record while allowing for variability between the two datasets. The second challenge was to address the clustering of climate proxies so that one set of proxies doesn't overweight regions with a denser concentration of proxy evidence.

To address these challenges, MBH98 was built on what is called "principle component analysis" (or PCA).  The paper needed to analyze a great deal of data from temperature proxies that were needed to discover global temperature variability. PCA is a way to discovering the most common patterns in the data. These patterns are "principle components" (PCs) which show a percentage of the variation in the original data. Many of these PCs will not have any statistical significance. Those that do can be used instead of the full set of data. If done properly, the full set of data and the statistically significant PCs will reveal the same things. 

The PCs produced through PCA may represent the overall climate trends or other patterns like seasonal cycles. But the PCs produced through PCA can be ranked in terms of the variance they explain. Since PCs are just statistical constructs, they do not necessarily reveal anything that is physically meaningful on their own. And the methodology used in producing these PCs can have a significant impact on the results. In changing the methodology, the ranking of the PCs produced may change and the number of needed PCs may change as well. It's therefore important to know if the PCs are a product of what is physically happening in nature or if they are a product of the methodology. The more PCs you use, the better chances your PCs represent the whole. If the results from the number of PCs you use agree with using all data without PCA, then you likely have used enough PCs.

In the above graph, I plotted MBH98 and HadCRUT5's NH temperatures with the 2-sigma CIs from MBH98, with a 10-year running mean. The plot used a 1850-1900 baseline, so predictably the two agree over that time frame. But you can also see the continued agreement across calibration period from 1902-1980. The two are not identical, but it would be very hard to argue that MBH98 fails to represent climate trends across the calibration period.

For North American tree ring proxies, MBH98 used about 70 chronologies. Since the goal was to reconstruct annual variability in NH temperatures, MBH98 was not as interested in seasonal variability in tree ring proxies or regional variability in nearby forests. The interest was in the shared information between them all so that large-scale variability can be seen. In MBH98, the NA tree ring series were analyzed in this way. The authors used a selective number of PCs to train the model in the calibration period to avoid overfitting the proxy data to the instrumental record. It's the contention of MM05a that the "hockey stick" was a product of the statistical methodology used with NA tree rings and not necessarily the result of anything physically meaningful in terms of NH temperature variability.

Let me state at the outset that I think it's perfectly right and fair for MM05a to investigate whether the hockey stick is a product of the chosen statistical method rather than what was actually happening to NH temperatures. The question here is not whether the analysis contained in MM05a was worth investigating. The question is whether they made their case. Did MM05a show that the Hockey Stick was merely a product of MBH98's statistical method?

The Argument of MM05a

MM05a was largely concerned with the impact of the statistical analysis of 70 North American Tree ring proxies. They argue that MBH used an "unusual data transformation which strongly affects the resulting PCs." Essentially, MBH98 normalized their data for their calibration period, which was 1902-1980, so that the mean of the data over this calibration period was 0. They tested their method on red noise, and they found that the first PC (PC1) produced from this analysis always produced a hockey stick-shaped graph. They then say,

In the controversial 15th century period, the MBH98 method effectively selects only one species (bristlecone pine) into the critical North American PC1, making it implausible to describe it as the "dominant pattern of variance."

They argue that the PC1 in MBH98 could simply be due to the effect of the statistical analysis they chose and not due to the changes in temperature recorded in these tree ring proxies. And sure enough, M&M show that their results on random red noise, the PC1 has a hockey stick shape that resembles the pattern of MBH98.


In the above image, the top graph is a sample PC1 resulting from a Monte Carlo simulation of red noise using the MBH98 methodology. The bottom graph is the NH temperature reconstruction in MBH98.  So it would seem that M&M found something worth considering, but look at the scale. Notice that the scale of the top M&M simulation covers only 0.1 C while MBH98 shows about 0.6 C. In other words, the MBH98 hockey stick is about 6x larger than the M&M simulation. So even if M&M discovered a real bias due to MBH98's statistical method, it would not erase the hockey stick. It would only change the shape of it slightly. A comment to this paper by Huybers makes a similar point, so I'll come back to this in the next section.

MM05 then used a different convention normalizing 1400-1980 instead of 1902-1980. The different centering method changed which PC rose to the top, and the PC1s from the MBH method and the M&M method had a different shape.


In the above image, the PC1 in the NA Tree Ring Network is shown from MBH98 (top) normalized to 1902-1980 and MM05 (bottom) normalized to 1400-1980. Because the PC1 from from the MM05 method has no hockey stick shape, and since red noise using MBH98's method produces PC1s with hockey stick shapes, MM05 conclude that the hockey stick is an artifact of MBH's method and not the temperature changes indicated by NA tree rings. MM05 also evaluated the reduction of error (RE) statistic for MBH's 1400 step and found it lacked statistical significance.

Huyber Comment to MM05a

In a comment to the MM05a paper, Huyber observes that the M&M alternative method is also questionable. "It is in this same step that MM05 use a questionable normalization procedure, making it useful to describe the various normalization conventions in detail." Huyber notes that MBH used their convention from 1902-1980 because nearly all the proxy records span this interval, but agrees that the method leads to some bias in the results. But he observes that the MM05 method also causes bias in the opposite direction.

MM05 list fifteen records as dominating the MBH98 PC1 (see MM05, Table 1). The MBH98 normalization leads to these fifteen records having roughly twice the variance of the other records, whereas the MM05 normalization effectively down-weights these same records by a factor of two.

The MBH PC1 has a hockey stick index of 1.6, while the MM05 has an index of 0.3 and the full normalization convention has an index of 0.8.


In the image above, Huyber shows the PC1 results from MBH98 (top), MM05 (middle) and full normalization (bottom). Huybers concludes that "the MM05 results are biased in the opposite direction to those of the MBH98 results. The fully normalized PC1 and average closely resemble one another (r2 = 0.95), indicating that the fully normalized PC1 describes variability common to much of the NOAMER data-set." M&M's analysis exaggerated the bias in the MBH98 method, and while bias remains an important issue for MBH98, MM05 contained biases in the opposite direction.

The M&M response too issue with Huyber's full normalization method, but they did not specifically defend their own. 
We re-emphasize that our comparison between the MBH98 method and a covariance PC1 was not presented as an attempt to "remove the bias in MBH98’s method," and that we take no position on the relative merits of using a mean, a covariance PC1, or even using PC analysis at all, in paleoclimate work.
But was we will see, the "hockey stick" will emerge in using the MM05 method as well, when properly done, as well as with a wide variety of other statistical methods.

Von Storch and Zorita Comment to MM05a

In a second comment to MM05a, von Storch and Zorita (VZ05) also acknowledged the "artificial hockey stick" (AHS) effect, but showed that the effect was small. They noted that their 2004 study had used climate simulations based on two coupled models (ECHO-G and HadCM3) that used the complete proxy network from MBH with no PCA. The redid this test and used the PCs instead of the full proxy network. They used "pseudoproxies" following the 1902-1980 centering method (MBH) and a 1000-1980 centering method. The number of PCs retained followed the selection rules from MBH98 decided from the eigenvalues for each PC. They concluded that the MBH method does produce differences "but does not have a significant impact but leads only to very minor deviations."

In response to VZ05, M&M stated, "We did not claim that the AHS effect applied to all situations. We did claim that it affected MBH98... where the AHS effect interacted with flawed bristlecone proxies." M&M's response assert that the MBH method is "extraordinarily sensitive" to the proxies included in the reconstruction and maintained that while VZ05 successfully provided an example of AHS not affecting all situations, they claimed that this was irrelevant to evaluating MBH98. However, even in MM05, the amount of bias introduced by the MBH98 centering method was small with respect to the full hockey stick. As best I can tell, those seeking to replicate the problems found by M&M can find the problem but find the problem to be much smaller than M&M claim and having a minimal effect on the full reconstruction of NH temperatures.

To close off this section of this multipart series on the hockey stick, I want to return to a couple analyses that respond to issues common to MM04 and both MM05a papers, first from MBH and second from Wahl and Ammann.

MBH Response to MM04/05b Criticism of their PCA

As noted above, the order of PCs and the number of necessary PCs needed is significantly affected by statistical method. In the MM04/05b alternate reconstruction of NH temperatures, M&M used the top 2 PCs because that was the number of PCs used in the MBH98 reconstruction. The MBH response to  MM04/05b pointed out that this was a major flaw in the M&M analysis. In the MBH98 analysis of NA tree ring data, the top 2 PCs explained about 50% of the cumulative variance.


However, in the M&M methodology the top 2 PCs would only explain about 29% of the cumulative variance, and this was not enough. To be consistent with the selection criteria used in MBH98, they would have needed to use the top 5 PCs to explain 50% of the cumulative variance.

It's important to note that PC4 for M&M is the same as PC1 in MBH98. It turns out that many of the problems found by M&M stem from the fact that they didn't follow standard selection rules used in MBH98.


In the above graph showing the eigenvalue spectrum for MBH (blue) and M&M (red), you can see that PC1 and 2 in blue are significantly above the Monte Carlo simulations, while PC1 through 5  in red are significantly above Monte Carlo simulations, indicating a greater number of PCs were required by MM05 to maintain the same selection criteria. To their credit M&M admit that using the proper selection criteria produces results with a hockey stick shape.
If a centered PC calculation on the North American network is carried out (as we advocate), then MM-type results occur if the first 2 NOAMER PCs are used in the AD1400 network (the number as used in MBH98), while MBH-type results occur if the NOAMER network is expanded to 5 PCs in the AD1400 segment (as proposed in Mann et al., 2004b, 2004d).
And MBH note that if you plot the MBH98 PC1 and the MM05b PC4, they're nearly identical.

In the above graph, the red is MBH98 PC1 and the blue is M&M PC4. This is why MBH criticism of M&M frequently referred to "censored" data. M&M effectively censored three necessary PCs and it's precisely the reduction of necessary PCs (eliminating PC4) that leads to the differences in the MM04/05b method. 


In the above graph, MBH98 (blue), a direct use of 95 proxy series back to 1404 (yellow), 94 proxies back to 1400 (green) and the instrumental record (red) are compared. The results are similar regardless of method and to removing certain proxies from the reconstruction, contrary to M&M's claims.

Wahl and Ammann Evaluation of M&M

In 2007, Eugene Wahl and Caspar Ammann (WA07) conducted an independent investigation into the reported findings of the papers and comments published by M&M. WA07 used provided indirect analysis, testing MBH98 data by systematically excluding proxies and processing steps that were challenged by M&M. It also provided a direct analyses PC methods used by MBH98 and MM05. They developed a new reconstruction of NH temperatures for 1400–1980. Their conclusion was that the MBH98 "reconstruction is robust against the proxy-based criticisms addressed," contrary to the findings of MM05. They found that a "hockey stick" similar to MBH98 was produced whether PCA was used or not used - the "hockey stick" was not an artifact of the particular form of PC analysis employed by MBH98. They also found that the proxy reconstruction was unaffected by the use the tree ring proxies deemed "controversial" by M&M. When using PCA, the hockey stick was also robust to the centering method used, whether that by MBH98 or MM05. What was significant is what proxies were included in the reconstruction. When "the full extent of the climate information actually in the proxy data is represented by the PC time series" a hockey stick emerges from the data. 


WA07 also addressed M&M's proposed "corrections" to MBH98. MM04/05b had asserted that when MBH98 was corrected, NH temperatures in the 15th century may have been as high as the late 20th century. WA determined that these claims were "without statistical and climatological merit." The WA07 reconstruction essentially replicated MBH98, though with a small modification to the 1400-1450 portion of the reconstruction, where the WA07 reconstruction produced temperatures about +0.05–0.10◦C warmer than MBH98. However, that correction "leaves entirely unaltered the primary conclusion of Mann et al. (as well as many other reconstructions) that both the 20th century upward trend and high late-20th century hemispheric surface temperatures are anomalous over the last 600 y ears."

Conclusion

I consider the WA07 paper to be a somewhat definitive, independent response to the various criticisms of MBH by M&M. The hockey stick shape is in the proxy data, and so it can be discovered in a wide variety of statistical methods, provided you use sufficient data. Using M&M method with the 5 necessary PCs produces a hockey stick. Using MBH98 using 2 PCs produces virtually the same hockey stick. In my last post, we saw that Rutherford et al. discovered the same hockey stick using a RegEM method with the MBH98 proxies. Moberg 2005 discovered a similar hockey stick (with a cooler LIA) using another methodology. The WA07 paper also replicated MBH98 and the 15th century was only slightly warmer in WA07 than in MBH98. And this is not all. WA07 also note that MBH98 has been replicated by studies other than the ones we've already considered:

MBH result of anomalous warmth in the later 20th century remains consistent with other paleoclimate reconstructions developed for the last 1–2 millennia (Mann et al., 2007; Osborn and Briffa, 2006; Moberg et al., 2005; Oerlemans, 2005; Cook et al., 2004; Huang, 2004; Jones and Mann, 2004; Mann and Jones, 2003; Esper et al., 2002; Briffa et al., 2001; Huang et al., 2000; Crowley and Lowery, 2000; Jones et al., 1998; Bradley and Jones, 1993), especially in light of the recent reconciliation of the Esper et al. (2002; cf. Cook et al., 2004) and MBH reconstructions reported by Rutherford et al. (2005).

What is clear is that, provided you use sufficient proxy data, the hockey stick is robust with respect to statistical method. It's discovered because the hockey stick is in the proxy evidence, and it's not an artifact of a statistical method.

A lot of the rhetoric about M&M's criticisms has to do proposed shortcomings of MBH's statistical method. However, the biases in their method are small with respect to the magnitude of the hockey stick, and the hockey stick shows up regardless of methodology. In reality, statistical method was never a meaningful point of contention. The real issue had whether certain NA tree rings should be included in the proxy, particularly certain Bristlecone Pines. These concerns were adequately addressed in MBH99, but regardless, in this area, M&M are in effect no longer auditing the MBH statistical method. They're making judgments about the scientific evidence about tree ring proxies, and they are not dendrochronologists.

Dendrochronologists agree that these tree rings are useful proxies and that the divergence problem does not undermine the validity of proxy reconstructions. In a future post I plan to cover this in more detail. But suffice it to say here that while M&M may be competent statisticians, they have no expertise in dendrochronology, and rejecting tree ring proxies arbitrarily when doing so ruins your validation statistics is not a compelling argument against the statistical methods used by MBH98.



References:

McIntyre, S., and McKitrick, R. (2005a), Hockey sticks, principal components, and spurious significance, Geophys. Res. Lett., 32, L03710, doi:10.1029/2004GL021750.
https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2004GL021750

Huybers, P. (2005), Comment on “Hockey sticks, principal components, and spurious significance” by S. McIntyre and R. McKitrick, Geophys. Res. Lett., 32, L20705, doi:10.1029/2005GL023395.
https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2005GL023395

McIntyre, S., and McKitrick, R. (2005), Reply to comment by Huybers on “Hockey sticks, principal components, and spurious significance”, Geophys. Res. Lett., 32, L20713, doi:10.1029/2005GL023586.
https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2005GL023586

von Storch, H., and Zorita, E. (2005), Comment on “Hockey sticks, principal components, and spurious significance” by S. McIntyre and R. McKitrick, Geophys. Res. Lett., 32, L20701, doi:10.1029/2005GL022753.
https://agupubs.onlinelibrary.wiley.com/action/showCitFormats?doi=10.1029%2F2005GL022753

McIntyre, S., and McKitrick, R. (2005), Reply to comment by von Storch and Zorita on “Hockey sticks, principal components, and spurious significance”, Geophys. Res. Lett., 32, L20714, doi:10.1029/2005GL023089.
https://agupubs.onlinelibrary.wiley.com/action/showCitFormats?doi=10.1029%2F2005GL023089

McIntyre, S., & McKitrick, R. (2005b). The M&M Critique of the MBH98 Northern Hemisphere Climate Index: Update and Implications. Energy & Environment, 16(1), 69–100. https://doi.org/10.1260/0958305053516226

Wahl, E.R., Ammann, C.M. Robustness of the Mann, Bradley, Hughes reconstruction of Northern Hemisphere surface temperatures: Examination of criticisms based on the nature and processing of proxy climate evidence. Climatic Change 85, 33–69 (2007). https://doi.org/10.1007/s10584-006-9105-7

Comments

  1. Thanks for this useful summary of the various statistical issues raised by the M&M criticisms of Mann et al’s “hockey stick”.

    M&M did respond to Huybers' comment, arguing that Huybers' proposed “full normalization” method was unjustified, mainly because (they claimed) it exaggerated the effect of the bristlecone proxies, which they regarded as problematic. But they don’t explicitly refute Huybers’ conclusion that their own method erroneously suppresses the hockey stick. In fact, after all the arguments about the effects of various normalizations, centering and measures of significance, M&M state: “...we take no position on the relative merits of using a mean, a covariance PC1, or even using PC analysis at all…”. Thus they don’t really even defend their own method.

    In any case, neither MM05 or Huybers address the issue of the appropriate number of PCs, or the specific quantitative effects of including or excluding various proxy series. So I think you are correct to regard WA07 as definitive.

    Note however that there is a scathing response by M&M to what must have been an earlier version of WA07 (submitted to GRL) posted on McIntyre’s Climate Audit website (unavailable now, but available at https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=caca9d551363670ae34daa1d39b85aa6128da0bd). I found no published response to WA07.

    ReplyDelete
    Replies
    1. Good point. I had seen that M&M response to what may have been an AW comment to MM05, but I wasn't able to find the AW comment, so I wasn't sure how to evaluate it. I decided just to go with the published WA07.

      Delete

Post a Comment

Popular posts from this blog

The Marketing of Alt-Data at Temperature.Global

Are Scientists and Journalists Conspiring to Retract Papers?

Tropical Cyclone Trends