Nicola Scafetta on the Performance of CMIP6 Models

In 2020, Zeke Hausfather (and others) published a study in Geophsysical Research Letters[1] examining the performance of climate models in relation to observational data. Their conclusion was that "We find that climate models published over the past five decades were skillful in predicting subsequent GMST changes, with most models examined showing warming consistent with observations, particularly when mismatches between model-projected and observationally estimated forcings were taken into account." Given the selection criteria for selecting models described in Hausfather's paper, most of the models produced results that were statistically indistinguishable from observations.

Reliability of Models in Hausfather et al 2020.

However, just last month, Nicola Scafetta examined the performance of the individual CMIP6 models which contributed to projections in the AR6 report.[2] This paper was published in the same journal as Hausfather's paper a couple years ago. The CMIP6 models have been heavily scrutinized because a significant number of them calculated high sensitivities, largely due to attempts to simulate cloud feedbacks. So Scafetta looked to see if these models successfully predict the temperature change that occurred through the time frame of the satellite record. To do this, Scafetta calculated the difference between the 2011-2021 mean and the 1980-1990 mean in ERA5 to be 0.56 C. Then he plotted how well these models "predicted" that rise in temperature.

He analyzed these models in three groups based on their calculated values for ECS:
Low, 1.80–3.00°C;
Medium, 3.01–4.50°C;
High, 4.51–6.00°C
Then he showed that of these three categories, only the Low sensitivity group accurately predicted the warming from the 1980-1990 mean to the 2011-2021 mean. He concluded that "all models with ECS > 3.0°C overestimate the observed global surface warming" and as such "the high and medium-ECS GCMs are unfit for prediction purposes."

Figure 1 from Scafetta 2022

This paper was severely criticized by Gavin Schmidt at RealClimate,[3] and his criticisms make a lot of sense to me. First, he shows that, because he neglected to include the uncertainty estimates for ERA5, the claim that "all models with ECS > 3.0°C overestimate the observed global surface warming" is wrong. Three of the models with ECS > 3.0°C produced results within the CI of ERA5. 
Gavin Schmidt's Plot of Ensemble Members and Means with ERA5 (95% CI)

But secondly, if you look at all the ensemble members, and not just the ensemble mean, things look a bit different. Schmidt observes, "49 ensemble members from 18 models are compatible with the ERA5 result. Of those 18 models, half of them have ECS above 3ºC." So contrary to Scafetta's conclusion, 9 of the 18 models that are compatible with ERA5 have an ECS > 3ºC. This seems to be a pretty significant error in Scafetta's conclusions.

The third problem that Schmidt identified has to do with a statistical problem associated with comparing models to observations. The ensemble mean is a "forced pattern," dealing with the impact of climate forcings on temperature without attempting to account for internal variability. But observational data is a combination of this forced signal and internal variability. Scafetta tested the difference between the models and observations against the uncertainty of the forced pattern (without internal variability). Schmidt notices, "This has the bizarre property that you would be almost guaranteed to eventually reject all of the specific model realizations as the number of ensemble members increases." All three critiques make sense to me, and the first two in particular seem extremely obvious.

But I don't want to just rehash Schmidt's critique. I do think there are some interesting observations that we can make from his graph that he didn't explicitly state, though.
  1. Of the 49 ensemble members compatible with ERA5, most of them were from 9 models calculating ECS > 3ºC.
  2. However, there are fewer ensemble members overall from models calculating ECS < 3ºC. I wonder if there were more available members among the higher sensitivity models because of the greater interest in studying them, but model runs from lower sensitivity models are under represented here.
  3. The only ensemble members underpredicting warming came from models with ECS < 3ºC, while these models also sometimes over predicted warming.
  4. It looks like all but one ensemble member from models where ECS = 3ºC (2.98ºC < ECS < 3.04ºC ) were consistent with ERA5.
As I looked at the paper, one of the things that struck me is, why did he group the models as he did? I think he did it this way because it produced a somewhat even distribution of models in each group, and the low sensitivity group agreed with ERA5. But what I'm interested in here is how valid the typical estimate of ECS =  3ºC actually is. So I thought, why not regroup Scafetta's work and select the models that show  2.5ºC < ECS <  3.5ºC, then see how they compare to ERA5 and the instrumental record? So that's what I did. Now, I don't have access to the values for the ensemble members that Schmidt plotted above, so below I'm only using data contained presented in Scafetta's paper. My graph below will thus be subject to the second problem Schmidt found with the paper.
I probably could have organized this graph a little better, but with a little explanation, I think it's clear. Against the Y-Axis I plotted the mean warming from several datasets (ERA5, JMA, BEST, NOAA, NASA, and HadCRUT5). The shaded region shows the uncertainty from ERA5 that I gathered from Gavin Schmidt's post. The ensemble means for three scenarios (SSP2-4.5, SSP3-7.0, and SSP5-8.5) are shown in non-circle shapes according to their calculated ECS on the x-axis. The mean for each of these is shown against the y-axis. What seems clear to me is that the majority of these results appear to be consistent with ERA5, as are all the means of each scenario - 11 showed too much warming, 2 showed too little, and 21 were consistent with ERA5.

CMIP6 Models with Calculated Sensitivities
2.5ºC < ECS <  3.5ºC

ModelECS (°C)SSP3-7.0SSP2-4.5SSP5-8.5
AWI-CM-1-1-MR3.160.760.790.78
MRI-ESM2-03.150.630.710.8
BCC-CSM2-MR3.040.650.640.66
FGOALS-f3-L30.70.680.69
MPI-ESM1-2-LR30.570.570.55
MPI-ESM1-2-HR2.980.590.570.57
FGOALS-g32.880.590.610.6
GISS-E2-1-G p12.720.7
GISS-E2-1-G p32.720.40.580.45
MIROC-ES2L f22.680.560.590.56
MIROC62.610.470.480.51
NorESM2-LM2.540.760.620.71
Model Mean
0.6150.6220.625

But the surprising thing I noted is that sometimes the results are flipped from what would be expected. Sometimes the model produced most warming with SSP2-4.5. At other times, SSP3-7.0 produced the most warming, and at others, SSP5-85 produced the most warming. And the average of the models under each scenario are very similar with SSP3-7.0 averaging the least warming and SSP5-8.5 averaging the most warming. This suggests to me that there isn't enough time between the beginning and end for the models to separate themselves - 1980-1990 is just too close to 2011-2021. By calculating the difference between two decadal means 20 years apart, the different scenarios don't have any impact on the increase in temperature. I take this to mean that Scafetta would have better results if he had chosen to see how well these models reproduced a longer time frame, like the 20th century or warming since 1850.

A paper published in 2021 did precisely that. Nathan Gillet et. al.[4] This of course is a different mix of CMIP6 models compared to HadCRUT4 instead of ERA5. But Gillet's paper shows the CMIP6 models reproducing warming between 1850 and 2019 pretty accurately:
Much attention has recently focused on the high climate sensitivity of some CMIP6 models, and while we find that some of the models considered here do overestimate the response to greenhouse gases, on average the greenhouse gas response of these models matches the observations closely (the best estimate of the multimodel greenhouse gas regression coefficient in Fig. 2b is close to one).
I don't think there's any question at this point that the "high sensitivity" CMIP6 models produce too much warming, but this is because the calculate sensitivities signicantly warmer than 3ºC. But empirical data still yields results close to 3ºC (slightly higher) using energy balance equations. Schmidt and others have already put together a complaint about this paper, and he seems to strongly suspect that it will be retracted. But whether it is or isn't, I don't think this paper presents a compelling argument that ECS is < 3ºC.

References: 

[1] Hausfather, Z., Drake, H. F., Abbott, T., & Schmidt, G. A. (2020). Evaluating the performance of past climate model projections. Geophysical Research Letters, 47, e2019GL085378. https://doi.org/10.1029/2019GL085378

[2] Scafetta, N. (2022). Advanced testing of low, medium, and high ECS CMIP6 GCM simulations versus ERA5-T2m. Geophysical Research Letters, 49, e2022GL097716. https://doi.org/10.1029/2022GL097716

[3] Gavin Schmidt. "Issues and Errors in a New Scafetta Paper." RealClimate

[4] Gillett, N.P., Kirchmeier-Young, M., Ribes, A. et al. Constraining human contributions to observed warming since the pre-industrial period. Nat. Clim. Chang. 11, 207–212 (2021). https://doi.org/10.1038/s41558-020-00965-9

Comments

Popular posts from this blog

The Marketing of Alt-Data at Temperature.Global

Roy Spencer on Models and Observations

Patrick Frank Publishes on Errors Again