Patrick Frank Publishes on Errors Again

Recently I came across yet another paper by Patrick Frank[1] attempting to claim that climate scientists have been underestimating uncertainties in climate-related data. In this paper, he takes aim at GMST data, and he argues that 

LiG resolution limits, non-linearity, and sensor field calibrations yield GSATA mean ±2σ RMS uncertainties of, 1900–1945, ±1.7 °C; 1946–1980, ±2.1 °C; 1981–2004, ±2.0 °C; and 2005–2010, ±1.6 °C. Finally, the 20th century (1900–1999) GSATA, 0.74 ± 1.94 °C, does not convey any information about rate or magnitude of temperature change.

The resulting GMST graph from his calculations is below.

Essentially, he's saying that errors associated with liquid in glass thermometers are so large that we can have no confidence in the global warming trend in the major GMST datasets. Of course, the organizations producing these GMST datasets all evaluate the uncertainties associated with their anomaly values, and their estimates are invariably much smaller - about ±0.05°C in recent decades gradually increasing to ±0.15°C in the late 19th century.[2] These values are also both assessed and published in the peer-reviewed literature. Are all these estimates wrong? Has Patrick Frank discovered something that none of those working with the data have found before? Let's see.

Defining Terms

First let's cover a brief overview of the statistics involved, in very general terms. Frank is attempting to calculate the root-mean-square (RMS) uncertainty at 95% confidence for GMST anomalies. In another post, I talk more about that this means, but essentially what we're after with this calculation is that, given the calculated mean value, we can be 95% confident that the true value falls within the range calculated. So if Frank is right that the during the 20th century the globe warmed 0.74 ± 1.94°C, then we can have very little confidence that the globe has warmed at all, since the range of ± 1.94°C is larger than the calculated mean of 0.74°C. The true value could actually be a negative value to 95% confidence.

Statisticians assess this through calculating a standard deviation (σ). Assuming a normal distribution, 1σ from the mean in both directions will contain 68.2% of the values in the distribution, while a 2σ range includes 95.4% of values the distribution. To calculate the 95% uncertainty, therefore, you can take your 1σ value and multiply it by 1.96; the 95% uncertainty range is slightly less than 2σ. Often this is described generally as the 68-95-99.7 rule, the approximate values for 1σ, 2σ, and 3σ. In many sciences, the 95% uncertainty or ~2σ confidence level is what is required to assess statistical significance.

Some of the issues in this paper come from Fank's attempts to calculate the uncertainty of mean values, given the σ values for the parts of that mean. For instance, to calculate average temperature for a day, scientists simply average the maximum and minimum temperatures for the day.

Tavg = (Tmax + Tmin)/2 or
Tavg = (1/2)*Tmax + (1/2)*Tmin

So the question is, how do you calculate the uncertainty for Tavg given values for σ for Tmax and Tmin? To assess the uncertainty for this average, we can use the following formula

σf^2 = a^2*σ^2a + b^2σ^2b, where

σf = is the 1σ uncertainty of the average
a = fraction of average for thing a
σa = standard deviation for thing a
b = fraction of average for thing b
σb = standard deviation for thing b

Since σf is squared (technically this is variance), we would take the square root of our results of the calculations on the right hand side of the equation to get the 1σ uncertainty. We can then multiply that value by 1.96 to get the 95% uncertainty. If there are more than 2 terms in the average, we can address this as follows:

σf^2 = a^2*σ^2a + b^2σ^2b + ... + n^2σ^2n 

Now what happens if, say you want to calculate the uncertainty of an average, and the uncertainties for each part of the average is the same (σa = σb = ... = σn) and each term contains an equal fraction (a = b = ... = n)? In this case you can simplify the formula substantially to

σf^2 = σ^2/n

Where n is the number of terms. This works because you can distribute out all the σ^2 values, and you're left with all the a^2, b^2, etc values that add up to n/n^2 = 1/n. What we really want to calculate though, is not simply the ~2σ uncertainty. The calculation should be the standard error of the mean (SEM), which is σ/sqrt(N), where N is the number of samples. As we'll see this paper doesn't get that far and ignores 1/sqrt(N) altogether.

The "Misprint" in Frank's Paper

When I'm looking at Frank's paper, equation (6) struck me as odd. Look carefully at how its written.

Frank is calculating the ~2σf (technically 1.96*σf) uncertainty for the land-surface air temperature mean by propagating uncertainty from monthly uncertainties, but look carefully at what's in the radical. What should be clear is that it could be easily rewritten as (12/12)*(0.198)^2. The 12/12 cancels out, and you're left with =sqrt(0.198^2), which also reduces simply to 0.198. In other words, all Frank is doing is taking the monthly σ=0.198 and calculating the 95% uncertainty of the monthly σ value. And the answer isn't correct, because 1.96*0.198 = 0.388°C. Something is wrong here, and the reason is he got the formula wrong. This is what he should have calculated:

~2σf  = 1.96*sqrt((0.198^2)/12) = ±0.112°C

It turns out that others have noticed this too on Skeptical Science, and there's much more to be said about this. From the Skeptical Science summary, I think we can detect where Frank's error began. It begins in equation (4), where Frank incorrectly wrote the formula (making the same mistake he made in equation 6) but he got the right answer.

The solution to this equation is ±0.541°C, so why did he say it's ±0.382°C? Notice he's calculating the ~2σ uncertainty for Tavg based on the uncertainties for the Tmax and Tmin values, but he didn't square the a and b terms, both of which are 1/2 (since day and night average half the day). The variance equation should be σ^2  = (1/2)^2*(0.366^2) + (1/2)^2*(0.135^2) , which simplifies to (0.366^2 + 0.135^2)/4, and if you take the SQRT of that and multiply by 1.96, you get ±0.382°C. Frank published a comment to his paper pointing out that the 2 should have been outside the radical, and calls it a misprint. Sure enough, if he took the 2 outside the radical, he'd get the right answer, but technically, given the logic of what he's calculating, it would be better to write it as a 4 inside the radical.

And it's this misprint (error?) that explains Frank's mistake in equation 6. The 12 in the denominator should have been 12^2. That is, inside the radical the equation should be (12/12^2)*(0.198^2), which simplifies to (0.198^2)/12. In equation 4, he printed the misprint but calculated the correct result because he didn't calculate the result from the equation he wrote. In equation 6, he calculated the incorrect result from the "misprint" in the equation. And he did the same thing in equation 5.

This equation reduces simply to 1.96*0.195. All he did is calculate the ~2σ uncertainty for the daily 1σ value. The correct value would be 

~2σf = 1.96*sqrt((0.195^2)/30.417) = ±0.069°C

And again, he made the same mistake when calculating the uncertainty for combining global land with SSTs in equation (7).

The 0.7 and 0.3 terms should both be squared in this equation. The correct calculation would be

~2σf =1.96*sqrt((0.7^2)*(0.136^2)+(0.3^2)*(0.195^2)) = ±0.219°C

All of these equations were written incorrectly, and in all but equation (4), these "misprints" caused him to make incorrect calculations that inflate uncertainties by a significant margin. 

SEM and 1/sqrt(N)

And this is not the end of the errors and oversights in this paper. There is more going on here that's wrong. For instance, increasing the number of values (N) should decrease uncertainty. The more values you average, the less the uncertainty in the mean. Frank does not calculate SEM with a 1/sqrt(N) term. Frank's reasoning for not including this appears to be
Correlated and non-normal systematic errors violate the assumptions of the central limit theorem, and disallow the statistical reduction of systematic measurement error as 1/√𝑁.
But this is simply false. As has been clearly demonstrated many times, averaging multiple values increases the precision of the average. This is something you can test for yourself. Take 100 values accurate to 2 decimal places. Then duplicate those values except round them to 1 decimal place. Then average both sets of 100 values. You'll find the both averages are similar to each other, and if you do the same with 1000 values, they will be even closer . This is because while rounding introduces error, the error is random - about half add to the mean and about half subtract from the mean, so the more values you average, the closer the two columns will agree. Since averaging cancels out much of the random error, the loss of precision where N=100 scales by σ/sqrt(100).

Systematic errors are also accounted for and quantified, and bias correction reduces their impact in all modern GMST datasets. Biases also work in both directions and can cancel each other out. If one bias adds 0.1°C to a value and two other biases subtract 0.05°C, their net effect is essentially 0. Bias correction makes the net effect of all biases small. There are numerous papers quantifying biases and correcting them to remove their impact on temperature trends. There is no justification for removing the 1/sqrt(N) term from the SEM. Uncertainties of GMST anomalies are dominated by sampling issues, not by instrumental error. Because Frank neglected to calculate SEM, his uncertainty estimates are going to be way too high even after making the above corrections to his equations. 

And Skeptical Science observes that Frank has also not properly accounted for covariance in these calculations, which would introduce a 2ab*σab term into his equation. The Skeptical Science article observes that one reason why scientists use anomalies is to reduce this uncertainty.
But anomalies involve subtracting one number from another, not adding the two together. Remember: when subtracting, you subtract the covariance term. It’s σf^2 = a^2*σ^2a + b^2*σ^2b - 2ab*σab. If it increases uncertainty when adding, then the same covariance will decrease uncertainty when subtracting. That is one of the reasons that anomalies are used to begin with!

Conclusion

I don't see the need to detail these errors more completely. I think the case can be sufficiently made from the above that Frank has not done his due diligence in this paper to provide a competent analysis of GMST uncertainties. The uncertainties calculated in the peer-reviewed literature for the major GMST datasets are competently done, and they do not make these kinds of mistakes. There's no reason to discard the competent analysis in the peer-reviewed literature for the error-prone analysis in Frank's paper.


References:

[1] Frank P. LiG Metrology, Correlated Error, and the Integrity of the Global Surface Air-Temperature Record. Sensors. 2023; 23(13):5976. https://doi.org/10.3390/s23135976

[2] Lenssen, N. J. L., Schmidt, G. A., Hansen, J. E., Menne, M. J., Persin, A., Ruedy, R., & Zyss, D. (2019). ImpImprovements in the GISTEMP uncertainty model. Journal of Geophysical Research: Atmospheres, 124, 6307–6326. https://doi.org/10.1029/2018JD029522

Comments

  1. Typo in RHS of eqn below (7): 219 --> .219
    Better fix it before Frank catches it!! :-)

    ReplyDelete
  2. "σf^2 = a^2*σ^2a + b^2σ^2b" is the wrong equation. You're treating variances as though they are measurements.
    Eqn. 4 expresses the mean of two variances, V_u = (V_u1+V_u2)/2. It is not the sum of two measurements.

    The rest of your analysis also merely repeats the SkS train wreck. I've already worked through it. It's misconceived and wrong throughout. But SkS disallows debate, making them intellectual cowards. You may be different. But if you encourage an actual debate, I can tell you right now that the posted analysis will go down in flames.

    It's also clear you got your stuff from SkS, Scott. So, it's not that they noticed it too. It's that you got your analysis from them, and didn't acknowledge the source.

    ReplyDelete
    Replies
    1. σf^2 = a^2*σ^2a + b^2σ^2b is the correct equation for 4-7, and you agreed 4, since you accidentally got the right answer with that equation, but got it wrong for 5-7. All you did in equations 5-6 is convert 1σ uncertainty to 95% uncertainty. Your paper is wrong. After finding your mistake, I did more research and found the SkS article that had also found it; it turns out lots of people have pointed this out to you. I agree there's nothing special about my post; it's completely obvious, and lots of people have noticed it. You're also wrong to remove /sqrt(N) from your calculations.

      Delete
    2. Also note that this is NOT the equation you used in equation (4): "V_u = (V_u1+V_u2)/2." Notice that if you take the square root of this, you'd have a 2 inside the radical, but you admitted that was a "misprint." The equation you actually solved was V_u = (V_u1+V_u2)/4, which is identical to the equation I gave you, and that's the equation that gave you the right answer, not the one you wrote in the text.

      Delete
    3. Eqn. 4 is a mean of variances. Eqns. 5&6 are RMS. They are not the same.

      The correct answer from eqn. 4 was gotten because I did the calculation correctly, not by accident. The division by 2 *must* be outside the the square root in order to retain the relative magnitude of the uncertainty to the resulting mean. You apparently don't know that.

      The equation I "actually solved" was, sqrt(V_u₁²+V_u₂²)/2. The sqrt is butt up against the parenthesis and operates first, division by 2, second. First year algebra.

      About Eqn. 7: You are invited to consult p. 3475 of Rohde & Hausfather The Berkeley Earth Land/Ocean Temperature Record. Earth Syst. Sci. Data, 2020. 12(4), 3469-3479; doi: 10.5194/essd-12-3469-2020, where you will find:

      "Uncertainties for the combined record are calculated by assuming the uncertainties in LAT and SST time series are independent and can be combined in proportion to the relative area of land and ocean."

      Likewise, p. 2623-4 in Folland, et al., Global Temperature Change and its Uncertainties Since 1861. Geophys. Res. Lett., 2001. 28(13): p. 2621-2624; doi:10.1029/2001GL012877, has:

      "We calculated separate RSOA [reduced space optimal averaging] uncertainties for the hemispheres, but used time series of the remaining global uncertainties weighted according to the land- and ocean-based data area fraction in each hemisphere."

      You may come to realize that the two quoted descriptions of method are eqn. 7 in words.

      Your arguments are ill-conceived, Scott.

      Delete
    4. Also, none of the uncertainties stem from random error. The assumptions of Gaussian statistics are violated. Applying 1/sqrt(N) is wrong.

      Delete
    5. This is objectively false: "none of the uncertainties stem from random error." All measurements include random error, and biases can be quantified and therefore corrected. You would have to imagine that the instrumental record not only contains biases, but also that they happen to be INCREASING through time, and dominated towards forming a warming trend. But globally bias correction reduces the global warming trend.

      Delete
    6. "It was actually "V_u = (1/2)^2*(V_u1) + (1/2)^2*(V_u2)," Dyslexia. Solving eqn. 4 the way it is written yields 0.541. One can't "accidentally" get the right answer from the wrong equation.

      Eqns. 5&6 are RMS. Agreed they're trivial, but are necessarily there for completeness.

      Bevington eqn. 4.23 is the reduction of uncertainty when averaging measurements with random iid error. Virtually all of Bevington's analyses and equations are strictly valid only for random error.
      .
      However, none of the errors considered in LiG Met. are random. They're systematic with non-normal distributions. The statistical assumptions of 4.23 are violated, and it cannot be used.

      Bevington 4.22 (which will likely be your fallback) is the equation for the uncertainty in a mean. But I used eqns. 5&6 because the analysis required the mean of the uncertainty. They are not the same.

      "All measurements include random error, ..." Equation 4 is dealing with the inherent instrumental uncertainty of a 1C/division thermometer. Not with measurements. Those particular uncertainties have no random component. If you'd read the paper, you'd know that.

      You're criticizing the paper without having studied it, aren't you, Scott.

      "...biases can be quantified and therefore corrected." Only when one knows them. Which they do not. If you'd read LiG Met. with any sort of care, you'd know that, too.

      Systematic field measurement error from environmental variables is not random (demonstrated) and the error biases are not known because the measurement errors have not been determined. They've all just been hand-waved away.

      Delete
    7. You got eqn 4 wrong in the paper and then again in your first comment here. If you solve the wrong equation incorrectly, you could accidentally get the right answer. Eqns 5-6 are not RMS; you simply multiplied 1σ by 1.96. You literally just multiplied the σ value by n/n, then squared σ and then took the sqrt of σ^2. All completely unnecessary, and did nothing whatsoever. You don't have to be trivial to be complete. The equation you should have used was 4.23 in Bevington: σf^2 = (σ^2)/N; that's how you calculate the variance of the mean, and if you take the sqrt, you get the standard error of the mean, which is what you should have calculated. I honestly don't think you know the difference between random and systematic errors, and if these are non-normal distributions, you shouldn't be using the 68-95-99.7 rule anyway.

      "If you'd read LiG Met. with any sort of care..." With all do respect, if you WROTE your paper with any sort of care, you wouldn't make these blunders. Eqn 4 you wrote wrong, then blamed it on a misprint, but then repeated the same wrong equation above. Eqn 6 you wrote wrong in the exact same way but solved it wrong too. The actual solution to the eqn 6 you wrote is 0.388, not 0.382, and fixing the eqn 6 to what it should be reduces it to 0.112. It seems it's really hard for you to get any equation right. Above you wrote, "The equation I 'actually solved' was, sqrt(V_u₁²+V_u₂²)/2." Hardly. Do you really want me to believe the squared variances, added them, then took the sqrt, then divided by 2? And you're telling me to read with care? Perhaps the problem is there are way too many people reading your paper with care, and that's why so many people are pointing out the same exact errors. So please don't lecture me about care; you've been pretty careless in your paper.

      Delete
    8. "The equation you should have used was 4.23 in Bevington: σf^2 = (σ^2)/N; that's how you calculate the variance of the mean"

      For measurements with random errors *only*. In LiG Met., none of the errors I assess have a normal distribution. That's a demonstrated fact. Which you've ignored repeatedly.

      "0.112." Wrong yet again. Eqns. 5&6 are root-mean-square. The division is by N, not N².

      You're insistently wrong, Scott. You've implicitly admitted to have not read the paper. From your arguments you don't understand the logic of the analysis.

      For anyone interested and reading here, the full refutation of Scott's nonsense, and Bob Loblaw, jg's at SkS, can be found here:
      https://notrickszone.com/2023/08/24/dr-patrick-franks-refutation-of-a-sks-critique-attempt-loblaws-24-mistakes-exposed/
      Compare for yourself.

      Delete
    9. All measurements have random errors, and if you're right that none of the errors you assess have a normal distribution, then you shouldn't be using the 68-95-99.7 rule, which would mean your paper has even more problems. Here's a very elegant derivation of the equation for the variance of the mean you should have used. Since you apparently can't be bothered to read Bevington carefully, maybe you'll watch it on Youtube. She begins a little before I do above, and she explains it with more details.
      https://www.youtube.com/watch?v=XymFs3eLDpQ

      I'd also encourage you to go back and read the sections you quoted from in the two papers you cited above. If you read those with care, you'll see that they did these calculations correctly.

      Delete

Post a Comment

Popular posts from this blog

The Marketing of Alt-Data at Temperature.Global

Are Scientists and Journalists Conspiring to Retract Papers?

Tropical Cyclone Trends