Go back to the Teaching page

Estimating Climate Sensitivity from Observations using Bayesian Methods

by Duo CHAN.

Oct, 2022 in Woods Hole

Climate sensitivity refers to increases in the earth's surface temperature due to a doubling of atmospheric CO 2 concentration. It is a crucial quantity for predicting how fast the warming will be in the future.

In this notebook, we will estimate climate sensitivity from historical observations. Specifically, we will learn:

  • How to build a simple energy balance model for Earth's climate.
  • How to estimate key parameters in that model using Bayesian approaches.
  • Differences between equilibrium and transient climate sensitivity.
  • How to perform a proper model-data comparison.

1. A simple energy balance model for Earth's Climate

The simplest way to model the earth's climate is probably to consider the earth as a single dot. Accounting for energy imbalance, climate feedback, and attendant temperature change, the following equation should be a good first-order description of Earth's temperature evolution around some equilibrium state, say the pre-industrial period,

$c_p \frac{dT'}{dt} = -\lambda T' + F'$ ............. (1)

where $T'$ and $F'$ denote, respectively, temperature and forcing anomalies.

There are two parameters in this model.

  • $\lambda$ is a positive feedback parameter, which damps temperature anomalies. $\lambda$ is a combination of many processes including the Planck feedback, ice-albedo feedback, water-vapor feedback, lapse-rate feedback, etc.
  • $c_p$ is the heat capacity parameter, which is often linked to the depth of the ocean mixed layer, and controls how fast temperature would respond to forcing.

At equilibirum, $\frac{dT'}{dt} = 0$, and associated temperature response is $T' = F'/\lambda$, which we call climate sensitivity.


From 1750--2020, the whole world has emitted more than 1.7 trillion tons of CO 2 cumulatively , which has increased the atmospheric CO 2 concentration to exceed 415 ppm and global mean temperature by about 1.2°C since the 1880s.

The figure below shows our current best estimate of historical climate forcing , $F'$, and temperature evolution, $T'$. Note that for simplicity, we do not distinguish different types of forcing, such as greenhouse gases, anthropogenic aerosols, volcanos, and the sun.

In [1]:
# First Load and Visualize Data
import numpy  as np
import pandas as pd
from scipy import stats
from matplotlib import pyplot as plt

T_df = pd.read_csv('Tempreature.txt', sep=',', header=0)
F_df = pd.read_csv('Forcing.txt', header=0, delim_whitespace=True)
Nino_df = pd.read_csv('Nino34.txt', header=0, delim_whitespace=True)
T    = T_df["Anomaly (deg C)"].to_numpy()
Terr = (T_df["Upper confidence limit (97.5%)"].to_numpy() - \
        T_df["Lower confidence limit (2.5%)"].to_numpy())/3.88    # Get 1 s.d. error of Temperature
F    = F_df["All"].to_numpy()
Yr   = T_df["Time"].to_numpy()
Nino = Nino_df["Nino34"].to_numpy()

# Subset data from 1900-2021
yr_start = 1900
T    = T[Yr >= yr_start]
Terr = Terr[Yr >= yr_start]
F    = F[Yr >= yr_start]
Nino = Nino[Yr >= yr_start]*1000
Yr   = Yr[Yr >= yr_start]

idx  = np.arange(1960,1990)-Yr[0]
T    = T - np.mean(T[idx])         # Temperature anomalies to 1960--1990 

plt.plot(Yr,T, label = "Temperature [°C]");
plt.plot(Yr,F, label = "Forcing [W/m2]");
plt.grid();
plt.legend();

So, can we use these time series and Eq. 1 to estimate $\lambda$ and $c_p$ in order to understand climate sensitivity?

2. Solving the problem using Bayesian methods

After defining the problem, we need to solve it. Specifically, we need to find combinations of $\lambda$ and $c_p$ such that, when plugged in with observed $F'$, Eq. 1 produces $T'$ that reasonably agrees with observed $T'$.

Philosophically, this fitting of a differential equation has no difference from fitting a linear model, i.e., $y = kx+b$, in the sense that we are still trying to find combinations of parameters that decrease some loss function or increase the likelihood of data. However, the problem we are dealing with does not have a closed-form analytical solution like linear regression . As a result, we need new methods.

The Bayesian method is a great candidate, with several benefits,

  • It allows for estimating parameters no matter how complicated the underlying function is.
  • It estimates uncertainties comprehensively and captures complicated covariance between parameters.
  • We can bring our prior knowledge about parameters in the process of inference.

2.1 How do the Bayesian methods work?

The essence of Bayesian methods is the Bayes rule,

$P(\theta|D) = \frac{P(D|\theta) P(\theta)}{\int P(D|\theta)P(\theta) d\theta} \propto P(D|\theta) P(\theta)$ .................... (2)

On the left-hand side, we have $P(\theta|D)$, which is the probability of parameters, $\theta$, condition on data, $D$. This quantity is what we need to calculate, which we call Posterior in Bayesian terminology.

On the right-hand side, we have $P(D|\theta)$ and $P(\theta)$. Among which,

  • $P(D|\theta)$ is the probability of seeing data given a combination of parameters, which we call Likelihood ,
  • $P(\theta)$ is the knowledge we have on $\theta$ before seeing any data, which we call Prior .

As a result, the process of performing Bayesian analysis is to use Likelihood to update the Prior and obtain the Posterior.


If you have no statistical background, I find the following example helpful for understanding the Bayes rule.

Suppose $\theta$ can take values from two options, $\theta_1$ and $\theta_2$, and data can also take values from two options, $D_1$ and $D_2$.

We know that given $\theta_1$, there is one pathway to generate $D_1$, and given $\theta_2$ there are two pathways to generate $D_1$. If we make an observation that gives us $D_1$, we probably know that $\theta_2$ is more likely because it takes two of the three pathways to generate $D_1$. On the other hand, $\theta_1$ is less likely but not impossible.

So a reasonable estimate of how likely each value of $\theta$ could be, $P(\theta|D_1)$, can involve weighing by the number of pathways each value contributes. That is, $P(\theta_1|D_1) = 2 / (2 + 1) = 2/3$, and $P(\theta_2|D_1) = 1 / (2 + 1) = 1/3$. Essentially, the numerator, 1 or 2, is $P(D|\theta)$, and the denominator, (1+2), is $\int P(D|\theta) d\theta$ in the Bayes rule.


Coming back to our climate example, how should we calculate $P(D|\theta)$ and $P(\theta)$?

[Data Likelihood] Calculating $P(D|\theta)$ often involves components, a process-level model and an observation model .

  • The process-level model is a model that generates true values of quantities to be observed. In our climate example, it is Eq. 1 that generates $T'_t$ given $\lambda$, $c_p$, and $F'_t$. Here, the subscript $_{t}$ denotes underlying true values to be observed. This model has an analytical solution, ###
    $T_t'(t) = \frac{1}{c_p} {\int_{0}^{t}} e^{-\frac{\lambda}{c_p}(t-\tau)} F'(\tau) d\tau$.
    where $t$ in the parenthesis denotes time. You may notice that we assume that forcing measurements, $F'$, are perfect, i.e., $F'_t$ = $F'$.
    This assumption is bold because the uncertainty of aerosol forcing estimate remains high and is an active field of research.
In [2]:
# Process-level model

# F:    forcing            [unit: W]
# lmbd: feedback parameter [unit: W/K]
# cp:   mixed layer depth  [unit: hundred m]

def climate_model(F,lmbd,cp):
    scl    = 86400 * 365 / 1000 / 4000 / 100 / 0.7       # Match up with standard physical unit [m, s, K, W...]
    kernel = np.exp(- lmbd / cp * scl * (Yr-Yr[0]+1))
    T_hat  = np.convolve(F / cp * scl, kernel)           # Calculate T_t
    T_hat  = T_hat[0:Yr.shape[0]]
    return T_hat - np.mean(T_hat[idx])                   # Mean over 1960--1990 is zero

# Test the process-level model
plt.plot(Yr,T, label = "Temperature [°C]");
plt.plot(Yr,climate_model(F,lmbd = 1.5,cp = 1), label = "Simulated T [°C]");  # Let's try a random set of parameters
plt.grid();
plt.legend();

[Prior] Calculating $P(\theta)$ depends on the prior we choose, which we will discuss later. But for now, we assume that we know nothing about $\lambda$ and $c_p$, such that their probability density function (pdf) is flat, i.e., $P(\lambda,c_p)$ = 1. We also assume the standard deviation of observational error of temperatures, $\sigma$, which is a parameter in the observational model, is 0.2°C.

Plugging these back into the Bayes rule, the posterior distribution is,

$P(\lambda, c_p, | T', F', \sigma = 0.2) \propto P(T' | \lambda, c_p, \sigma = 0.2, F') = \mathbf{normpdf}(T'_t, 0.2, T')$.

2.2 Perform the analysis

Now we know how to compute the posterior, but essentially, we need to estimate, for example, the mean and uncertainty of $\lambda$. There are generally two types of methods to analyze the posterior distribution,

  • Evaluating on a grid of parameters
  • Sampling from the posterior distribution

Let's start with the grid search method, which requires looping over every possible combination of parameters.

In [3]:
# Posterior of the climate model problem
# x: parameters  [lambda, cp]
def climate_model_post_flat_prior(x0):
    # To avoid diminishing likelihood when multiplying small numbers many times, 
    # we use log-likelihood, and multiplication becomes the sum in the log space
    l_pr  = 1                                           # Flat Prior
    T_hat = climate_model(F,x0[0],x0[1])                # predict using the process-level model              
    l_lik = np.sum(stats.norm(T_hat,0.2).logpdf(T))     # pass through the observational model 
    return l_pr + l_lik, l_pr, l_lik


# Now let's loop over all likely combinations of lambda and cp
lambda_step = 0.04                                      # size of grid
cp_step     = 0.07
lambda_grid = np.arange(lambda_step,2,lambda_step)    # define the grid
cp_grid     = np.arange(cp_step,7,cp_step)
l = np.zeros([lambda_grid.shape[0],cp_grid.shape[0]])   # log posterior
ct_1 = -1
for lmbd in lambda_grid:
    ct_1 += 1
    ct_2 = -1
    for cp in cp_grid:
        ct_2 += 1
        l[ct_1,ct_2], _, _ = climate_model_post_flat_prior(np.array([lmbd, cp]))
In [4]:
# Plot the grid search
def log2lin_normalize(l_in,a_step,b_step):
    L_out = np.exp(l_in-np.max(l_in))                        # Convert log-likelihood back to likelihood
    return L_out / np.sum(L_out) / a_step / b_step           # Normalize likelihood

# A function for plotting 2D pdf (heat map) and marginal distributions (black curves)
def show_P_2D(ax,x_list,y_list,P,tit):

    pcm = ax.pcolor(x_list,y_list,P.transpose(),cmap = "YlOrRd");       # Heatmap
    ax.plot(x_list,np.sum(P,axis = 1)*.04 + np.min(y_list),'k-')         # Marginal distributions
    ax.plot(np.sum(P,axis = 0)*0.03 + np.min(x_list), y_list,'k-')
    cbar = fig.colorbar(pcm, ax = ax);                                  # Add colorbar
    cbar.ax.set_ylabel(tit, rotation=90);
    ax.set(xlabel = "$\lambda$ [W/°C]", ylabel = "Mixed-layer depth [100 m]")
    # ax.set(xlabel = " $\gamma$ [°C/W]", ylabel = "decay time scale [years]")
    ax.set_xlim([np.min(x_list), np.max(x_list)])
    ax.set_ylim([np.min(y_list), np.max(y_list)])

L   = log2lin_normalize(l,lambda_step,cp_step)
fig = plt.figure()
ax  = plt.subplot(111)
show_P_2D(ax,lambda_grid,cp_grid,L,'Posterior')
In [5]:
def find_ci_from_grid(x,pdf,alpha=0.05):
    cdf = np.cumsum(pdf / pdf.sum())
    return np.concatenate((x[np.argwhere(cdf<alpha/2)[-1]], x[np.argwhere(cdf<0.5)[-1]],  x[np.argwhere(cdf>1-alpha/2)[0]]))

# Summrize estimates and report median and 95% c.i.
lambda_ci = find_ci_from_grid(lambda_grid,L.sum(axis = 1))
print('lambda            : ' + str(lambda_ci[1])+ ' ' + str(lambda_ci[np.array([0,2])]) + 'W/°C (95% c.i.)')
cp_ci = find_ci_from_grid(cp_grid,L.sum(axis = 0))
print('mixed layer depth : ' + str(cp_ci[1])+ '  ' + str(cp_ci[np.array([0,2])]) + 'hm      (95% c.i.)')
lambda            : 0.88 [0.32 1.24]W/°C (95% c.i.)
mixed layer depth : 3.5  [2.38 6.02]hm      (95% c.i.)

However, the ocean mixed layer depth is not as deep as 700m. Suppose we know that the mixed layer depth has a prior distribution, $P(c_p) \sim N(1,0.3) $hm. Let's bring in this peice of information and redo the grid search.

In [6]:
# Hyper parameters for priors
cp_pr_mu = 1.0   # Prior : mean of cp
cp_pr_sd = 0.3   # Prior : standard error of cp

# Posterior of the climate model problem
# x: parameters  [lambda, cp]
def climate_model_post_informative_prior(x0):
    # To avoid diminishing of likelihood when multiplying small numbers many times, 
    # we use log-likelihood, and multiplication becomes the sum in the log space
    l_pr  = stats.norm(cp_pr_mu,cp_pr_sd).logpdf(x0[1]) # Informative prior
    T_hat = climate_model(F,x0[0],x0[1])                # predict using the process-level model              
    l_lik = np.sum(stats.norm(T_hat,0.2).logpdf(T))     # pass through the observational model 
    return l_pr + l_lik, l_pr, l_lik

# Now let's loop over all likely combinations of lambda and cp
l_pst = np.zeros([lambda_grid.shape[0],cp_grid.shape[0]])   # log posterior
l_pr  = np.zeros([lambda_grid.shape[0],cp_grid.shape[0]])   # log prior
l_lik = np.zeros([lambda_grid.shape[0],cp_grid.shape[0]])   # log likelihood
ct_1 = -1
for lmbd in lambda_grid:
    ct_1 += 1
    ct_2 = -1
    for cp in cp_grid:
        ct_2 += 1
        l_pst[ct_1,ct_2], l_pr[ct_1,ct_2], l_lik[ct_1,ct_2] = \
        climate_model_post_informative_prior(np.array([lmbd, cp]))
In [7]:
L_pr   = log2lin_normalize(l_pr,lambda_step,cp_step)       # Prior distribution
L_lik  = log2lin_normalize(l_lik,lambda_step,cp_step)      # Data likelihood :: the same as before
L_pst  = log2lin_normalize(l_pst,lambda_step,cp_step)      # Posterior distribution

fig =  plt.figure(num=1, figsize=(10, 6))
ax  = plt.subplot(221)
show_P_2D(ax,lambda_grid,cp_grid,L_pr,'Prior')
ax  = plt.subplot(222)
show_P_2D(ax,lambda_grid,cp_grid,L_lik,'Likelihood')
ax  = plt.subplot(223)
show_P_2D(ax,lambda_grid,cp_grid,L_pst,'Posterior')

The prior information influences the estimate of our posterior distribution. For example, the new estimate does not have the long tail for a mixed layer deeper than 300 m, thereby excluding $\lambda$ smaller than 1.0 W/°C from our estimates.

The bayesian estimate is, therefore, an average over prior knowledge and data. In other words, Bayesian methods can be used to pull different pieces of information together. Several classic data assimilation techniques that we are going to cover in the class, such as the Kalman Filter, are posterior distributions averaged over model simulations (prior) and observations (data likelihood).


2.2.2 Sampling form the posterior distribution

One disadvantage of the grid search is the number of required computations increases exponentially with the number of parameters, which can quickly become a nightmare that is often called the curse of dimensionality . Even for the 2D case above, the lower left and upper right corner of the parameter space has a low probability, and spending time to evaluate them is inefficient. Moreover, it is not an easy task to calculate, for example, mean and standard deviation, from grid search results. These quantities can be evaluated more efficiently if we can take samples from the posterior distribution.

Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. In this notebook, we are going to introduce the Metropolis–Hastings algorithm , which obtains a sequence of random samples from a probability distribution from which direct sampling is difficult.

In [8]:
# The Metropolis-Hastings algorithm is simple and takes less than ten lines of code
def Metropolis_Hastings(x0,fun,stp = 0.05):
    x1 = x0 + np.random.normal(0,stp,x0.shape)  # 1. Jump to a new location
    r  = np.exp(fun(x1) - fun(x0))              # 2. Evaluate the probability ratio between the two locations (likelihood evaluated in log space)
    if r >= 1:                                  # 3. If the new location has a higher posterior probability, accept the new location as a new sample   
        x0 = x1                                 # 4. Otherwise, accept the new location with a probability of r  
    elif np.random.uniform(0,1,1) < r:          # This is realized by drawing a random number form uniform distribution over [0,1]
        x0 = x1                                 # 5. If this random number is smaller than r, accept the new location as a new sample
                                                # 6. Otherwise, stay at the original location and take a sample from that location
    return x0,x1,r                              # The second and third returns are for visualization

The mathematical details and proof of the algorithm can be found on this Wikipedia page.

Here, we focus on how to implement it in the climate model example.

In [9]:
# Posterior of the climate model problem
# x: parameters  [lambda, cp]
def climate_model_post_mcmc_2D(x0):
    # To avoid diminishing likelihood when multiplying small numbers many times, 
    # we use log-likelihood, and multiplication becomes the sum in the log space
    l_pr  = stats.norm(cp_pr_mu,cp_pr_sd).logpdf(x0[1]) # Informative prior
    T_hat = climate_model(F,x0[0],x0[1])                # predict using the process-level model              
    l_lik = np.sum(stats.norm(T_hat,0.2).logpdf(T))     # pass through the observational model 
    return l_pr + l_lik

np.random.seed(999)                      # Make the results reproducable

N_samp  = 2000                           # Take how many samples
X       = np.zeros((N_samp,2))           # Allocate space for MCMC samples
X_P     = np.zeros((N_samp,2))           # Allocate space for MCMC proposal
R       = np.zeros((N_samp,1))           # Allocate space for MCMC ratio
x0      = np.array([0.55, 1.1])          # Initiate the sampler
x_start = x0
for ct in np.arange(0,N_samp,1):
    x0,X_P[ct,:],R[ct,:]  = Metropolis_Hastings(x0, \
            climate_model_post_mcmc_2D, stp = np.array([0.08, 0.15]))  # Sampling using the MH-MCMC method
    X[ct,:]   = x0

if_accept = np.all(X == X_P,axis = 1)

The animation below gives you a step-by-step visualization of what happened when we run the above block.

The black dot denotes the proposal at a new time step, the gray line shows sampled chain. Also shown are probability ratio between the proposal and the existing location and whether the proposal is accepted.

Out[21]:

Several points to notice:

  • Usually, the first half of the chain is considered warm-up and is discarded.

  • For the retained samples, need to check if they are stationary. That is, the starting and the end of a chain are exploring the same region in the parameter space.

  • We also need to check for convergence. That is, if we start independent chains from different locations, do they explore the same region? To perform this check, we need to run several chains starting from different locations.

In [10]:
# Now we wrap a chain of the MCMC sampler into a function, and run multiple chains to check convergence
def MCMC_one_chain(x0,N_samp = 1000):
    X       = np.zeros((N_samp,2))                               # Allocate space for MCMC samples
    for ct in np.arange(0,N_samp,1):
        x0,_,_  = Metropolis_Hastings(x0, \
              climate_model_post_mcmc_2D, stp = np.array([0.08, 0.15]))    # Sampling using the MH-MCMC method
        X[ct,:] = x0
    return X

np.random.seed(999)                     # Make the results reproducable
N_samp  = 3000
X_1 = MCMC_one_chain(np.array([.55,1.1]),N_samp)
X_2 = MCMC_one_chain(np.array([.55,3.0]),N_samp)
X_3 = MCMC_one_chain(np.array([1.3,1.1]),N_samp)
X_4 = MCMC_one_chain(np.array([1.3,3.0]),N_samp)
In [11]:
# What does pamateters in the four chains look like?
# These plots are also called trace plots, and they can be plotted for each parameter
id = 0     # 0 for lambda, 1 for cp
plt.plot(X_1[:,id])
plt.plot(X_2[:,id])
plt.plot(X_3[:,id])
plt.plot(X_4[:,id])
plt.xlabel("a")
fig.show()

After warming up, the four chains converge visually. We can also use the ratio between total variance among all the chains versus mean variance within individual chains,

$r = \frac{V_{total}}{V_{intra}}$,

and $r > 1.02$ normally suggests poor convergence for a long chain.

In [12]:
# Calculate r for parameters
id = 0             # 0 for lambda, 1 for cp
var_intra = (np.var(X_1[int(N_samp/2):,id])+np.var(X_2[int(N_samp/2):,id])+np.var(X_3[int(N_samp/2):,id])+np.var(X_4[int(N_samp/2):,id]))/4
var_total = np.var(np.concatenate((X_1[int(N_samp/2):,id],X_2[int(N_samp/2):,id],X_3[int(N_samp/2):,id],X_4[int(N_samp/2):,id])))
print("R = " + str((var_total / var_intra).round(3)))
R = 1.012

3. Putting things together

So far, we learned,

  • Building a simple energy balance model to simulate the earth's surface temperature during the 20th century.

  • Bayesian methods is to use data likelihood to update prior information, thereby pooling information from different sources.

  • A model often consists of a deterministic process-level model that generates the true underlying quantities to be observed and an observational model that is used to evaluate data likelihood.

  • Using grid search and MCMC methods to estimate parameters of model parameters in a Bayesian framework.

Can we further improve the above estimates?

3.1 Improve the error model

One possibility is that the observational error, $\sigma$, is also an unknown parameter. The observational error generally decreases with time, meaning that earlier data would provide fewer constraints on the parameters. In addition, misfits between the model and data do not only arise from observational errors, $\sigma_o$. Internal climate variability, $\sigma^2_i$, such as ENSO, also leads to interannual variations that cannot be captured by the simple climate model.

As a result, a better error model would be,

$\sigma^2 = \sigma^2_i + \sigma^2_o$,

where we use estimates of $\sigma^2_o$ from HadCRUT5 and let the MCMC determine $\sigma^2_i$, we also use a parameter to remove the temperature change that is linearly associated with the Nino3.4 index .

In [13]:
# x: parameters  [lambda, cp, sigma]
def climate_model_post_three_para(x0):
    # To avoid diminishing likelihood when multiplying small numbers many times, 
    # we use log-likelihood, and multiplication becomes the sum in the log space
    l_pr  = stats.norm(cp_pr_mu,cp_pr_sd).logpdf(x0[1])# stats.norm(cp_pr_mu,cp_pr_sd).logpdf(x0[1])                                          # Informative prior
    T_hat = climate_model(F,x0[0],x0[1])               # predict using the process-level model              
    l_lik = np.sum(stats.norm(T_hat,np.sqrt(Terr**2+x0[2]**2)).logpdf(T))  # pass through the observational model 
    return l_pr + l_lik

np.random.seed(1000)                     # Make the results reproducable
N_samp  = 10000                          # Take how many samples
X       = np.zeros((N_samp,3))           # Allocate space for MCMC samples
X_P     = np.zeros((N_samp,3))           # Allocate space for MCMC proposal
R       = np.zeros((N_samp,1))           # Allocate space for MCMC ratio
x0      = np.array([1, 2, 0.1])          # Initiate the sampler
x_start = x0
for ct in np.arange(0,N_samp,1):
    x0,X_P[ct,:],R[ct,:]  = Metropolis_Hastings(x0, \
            climate_model_post_three_para, stp = np.array([0.06, 0.12, 0.006]))  # Sampling using the MH-MCMC method
    X[ct,:]   = x0

if_accept = np.all(X == X_P,axis = 1)
In [14]:
# Report the confidence interval of each parameter
Stats = np.quantile(X[int(N_samp/2):,:], np.array([0.5,0.025,0.975]), axis = 0).round(2)
print("Lambda            : " + str(Stats[0,0]) + "   [" + str(Stats[1,0]) + ', ' + str(Stats[2,0]) + '] W/°C  (95% c.i.)')
print("Mixed-layer depth : " + str(Stats[0,1]) + "  [" + str(Stats[1,1]) + ', ' + str(Stats[2,1]) + ']  hm    (95% c.i.)')
print("Internal error    : " + str(Stats[0,2]) + "  [" + str(Stats[1,2]) + ', ' + str(Stats[2,2]) + '] °C     (95% c.i.)')
Lambda            : 1.28   [1.1, 1.5] W/°C  (95% c.i.)
Mixed-layer depth : 2.11  [1.55, 2.57]  hm    (95% c.i.)
Internal error    : 0.12  [0.1, 0.14] °C     (95% c.i.)
In [15]:
# Plot the histogram of lambda and cp
ax  = plt.subplot(111)
ax.hist2d(X[int(N_samp/2):,0],X[int(N_samp/2):,1],bins = 40, \
                        range = [[.7, 1.8],[1., 4.]], cmap = "YlOrRd");
ax.set(xlabel = "$\lambda$ [W/°C]", ylabel = "Mixed-layer depth [100 m]");
In [16]:
# Plot the histogram of internal variability
ax  = plt.subplot(111)
ax.hist(X[int(N_samp/2):,2],bins = 40, range = [0.08, 0.15]);
ax.set(xlabel = "$\sigma$ [°C]", ylabel = "Count");
In [17]:
# Plot the structure of total variance with time
ax  = plt.subplot(111)
err = np.sqrt(Terr**2+np.mean(X[int(N_samp/2):,2])**2)
ax.bar(Yr, err, label = 'Estimated Error')
ax.plot([1900, 2021],[0.2, 0.2],'r--', label = 'Assumed Error in Sec. 2.2.1')
ax.set(xlabel = "Year", ylabel = "1 s.d. error [°C]");
ax.legend();
In [18]:
# Plot Fitted model and ensemble
Simulated = np.zeros((int(N_samp/2),T.shape[0]))
for ct in np.arange(int(N_samp/2),N_samp):
    Simulated[ct-int(N_samp/2),:] = climate_model(F,lmbd = X[ct,0],cp = X[ct,1])

Simulated_q = np.quantile(Simulated,[.025, .5, .975],axis = 0)

ax  = plt.subplot(111)
ax.plot(Yr,Simulated_q[[0,2],:].transpose(),'--',color = [.5, .5, .5], label = '95% c.i. (Fitting)')
ax.plot(Yr,Simulated_q[2,:].transpose()+1.96*err,'--',color = [.7, .7, 1],  label = '95% c.i. (Predicting)')
ax.plot(Yr,Simulated_q[0,:].transpose()-1.96*err,'--',color = [.7, .7, 1],  label = '95% c.i. (Predicting)')
ax.plot(Yr,Simulated_q[1,:].transpose(),'-',color = [.0, .0, .0], label = 'Central estimate')
ax.plot(Yr,T, label = 'Observed T')
ax.set(xlabel = "Year", ylabel = "Temperature anomalies [°C]");
ax.legend();
ax.grid();

3.2 Adding Another Mode Representing the Effect of Deeper Ocean

From this analysis, it is clear that a simple model like Eq. 1 is capable of reproducing the overall temperature evolution at the earth's surface given historical estimates of external forcing.

Given the estimated $\lambda$ parameter, we can further estimate climate sensitivity as $F'/\lambda$. The effective radiative forcing of doubling CO 2 is roughly 3.7 W/m 2 . Plugging in posterior samples of $\lambda$ gives an estimate of climate sensitivity of 2.8 ([2.5, 3.4])°C. Such an estimate is tighter and, on average, lower than the 3 ([2, 5]) °C estimate by IPCC AR6.

In [19]:
# Plot the histogram of internal variability
ax  = plt.subplot(111)
ax.hist(3.7 / X[int(N_samp/2):,0],bins = 40, range = [2, 10]);
ax.set(xlabel = "Climate sensitivity [°C]", ylabel = "Count");
ax.grid()

Stats2 = np.quantile(3.7/X[int(N_samp/2):,0], np.array([0.5,0.025,0.975]), axis = 0).round(2)
print("Climate sensitivity : " + str(Stats2[0]) + "   [" + str(Stats2[1]) + ', ' + str(Stats2[2]) + '] °C  (95% c.i.)')
Climate sensitivity : 2.89   [2.46, 3.35] °C  (95% c.i.)

Why that is the case?

In our simple model, we assumed that only the top ~100m of the ocean is responding to external forcing. But if a deeper ocean is also responding, it could increase the responding time scale of the system, thereby increase climate sensitivity estimates (the long tail in the fitting in section 2.2.1).

To represent the potential effect of a deeper ocean, we extend the simple model to have more than two modes, a fast model representing seasonal mixed-layer and a slower model representing responding ocean beneath the seasonal mixed-layer. Mathematically, we are adding up the solution of two modes.

$T_t'(t) = \frac{1}{c_{p1}} {\int_{0}^{t}} e^{-\frac{\lambda_1}{c_{p1}}(t-\tau)} F'(\tau) d\tau + \frac{1}{c_{p2}} {\int_{0}^{t}} e^{-\frac{\lambda_2}{c_{p2}}(t-\tau)} F'(\tau) d\tau$,

where subscripts 1 and 2 denote the fast and the slow mode, respectively. The fast mode should have a smaller $c_p$ and a larger $\lambda$, vice versa .

In [20]:
# F:       forcing            [unit: W]
# lmbd1/2: feedback parameter [unit: W/K]
# cp1/2:   mixed layer depth  [unit: hundred m]

def climate_model_2_modes(F,lmbd1,lmbd2,cp1,cp2):
    scl    = 86400 * 365 / 1000 / 4000 / 100 / 0.7       # Match up with standard physical unit [m, s, K, W...]
    kernel1 = np.exp(- lmbd1 / cp1 * scl * (Yr-Yr[0]+1))
    kernel2 = np.exp(- lmbd2 / cp2 * scl * (Yr-Yr[0]+1))
    T_hat1  = np.convolve(F / cp1 * scl, kernel1)        # Calculate T_t for mode 1
    T_hat1  = T_hat1[0:Yr.shape[0]]
    T_hat2  = np.convolve(F / cp2 * scl, kernel2)        # Calculate T_t for mode 2
    T_hat2  = T_hat2[0:Yr.shape[0]]
    T_hat   = T_hat1 + T_hat2
    return T_hat - np.mean(T_hat[idx]), T_hat1 - np.mean(T_hat1[idx]), T_hat2 - np.mean(T_hat2[idx])

plt.plot(Yr,T, label = "Temperature [°C]");

# The model has a fast mode (green) that has large response to fast-varying forcing agents, 
# such as volcanic eruptions, and a slow mode (red) that has small response to volcanic 
# forcing but large response to slow varying forcing components, such as GHG and anthropogenic aerosols.
a = np.array([6,.8,1,6,0.1])
aa,bb,cc = climate_model_2_modes(F,a[0],a[1],a[2],a[3])
plt.plot(Yr,aa, label = "Simulated T [°C]");
plt.plot(Yr,bb, label = "Simulated T mode fast [°C]");
plt.plot(Yr,cc, label = "Simulated T mode slow [°C]");
plt.grid();
plt.legend();

Now, we can repeat our Bayesian analysis,

In [21]:
# x: parameters  [lambda1, lambda2, cp1, cp2, sigma]
def climate_model_2_modes_post_three_para(x0):

    # Here, the prior information on x0[3] (depth of the slower mode) is N(4,1) x100m
    # 
    # Prior of the lambdas are exponential distributions with a large scale factor, 
    # which is to gaurantee that lambdas are positive definite 
    # but is essentially flat for lambda > 0
    l_pr  = stats.norm(1,0.3).logpdf(x0[2]) + stats.norm(4,1).logpdf(x0[3]) \
                                            + np.sum(stats.expon(scale = 10000).logpdf(x0[0:2]))
    T_hat,T_hat1,T_hat2 = climate_model_2_modes(F,x0[0],x0[1],x0[2],x0[3]) # predict using the process-level model              
    l_lik = np.sum(stats.norm(T_hat,np.sqrt(Terr**2+x0[4]**2)).logpdf(T))  # pass through the observational model 
    return l_pr + l_lik

np.random.seed(10000)                       # Make the results reproducable
N_samp  = 10000                             # Take how many samples
X       = np.zeros((N_samp,5))              # Allocate space for MCMC samples
X_P     = np.zeros((N_samp,5))              # Allocate space for MCMC proposal
R       = np.zeros((N_samp,5))              # Allocate space for MCMC ratio
x0      = np.array([10, 1, 1, 6, 0.1])      # Initiate the sampler
x_start = x0
for ct in np.arange(0,N_samp,1):
    x0,X_P[ct,:],R[ct,:]  = Metropolis_Hastings(x0, \
            climate_model_2_modes_post_three_para, stp = np.array([0.5, 0.1, 0.1, 0.6, 0.006]))  # Sampling using the MH-MCMC method
    X[ct,:]   = x0
In [22]:
# Plot fitted parameters
plt.plot(X[int(N_samp/2):,0],label = "$\lambda_1$")
plt.plot(X[int(N_samp/2):,1],label = "$\lambda_2$")
plt.plot(X[int(N_samp/2):,2],label = "$c_{p1}$")
plt.plot(X[int(N_samp/2):,3],label = "$c_{p2}$")
plt.plot(X[int(N_samp/2):,4],label = "$\sigma$")
plt.grid();
plt.legend()
plt.yscale('log')
In [23]:
# Plot Fitted model and ensemble
Simulated = np.zeros((int(N_samp/2),T.shape[0]))
for ct in np.arange(int(N_samp/2),N_samp):
    Simulated[ct-int(N_samp/2),:] = climate_model_2_modes(F,X[ct,0],X[ct,1],X[ct,2],X[ct,3])[0]

Simulated_q = np.quantile(Simulated,[.025, .5, .975],axis = 0)

ax  = plt.subplot(111)
err = np.sqrt(Terr**2+np.mean(X[int(N_samp/2):,4])**2)
ax.plot(Yr,Simulated_q[[0,2],:].transpose(),'--',color = [.5, .5, .5], label = '95% c.i. (Fitting)')
ax.plot(Yr,Simulated_q[2,:].transpose()+1.96*err,'--',color = [.7, .7, 1],  label = '95% c.i. (Predicting)')
ax.plot(Yr,Simulated_q[0,:].transpose()-1.96*err,'--',color = [.7, .7, 1],  label = '95% c.i. (Predicting)')
ax.plot(Yr,Simulated_q[1,:].transpose(),'-',color = [.0, .0, .0], label = 'Central estimate')
ax.plot(Yr,T, label = 'Observed T')
ax.set(xlabel = "Year", ylabel = "Temperature anomalies [°C]");
ax.legend();
ax.grid()
In [24]:
# Plot the histogram of internal variability
ax  = plt.subplot(111)
ECS = 3.7 / X[int(N_samp/2):,0] + 3.7 / X[int(N_samp/2):,1]
ax.hist(ECS,bins = 40, range = [2, 10]);
# ax.hist(3.7 / X[int(N_samp/2):,1],bins = 40, range = [2, 15]);
ax.set(xlabel = "Climate sensitivity [°C]", ylabel = "Count");
ax.grid()

Stats2 = np.quantile(ECS, np.array([0.5,0.025,0.975]), axis = 0).round(2)
print("Climate sensitivity : " + str(Stats2[0]) + "   [" + str(Stats2[1]) + ', ' + str(Stats2[2]) + '] °C  (95% c.i.)')
Climate sensitivity : 4.43   [3.59, 6.34] °C  (95% c.i.)

Here, it does not necessarily mean that this updated result from the two box is 100% correct, but including the slow mode leads to higher ECS estimate and also a longer tail. We often call estimates using only fast modes transient climate sensitivity and those including slower modes equilibrium climate sensitivity .


3.3 A Proper Model-Data Comparison

Look at the figure for fitted temperature in Sec. 3.2 again, although models capture the overall increasing trend, there are also misfits. As discussed before, some of them can arise from internal variability. For example, the two peaks in 1941 and 1998 can be explained by ENSO events. decadal variability, such as the Pacific decadal oscillation may have also contributed. But there are still some periods that internal variations cannot explain. For example, during World War II (WWII), observations show large positive anomalies, especially in 1945.

Of course, the model may be too simple, but when the model and data disagree, in addition to the model's being imperfect and internal variability could contribute, it is also important to keep in mind that the so-called observations we see could contain biases and errors. Below is a story about problems in sea-surface temperature measurements during WWII.

Out[21]:

3.4 Further notes about Bayesian Analysis and MCMC

Two other topics that may be of your interest:

(1) There are packages to implement MCMC-type Bayesian analysis. A famous one is Stan , which is a C package that also has command line , R , Python , Matlab , and Julia interfaces. Julia also has its own Bayesian package, called Turing . I find Turing powerful, especially when the process-level models are complicated and need to be solved using other packages, such as a differential equation solver .

(2) In practice, the Metropolis-Hastings algorithm often rejects too many proposals and performs poorly . A more involved method, called Hamiltonian Monte Carlo ( HMC ), solves this issues by considering momentum and gradient in the parameter space. I find this video a good starting point to grab some intuitions about HMC. Knowing Automatic differentiation and Dual number would also be helpful, but these would involve more advanced content in an Optimization course. Nevertheless, if your goal is to simply use HMC, both Stan and Turing has that option.