Filtering or Detrending using Regression#

We will be discussing filtering in-depth in subsequent sections, but one filtering or detrending technique is to use linear regression.

Consider the decomposition of a variable y(t) into two parts: the fraction that is linearly related to x(t) and the fraction that is uncorrelated with x(t):

y(t)=y(t)fitted+y(t)residual

where y(t)fitted is the best-fit linear relationship between y(t) and x(t).

Thus,

  • the fraction of variance of y(t) explained by x(t) is r2: the ratio of variance of y(t)fitted to y(t) and,

  • the fraction of variance of y(t) that is not explained by x(t) is 1 - r2: the ratio of variance of y(t)residual to y(t).

Detrending#

The above decomposition allows us to isolate y(t)residual which we sometimes want to do. If y(t)fitted represents a linear trend, y(t)residual is then the detrended component of y(t).

Filtering#

Let’s take a look at a filtering example using our ENSO and California precipitation data.

# load packages
import numpy as np
import pickle as pickle
import scipy.stats as st
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rc('font',size=16,weight='bold') #set default font size and weight for plots
# ENSO Data:
# unpickle NINO34 (units of degC)
with open('NINO34_NDJF_2021.pickle','rb') as fp: #.pickle files are a python file type
    nino34 = pickle.load(fp,encoding='latin1')

#  Precipitation Data:
#  unpickle CA_PRECIP_DJFM (units of mm/day)
with open('CA_PRECIP_DJFM.pickle','rb') as fp: #.pickle files are a python file type
    precip_djfm = pickle.load(fp,encoding='latin1')

The y(t)fitted component is simply the best-fit line between y(t) and x(t) - in this case, CA precipitation and ENSO.

# calculate best-fit line

# np.polyfit(predictor, predictand, degree)
a = np.polyfit(nino34,precip_djfm,1) #polynomial fit, degree = 1 means a linear fit

# np.polyval(coeffs,predictor)
y_hat = np.polyval(a,nino34)

Now, we can filter out the influence of ENSO on California precipitation. Why would one want to do this? We might want to do this if we are looking for a relationship between California precipitation and something other than ENSO, but the ENSO influence is too strong to allow us to identify this secondary relationship.

This filtered out part is y(t)residual.

# Filter out the ENSO signal from CA precip
precip_djfm_noENSO = precip_djfm - y_hat

Let’s see how the original, the fitted and filtered precipitation time series compare.

# plot the precip and filterd precip time series
plt.figure(figsize=(20,6))
plt.plot(precip_djfm,label="original")
plt.plot(y_hat,label="best-fit")
plt.plot(precip_djfm_noENSO,label="residual")
plt.title("CA DJFM Precipitation")
plt.ylabel("mm day$^{-1}$")
plt.xlabel("time")
plt.legend(loc="upper right")
plt.ylim(-3,8)
(-3.0, 8.0)
../../_images/2e53fcc7fa6338b259f1d1b8104e46b8512fb224b9dac22d345b412c2227755f.png