Filtering or Detrending using Regression#
We will be discussing filtering in-depth in subsequent sections, but one filtering or detrending technique is to use linear regression.
Consider the decomposition of a variable
where
Thus,
the fraction of variance of
explained by is : the ratio of variance of to and,the fraction of variance of
that is not explained by is 1 - : the ratio of variance of to .
Detrending#
The above decomposition allows us to isolate
Filtering#
Let’s take a look at a filtering example using our ENSO and California precipitation data.
# load packages
import numpy as np
import pickle as pickle
import scipy.stats as st
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rc('font',size=16,weight='bold') #set default font size and weight for plots
# ENSO Data:
# unpickle NINO34 (units of degC)
with open('NINO34_NDJF_2021.pickle','rb') as fp: #.pickle files are a python file type
nino34 = pickle.load(fp,encoding='latin1')
# Precipitation Data:
# unpickle CA_PRECIP_DJFM (units of mm/day)
with open('CA_PRECIP_DJFM.pickle','rb') as fp: #.pickle files are a python file type
precip_djfm = pickle.load(fp,encoding='latin1')
The
# calculate best-fit line
# np.polyfit(predictor, predictand, degree)
a = np.polyfit(nino34,precip_djfm,1) #polynomial fit, degree = 1 means a linear fit
# np.polyval(coeffs,predictor)
y_hat = np.polyval(a,nino34)
Now, we can filter out the influence of ENSO on California precipitation. Why would one want to do this? We might want to do this if we are looking for a relationship between California precipitation and something other than ENSO, but the ENSO influence is too strong to allow us to identify this secondary relationship.
This filtered out part is
# Filter out the ENSO signal from CA precip
precip_djfm_noENSO = precip_djfm - y_hat
Let’s see how the original, the fitted and filtered precipitation time series compare.
# plot the precip and filterd precip time series
plt.figure(figsize=(20,6))
plt.plot(precip_djfm,label="original")
plt.plot(y_hat,label="best-fit")
plt.plot(precip_djfm_noENSO,label="residual")
plt.title("CA DJFM Precipitation")
plt.ylabel("mm day$^{-1}$")
plt.xlabel("time")
plt.legend(loc="upper right")
plt.ylim(-3,8)
(-3.0, 8.0)
