Real-Time Bivariate MDFA-MSE: a Quantitative Assessment of Leading Indicators

This entry is about the last piece of R-code in MDFA_Legacy_MSE.r. I'm showing how to implement a bivariate real-time (concurrent) filter in the MDFA-package and I'm analyzing various leading indicator designs (various combinations of lead and noise parameters). The whole material is copy-paste from MDFA, chapter 4 which highlights plain vanilla mean-square error (MSE-) performances.

In the following I'm assuming that you installed MDFA_Legacy_MSE.r , see MDFA tutorial, and that the code is running faultlessy up to chunk number 33. The example below will start from there.

Leading Indicator

A leading indicator $l_t$ is a time series that is related to a particular target series $x_t$ and which is 'anticipating' the latter by a (specified or unspecified) amount of time (in the mean). I have no interest in diving into a fruitless discussion about causality aspects here, so I won't comment on the pertinence of a leading indicator design (I have my own ideas on the topic and they suggest that providers of leading indicators are to some extent illusionists: they distract from the main problem). Instead, I'll here adopt an empirical perspective founding on a simple statistic.

The strength of the link as well as the lead-time can be conveniently measured by the so-called peak-correlation concept

\[\arg\left(\max_\delta \left(cor(x_t,l_{t-\delta})\right)\right)\]
The integer $\delta_0$ which maximizes the correlation can be interpreted as the lead ($\delta_0>0$) or lag ($\delta_0<0$) of $l_t$, with respect to $x_t$, and the value of the correlation at $\delta_0$, the peak-correlation, can be interpreted as the strength of the link between both series.

The higher the lead the better the indicator (ceteris paribus).
The higher the peak-correlation, the better the indicator (ceteris paribus).
Ideally, both the lead as well as the peak-correlation should be large.
In economic applications one often observes a tradeoff: the higher the lead the weaker the link (peak correlation) and conversely.
By adopting a correlation we skip causality.

Examples of leading indicators include the CLI of the OECD, the LEI of the conference board or the WLI of the ECRI. In my experience indicators get 'noisier' when reaching further into the alleged future; as a consequence, the peak correlation shrinks.

Simulation Framework

A simple model reflecting (part of) my experience is
\[l_t=x_{t+\delta}+s\epsilon_t\]
where $\epsilon_t$ is white noise. The parameter $\delta$ reflects the lead-time and $s$ accounts for the magnitude of the noise (it relates to the Signal to Noise ratio SNR). Larger $s$ decrease the peak-correlation. Typically, in applications, $\delta$ and $s$ would appear to be positively 'correlated'. The model is not causal if $\delta>0$ i.e. all sorts of 'back to the future' anomalies could be imagined (fortunately, this is just a means to evaluate filter performances).

1. Data (Time Domain)

Here's the corresponding R-code, chunk number 34 (recall that you have to run MDFA_Legacy_MSE.r through all previous chunks). I'm assuming

$\delta=1$ (shift by one time-unit),
$s=0.1$,
$\epsilon_t$ is standardized Gaussian noise and
$x_t$ is a realization of an AR(1) with coefficient 0.9 and standardized Gaussian noise mutually uncorrelated with $\epsilon_t$ (the series is generated in earlier chunks, the set.seed of both series differ)

###################################################
### code chunk number 34: exercise_dfa_ms_4
###################################################
set.seed(12)
# Select the AR(1)-process with coefficient 0.9
i_process<-1
# Scaling of the idiosyncratic noise
scale_idiosyncratic<-0.1
eps<-rnorm(nrow(xh))
indicator<-xh[,i_process]+scale_idiosyncratic*eps
# Data: first column=target, second column=x,
# third column=shifted (leading) indicator
data_matrix<-cbind(xh[,i_process],xh[,i_process],c(indicator[2:nrow(xh)],NA))
dimnames(data_matrix)[[2]]<-c("target","x","leading indicator")
# Extract 120 observations from the long sample
data_matrix_120<-data_matrix[lenh/2+(-len/2):((len/2)-1),]
head(round(data_matrix_120,4))

The data matrix displayed in the R-console should look like

     target      x leading indicator
[1,] 1.0687 1.0687            0.8137
[2,] 0.9497 0.9497            0.6150
[3,] 0.6628 0.6628            1.7381
[4,] 1.6465 1.6465            1.6748
[5,] 1.7680 1.7680            1.7033
[6,] 1.8317 1.8317            2.4977

The explanatory variables in columns 2 and 3 (the target series is always in the first column) contain the target series (replicated in the second column) as well as the indicator (third column): it is a bivariate design. Note that if the target series is part of the explanatory series (as is the case here) then it must be placed in the second column (I could detect that in the code but I didn't yet...).

2. Data (Frequency-Domain)

We feed the data to MDFA as transformed (mapped) in the frequency-domain (see my previous blog-posts):

###################################################
### code chunk number 35: exercise_dfa_ms_4
###################################################
# Fully in sample
insample<-nrow(data_matrix_120)
# d=0 for stationary series: see default settings
weight_func<-spec_comp(insample, data_matrix_120, d)\$weight_func

The spectrum (DFT) should read

> head(weight_func)
                  target                   x   leading indicator
[1,] -2.854807+0.000000i -2.854807+0.000000i -2.929527+0.000000i
[2,] -0.523832+5.512559i -0.523832+5.512559i -0.446467+5.517594i
[3,]  3.698074+4.520299i  3.698074+4.520299i  3.952730+4.094188i
[4,]  4.110167+1.665234i  4.110167+1.665234i  4.196619+1.005825i
[5,] -0.890288-3.018920i -0.890288-3.018920i -1.650795-2.787001i
[6,]  0.426674-1.510247i  0.426674-1.510247i -0.107234-1.588954i

in the R-console. The shift (lead), by one time-unit, is reflected by a rotation (relative phase) in the DFT in the third column (hard to sense by eyeballing...).

3. Estimation

The parameters controlling the MSE-wrapper MDFA_mse were set in previous chunks. Specifically,

Gamma: we assume an ideal lowpass with cutoff=$\pi/6$
Filter length $L=12$
Lag=0 (nowcast)

Thus we can proceed to estimation:

###################################################
### code chunk number 36: exercise_dfa_ms_4
###################################################
# Source the default (MSE-) parameter settings
source(file=paste(path.pgm,"control_default.r",sep=""))
# Estimate filter coefficients
mdfa_obj<-MDFA_mse(L,weight_func,Lag,Gamma)\$mdfa_obj
# Filter coefficients
b_mat<-mdfa_obj\$b
dimnames(b_mat)[[2]]<-c("x","leading indicator")
dimnames(b_mat)[[1]]<-paste("Lag ",0:(L-1),sep="")#dim(b_mat)
head(b_mat)

Sourcing the hyper-parameters (source(file=paste(path.pgm,"control_default.r",sep="")) in the above snippet is not necessary (the line could be discarded), because the MSE-wrapper automatically sets all 'sleeping' hyper-parameters. In any case, the coefficients should look like

> head(b_mat)
               x leading indicator
Lag 0 0.20556332        0.39969599
Lag 1 0.35970890       -0.08021796
Lag 2 0.21659593       -0.18695421
Lag 3 0.14359475       -0.06108555
Lag 4 0.13724690       -0.02475913
Lag 5 0.06399915       -0.09717151

We can see that

Most weight is attributed by the filter to the last observation of the leading indicator (0.39969599): this is future information (news) not contained in $x_t$.
The weights assigned to higher lags (1,2,...) of the leading indicator by the filter are small (the leading indicator is noisy)
The weights assigned to $x_t$ (first column) correspond to a typical lowpass smoothing.

Makes sense so far. Next we display the criterion value which is just the (uniformly superconsistent) estimate of the (unobservable) sample mean-square filter error

###################################################
### code chunk number 37: exercise_dfa_ms_4
###################################################
# Criterion value
round(mdfa_obj\$MS_error,3)

Which should display in the R-console as

> # Criterion value
> round(mdfa_obj\$MS_error,3)
[1] 0.145

So we expect that the output of the concurrent bivariate filter should track the target (the unobservable output of the ideal lowpass) up to a mean-square error of 0.145 (the assertion is checked below).

4. Filter the Data

We can now apply the bivariate filter to the explanatory data (columns 2 and 3 in data_matrix: the first column is reserved for the target)

###################################################
### code chunk number 38: exercise_dfa_ms_4
###################################################
yhat_multivariate_leading_indicator<-rep(NA,len)
for (j in 1:len)
yhat_multivariate_leading_indicator[j]<-sum(apply(b_mat*
data_matrix[lenh/2+(-len/2)-1+j:(j-L+1),2:3],1,sum))

Evaluation

We here

benchmark the bivariate design against the univariate (assessing the added-value of the leading indiactor) and
we check the uniform superconsistency argument in the bivariate case, see DFA-MSE-criterion for background.

For the latter point we can rely on $y_t$ (the output of the symmetric filter) which is observable in our particular simulation context (in typical applications $y_t$ cannot be observed). The target $y_t$ has been previously computed in chunk number 19. Thus we can compute the sample-MSEs $mean((y_t-\hat{y}_t)^2)$ for univariate and bivariate filters:

###################################################
### code chunk number 39: exercise_dfa_ms_4
###################################################
y_target_leading_indicator<-y[,i_process]
perf_mse<-matrix(c(mean(na.exclude((yhat_multivariate_leading_indicator-
                                      y_target_leading_indicator))^2),
                   mean(na.exclude((yhat[,i_process]-
                                      y_target_leading_indicator))^2)),nrow=1)
dimnames(perf_mse)[[2]]<-c("bivariate MDFA","DFA")
dimnames(perf_mse)[[1]]<-"Sample MSE"
round(perf_mse,3)

This should obtain as

> round(perf_mse,3)
           bivariate MDFA   DFA
Sample MSE          0.139 0.321

in the R-console. We infer that:

By using the leading indicator the sample-MSE could be reduced from 0.321 (DFA) to 0.139 (MDFA)
The criterion value (0.145, see above) is close to the generally unobserved sample-MSE of the MDFA (0.139). This tight fit is a consequence of 'uniform superconsistency', see DFA-MSE-criterion. Statistical efficiency of the MDFA founds (at least to some extent) on this property.

Here's a graphical comparison of estimates and of target (the latter is generally not observable)

###################################################
### code chunk number 40: z_mdfadfa_ar1_output.pdf
###################################################
i<-1
ymin<-min(min(y[,i]),min(na.exclude(yhat)[,i]))
ymax<-max(max(y[,i]),max(na.exclude(yhat)[,i]))
ts.plot(yhat[,i],main=paste("Sample MSE MDFA: ",ylab="",
round(perf_mse[1],3),", DFA: ",round(perf_mse[2],3),sep=""),col="blue",
ylim=c(ymin,ymax))
lines(y[,i],col="red")
lines(yhat_multivariate_leading_indicator,col="green")
mtext("DFA", side = 3, line = -2,at=len/2,col="blue")
mtext("target", side = 3, line = -1,at=len/2,col="red")
mtext("MDFA", side = 3, line = -3,at=len/2,col="green")

The output of the MDFA (green line) is shifted to the left (faster) when compared to the DFA (blue line), as expected
The output of MDFA is also slightly smoother (less noisy). We will provide corresponding formal measures in later blog-posts.
the red line (the target) is generally unobservable, in particular towards the sample end $t=T$ (it is observable here because of our particular experimental design).

Lead vs. Noise in Leading Indicators

We are free to vary $s$ (related to the inverse signal to noise ratio) and $\delta$ at discretion in our simulation design. In order to illustrate the effects of these parameters on MSE-performances of the bivariate design we define a grid of parameter values

###################################################
### code chunk number 42: exercise_dfa_ms_4
###################################################
# Inverse SNR: the variance of the standardized noise is one:
#   we thus normalize by the standard deviation of the data x
#   (second column of the data matrix)
scale_idiosyncratic_vec<-c(0,0.1,0.5,1,2)/sqrt(var(data_matrix_120[,2]))
# We select fractional leads: multiples of 0.25
#   A fractional lead of 0.25 corresponds roughly to a week
#   on a monthly time scale
delta_vec<-0.25*0:4

Next, we compute the MSE (the criterion value) at each grid-point (nested loop):

###################################################
### code chunk number 43: exercise_dfa_ms_4
###################################################
# Initialize the performance matrix
lead_snr_mat<-matrix(ncol=length(scale_idiosyncratic_vec),
                     nrow=length(delta_vec))
dimnames(lead_snr_mat)[[2]]<-paste("1/SNR=",
                                   sqrt(var(data_matrix_120[,1]))*scale_idiosyncratic_vec,paste="")
dimnames(lead_snr_mat)[[2]][1]<-paste("Univ. design: ",
                                      dimnames(lead_snr_mat)[[2]][1],sep="")
dimnames(lead_snr_mat)[[1]]<-paste("Lead ",delta_vec,paste="")
# Generate the idiosyncratic noise
set.seed(20)
eps<-rnorm(nrow(data_matrix_120))
# Loop over all combinations of leads and SNR-ratios
for (i in 1:length(scale_idiosyncratic_vec))#i<-4
{
for (j in 1:length(delta_vec))#j<-1
{
    # Add the (suitably scaled) noise: no lead yet.
    indicator<-data_matrix_120[,2]+scale_idiosyncratic_vec[i]*eps
    # Overwrite the indicator column with the new time series
    data_matrix_120[,3]<-indicator
    # Compute the DFTs (full in-sample, for stationary series d=0)
    insample<-nrow(data_matrix_120)
    weight_func<-spec_comp(insample, data_matrix_120, d)\$weight_func
    # Compute the discrete frequency-grid omega_k: from zero to pi
    omega_k<-(0:(nrow(weight_func)-1))*pi/(nrow(weight_func)-1)
    # Introduce the fractional time-shift by rotation of the DFT
    #   of the indicator (last column)
    weight_func[,ncol(weight_func)]<-exp(-1.i*delta_vec[j]*omega_k)*
      weight_func[,ncol(weight_func)]
    # If the idiosyncratic noise is zero, then we use a univariate design
    if (i==1)
      weight_func<-weight_func[,-2]
    # Compute optimal filters and derive the (frequency-domain) MSE
    mdfa_obj<-MDFA_mse(L,weight_func,Lag,Gamma)\$mdfa_obj #mdfa_obj\$b
    # Store the MSE
    lead_snr_mat[j,i]<-mdfa_obj\$MS_error
}
}

and collect the results in a table:

###################################################
### code chunk number 44: z_dfa_ar1_output.pdf
###################################################
library(Hmisc)
require(xtable)
#latex(cor_vec, dec = 1, , caption = "Example of using latex to create table",
#center = "centering", file = "", floating = FALSE)
xtable(lead_snr_mat, dec = 1,digits=rep(3,dim(lead_snr_mat)[2]+1),
       paste("Effect of lead and of (inverse) signal-to-noise ratio on filter MSE",sep=""),
       label=paste("lead_snr_mat",sep=""),
       center = "centering", file = "", floating = FALSE)

Let's have a look at the table:

> lead_snr_mat
            Univ. design: 1/SNR= 0  1/SNR= 0.1  1/SNR= 0.5  1/SNR= 1  1/SNR= 2 
Lead  0                   0.3137716   0.2634751   0.2634751 0.2634751 0.2634751
Lead  0.25                0.2606970   0.1728124   0.2294936 0.2536344 0.2644370
Lead  0.5                 0.2154068   0.1389831   0.1882282 0.2151649 0.2484882
Lead  0.75                0.1773698   0.1236558   0.1574768 0.1787083 0.2200445
Lead  1                   0.1459904   0.1213045   0.1288397 0.1459820 0.1886407

A full explanation is provided at the end of chapter 4 in MDFA.

MSE generally (but not always) decreases when the lead increases (from bottom to top).

The one exception is seen in the last column, when the noise is strong: in such a case the effects of a small lead (going from 0 to 0.25) might be hidden by noise (in the above simulation context and for the particular random seed used).

MSE increases when the signal to noise ratio (SNR) decreases or, equivalently, when 1/SNR increases (from left to right, but ignore the first column)
The first column is 'special'

It corresponds to a univariate design whereby the single explanatory series is the leading indicator (this design cannot be replicated in DFA because target series and explanatory series differ). The reason is simple: for vanishing noise and lead the design is singular (both explanatory series are identical). And if the lead is positive, then $x_t$ is uninformative i.e. all relevant information is already conveyed by the leading series (which is not noisy).
All other columns in the table correspond to bivariate designs with $x_t$ and $l_t$ as explanatory variables.
The results in the first column cannot be compared directly with the remaining columns.

The first row is 'special' too because the lead vanishes.

Therefore, all bivariate designs are informationally equivalent (subtracting the first column of the data-matrix from the second column of the data-matrix gives $x_t+s\epsilon_t-x_t=s\epsilon_t$). Therefore the MSE-performance (0.2634751) is fixed in the corresponding cells. The spurious decrease from 0.3137716 (univariate design) to 0.2634751 is due to overfitting since $\epsilon_t$ is not related to $x_t$.

The fractional (non-integer) lead times 0.25,0.5 and 0.75 were obtained in the frequency-domain (by rotation).

We observe that different grid-points (combinations of $\delta,s$) lead to similar MSE-performances. So for example Lead 1 and 1/SNR=2 (with an MSE of 0.188) is similar to Lead 0.5 and 1/SNR=0.5 (also MSE of 0.188). Therefore, leads can be exchanged for noise without affecting MSE. This standoff, in terms of MSE performances, does not necessarily mean that one particular constellation (for example Lead 1, 1/SNR=2) is not preferable to the alternative. I'd assume that (everything else being equal) typical users would prefer a higher lead at costs of larger noise. Customization (ATS-trilemma) will aim at higher lead and smaller noise: a double-score hardly to obtain in a classic MSE-perspective.

Where Are We? What's Next?

We just went through the whole sample file MDFA_Legacy_MSE.r proposed in the MDFA tutorial. The R-code is copy-pasted from chapter 4 in MDFA. We also replicated a bunch of slides in Advances in Signal Extraction and Forecasting posted in my first ever blog-post on this site. More tangibly, we know

how to set-up the data matrix
how to map the data into the frequency-domain by means of a simple yet effective transformation
how to parametrize MDFA for the simplest possible setting:

MSE cost function
stationary series
no regularization
no customization
equally-sampled data.

how to forecast, nowcast and backcast signals (parameter Lag in the function calls).

Chapter 5 in MDFA is about backcasting but I'll probably skip the topic due to lack of interest

Next... maybe an introduction to the ATS-trilemma and then some customization.

Search This Blog

Signal Extraction and Forecasting

Next Time