DFA-MSE Criterion: a Technical Perspective

In the previous blog-post (1) I specified the generic target in signal extraction and forecasting. I now present and discuss the corresponding (MSE) optimization criterion, see Optimal Real-Time Filters for Linear Prediction Problems for background. The exposition addresses technical issues which will be illustrated by sample (R-) code (MDFA_Legacy_MSE.r posted earlier, see MSE-tutorial). An intuitive interpretation of the criterion is provided in the follow-up post.

DFA-MSE Criterion

Here's the estimation problem as described in DFA:

Note that the target $y_T$ (based on $\gamma_k$) is approximated by $\hat{y}_T$ which relies on a finite causal filter $b_0,...,b_{L-1}$ of length $L<T$ which is applied to the data $x_T,x_{T-1},...,x_{T-L+1}$: we here assume $t=T$ (sample end) and $h=0$ (causal filter) which will invariably be the case in our applications. For $h=0$ the filter is also called concurrent. The selection of a target filter $\gamma_k$ has been extensively discussed in (1) (with sample (R-) code).

If we observed the expected squared error $E[(y_T-\hat{y}_T)^2]$ in formula 46 above, then we could minimize that expression as a function of the filter coefficients $b_k$ and we would be done. Unfortunately we don't... so we have to look for something else:

OK: we're still stuck. Move forward one more step:

At least we have an expression (formula 49) which is eventually observable:

for any $b_k,k=0,...,L-1$ we obtain a positive real number (depending on the available data $x_1,...,x_T$).
now we could minimize that 'number' (function) as a function of $b_k,k=0,...,L-1$ for given data $x_1,...,x_T$.

But why should we minimize that number (function in formula 49) and not any other number/function?

Efficiency

Consider the following technical arguments:

The approximation error in 49 is unusually small. In technical terms

the right hand-side is a uniformly superconsistent estimate of the left hand side
uniformity means that superconsistency is maintained irrespective of optimization
superconsistency means: the error is smaller, in mean-square, than classic estimates (asymptotically, the order is smaller than $1/T$ in mean-square)

On the other hand, the expression on the left hand side of 48, the mean-square error, is an asymptotically efficient estimate of the unobserved expected squared error $E[(y_T-\hat{y}_T)^2]$.

In summary: we can minimize 49 and the resulting filter $\hat{b}_k$ will also minimize $E[(y_T-\hat{y}_T)^2]$ up to a (smallest possible) approximation error.

An Illustration of Uniform Superconsistency

In order to illustrate the above ideas we rely, once again, on the sample code MDFA_Legacy_MSE.r posted earlier, see MSE-tutorial. I here briefly discuss the relevant code chunks (you have to run all previous lines in the file i.e. all chunks up to number 20).

In the following chunk number 20 we specify the problem-structure to be solved by the criterion:

the filter length (L=12),
we want a concurrent filter (Lag=0) and
we define the transfer function Gamma of the (symmetric) target filter with cutoff $\pi/6$ (you may have a look at my previous post (1) for an introduction to 'ideal' filters)

###################################################
### code chunk number 20: exercise_dfa_ms_3
###################################################
plot_T<-F
periodogram<-matrix(ncol=3,nrow=len/2+1)
trffkt<-periodogram
perf_mat<-matrix(nrow=3,ncol=2)
dimnames(perf_mat)[[2]]<-c("Criterion Value",
                           "Mean-Square Sample Filter Error")
dimnames(perf_mat)[[1]]<-c("a1=0.9","a1=0.1","a1=-0.9")
# Filter length
L<-12
# Real-time design
Lag<-0
# Target ideal trend
Gamma<-c(1,(1:(len/2))<len/12)

Here's a plot of the transfer function (this plot is not included in the sample R-code)

plot(Gamma,type="l",main="Target",
     axes=F,xlab="Frequency",ylab="Amplitude",col="black",ylim=c(0,1))
axis(1,at=c(0,1:6*len/12+1),labels=c("0","pi/6","2pi/6","3pi/6",
                                     "4pi/6","5pi/6","pi"))
box()

The target is not specified in terms of $\gamma_k$ but in terms of $\Gamma(\omega_k)$. In fact, in typical applications we never filter the data with $\gamma_k$ (although we do it in this simulated example because we want a benchmark) i.e. we don't need to know $y_t$ for deriving it's estimate $\hat{y}_t$ (DFA is implemented in the frequency-domain). The transfer function in the above plot does not appear discontinuous (ideal filters are discontinuous...) because it is defined on a discrete grid (of frequencies): the seemingly continuous passage from passband to stopband is a graphical artifact (instead of lines we should use dots for representing the target). Finally, note that negative Lag-settings (Lag=-1,-2,...) would correspond to signal forecasts, whereas strictly positive Lag(s) would correspond to backcasts (Lag corresponds to $-h$ in the above DFA-capture). Here, we are interested in a nowcast (Lag=0).

Next we compute DFA-based concurrent filters for each of the three time series encountered in the previous blog post (1) (AR(1) processes with $a=-0.9,a=0.1$ and $a=0.9$), relying on the MSE-wrapper of the DFA function in the MDFA-package:

b<-matrix(nrow=L,ncol=3)
# Compute real-time filters
for (i in 1:3)#i<-1
{
# Compute the periodogram based on the data (length 120)
periodogram[,i]<-per(x[,i],plot_T)\$per
# Optimize filters
filt<-dfa_ms(L,periodogram[,i],Lag,Gamma)
trffkt[,i]<-filt\$trffkt
b[,i]<-filt\$b
# Compute real-time outputs (we can use the longer series in order
# to obtain estimates for time points t=1,...,11)
for (j in 1:len)
yhat[j,i]<-filt$b%*%xh[lenh/2+(-len/2)-1+j:(j-L+1),i]
}

We loop over the three time series, compute the corresponding periodograms (the weighting function in the criterion), apply dfa_ms and extract transfer functions as well as filter coefficients for each series. Finally, we proceed to filtering of each time series in the time domain, recall (3) (next to last section in that entry).

We now proceed to empirical verification of the uniform superconsistency argument. For that purpose we have to compute 49 which is the criterion value at the optimum for each of the three series (the following code replicates 49)

###################################################
### code chunk number 21: exercise_dfa_ms_3
###################################################
for (i in 1:3)
{
# Compute criterion values
perf_mat[i,1]<-(2*pi/length(Gamma))*
abs(Gamma-trffkt[,i])^2%*%periodogram[,i]
}
perf_mat[,1]

According to our technical argument, above, these values should be close to the left-hand side of 48. Let's compute the latter too

###################################################
### code chunk number 22: exercise_dfa_ms_4
###################################################
# Compute time-domain MSE
mse<-apply(na.exclude((yhat-y))^2,2,mean)
perf_mat[,2]<-mse
round(perf_mat[,2],3)

Just to be clear:

In practice one does not observe $y_t$ and therefore the last expression, the mean-square filter error, cannot be computed. In contrast, the former expression, the DFA-criterion, can always be computed
We were able to compute the filter mean-square error here, because we generated very long time series and because we looked at a short sample in the middle of the (much longer) series.

So let's compare both statistics for all three processes

###################################################
### code chunk number 23: z_dfa_ar1_output.pdf
###################################################
library(Hmisc)
require(xtable)
#latex(cor_vec, dec = 1, , caption = "Example of using latex to create table",
#center = "centering", file = "", floating = FALSE)
xtable(perf_mat, dec = 1,digits=rep(3,dim(perf_mat)[2]+1),
       paste("Criterion values vs. sample (mean-square) filter errors",sep=""),
       label=paste("perf_mat",sep=""),
       center = "centering", file = "", floating = FALSE)

If your code ran faultlessly then you should have obtained:

        Criterion Value Mean-Square Sample Filter Error
a1=0.9       0.31377156                      0.32137867
a1=0.1       0.05867743                      0.06007367
a1=-0.9      0.02756990                      0.02793541

The first (data-) column corresponds to formula 49 and the second column to formula 48. Uniform superconsistency means that both columns should be 'alike'. Are they? Well, we know that they are closest possible 'in the mean' (asymptotically). Minimizing the criterion (first column) is all we can do in practice, because the sample mean-square error (second column) is not available, at least in typical applications. Efficiency, of the DFA, is closely (but not exclusively) related to the 'best possible' approximation of the second column by the first column.

Let's conclude by comparing outputs of symmetric (red lines) and concurrent (blue lines) filters:

###################################################
### code chunk number 24: z_dfa_ar1_output.pdf
###################################################
par(mfrow=c(3,1))
for (i in 1:3)   #i<-1
{
ymin<-min(min(y[,i]),min(na.exclude(yhat)[,i]))
ymax<-max(max(y[,i]),max(na.exclude(yhat)[,i]))
ts.plot(yhat[,i],main=paste("Time-domain MSE = ",
                              round(mse[i],3)," , Frequency-domain MSE = ",
                              round(perf_mat[i,1],3),", a1 = ",a_vec[i],sep=""),col="blue",
          ylim=c(ymin,ymax),
          gpars=list(xlab="", ylab=""))
lines(y[,i],col="red")
mtext("Real-time", side = 3, line = -1,at=len/2,col="blue")
mtext("target", side = 3, line = -2,at=len/2,col="red")
}

This should obtain as

The red lines are unavailable to the analyst, at least towards the sample end $t=T$. We could obtain them here because of our particular ('cheating') simulation framework. We can see that

Concurrent filters are subject to delay (right-shift of blue lines as compared to red lines)
Concurrent filters are subject to leakage (rippled/noisy blue lines as compared to smooth red lines)

Whatever you might think about this outcome, it's what MSE is supposed to do! In practice, however, one might wish 'faster' filters (less delay) and stronger noise suppression (smoother outputs). That's the topic of customization.

Next?

We'll look at the DFA-MSE criterion from an intuitive perspective. This will prepare for the ATS-trilemma and customization.

Search This Blog

Signal Extraction and Forecasting

Next Time