DFA-MSE Criterion: an Intuitive Perspective

In the previous blog-post (1) I introduced some formal arguments justifying the DFA-MSE criterion. Here I'd like to provide some intuitive background. I'll introduce transfer functions, amplitude functions and time-shift functions. Moreover, I'll introduce concepts and ideas preparing for the ATS-trilemma (customization). Sample (R-) code will be based on MDFA_Legacy_MSE.r posted earlier, see MSE-tutorial.

Criterion

The DFA-MSE criterion analyzed in the previous blog-post (1) is




See slide 27 in Advances in Signal Extraction and Forecasting. The principle is a simple and intuitively appealing weighted approximation of the target transfer function $\Gamma(\omega_k)$ by the concurrent transfer function $\hat{\Gamma}(\omega_k)$ whereby the weighting function $S(\omega_k)$ receives the meaning of a spectrum (the data as transformed in the frequency-domain). Note that
  • the above sum runs across a grid of (typically equidistant) $\omega_k\in [-\pi,\pi]$ which can be restricted to $\omega_k\in[0,\pi]$ if the filter coefficients are real-valued (which is invariably the case in our applications).
  • the approximation of the unobserved expected squared error $E[(y_T-\hat{y}_T)^2]$ (we assume $h=0$: a nowcast) by its estimate on the right-hand side (the DFA-MSE criterion) is in some sense 'best possible', see the discussion in the previous blog-post (1)
    • Interpretation: the solution of the DFA-MSE criterion minimizes $E[(y_T-\hat{y}_T)^2]$ up to a smallest possible error-term.
 What's a transfer function, by the way?

Transfer, Amplitude, Phase and  Time-Shift Functions


A transfer function is a mapping of a filter in the frequency-domain:
\[\hat{\Gamma}(\omega)=\sum_{k=0}^{L-1}b_k\exp(-ik\omega)\]
where
  • $\omega\in [-\pi,\pi]$ (here $\omega\in [0,\pi]$)
  • $b_k,k=0,...,L-1$ are the filter coefficients. 
The filter coefficients $b_k$ can be recovered from the transfer function by inverse Fourier transform. For illustration the following code-lines are copy-paste from the DFA-function dfa_ms in the MDFA-package:

trffkt[1] <- sum(b)
    for (k in 1:(K)) {
        trffkt[k + 1] <- (b %*% exp((0+1i)*k*(0:(length(b)-1)) * pi/(K)))
    }
In frequency $\omega=0$ we have $\exp(-ik\omega)=1$ and therefore $\hat{\Gamma}(0)=\sum_{k=0}^{L-1}b_k$ which corresponds to the first line of the above snippet (the first component trffkt[1] corresponds frequency $\omega=0$). Also, $\omega_k=k\pi/K$ where $K=T/2$ (half the sample length). 


The transfer function can be decomposed into amplitude and phase functions, see DFA:



Why do we consider Amplitude and phase functions at all? Consider




Since any time series (any collection of ordered observations) can be decomposed into trigonometric waves, the above effects (scaling/time-shift) apply to any time series, by virtue of linearity. Therefore, amplitude and time-shift functions describe practically relevant properties of the (concurrent) filter:
  • the delay (as measured by the time-shift in the passband)
  • the noise leakage (as measured by the amplitude in the stopband)
In a trading framework, typically, an 'idealized' filter should be free of noise (reliable signals) and 'fast' (no delay). In practice, one could strive for 'slightly faster' and 'slightly more reliable' (than what other market participants are using).

Amplitude and Time-Shift 'Fits'

In order to illustrate the above ideas we rely, once again, on the sample code  MDFA_Legacy_MSE.r posted earlier, see MSE-tutorial. I here briefly discuss the relevant code chunks only (you have to run all previous code-lines in the file i.e. all chunks up to number 26 in order to obtain the figure below).

We here plot (analyze) amplitude and time-shift functions of the three concurrent filters computed in the previous post (1):


###################################################
### code chunk number 26: z_dfa_ar1_output.pdf
###################################################

omega_k<-pi*0:(len/2)/(len/2)
par(mfrow=c(2,2))
amp<-abs(trffkt)
shift<-Arg(trffkt)/omega_k
plot(amp[,1],type="l",main="Amplitude functions",
     axes=F,xlab="Frequency",ylab="Amplitude",col="black",ylim=c(0,1))
lines(amp[,2],col="orange")
lines(amp[,3],col="green")
lines(Gamma,col="violet")
mtext("Amplitude a1=0.9", side = 3, line = -1,at=len/4,col="black")
mtext("Amplitude a1=0.1", side = 3, line = -2,at=len/4,col="orange")
mtext("Amplitude a1=-0.9", side = 3, line = -3,at=len/4,col="green")
mtext("Target", side = 3, line = -4,at=len/4,col="violet")
axis(1,at=c(0,1:6*len/12+1),labels=c("0","pi/6","2pi/6","3pi/6",
                                     "4pi/6","5pi/6","pi"))
axis(2)
box()
plot(shift[,1],type="l",main="Time-shifts",
     axes=F,xlab="Frequency",ylab="Shift",col="black",
     ylim=c(0,max(na.exclude(shift[,3]))))
lines(shift[,2],col="orange")
lines(shift[,3],col="green")
lines(rep(0,len/2+1),col="violet")
mtext("Shift a1=0.9", side = 3, line = -1,at=len/4,col="black")
mtext("Shift a1=0.1", side = 3, line = -2,at=len/4,col="orange")
mtext("Shift a1=-0.9", side = 3, line = -3,at=len/4,col="green")
mtext("Target", side = 3, line = -4,at=len/4,col="violet")
axis(1,at=c(0,1:6*len/12+1),labels=c("0","pi/6","2pi/6","3pi/6",
                                     "4pi/6","5pi/6","pi"))
axis(2)
box()
plot(periodogram[,1],type="l",main="Periodograms",
     axes=F,xlab="Frequency",ylab="Periodogram",col="black",
     ylim=c(0,max(periodogram[,3])/6))
lines(periodogram[,2],col="orange")
lines(periodogram[,3],col="green")
mtext("Periodogram a1=0.9", side = 3, line = -1,at=len/4,col="black")
mtext("Periodogram a1=0.1", side = 3, line = -2,at=len/4,col="orange")
mtext("Periodogram a1=-0.9", side = 3, line = -3,at=len/4,col="green")
axis(1,at=c(0,1:6*len/12+1),labels=c("0","pi/6","2pi/6","3pi/6",
                                     "4pi/6","5pi/6","pi"))
axis(2)
box()


Which obtains as
The link of this figure to the DFA-MSE criterion (shown in the above slide) is as follows:
  • For the weighting function $S(\omega_k)$ in the criterion I selected the periodogram which can be seen seen in the bottom-left panel (for all three series).
  • The target $\Gamma(\omega_k)$ in the criterion corresponds to the violet line in the upper two panels of the figure: it's an ideal lowpass with cutoff $\pi/6$ (left panel) and it's time-shift vanishes (right panel) by symmetry (and positiveness) of the filter
  • The estimates $\hat{\Gamma}(\omega_k)$ are shown in the upper panels of the figure (black, yellow and green): they are decomposed into amplitude (left panel) and time-shift i.e. phase divided by frequency (right panel).

Let's first have a look at the bottom left panel: the periodograms of the three AR(1)-processes
  • The process corresponding to $a_1=0.9$ (black line) has a spectral bulk towards frequency zero, as expected for a process with strongly positive autocorrelation.
  • The process corresponding to $a_1=0.1$ (orange line) has no marked bulk (it's close to white noise)
  • The process corresponding to $a_1=-0.9$ (green line) has a spectral bulk towards frequency $\pi$, as expected for a process with a strongly negative autocorrelation. This latter process is the least practically relevant in the context of economic time series (I know of no relevant phenomenon). It has been included in order to cover a broad range of time series dynamics.
Let's now have a look at the amplitude functions in the top-left panel
  • The amplitude of the process corresponding to $a_1=0.9$ (black line)is closer to the target (violet line) towards the lower frequencies, were the spectral bulk of the series is located (weighted approximation principle). Towards the higher frequencies, the tightness of the fit is markedly relaxed (marked noise leakage)
  • In contrast, the amplitude of the process corresponding to $a_1=0.1$ (orange line) tries to match the target more 'uniformly' in $[0,\pi]$
  • Finally, the amplitude of the process corresponding to $a_1=-0.9$ (green line) is closer to the target (violet line) towards the higher frequencies, were the spectral bulk of the series is located (weighted approximation principle): it's amplitude is vanishingly small at frequency $\pi$. Towards the lower frequencies, the tightness of the fit is markedly relaxed.
What about the time-shifts in the top-right panel? First note that the time-shift of the target vanishes by symmetry of the filter (its transfer function is positive real-numbered).
  • The shift of the process corresponding to $a_1=0.9$ (black line) tends to be small in the passband, were the spectral bulk of the series is located. Guessing a bit I'd say that turning-points (of a trend) would be shifted by roughly 1 time unit, in the mean (sometimes more and sometime less).
  • The other two processes tend to show larger shifts in the passband.
    • They are not uniformly larger than the first filter, but larger on 'average' (weighted average). 
    • Guessing a bit, once again, I'd expect that turning-points of a trend would be delayed roughly be 2 time units, in the mean (sometimes more and sometimes less). 

Splitting MSE into Amplitude and Time-Shift  Contributions

Amplitude Contribution 

You might be wondering how the amplitude function can 'contribute' to (explain) mean-square performances of the filter. I briefly review the main effects:
  • Leakage: if the amplitude is off-target in the stopband ($\omega>cutoff$), as is the case for the black amplitude in the top-left panel, then undesirable high-frequency components (noise) can 'pass' the filter and contaminate the output: this noise contributes to the MSE.
    • If the high-frequency content of the series is weak (small periodogram in the stopband relative to the passband: black series) then the remaining noise passing the 'leaking' filter will be weak (in relative terms) and its contribution to MSE might become 'acceptable' (in a tradeoff perspective)
    • If the high-frequency content of the series is strong (large periodogram in the stopband relative to the passband: green series) then the remaining noise passing the 'leaking' filter will be strong (in relative terms) and its contribution to MSE might become 'unacceptable' (in a tradeoff perspective).
  • Scaling: if the amplitude is off-target in the passband ($\omega<cutoff$), as is the case of yellow and green filters in the top-left panel, then the output is either shrunken (amplitude smaller than target) or magnified (amplitude larger than target). In either case, the misspecified scaling contributes to the MSE.
    • Shrinkage (yellow and green filters) can be desirable (in a tradeoff perspective) if the high-frequency content of the series is strong (in relative terms) because the remaining noise (passing the 'shrinking' filter) will be shrunken, too i.e. its contribution to MSE 'shrinks'.
In summary: the amplitude function contributes to MSE in the passband (scaling) and in the stopband (leakage). There is a tradeoff involved between passband and stopband 'fits' which depends on the weighting function $S(\omega_k)$ (the periodogram in the above example). This tradeoff includes the time-shift too (amplitude and phase functions are functionally interlinked: for minimum phase filters, any one of the two functions could be derived from the other one).

Time-Shift Contribution

How does the time-shift contribute to MSE? Let's illustrate the topic by relying on a simple example, namely a linear trend $a+bt$ with intercept $a$ and slope $b$. If the filter shifts the series by one time unit to the right (without otherwise scaling the output) then the output will be $a+b(t-1)$ (a right shift is a delay) and the mean-square error will be $(a+bt-a-b(t-1))^2=b^2$. For arbitrary shift $\delta$ we obtain $\delta^2b^2$. Thus the time-shift contribution to MSE depends on the magnitude of the shift $\delta$ and on the slope $b$ of the trend but not on the level constant $a$. An immediate outcome is that the time-shift contribution to MSE is nil if $b=0$: if the series is flat, then the shift does not matter whatever the level $a$ of the series.
Let's have a look at cycles. Here's a short piece of code which applies an equally-weighted filter (lowpass) to a cosine: it computes the cosine, the filtered cosine as well as the squared filter error:


# Input series
omega<-pi/61
len<-120
x<-cos(1:len*omega)

# Filter weights: equally-weighted filter of length 12

L<-12
bkh<-rep(1/L,L)
# Normalization (so that scale of output is equal to scale of input)
norm_c<-abs(sum(bkh*exp(1.i*(0:(L-1))*omega)))
bk<-bkh/norm_c

# Filter series
yhat<-rep(NA,len)
for (i in L:len)
  yhat[i]<-sum(bk*x[i:(i-L+1)])

# Plot series, filtered series and squared error
par(mfrow=c(2,1))
ts.plot(x,col="blue",main="Series(blue), filtered series (red) and squared error (black)")
lines(yhat,col="red")
abline(h=0)
ts.plot((x-yhat)^2)


The plot should look like:

We can see
  • a shift of the filter output (red) 
  • the (squared) error introduced by the time-shift: it is maximal at the zero-crossings (of the series) and minimal at the extrema
    • notice that the error is entirely due to the time-shift because we normalized the filter (the scale of the output is the same as the scale of the input)

Armed with these simple examples we now distinguish pass and stopband contributions (to MSE) by the time-shift.
  • Passband:
    • As noted, the contribution (by the shift to MSE) along a cycle is strongest at the zero-line crossings (where the slope of the cycle is strongest) and will be (close to) vanishing at the peaks/troughs (where the slope is vanishing). This is obtained 'by construction': a by-product.
    • Controlling the time-shift implicitly means: smaller MSE-contributions (by the shift) specifically at the zero-crossings. 
    • If we are working with differenced data, then zero-crossings (crossings of the data at the zero-line) correspond to local extrema of the original series (in levels). Thus the above findings signify that -- as a by-product -- performances (MSE-contributions by the time-shift) are implicitly emphasized at the practically relevant turning-points (the extrema) of the original series. In many applications, these are the time-points towards which a decision-maker is supposed to make (well...) a decision (call a buy/sell signal, cut/tighten the interest rate).
  • Stopband: in the stopband, the target filter $\Gamma$ vanishes and therefore its output is a flat (zero-) line. Therefore, whatever the time-shift of the concurrent filter $\hat{\Gamma}$ in the stopband might be, its contribution to MSE is nil.

This last result is to be contrasted with the amplitude function which contributes both in the passband as well as in the stopband.

Disentangling both Effects 

The MSE-contribution of the amplitude emphasizes levels/scales whereas the MSE-contribution  by the shift emphasizes differences/slope. This functional mismatch impedes the analysis, in particular when amplitude and time-shift effects are both present and entangled, as is invariable the case in applications. Fortunately, both effects can be neatly disentangled in the frequency-domain. The interested reader might look at The Trilemma Between Accuracy, Timeliness and Smoothness in Real-Time Signal Extraction. Alternatively, I will dedicate a series of future blog-posts to the topic.

Summary

The last figure above provides a feeling for the mechanics of the DFA-MSE criterion:
  • $\Gamma(\omega)$ is close to $\hat{\Gamma}(\omega)$ at frequencies more heavily 'loaded' by the spectrum. 
  • This finding applies to amplitude and time-shifts too, with respect to their specific targets.
  • The MSE-perspective emphasizes amplitude functions and time-shift functions (fits thereof)  'equally' : there's no favorite, here. 
In order to prepare for customization (ATS-trilemma) we note that:
  • Many interesting applications (for example algorithmic trading or realtime business-cycle analysis) can be expressed in terms of 'idealized' amplitude and time-shift characteristics in pass- and stopband of the filter:
    • Fast filters (small delays) are characterized by small time-shifts in the passband (ceteris paribus)
    • Filters with a strong noise rejection are characterized by small amplitude functions in the stopband (ceteris paribus)
  • Therefore, we need a generic optimization concept which frames the somehow 'ad hoc' requirements into a fundamental (universal) tradeoff, generalizing the classic MSE-paradigm.
    • In order to proceed, we need to disentangle amplitude and shift contributions to MSE
    • The outcome of disentanglement will be the sought-after (fundamental) tradeoff









Comments

Popular posts from this blog

What is a 'Direct Filter Approach'?