MDFA-Package: Consistency Check

Here's a very simple and short example illustrating replication of  (the univariate) DFA-MSE criterion by the multivariate MSE-wrapper as well as by the generic multivariate MDFA-function of the MDFA-package (consistency check). The example briefly goes over the parametrization of MDFA-functions (discussed in step-by-step introduction). Sample code relies on MDFA_Legacy_MSE.r  (see MDFA tutorial for installation).

I assume that you installed MDFA_Legacy_MSE.r on your machine and that you ran the code (hopefully faultlessly) up to code chunk number 28. The example below then starts from there. In the following I briefly illustrate the four steps proposed at the end of step-by-step introduction:
  • specify data in time-domain
  • transform into frequency-domain
  • compute optimal real-time (concurrent) filter
  • filter the data

1. Define the Data-Matrix

I here select one (of the three time series generated in chunk number 18): i_process<-1 is the strongly positively autocorrelated AR(1)-process used in target signal:

###################################################
### code chunk number 29: exercise_dfa_ms_4
###################################################
# Select the first process
i_process<-1
# Define the data-matrix:
# The first column must be the target series.
# Columns 2,3,... are the explanatory series. In a univariate setting
# target and explanatory variable are identical
data_matrix<-cbind(x[,i_process],x[,i_process])

2. Transform into Frequency-Domain

I'm using the DFT (Discrete Fourier Transform) in this example as implemented in the function spec_comp of the MDFA-package. Note that the parameter d should be zero (it is initialized earlier in the code chunks) which means that the data is supposed to be stationary: this is my default-setting when working with differenced economic data. The integer insample allows to select a restricted data-sample in data_matrix above (rows 1:insample are used for estimation) so that filter outputs for time points $t>$insample would be effectively out-of-sample. The selection below means that I'm using all the data for estimating the filter.


# Determine the in-sample period (fully in sample)
insample<-nrow(data_matrix)
# Compute the DFT by relying on the multivariate DFT-function:
#   d=0 for stationary data (default settings)
weight_func<-spec_comp(insample, data_matrix, d)$weight_func


Let's have a look at weight_func (this is not part of the provided code chunk)

head(weight_func,10)

You should see the following content in the R-console:

                       [,1]                  [,2]
 [1,] -2.8548067+0.0000000i -2.8548067+0.0000000i
 [2,] -0.5238317+5.5125590i -0.5238317+5.5125590i
 [3,]  3.6980745+4.5202991i  3.6980745+4.5202991i
 [4,]  4.1101674+1.6652342i  4.1101674+1.6652342i
 [5,] -0.8902880-3.0189199i -0.8902880-3.0189199i
 [6,]  0.4266737-1.5102474i  0.4266737-1.5102474i
 [7,] -1.5153188-0.5127674i -1.5153188-0.5127674i
 [8,]  0.8298792-0.9027621i  0.8298792-0.9027621i
 [9,] -1.0273537-0.3603582i -1.0273537-0.3603582i
[10,] -1.6578681+0.7185627i -1.6578681+0.7185627i

This is the data as transformed in the frequency-domain (DFT). The matrix is complex-valued except at frequencies $\omega=0$ (the first row) and $\omega=\pi$ (last row not seen here). Both columns are identical because the target series is also the (unique) explanatory series in a univariate framework.

3. Compute Concurrent Filter

We first rely on the generic MDFA-function mdfa_analytic. Note that the coefficients in b[,i_process] (in the chunk below) were generated by the DFA function dfa_ms in an earlier chunk (chunk number 20 which is assumed to be run) and therefore we put together the estimates of dfa_ms (univariate) and of mdfa_analytic, for sake of comparison. Both estimates should be identical (not trivial because mdfa_analytic is a much (much!) more complex estimation procedure). We source the file control_default.r in order to initialize the (very) long list of parameters of the generic mdfa_analytic function, see understanding the MDFA package.

###################################################
### code chunk number 30: exercise_dfa_ms_4
###################################################
# Source the default (MSE-) parameter settings
source(file=paste(path.pgm,"control_default.r",sep=""))
# Estimate filter coefficients:
mdfa_obj<-mdfa_analytic(L, lambda, weight_func, Lag, Gamma, eta, cutoff, i1,i2, weight_constraint, lambda_cross, lambda_decay, lambda_smooth,lin_eta, shift_constraint, grand_mean, b0_H0, c_eta, weight_structure,white_noise, synchronicity, lag_mat, troikaner)
# Filter coefficients: compare MDFA and previous DFA
b_mat<-cbind(mdfa_obj$b,b[,i_process])
dimnames(b_mat)[[2]]<-c("MDFA","DFA")
dimnames(b_mat)[[1]]<-paste("lag ",0:(L-1),sep="")
as.matrix(round(b_mat,5))


We next rely on the MSE-wrapper MDFA_mse which pre-conditions the generic criterion for MSE-duties (no customization or regularization at this stage), see understanding the MDFA package for context.

###################################################
### code chunk number 31: exercise_dfa_ms_4
###################################################
mdfa_obj_mse<-MDFA_mse(L,weight_func,Lag,Gamma)$mdfa_obj


We put together all three estimates and compare them

###################################################
### code chunk number 32: exercise_dfa_ms_4
###################################################
b_mat<-cbind(b_mat,mdfa_obj_mse$b)
dimnames(b_mat)[[2]][3]<-"MDFA_mse"
dimnames(b_mat)[[1]]<-paste("lag ",0:(L-1),sep="")
head(as.matrix(round(b_mat,5)))


You should see

         MDFA     DFA MDFA_mse
lag 0 0.53821 0.53821  0.53821
lag 1 0.10039 0.10039  0.10039
lag 2 0.17419 0.17419  0.17419
lag 3 0.11221 0.11221  0.11221
lag 4 0.08075 0.08075  0.08075
lag 5 0.01972 0.01972  0.01972

in the R console. These are the first 6 coefficients of the univariate filter as computed by three different functions of the MDFA-package, whereby the univariate DFA is a much simpler/shorter code. We can see that DFA is indeed replicated by MDFA and therefore we may skip the corresponding DFA-functions, as well (I'm using them for teaching, mainly).

Note a slight difference when feeding the frequency-domain data to DFA vs feeding it to MDFA:
  • The DFA (dfa_ms) uses the periodogram which is the squared absolute value of the DFT (i.e. positive real-valued numbers)
  • The MDFA required two (identical) columns with the DFT instead (i.e. complex valued numbers)
Why the difference?
  • In a univariate setting the explanatory series is (automatically) also the target series and vice versa. Since both series are identical there cannot be any phase-shift between them. Therefore we can feed the periodogram.
  • In a multivariate setting the target series and the explanatory series do not have to coincide. Since the series generally differ, the relative phase shifts (lead/lag structure) are important. This information is contained in the DFT (but not in the periodogram). In a univariate context this 'additional information' is not important, of course, since the series are the same.

4. Filter the Data


This is done in an earlier code chunk number 20 that is not replicated here, see dfa-mse-criterion.



Comments

Popular posts from this blog

What is a 'Direct Filter Approach'?