The intuitive simplicity of the classic model-based approach discussed in
(1) distracts from a deeply-rooted problem: an implicit assumption which does not hold, in practice, and whose undesirable effects are magnified in signal extraction, in particular. I here introduce and briefly discuss the 'broken link'.
Estimation Problem and Scope
The discussion herein emphasizes
MSE-performances (mean-square error). Controlling the time-shift and/or the amplitude functions, see
(2) for definitions, are beyond the scope of this entry (ATS-trilemma).
The classic model-based approach tracks a generic (unobserved) target $y_t$
(see
(1) for a discussion of target signals) by plugging forecasts into expression 1:
In general, one is interested in estimating $y_T$ towards the sample end $t=T$ (therefore backcasts can be neglected). If the weights $\gamma_k$ decay slowly (which is often the case in practice), then multi-step ahead forecasts of possibly large horizon are required. A natural question arises: is that solution (plugging forecasts in) 'smart'? It depends...
Maximum Likelihood
Let's assume that the data has been obtained from a Gaussian AR(1)-process and that $\hat{\mu}$, $\hat{a_1}$ are the maximum likelihood (minimum mean-square one-step ahead forecast error) estimates of the unknown parameters $\mu,{a_1}$ (the unconditional term is ignored here i.e. we identify
conditional and unconditional maximum likelihood concepts). Then $\hat{\mu}$, $\hat{a_1}$ inherit an optimality property: they are closest possible (MSE-) to $\mu, {a_1}$ asymptotically.
Invariance Theorem: a Justification for the Model-Based Solution
Interestingly, this optimality property extends to a function $g(\mu,a_1)$ of the parameters $\mu,a_1$ i.e. $\hat{g}:=g(\hat{\mu},\hat{a_1})$ is closest possible to $g(\mu,a_1)$. This is called the invariance property, see for example
(4):
Now... multi-step ahead forecasts are functions of ${a_1},\mu$ and, more generally, the model-based target estimate in the second slide (plugging forecasts in) is a function of ${a_1},\mu$. We deduce that the model-based signal extraction estimate in slide 2, relying on $\hat{a}_1,\hat{\mu}$, must inherit optimality from the maximum likelihood estimate, in this context. So far so good.
Broken Link
Misspecification
In general, models are misspecified (I skip listing possible causes of misspecification but non-Gaussianity is typical in economic applications) and therefore $\hat{\mu}$, $\hat{a_1}$, as obtained by minimizing the one-step ahead mean square forecast error, do not maximize the likelihood, anymore.
- Of course, one step-ahead forecasts obtained from the fitted AR(1)-model might still be of 'good' quality, in particular if the model residuals are close to white noise (they may even be optimal in some sense)
- But the powerful invariance principle does not hold anymore i.e. $\hat{g}:=g(\hat{\mu},\hat{a_1})$ is generally not 'closest possible' to $g(\mu,a_1)$.
- Therefore, the model-base solution proposed in the second slide has no theoretical foundation, anymore.
- Even worse, multi-step ahead forecasts (required in the target estimate) tend to magnify misspecification issues, due to leverage effects, and therefore the classic model-based solution is hit twice.
Alternatives:
I briefly discuss
two three possible alternatives, in increasing order of preference:
- First alternative: one could be tempted to fit a specific forecast-model for each forecast horizon required in expression (2) above and then plug-in the corresponding forecasts:
- fit a one-step ahead model and plug the forecast in,
- fit a two-step ahead 'model' (strictly speaking this is no more a model but an empirical forecast rule, instead)and plug-in
- fit a $k-$step ahead forecast rule and plug-in
- Besides being cumbersome (in all imaginable instances, including numerical and statistical aspects as well) the solution would still be inefficient, in general, because a linear combination of 'optimal' estimates is not necessarily an 'optimal' estimate of the linear combination due to correlation effects (multi-step ahead forecasts are heavily correlated).
- Second alternative: estimate one (single) model of the DGP (data generating process) but with respect to all forecasts simultaneously or, more precisely, with respect to all weighted forecasts in (2) (the weights correspond to the coefficients $\gamma_k$ of the target filter)
- See section 3 in Optimal Real-Time Filters for Linear Prediction Problems: this kind of model-fitting is called 'Model Fitting via LPP MSE Minimization' (Linear Prediction Problem)
- It's not my favorite approach because the model puts unnecessarily severe 'structural constraints' on the resulting concurrent filter
- But it's a better approach than the other two alternatives, including the classic approach (at least under misspecification which will invariably be the case).
Second Third alternative: estimate $y_T$ at once, in a single (MSE-) stroke, without a latent 'model' of the DGP.
- Now we are not talking of models anymore but of filters
- Unfortunately the mean-square filter error $\overline{(y_t-\hat{y_t})^2}$ is not observable, in general (in contrast to the mean-square one-step ahead forecast error)
- Therefore we need a good (optimal) estimate of $\overline{(y_t-\hat{y_t})^2}$, see (3)
- Minimization of this estimate would result in a real-time (concurrent) filter whose output is closest possible (MSE) to the target (up to a smallest possible approximation error).
Conclusion
One can rely on any plausible approach for solving the estimation problem in slide 1 above. The classic (time-domain model-based) approach suffers from the 'broken-link' problem addressed above: the invariance property of maximum likelihood estimates does not apply to the derived concurrent filter. As a result, users of the approach cannot invoke (MSE-) optimality, anymore. As an alternative, one might consider (MSE-) optimality concepts discussed in
(3) (see also
Optimal Real-Time Filters for Linear Prediction Problems). Going beyond MSE will be the topic of follow-up blog entries.
Comments
Post a Comment