Nowcasting macro-financial indicators requires combining low-frequency and high-frequency time series. Mixed data sampling (MIDAS) regressions explain a low-frequency variable based on high-frequency variables and their lags. For instance, the dependent variable could be quarterly GDP and the explanatory variables could be monthly activity or daily market data. The most common MIDAS predictions rely on distributed lags of higher frequency regressors to avoid parameter proliferation. Analogously, reverse MIDAS models predict a high-frequency dependent variable based on low-frequency explanatory variables. Compared to state-space models (view post here), MIDAS simplifies specification and theory-based restrictions for nowcasting. The R package ‘midasr’ estimates models for multiple frequencies and weighting schemes. In practice, MIDAS has been used for nowcasting financial market volatility, GDP growth, inflation trends and fiscal trends.

The sources of the post are summarized at the bottom. The below are condensed annotated quotes. Cursive text and text in brackets have been added for clarity.

The post ties up with this site’s summary on quantititative methods for macro information efficiency.

What are MIDAS regressions?

“Mixed data sampling (MIDAS) regressions are now commonly used to deal with time series data sampled at different frequencies.” [Handbook of Statistics]

“Data are not all sampled at the same frequency. Most macroeconomic data are sampled monthly (e.g., employment) or quarterly (e.g., GDP). Most financial variables (e.g., interest rates and asset prices), on the other hand, are sampled daily or even more frequently. The challenge is how to best use available data.” [Armesto, Engemann and Owyang]

“Mixed-data sampling (MIDAS) regressions allow estimating dynamic equations that explain a low-frequency variable by high-frequency variables and their lags. When the difference in sampling frequencies between the regressand and the regressors is large, distributed lag functions are typically employed to model dynamics avoiding parameter proliferation.” [Foroni, Marcellino and Schumacher]

“A MIDAS regression model allows us to ‘explain’ a time-series variable that’s measured at some frequency, as a function of current and lagged values of a variable that is measured at a higher frequency. So, for instance, we can have a dependent variable that is quarterly, and a regressor that is measured at a monthly, or daily, frequency…[Hence] a MIDAS regression model is a very general type of autoregressive-distributed lag model, in which high-frequency data are used to help in the prediction of a low-frequency variable…Typically, some ‘extra’ values [of] the high-frequency variable(s) will be available after the most recent sample value of the low-frequency dependent variable has been observed…These ‘extra’ observations can be used for…nowcasting.” [Giles]

“Nowcasting…refers to the prediction of the present, the very near future, and the very recent past based on information provided by available data that are sampled at higher frequencies.” [Rufino]

“On the one hand, variables that are available at high frequency contain potentially valuable information. On the other hand, the researcher cannot use this high-frequency information directly if some of the variables are available at a lower frequency, because most time series regressions involve data sampled at the same interval…MIxed Data Sampling – or MIDAS – regressions represent a simple, parsimonious, and flexible class of time series models that allow the left-hand and right-hand side variables of time series regressions to be sampled at different frequencies.” [Ghysels, Sinko and Valkanov]

“MIDAS regressions are essentially tightly parameterized, reduced form regressions that involve processes sampled at different frequencies…Technically speaking MIDAS models specify conditional expectations as a distributed lag of regressors recorded at some higher sampling frequencies…MIDAS involve regressors with different sampling frequencies and are therefore not autoregressive models, since the notion of autoregression implicitly assumes that data are sampled at the same frequency in the past.” [Ghysels, Santa_Clara, and Valkanov]

“The focus in the literature has mostly been on improving the forecast of low-frequency variables by means of high-frequency information. In particular, different models have been introduced for dealing with the different sampling frequencies at which macroeconomic and financial indicators are available…Recently, new models have been proposed for forecasting high-frequency variables by means of low-frequency variables…Reverse Unrestricted MIDAS (RU-MIDAS) and Reverse MIDAS (R-MIDAS) model [link] high-frequency dependent variable with low-frequency explanatory variables in univariate context.” [Foroni, Ravazzolo and Rossini]

Basic technical intuition

“When data of different sampling frequencies are mixed, one invariably deals with temporal aggregation…In empirical work, a direct treatment of mixed data samples is typically circumvented by first aggregating the highest frequency data in order to reduce all data to the same frequency. Then, in a second step, a standard regression model is estimated with pre-filtered data…The mixed data sampling regression exploits a much larger information set and is more flexible. The cost is parameter proliferation, as a suitable polynomial might involve many lags…We want to preserve most of the information in the MIDAS regression, while decreasing the number of parameters to estimate…Our approach has its roots in an old literature on distributed lag models…MIDAS regressions are more efficient than the common practice of first aggregating the highest frequency data in order to reduce all data to the same frequency.” [Ghysels, Santa_Clara, and Valkanov]

“The MIDAS approach allows for non-equal weights (multipliers) for the components that are parsimoniously reparametrized through a weighing scheme anchored on the innovative use of lag polynomials. The way lag polynomials are employed in defining the weighing scheme for the multiplier represents a specific MIDAS regression model.” [Rufino]

The time-averaging model is parsimonious but discards any information about the timing of innovations to higher-frequency data…[A] survey [of] some common methods for dealing with mixed-frequency data…shows that, in some cases, simply averaging the higher-frequency data produces no discernible disadvantage. In other cases, however, explicitly modeling the flow of data (e.g., using mixed data sampling) may be more beneficial…especially if the forecaster is interested in constructing intra-period forecasts…In principle, one could use any (normalized) weighting function…While this may be tractable when mixing quarterly and monthly observations, other sampling frequencies may be problematic…Mixed data sampling (MIDAS)…employs (exogenously chosen) distributed lag polynomials as weighting functions.” [Armesto, Engemann and Owyang]

“The advantages of MIDAS, in addition to overcoming the problem of data with mixed frequency, is to minimize the number of estimated parameters and make the regression model simpler. A weighting function is used to reduce the number of parameters in the MIDAS regression. The weighting function can have a number of functional forms [such as] the exponential Almon function and the Beta function.” [Tri Utari and Ilma]

In macroeconomic applications…differences in sampling frequencies are often small. In such a case, it might not be necessary to employ distributed lag functions [leading to] unrestricted lag polynomials in MIDAS regressions.” [Foroni, Marcellino and Schumacher]

MIDAS regression versus Kalman filter

“The theory of the Kalman filter applies, strictly speaking, to linear homoskedastic Gaussian systems and yields an optimal [linear projection] in population…However, there are two important limitations to this result. First, it applies only in population, ignoring parameter estimation error. Second, it of course assumes that the state space model is correctly specified – state space model predictions can be suboptimal if the regression dynamics are mis-specified…State space models can be quite involved, as one must explicitly specify a linear dynamic model for all the series involved: low-frequency data series, latent low-frequency series treated as missing and the high-frequency observed processes. The system of equations therefore typically requires a lot of parameters, for the measurement equation, the state dynamics and their error processes.” [Bai, Ghysels and Wright]

MIDAS regression can also be viewed as a reduced-form representation of the linear projection which emerges from a state space model approach – by reduced form we mean that the MIDAS regression does not require the specification of a full state space system of equations…The Kalman filter…has several disadvantages: (1) it is more prone to specification errors as a full system of measurement and state equations is required and as a consequence (2) requires a lot more parameters, which in turn results in (3) computational complexities which often limit the scope of applications.” [Ghysels, Kvedara and Zemlys]

Kalman filter state space models…involve a system of equations, whereas in contrast MIDAS regressions involve a (reduced form) single equation. As a consequence, MIDAS regressions might be less efficient, but also less prone to specification errors…Forecasts from MIDAS regressions are generally quite similar to those from the Kalman filter. Kalman filter forecasts are typically a little better, but MIDAS regressions can be more accurate if the state-space model is mis-specified or over-parameterized.” [Bai, Ghysels and Wright]

R package midasr

“The R package midasr…enables estimating regression models with variables sampled at different frequencies within a MIDAS regression framework..We define a general autoregressive MIDAS regression model with multiple variables of different frequencies and show how it can be specified using the familiar R formula interface..The package is its flexibility in terms of the model formulation and estimation, which allows for:

  • estimation of regression models with their parameters defined (restricted) by certain functional constraints…
  • estimation of MIDAS models with many variables and (numerous) different frequencies;
  • various mixtures of restrictions/weighting schemes and also lag orders…
  • statistical testing for the adequacy of the model specification…
  • information criteria and testing-based selection of models;
  • forecasting and nowcasting functionality, including various forecast combinations.” [Ghysels, Kvedara and Zemlys]

“From a data handling point of view, the key specificity of the MIDAS regression model is that the length of observations of variables observed at various frequencies differs and needs to be aligned…[A special package function] performs exactly the transformation…converting an observation vector of a given (potentially) higher-frequency series into the corresponding stacked matrix of observations of low-frequency series.” [Ghysels, Kvedara and Zemlys]

Applications of MIDAS regressions

“The interest in MIDAS regressions addresses a situation often encountered in practice where the relevant information is high frequency data, whereas the variable of interest is sampled at a lower frequency…For example, some macroeconomic data are sampled monthly, like price series and monetary aggregates, whereas other series are sampled quarterly or annually, typically real activity series like GDP and its components…[Another] example pertains to models of stock market volatility. The low frequency variable is for instance the quadratic variation or other volatility process over some long future horizon corresponding to the time to maturity of an option, whereas the high frequency data set is past market information potentially at the tick-by-tick level.” [Ghysels, Santa_Clara, and Valkanov]

“As a key indicator of real economic activity, GDP is published at quarterly frequency and with a considerable delay. Due to this limited data availability, typically more timely high-frequency business cycle indicators such as industrial production or surveys about business expectations might help monitoring the current state of the economy as well as for forecasting…We derive unrestricted MIDAS regressions from linear high-frequency models..and show that their parameters can be estimated by OLS…In an empirical application on out-of-sample nowcasting GDP in the US and the Euro area using monthly predictors, we find a good performance of unrestricted MIDAS for a number of indicators.” [Foroni, Marcellino and Schumacher]

“MIDAS (Mixed Data Sampling) regression [can] solve the mixed frequency problem in implementing the nowcasting of the country’s economic growth…using quarterly Real GDP data and monthly data on inflation, industrial production…Results indicate the relative superiority of the MIDAS framework in accurately predicting the growth trajectory of the economy using information from high-frequency economic indicators.” [Rufino]

“We apply the MIDAS regression model to forecast the growth of the Indonesian GDP using the value of Indonesian agricultural exports…Exports will directly increase a country’s income. It is expected that an increase in…income will also result in an increase of its GDP. We use the Mixed Data Sampling (MIDAS) regression model to deal with a period or frequency difference issues of GDP and export variables”. [Tri Utari and Ilma]

Online price index is tested as a predictor of the monthly core inflation in Argentina…there is a slight improvement compared to the low-frequency benchmark autoregression and the unconditional mean.” [Libonatti]

We analyse the importance of low frequency hard and soft macroeconomic information, respectively the industrial production index and the manufacturing Purchasing Managers’ Index surveys, for forecasting high-frequency daily electricity prices…We do that by means of mixed-frequency models, introducing a Bayesian approach to reverse unrestricted MIDAS models (RU-MIDAS)…Results indicate that the macroeconomic low frequency variables are more important for short horizons than for longer horizons.” [Foroni, Ravazzolo and Rossini]

“We employ a Mixed Data Sampling (MiDaS) approach to analyze mixed frequency fiscal data…We use quarterly fiscal data to forecast a very disaggregated set of fiscal series at annual frequency….Once data for the third quarter is incorporated, the annual forecast becomes very accurate (very close to actual data). We also benchmark against the European Commission’s forecast and find the results fare favorably, particularly when considering that they stem from a simple univariate framework.” [Asimakopoulos, Paredes and Warmedinger]

Linked sources

Armesto, Michelle, Kristie Engemann, and Michael Owyang (2010), “Forecasting with Mixed Frequencies.”

Asimakopoulos, Stylianos, Joan Paredes and Thomas Warmedinger (2013), “Forecasting Fiscal Time Series using Mixed Frequency Data.”

Bai, Jennie, Eric Ghysels and Jonathan Wright (2010), “State Space Models and MIDAS Regressions.”

Giles, Dave (2016), “MIDAS Regression is Now in EViews.”

Foroni, Claudia , Massimiliano Marcellino and Christian Schumacher (2012), “U-MIDAS: MIDAS regressions with unrestricted lag polynomials.”

Foroni, Claudia , Francesco Ravazzolo, Luca Rossini (2020), “Are low frequency macroeconomic variables important for high frequency electricity prices?”

Ghysels, Eric, Virmantas Kvedara and Vaidotas Zemlys (2016), “Mixed Frequency Data Sampling Regression Models: The R Package midasr.”

Ghysels, Eric, Pedro Santa-Clara and Rossen Valkanov (2004), “The MIDAS Touch: Mixed Data Sampling Regression Models.”

Ghysels, Eric and Sinko, Arthur and Valkanov, Rossen (2006), “Midas Regressions: Further Results and New Directions.”

Handbook of Statistics, Chapter 4 – Mixed data sampling (MIDAS) regression models.

Libonatti, Luis (2018), “MIDAS Modeling for Core Inflation Forecasting.”

Rufino, Cesar (2019) “Nowcasting Philippine Economic Growth Using MIDAS Regression.”

Tri Utari, Dina and Hafizah Ilma (2018), “Comparison of methods for mixed data sampling (MIDAS) regression models to forecast Indonesian GDP using agricultural exports.”

Previous articleMarket-implied macro shocks
Next articleR tidyverse for macro trading research
Ralph Sueppel is founder and director of SRSV, a project dedicated to socially responsible macro trading strategies. He has worked in economics and finance for over 25 years for investment banks, the European Central Bank and leading hedge funds. At present, he is head of research and quantitative strategies at Macrosynergy Partners.