Unlike market price trends, macroeconomic trends are hard to track in real-time. Conventional econometric models are immutable and not backtestable for algorithmic trading. That is because they are built with hindsight and do not aim to replicate perceived economic trends of the past (even if their parameters are sequentially updated). Fortunately, the rise of machine learning breathes new life into econometrics for trading. A practical approach is “two-stage supervised learning”. The first stage is scouting features, by applying an elastic net algorithm to available data sets during the regular release cycle, which identifies competitive features based on timelines and predictive power. Sequential scouting gives feature vintages. The second stage evaluates various candidate models based on the concurrent feature vintages and selects at any point in time one with the best historic predictive power. Sequential evaluation gives data vintages. Trends calculated based on these data vintages are valid backtestable contributors to trading signals.
The below is a (largely) non-technical summary based on the experience of SRSV Ltd and its advisors. It also owes much inspiration to the comments of Dr. Lasse de la Porte-Simonsen of Macrosynergy and Dr. Simon Ellersgaard-Nielsen of J.P. Morgan.
This post ties in with this site’s summary on Quantitative methods for macro information efficiency.
The idea in a nutshell
Macroeconomic trends often become discernible only with long time lags. That is because comprehensive official statistics are released with delays relative to the observed period. Also, many are subject to substantial revisions over time. The economic profession has proposed many econometric forecasting and nowcast models but their deployment in trading is often obstructed by lack of compatibility:
- First, standard econometric models enjoy many benefits of hindsight. The choice of explanatory variables and hyperparameters is based on historic experience. In some cases, even standard model parameters are being revised to fit the historic sample.
- Second, the output of standard econometric models is not generally what trading models need. Thus, econometric models may predict forthcoming production or inflation numbers, but markets may be more susceptible to underlying economic states such as trends and – most importantly – perceptions thereof. Unfortunately, the former does not allow to infer the latter. A predicted change is not the same as a change in predictions. Time series of changes in predictions require vintages (multiple historic time series) of predicted and actual data.
The good news is that – after some delay – official data allow reconstructing the state of the economy. Based on these data, relevant states of the economy can typically be condensed into single indicators, be it an official off-the-shelf indicator (such as GDP) or a bespoke composite (such as credit-to-GDP ratios). This single indicator can then be used as a target variable for machine learning to train prediction or nowcasting models within the standard framework of supervised learning. Thus, supervised learning is not a competitor of standard econometrics but an ally that gives it greater relevance for trading.
A two-stage supervised learning process can make this idea practical:
- Scouting features: The learning process chooses at each point in time features (explanatory variables) from a wide list of candidates based on a standard model and estimated explanatory power at the time of release. These produce selection vintages that can then be fed into the learning pipelines. Pre-selecting features as a separate step is not just practical for computation complexity and time but also costs: data feeds and services can be very expensive.
- Selecting the best model: Using the selected features, machine learning pipelines can be used to train and validate candidate models (hyperparameters). The best model can be chosen based on the union of historic validation samples, i.e. based on how well it had performed on unseen future samples up to that point compared to other candidates. This means that the choice of model is confirmed or revised over time always based on information available up to that time.
The models are chosen over time applied to the concurrently available data then produce vintages of the target variable, which – in turn – can be used to extract macro trends and changes thereof. Macro trends or changes based on the machine learning vintages are valid backtestable parts of trading signals because they are based on minimal hindsight. Put simply, two-stage supervised learning vintages are more realistic simulations of the historic state of information and – therefore – help to assess how the principles and processes behind econometric prediction would have contributed to trading profits.
What are the economic states we can learn about?
The two-stage supervised learning methodology works for any economic state that is – eventually – observable in the data, but not available in real-time or even with short publication lags. Relevant examples include corporate earnings growth, production trends, external trade balances, or credit conditions.
The most popular example in financial market economics is GDP growth, the standard comprehensive metric for economic activity. It influences asset prices in many ways. For example, economic activity shapes the outlook for corporate earnings, prospects for monetary policy, the funding of the government budget, and a country’s attractiveness for capital flows. Alas, national accounts (which report the GDP and its main components) are typically released only at a quarterly frequency and between one and three months after the end of the observed quarter. Moreover, GDP growth rates convey information only with long lags. The average lag between the median day of the observed period of a quarterly GDP growth rate and its day of first release is about 140 days. To make matters worse, in many countries first prints of quarterly national accounts are rough estimates and can be revised significantly. And even after all revisions, individual quarterly growth rates deviate from trends as a consequence of calendar, weather, and residual seasonal effects. Put simply, it can take more than half a year before an underlying GDP trend becomes clear.
Two-stage supervised learning would first define an appropriate benchmark trend (target variable), such as a 4-quarter average of GDP growth, then scouts features that might predict that trend before it is released, and then use these features and target variables to construct models sequentially and periodically. The models could include generalized least square regressions, MIDAS regressions, or dynamic factor models with principal components analysis, or Kalman filters.
Stage 1: Scouting features
In order to track economic trends, markets follow economic reports of higher frequency and shorter publication lags than the main benchmark series. Thus, in the case of GDP markets look at a wide array of activity reports, such as industrial production, business surveys, labor market data, and transportation statistics. In practice, there are too many to follow and evaluate, particularly with the rise of alternative data. Even if one uses a formal model to condense the information such as a MIDAS regression or dynamic factor model it is not practical to feed all possible economic indicators into a model. While many nowcasting and forecasting models are suitable for dimension reduction (condensing the information of many data series into a few), this is not the same as feature selection.
Historically feature selection in financial economics has been mostly based on two principles, either of which has drawbacks for trading models:
- Models use features that worked well in the past: This is based on general knowledge of historic business cycles or specific empirical analyses. The approach surely implies a feature selection bias of estimates and can easily escalate into outright data mining.
- Models use features that are in the market economic release calendars of Bloomberg or Reuters: This is based on market convention. The approach avoids data mining but also suffers from selection bias, as the calendars add what markets found useful in past years. The approach also seriously limits the number of series that can be considered.
This is where supervised learning comes in. One can use the observable benchmark trend to select useful high-frequency indicators. This selection needs to consider the release sequence of data, however. Put simply, the learning process should only accept a feature that has been a competitive predictor at the time of its release, considering its own predictive power and that of data that had already been released.
The GDP example illustrates the point. For most countries, the first prints of national account reports (which include the official estimate of GDP) are published at a quarterly frequency in the first to third month after the end of a quarter. However, prior to the first release of quarterly GDP many other activity data for that quarter are being released. From a supervised learning perspective, these are feature candidates, i.e. potential predictors of quarterly GDP before it has been released.
Most feature candidates are monthly economic reports. They are released on different dates and with different publication lags to the observed period. For example, in the middle of the third month of a quarter, an industrial production report for the first month and a business survey for the second month may be published. This is what is called a “jagged edge”. This jagged edge must not only be considered for prediction models but also for feature selection. Whether a feature candidate should be used in a prediction model depends on both its individual predictive power and the timeliness of the release. Put simply, we should only consider features that add predictive value at the time of their release, i.e. competitive predictors.
Supervised learning can select features in the following systematic way:
- First, one should map the release schedule of the feature candidates. This is a typical time sequence of published features between two target variable releases. The below graph exemplifies such a schedule for the example of U.S. economic reports between two first-print GDP releases. Technically, this means that one first filters out only features with observation periods for which GDP has not yet been published. Then one must rank these features by release date.
- Second, one applies a feature selection algorithm to each point in the data cycle, including all “rival” features available up to that point. The selection is made by a constrained regression known as the Elastic Net with non-negative least squares estimation. This forces the coefficients of features with limited explanatory power or implausible coefficient signs to be zero. Any feature candidate that has a non-zero or significant coefficient at least at one point in the data cycle is selected. Thus, the competitiveness of features depends on a balance of timeliness and predictive power.
The chosen feature selection changes overtimes in accordance with the set re-estimation interval. One could call the resulting feature sets also feature vintages. The below is an illustration of the features set that would have been selected in mid-2021 for U.S. GDP predictions.
Stage 2: Selecting the best model
In this stage, one feeds selected features and model versions into a learning pipeline. The purpose of this stage of learning is to simulate the selection of the preferred model specification based only on the available information at the respective point in time and according to an objective numerical criterion (such as the root mean squared error of forecasts). This is also called “hyperparameter tuning”. It can be based on a manual pre-selection of model candidate models and/or a “grid” of hyperparameters. Beyond optimizing model specifications, a sequential application of this approach over the available time series prevents data leakage, reduces model selection bias, and thus produces a more realistic time series of estimated macro trends.
For model selection, the data series are split into training and validation sets. This is done to create strict out-of-sample forecasts for the various models without. This is done chronologically and sequentially. Chronologically means that the training set is always the early part, and the validation set is always the late part of the sample. Sequentially means that a sequence of such splits is created by expanding the training set and shifting the validation set into the future simulating the evolving information status available to the learning process over time.
This way of splitting time series is common for learning based on time series. For example, it is easily implemented by the scikit-learn class model_selection.TimeSeriesSplit(). In the context of macro trends there are some uncommon characteristics of data splits:
- The purpose of machine learning on splits is not merely cross-validation, but a simulation of a history of model selection. This improves the suitability of the final estimates for backtesting.
- Wherever possible (and affordable), the training-test splits should be based on data vintages, not a single time series. A vintage is a snapshot of a data series at a particular time. This is not the same as a split, due to the possibility of historic data revisions. Vintages may offer no advantage in choosing the best model for today. However, where revisions are significant vintages are more suitable for replication of historic choices.
- The selection process does not require test sets, which can be created separately for research purposes.
The data splits are the basis of evaluating candidate model specifications or hyperparameters. A hyperparameter is a model parameter that is not learned during the training process. To select hyperparameters for a specific point in time one first estimates candidate models for various hyperparameters based on the training set. This gives the regular parameters of the candidates, such as regression coefficients. Then the fully specified models are used to predict the target variable in the validation set. Performances across the union of the present and all past validation sets become the basis for choosing the best model of the time. For example, one may choose the model with the lowest root-mean-squared error across all validations.
After the best model has been selected for the point in time, its regular parameters are re-estimated based on the full (training and validation) sample. The specified model is then used to predict forthcoming unpublished value(s) of the target variable. In conjunction with already available data at the time, this results in data vintages. Macro trends can be calculated based on these vintages. Since the choice of hyperparameters is data-driven, chosen models change over time. This means that a time series of macro trends that is created in this way can be quite different from a time series of macro trends created in a conventional fashion. Rather than showing past macro trends from today’s perspective, the learning-based series produce perceived macro trends from concurrent perspectives.