Statistical learning refers to a set of tools for modelling and understanding complex datasets. Understanding statistical learning is critical in modern financial markets, even for non-quants (view post here). This is because statistical learning provides guidance on how the experiences of investors in markets shape their future behaviour. Statistical learning works with complex datasets in order to forecast returns or to estimate the impact of specific events. Methods range from simple regression to complex machine learning. Simplicity can deliver superior returns if it avoids “overfitting”, i.e. gearing models to recent experiences. Success must be measured in “out-of-sample” predictive power after a model has been selected and estimated.
Machine learning is based on statistical learning methods but partly automates the construction of forecast models through the study of data patterns, the selection of best functional form for a given level of complexity and the selection of the best level of complexity for out of sample forecasting. Machine learning can add efficiency to classical macro trading rules, mainly because it is flexible, adaptable, and generalizes knowledge well (view post here). Beyond speed and convenience, machine learning methods are highly useful for macro trading research because they enable backtests that are based on methods rather than on specific factors. Backtests of specific factors are mostly invalid because the factor choice is typically shaped by historical experiences.
A standard statistical learning workflow for macro trading involves model training, model validation and method testing. In particular, it  selects form and parameters of trading models,  chooses the best of a set of models, based on past out-of-sample performance, and  assesses the value of the deployed learning method based on further out-of-sample results. A convenient technology is the ‘list-column workflow’ based on the tidyverse packages in R (view post here).
Machine learning and expert domain knowledge are not rivals but complementary. Domain expertise is critical for the quality of featurization, the choice of hyperparameters, the selection of training and set samples, and the choice of regularization strategy.
Machine learning is conventionally divided into three main fields: supervised learning, unsupervised learning, and reinforcement learning.
- In supervised learning the researcher posits input variables and output variables and uses an algorithm to learn which function maps the former to the latter. This principle underlies the majority of statistical learning applications in financial markets. A classic example is the assessment of what the change in interest rate differential between two countries means for the dynamics of their exchange rate.
Supervised learning can be divided into regression, where the output variable is a real number, and classification, where the output variable is a category, such as “policy easing” or “policy tightening” for central bank decisions. An important subsection of supervised machine learning are ensemble methods, i.e. machine learning techniques that combine several base models in order to produce one optimal prediction. Ensemble methods include bagging, random forest and gradient boosting. They have been shown to produce superiod predictive power for credit spread forecasts (view post here) and – for the case of random forest – for equity reward-risk timing (view post here).
Also, artificial neural networks have become increasingly practical for (supervised and unsupervised) macro trading research. This is a popular machine learning method that consists of layers of data-processing units, connections between them and the application of weights and biases that are estimated based on training data. For example, neural networks can principally be used to estimate the state of the market on daily or higher frequency based on an appropriate feature space, i.e. data series that characterize the key features of a market (view post here). Beyond, neural networks can be used to detect lagged correlation between different asset prices (view post here) or market price distortions (view post here).
- Unsupervised learning only knows input data. Its goal is to model the underlying structure or distribution of the data in order to learn previously unknown patterns. Application of unsupervised machine learning techniques includes clustering (partitioning the data set according to similarity), anomaly detection, association mining and dimension reduction (see below).
- Reinforcement learning is a specialized application of (deep) machine learning that interacts with the environment and seeks to improve on the way it performs a task so as to maximize its reward (view post here). The computer employs trial and error. The model designer defines the reward but gives no clues as to how to solve the problem. Reinforcement learning holds potential for trading systems because markets are highly complex and quickly changing dynamic systems. Conventional forecasting models have been notoriously inadequate. A self-adaptive approach that can learn quickly from the outcome of actions may be more suitable. Reinforcement learning can benefit trading strategies directly, by supporting trading rules, and indirectly by supporting the estimation of trading-related indicators, such as real-time growth (view post here).
Linear regression remains the most popular tool for supervised learning in financial markets (apart from informal chart and correlation analysis). It can be the appropriate model if it relates market returns to previous available information in a theoretically plausible functional form. For example, mixed data sampling (MIDAS) regressions are a particularly useful method for nowcasting economic trends and financial market variables, such as volatility (view post here). These regressions allow combining time series of different frequencies and limit the number of parameters that need to be estimated.
Structural vector autoregression (SVAR) is a practical model class for empirical macroeconomics. It studies the evolution of a set of linearly related observable time series variables, such as economic data or asset price, assuming that all variables depend in fixed proportion on past values of the set and new structural shocks. The method can also be employed for macro trading strategies (view post here) because it helps to identify specific market and macro shocks (view post here). For example, SVAR can identify short-term policy, growth or inflation expectation shocks. Once a shock is identified it can be used for trading in two ways. First, one can compare the type of shock implied by markets with the actual news flow and detect fundamental inconsistencies. Second, different types of shocks may entail different types of subsequent asset price dynamics and may form a basis for systematic strategies.
One important area of statistical learning for investment research is dimension reduction. This refers to methods that condense the bulk of the information of a large set of macroeconomic time series into a smaller set that distills the relevant trends for investors. In macroeconomics there are many related data series that have only limited incremental relevant information value. There are three types of statistical dimension reduction methods.
- The first type selects a subset of “best” explanatory variables (Elastic Net or Lasso, view post here).
- The second type selects a small set of latent background factors of all explanatory variables and then uses these background factors for prediction. This is the key idea behind static and dynamic factor models. Factor models are one key technology behind nowcasting in financial markets, a modern approach to monitoring current economic conditions in real-time (view post here). While nowcasting has mostly been used to predict forthcoming data reports, particularly GDP, the underlying factor models can produce a lot more useful information for the investment process, including latent trends, indications of significant changes in such trends, and estimates of the changing importance of various predictor data series (view post here).
- The third type generates a small set of functions of the original explanatory variables that historically would have retained their explanatory power and then deploys these for forecasting (Sufficient Dimension Reduction)
Dimension reduction methods not only help to condense information of predictors of trading strategies, but also support portfolio construction. In particular, they are suited for detecting latent factors of a broad set of asset prices. These factors can be used to improve estimates of the covariance structure of these prices and – by extension – to improve the construction of a well-diversified minimum variance portfolio (view post here).