### Statistical learning

Statistical learning refers to a set of tools for modelling and understanding complex datasets. __Understanding statistical learning is critical in modern financial markets, even for non-quants__ (view post here). That is because statistical learning provides guidance on how the experiences of investors in markets shape their future behaviour. Statistical learning works with complex datasets in order to forecast returns or to estimate the impact of specific events. Methods range from simple regression to complex machine learning. __Simplicity can deliver superior returns if it avoids “overfitting”__, i.e. gearing models to recent experiences. Success must be measured in “out-of-sample” predictive power after a model has been selected and estimated.

__Machine learning is based on statistical learning methods but partly automates the construction of forecast models__ through the study of data patterns, the selection of best functional form for a given level of complexity and the selection of the best level of complexity for out of sample forecasting. Machine learning can add efficiency to classical macro trading rules, mainly because it is flexible, adaptable, and generalizes knowledge well (view post here). Beyond speed and convenience, machine learning methods is highly useful for macro trading research because it enables backtests that are based on methods rather than on specific factors. Backtest of specific factors are mostly invalid because the factor choice is typically shaped by historical experiences.

Machine learning and expert domain knowledge are not rivals but complementary. Domain expertise is critical for the quality of featurization, the choice of hyperparameters, the selection of training and set samples, and the choice of regularization strategy.

Machine learning is conventionally divided into three main fields: supervised learning, unsupervised learning, and reinforcement learning.

- In
**supervised learning** __the researcher posits input variables and output variables and uses an algorithm to learn which function maps the former to the latter__. This principle underlies the majority of statistical learning applications in financial markets. A classic example is the assessment of what the change in interest rate differential between two countries means for the dynamics of their exchange rate.

Supervised learning can be divided into regression, where the output variable is a real number, and classification, where the output variable is a category, such as “policy easing” or “policy tightening” for central bank decisions. An important subsection of supervised machine learning are ensemble methods, i.e. machine learning techniques that combine several base models in order to produce one optimal prediction. Ensemble methods include bagging, random forest and gradient boosting and have been shown to produce superiod predictive power for credit spread forecasts, for example (view post here).
**Unsupervised learning** only knows input data. Its goal is to __model the underlying structure or distribution of the data in order to learn previously unknown patterns__. Application of unsupervised machine learning techniques includes clustering (partitioning the data set according to similarity), anomaly detection, association mining and dimension reduction (see below).
**Reinforcement learning** is a specialized application of (deep) machine learning that __interacts with the environment and seeks to improve on the way it performs a task so as to maximize its reward__ (view post here). The computer employs trial and error. The model designer defines the reward but gives no clues as to how to solve the problem. Reinforcement learning holds potential for trading systems because markets are highly complex and quickly changing dynamic systems. Conventional forecasting models have been notoriously inadequate. A self-adaptive approach that can learn quickly from the outcome of actions may be more suitable.

**Linear regression** remains the most popular tool for supervised learning in financial markets (apart from informal chart and correlation analysis). It can be the appropriate model if it __relates market returns to previous available information in a theoretically plausible functional form__. However, regression is also often applied to concurrent data, i.e. observations of data series at the same point in time. Such regressions of contemporaneous data are very popular in the research of financial institutions but are rarely backed up by solid underlying theory for the presumed one-dimensional relation between dependent and explanatory variables.

**Structural vector autoregression **(SVAR) is a quite practical model class for empirical macroeconomics. It can also be __employed for macro trading strategies, because it helps to identify specific market and macro shocks__ (view post here). For example, SVAR can identify short-term policy, growth or inflation expectation shocks. Once a shock is identified it can be used for trading in two ways. First, one can compare the type of shock implied by markets with the actual news flow and detect fundamental inconsistencies. Second, different types of shocks may entail different types of subsequent asset price dynamics and may form a basis for systematic strategies.

One important area of statistical learning for investment research is **dimension reduction**. This refers to __methods that__ __condense the bulk of the information of a vast multitude of macroeconomic time series into a smaller set that distills the relevant trends for investors__. In macroeconomics there are many related data series that have only limited and highly correlated information content for markets. There are three types of statistical dimension reduction methods. The first type selects a subset of “best” explanatory variables (Elastic Net or Lasso, view post here). The second type selects a small set of latent background factors of all explanatory variables and then uses these background factors for prediction (Dynamic Factor Models). The third type generates a small set of functions of the original explanatory variables that historically would have retained their explanatory power and then deploys these for forecasting (Sufficient Dimension Reduction).