Dimension reduction methods of machine learning are suited for detecting latent factors of a broad set of asset prices. These factors can then be used to improve estimates of the covariance structure of price changes and – by extension – to improve the construction of a well-diversified minimum variance portfolio. Methods for dimension reduction include sparse principal components analysis, sparse partial least squares, and autoencoders. Both static and dynamic factor models can be built. Hyperparameters tuning can proceed in rolling training and validation samples. Empirical analysis suggests that that machine learning adds value to factor-based asset allocation in the equity market. Investors with moderate or conservative risk preferences would realize significant utility gains.

Conlon , Thomas, John Cotter, and Iason Kynigakis (2021), “Machine Learning and Factor-Based Portfolio Optimization”.

The below are quotes from the paper and some other sources which are linked next to the quote. Emphasis, headings, and text in brackets have been added for clarity.

This post ties in with this site’s summary on “Quantitative methods for macro information efficiency“, particularly the section on dimension reduction.

Using factors for reduction of portfolio variance

“The presence of factor structure in asset returns has been widely accepted in the economic literature…We examine the characteristics and benefits of latent factors generated from machine learning dimensionality reduction techniques for asset allocation. The analysis is conducted under the framework of factor-based covariance matrices used to construct minimum-variance portfolios.”

We focus…on minimum-variance portfolios…It requires only estimates of the covariance matrix, which are often considered to be more accurate than the estimates of the means [that are used in] the mean-variance criterion of Markowitz [and] that have been found to be the principal source of estimation risk.”

“Although the minimum-variance framework avoids the problem of estimation error associated with expected returns, its performance remains crucially dependent on the quality of the estimated covariance matrix. To lessen the impact of covariance misspecification on the optimal weights, we impose a factor structure on the covariance matrix, which reduces the number of parameters to be estimated…It has been shown [in previous academic work] that introducing factor structure to the covariance matrix can improve portfolio performance.”

“In addition to having observed or latent factors, factor models can be static, such as in the arbitrage pricing theory…or dynamic.”

Using machine learning to construct factors

“We examine the economic value of latent factors generated using a variety of supervised and unsupervised dimensionality reduction methods…In addition to classical approaches, such as principal component analysis (PCA) and partial least squares (PLS), their respective regularized versions that induce sparsity through a penalty in the objective function are also considered. We also investigate the performance of factors generated by autoencoders; a type of unsupervised neural network used for dimensionality reduction.”

“We describe classical dimensionality reduction techniques used to generate the latent factors, along with their extensions from the machine learning literature, which rely on regularization and neural networks. The alternative methods we consider are similar in that the dimensionality of the data is reduced by mapping the set of predictors to a smaller set of combinations of the original variables

  • Principal component analysis (PCA) derives the latent factors in an unsupervised way, based only on information from the predictors. PCA produces the weight matrix [based on] the covariance structure between predictors…The first principal component of the predictor set…has the largest sample variance amongst all linear combinations of the columns of the predictors.
  • Sparse principal component analysis (SPCA)…is based on the regression/reconstruction property of PCA and produces modified principal components with sparse weights, such that each principal component is a linear combination of only a few of the original predictors…PCA can be viewed in terms of a ridge regression problem and by adding the L1 penalty (penalty the increases linearly with coefficient size) they convert it to an elastic net regression, which allows for the estimation of sparse principal components.
  • In partial least squares (PLS) the factors are constructed in a supervised way, by using information from both the predictors and the response…constructing linear combinations based on both sets…PLS computes weights that account for the covariation between the predictors and the response.
  • Sparse partial least squares (SPLS) is an extension of PLS that imposes the L1 penalty to promote sparsity onto a surrogate weight vector instead of the original weight vector while keeping [the two vectors] close to each other.
  • Autoencoders…are a type of unsupervised neural network that can be used for dimensionality reduction. Autoencoders have a similar structure to feed-forward neural networks, which have been shown to be universal approximators for any continuous function. However, an autoencoder differs in that the number of inputs is the same as the number of outputs and that it is used in an unsupervised context. Autoencoders have also been shown to be nonlinear generalizations of principal component analysis. The goal of…autoencoders is to learn a parsimonious representation of the original input data through a bottleneck structure…Autoencoders use non-linear activation functions to discover non-linear representations of the data.
    The encoder creates a compressed representation of the set of predictor data when the input variables pass through the units in the hidden layers, which are then decompressed to the output layer through the decoder. By placing constraints on the network, such as limiting the number of hidden units, it is forced to learn a compressed representation of the input, potentially uncovering an interesting structure of the data. Most often the encoding and decoding parts of an autoencoder are symmetrical, in that they both feature the same number of hidden layers with the same number of hidden units per layer. The output of the decoder is most commonly used to validate information loss, while the smallest hidden layer of the encoder (or code, at the bottleneck of the network) corresponds to the dimension-reduced data representation.

“After the factor model is estimated…the covariance matrix of returns is obtained by its decomposition into two components: the first is based on the factor loadings and the factor covariance matrix, while the second is the covariance matrix of the errors…We focus on exact factor models where the covariance matrix of the residuals is diagonal by assuming cross-sectional independence.”

“[We] introduce dynamic factor models as an extension…A dynamic factor model is one in which at least one of the following three generalizations holds true: (i) the intercept and factor loadings are time-varying, (ii) the covariance matrix of the factors is time-varying or (iii) the covariance matrix of the errors is time-varying…There are various definitions of dynamic factor models, the one we follow in this study is a model that allows the factor loadings to be time-varying.”

“The machine learning models used to derive the latent factors rely on hyperparameter tuning. The choice of hyperparameters controls the amount of model complexity and is critical for the performance of the model. Specifically, we adopt the validation sample approach, in which the optimal set of values for the tuning parameters is selected in the validation sample…we maintain the temporal ordering of the data…Specifically, in each iteration of the rolling window, the in-sample is split into two disjointed periods, the training subsample, consisting of 80% of the observations [and]the validation subsample. In the training subsample the model is estimated for several sets of values of the tuning parameters. The [validation] subsample is used to select the optimal set of tuning parameters, by using the latent factor weight and loading estimates for each set of hyperparameters from the training sample. Forecasts are constructed for the observations in the validation sample.”

Key empirical findings

“We explore the impact that the proposed latent factors have on the structure of factor-based covariance matrices and to the composition and performance of minimum-variance portfolios.”

“We evaluate the different factor and covariance specifications by constructing minimum-variance portfolios based on individual stock return data for a sample period spanning 60 years. Overall, our findings suggest that machine learning adds value to factor-based asset allocation. In the baseline case, machine learning leads to portfolios that significantly outperform the equal-weighted benchmark… Investors with moderate or conservative risk preferences would realize statistically significant utility gains…The best-performing methods to generate the covariance matrix are autoencoders and sparse principal component analysis.”

“In addition, machine learning can improve factor-based portfolio optimization when performance is measured using alternative risk metrics. Covariance matrices based on autoencoders and sparse PCA outperform the equal-weighted portfolio by up to 2.9%, 1.26% and 1.57% per annum, in terms of mean absolute deviation, Value-at-Risk and Conditional Value-at-Risk, respectively.”

“The improved performance can be attributed to two aspects.

  • First, factor-based covariance matrices tend to significantly reduce the risk of a portfolio consisting of individual stocks. This finding remains robust in an out-of-sample setting, using different risk measures, across covariance and factor specifications, for a varying number of assets, alternative portfolio objective formulations and when transaction costs are taken into account.
  • Second, we demonstrate that using machine learning can lead to significant economic gains. For example, using a factor-implied covariance based on machine learning, can lead to a decrease in out-of-sample portfolio standard deviation of up to 29% and an increase in the Sharpe ratio of over 25%.

“The results show that machine learning yields factors that cause the covariances and portfolio weights to diverge from those based on commonly used estimators. Latent factors produced by PCA and PLS-type methods exhibit a stronger connection with well-known factors (such as those from the Fama and French five-factor model) throughout the out-of-sample period, compared to factors based on autoencoders. Furthermore, the covariance matrices whose structure deviates most from the sample estimator are based on unsupervised methods or allow the residual covariance matrix to be time-varying. Portfolios based on machine learning also have weights that are smaller, vary less over time and are more diversified, than models based on observed factors. Covariance matrices based on unsupervised methods also lead to portfolios with lower turnover and thus reduced sensitivity to transaction costs.”

Shallow learning outperforms deeper learning, which can be attributed to the small size of the data set and the low signal-to-noise ratio…Additionally, unsupervised methods tend to perform better than supervised methods.”

SHARE
Previous articleAccounting data as investment factors
Next articleMacro trends for trading models
Ralph Sueppel is founder and director of SRSV, a project dedicated to socially responsible macro trading strategies. He has worked in economics and finance for over 25 years for investment banks, the European Central Bank and leading hedge funds. At present, he is head of research and quantitative strategies at Macrosynergy Partners.