A powerful statistical method for selecting macro factors for trading strategies is the “Elastic Net”. The method simultaneously selects factors in accordance with their past predictive power and estimates their influence conservatively in order to contain the influence of accidental correlation. Unlike other statistical selection methods, such as “LASSO”, the “Elastic Net” can make use of a large number of correlated factors, a typical feature of economic time series.
Garcia, Juan Angel and Sebastian Werner (2016), “Bond risk premia, macroeconomic factors and financial crisis in the euro area”, ECB Working Paper, No 1938 / July 2016.
Zou, Hui and Trevor Hastie (2005), “Regularization and variable selection via the elastic net”, Journal of the Royal Statistical Society: Series B.
Fischetti, Tony (2015), “Kickin’ it with elastic net regression”, R bloggers, August 2015.
For the use of Elastic Net in R one can use the package ‘elastic net’, view reference manual here.
The post ties in with the subject of macro information efficiency (summary page here).
The below are excerpts from the papers and post. The initial section, headings and some other cursive text has been added for context.
Conventional statistical methods for selecting macro factors
Stepwise regression is an automated regression-based procedure to select predictors in a forecast model. For example, forward stepwise regression starts with the coefficients of all potential factors set equal to zero. Then it finds the factor that predicts future returns ‘best’ and adds it to the forecast model. Thereupon it computes the residual of forecasts based on that factor alone. Then the procedure adds the factor that best predicts the residuals. This continues until there are no more factors left that improve the model.
Stepwise regression is effectively data mining and not considered good practice by statisticians. In practice, it can create models that include many accidentally correlated factors and give heavy weights to factors that were accidentally highly correlated over the sample period.
Ridge regression is a multivariate regression technique particularly suitable for estimating coefficients of predictors that are highly correlated among themselves (multicollinearity), a common feature of macro factors that are linked to the global business cycle. In ordinary least-squares regression such multicollinearity means great uncertainty around coefficient estimates.
Ridge regression ‘shrinks’ the size of estimated coefficients: Instead of just minimizing the residual sum of squares it imposes a penalty proportional to the sum of squared coefficients and hence reduces their size. This means that estimators are usually biased to the low side but less prone to large deviations from the ‘true’ values. This is also called ‘regularization’ and mitigates distortions that arise from sample idiosyncrasies. Also, ridge regression is less likely to heavily over- or understate the role of individual macro factors in trading strategies. As a result, ridge regression estimates tend to be stable in the sense that they are usually little affected by small changes in the data on which the fitted regression is based.
Ridge regression on its own is not a formal process for selecting factors, but just for giving coefficients to those that were selected. Ridge regression cannot produce a parsimonious model, for it always keeps all the predictors in the model. Moreover, the estimation depends strongly on the parameter that determines the penalty for coefficient sizes and, hence, on judgment.
LASSO is a statistical method for selecting predictors and ‘shrinking’ the coefficients of factors in the context of linear regression. The acronym stands for ‘least absolute shrinkage and selection operator’. Like regular OLS regression the LASSO minimizes the sum of squared errors but, similar to ridge regression, also penalizes the size of coefficients. Unlike in ridge regression the penalty is imposed on the sum of absolute – not squared – coefficient estimates and, hence, when the parameter for the penalty is sufficiently large in LASSO many coefficients are driven to zero, giving a parsimonious set of factors for efficient prediction. Hence, the “regularization” in LASSO not only prevents individual factors from being weighted to heavily but also contains the numbers of factors used for forecasting and hence reduces the tendency of “overfitting”.
A drawback of LASSO is that it can only select a limited number of factors. In particular, it will often pick only one of a set of correlated factors, even if the selection is based on flimsy empirical evidence and if the choice of several factors would add to model stability.
LARS means ‘least angle regression’ and is a model selection method that is similar to forward stepwise regression. However, it does not add factors to the forecast model ‘fully’. Instead, the coefficient of a factor is increased in the direction of its correlation until that predictor is no longer the one most correlated with the residual future return. Then the next most correlated factor is added to the model. The LARS algorithm can be viewed as a particular method for running the LASSO.
Features and advantages of the “Elastic Net”
“We propose a new regularization technique which we call the elastic net. Similar to the lasso, the elastic net simultaneously does automatic variable selection and continuous shrinkage, and it can select groups of correlated variables. It is like a stretchable fishing net that retains ‘all the big fish’…Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together. The elastic net is particularly useful when the number of predictors is much bigger than the number of observations.” [Zhou and Hastie]
“Ridge regression is a really effective technique for thwarting overfitting…by penalizing the L2 norm (Euclidean distance) of the coefficient vector which results in “shrinking” the beta coefficients… Lasso regression is a related regularization method. Instead of using the L2 norm, though, it penalizes the L1 norm (Manhattan distance) of the coefficient vector. Because it uses the L1 norm, some of the coefficients will shrink to zero… Elastic net regression is a hybrid approach that blends both penalization of the L2 and L1 norms.” [Fischetti]
“The advantages of the Elastic Net selection criterion…can be better understood by considering two other criteria it nests as special cases, namely the standard ridge regression and [LASSO]…The penalty coefficient [of the LASSO type] contributes to both shrinkage and variable selection. The penalty coefficient [of the ridge regression type] helps to overcome two problems of the LASSO selection criterion…[i]the LASSO criterion tends to select only one factor from a group and the within-group selection is often not robust…[ii] when confronted with a data set in which the number of potential factors is much higher than the number of observations the LASSO criterion can select at most [as many factors as there are observations].” [Garcia and Werner]
“The Elastic Net is implemented using Least Angle Regression (LARS). Initially, all factor coefficients are set to zero …and we search for the vector of predictors most correlated with our vector of excess returns…At each iteration, the algorithm computes the residuals of a regression of the response vector on the by-then selected factors, and expands the set of selected factors by moving [their coefficients] in the direction of the sign of their correlation until some other factor s as strongly correlated with the current residual as the already-selected factors are…At each iteration we find the factor with the highest correlation with the current residual, then update…and move that factor into the selected set.” [Garcia and Werner]
An application of “Elastic Net” to macro factors
“Our analysis focuses on the predictive power of macroeconomic factors for [euro area sovereign] bond premia, and on the impact of the financial and economic crisis on that predictive power.” [Garcia and Werner]
“We employ the Elastic Net estimator…a variable selection…We can evaluate a large number of potential determinants: 132 monthly macroeconomic indicators including both euro area wide and country-specific information that reflect the data-rich environment for euro area markets…We can select observable factors based on their explanatory power for bond premia…Elastic Net is particularly suitable for small sample analysis, which fits well with the short history of the euro area and our goal of investigating the financial crisis impact.” [Garcia and Werner]
“We show that macroeconomic factors display a strong predictive power throughout our sample. First, we report that individual economic activity and economic sentiment indicators (around 15%) and prices (around 10%) explained a significant proportion of the variation in bond risk premia prior to the start of the crisis. Moreover, during the financial crisis their relevance, as that of other macroeconomic indicators (e.g. labour market) rose significantly…On average, macro factor models can explain 38% of the variability of risk premia in euro bond markets before the crisis, and around 55% during the financial crisis. Moreover, their performance is fairly consistent both for core bond markets (41% to 62%) and periphery countries (from around 35% to 44%).” [Garcia and Werner]