Any asset can use a portfolio of similar assets to hedge against its factor exposure. The factor residual risk of the hedged position is called statistical arbitrage risk. Consequently, the statistical arbitrage risk premium is the expected return of such a hedged position. A recent paper shows that both theoretically and empirically this premium rises in the stock’s statistical arbitrage risk. ‘Unique’ stocks have higher excess returns than ‘ubiquitous’ stocks. The estimated premium is therefore a valid basis for investment strategies. Statistical arbitrage risk can be estimated by using ‘elastic net’ estimation and related machine learning. This method selects a relatively small hedge portfolio from a large array of candidate stocks.

The below are mostly quotes from:
Leung, Raymond and Yu-Man Tam (2021), “Statistical Arbitrage Risk Premium by Machine Learning”.
with some additional quotes whose sources are referenced below.

The post ties up with this site’s summary on implicit subsidies.

Understanding the statistical arbitrage risk premium

“How to hedge factor risks without knowing the identities of the factors? We…prove a general theoretical result: even if the exact set of factors cannot be identified, any risky asset can use some portfolio of similar peer assets to hedge against its own factor exposures. A long position of a risky asset and a short position of a ‘replicate portfolio’ of its peers represent that asset’s factor residual risk…[Analogously] the statistical arbitrage risk premium is the expected return of the residual factor risks of a given stock.”

“Under weak economic and technical conditions, the [statistical arbitrage risk premium] is non-zero. Moreover, one does not need to know a priori what are the underlying factors that drive the economy… The challenge in empirically estimating [the premium] is finding the peers for each asset and constructing the replicate portfolios.”

“We call the [inverse of a statistical] projection goodness of fit [or R2] of each stock i [based on other stocks] the statistical arbitrage risk of stock i; we say a stock has high statistical arbitrage risk if it has a low goodness of fit.”

“The core message of this paper is: statistical arbitrage risk premium is increasing in statistical arbitrage risk.”

Measuring the statistical arbitrage risk premium of a stock

“Given any stock, what portfolio of all other stocks is most similar to it? Suppose all stocks are exposed to the same set of linear factors but with [different] factor loadings. If one can identify a group of peers that is the most ‘similar’ to a given stock i, then this portfolio is also exposed to similar factor loadings of this stock i. We view this portfolio of peer stocks as the replicate of stock i. A long position on stock i and a short position on its replicate will expose the holder to any remaining factor risks of stock i that cannot be completely hedged out by its peer stocks. We show [that] this long-short position exactly equates to the residual factor risks of stock i. This long-short position does not require [knowing] the true underlying factor structure of the economy.”

“We use the elastic-net, a machine learning method, to project each stock’s past returns onto that of every other stock. The resulting high-dimensional but sparse projection vector serves as investment weights in constructing the stocks’ replicate portfolios. We say a stock has high (low) statistical arbitrage risk if it has low (high) R-squared with its peers.”

“For each month-end, we use the elastic-net estimator…to project each stock i’s past twelve months’ daily returns onto the returns of every other stock in the market. The resulting elastic-net projection vector is high-dimensional but very sparse [i.e. it select only a small portfolio, in position size and number of stocks, from a large range of candidate stocks]. After a suitable normalization, the projection vector is then used as investment weights into all stocks other than i. The resulting portfolio is hence a machine learning constructed replicate of stock i….The time-series average return from a long position of stock i and a short position of its replicate is the statistical arbitrage risk premium of stock i.”

Understanding the elastic net method

“Elastic Net is an extension of linear regression that adds regularization penalties to the loss function during training…[It] is a popular type of regularized linear regression that combines two popular penalties, specifically the L1 and L2 penalty functions [penalizing absolute and squared coefficient values respectively].” [Jason Brownlee]

“In statistics, there are two critical characteristics of estimators to be considered: the bias and the variance. The bias is the difference between the true population parameter and the expected estimator. It measures the accuracy of the estimates. Variance, on the other hand, measures the spread, or uncertainty, in these estimates…Both the bias and the variance are desired to be low, as large values result in poor predictions from the model.

  • The OLS estimator has the desired property of being unbiased. However, it can have a huge variance…The general solution to this is: reduce variance at the cost of introducing some bias. This approach is called regularization and is almost always beneficial for the predictive performance of the model…As the model complexity, which in the case of linear regression can be thought of as the number of predictors, increases, estimates’ variance also increases, but the bias decreases…We regularize: to lower the variance at the cost of some bias, thus moving left on the plot, towards the optimum…
  • In Ridge Regression, the OLS loss function is augmented in such a way that we not only minimize the sum of squared residuals but also penalize the size of parameter estimates, in order to shrink them towards zero.
  • Lasso, or Least Absolute Shrinkage and Selection Operator, is quite similar conceptually to ridge regression. It also adds a penalty for non-zero coefficients, but unlike ridge regression which penalizes sum of squared coefficients (the so-called L2 penalty), lasso penalizes the sum of their absolute values (L1 penalty). As a result, for high values of the penalty coefficient, many coefficients are exactly zeroed under lasso, which is never the case in ridge regression…
  • Elastic Net [is a] a convex combination of Ridge and Lasso…Elastic Net first emerged as a result of critique on lasso, whose variable selection can be too dependent on data and thus unstable. The solution is to combine the penalties of ridge regression and lasso to get the best of both worlds.” [Michael Oleszak on Datacamp Community]

“The elastic-net estimator…encompasses the special cases of the ordinary least squares (OLS) estimator, least absolute shrinkage and selection operator (LASSO) estimator, and the ridge estimator. The hyperparameters control the strength of the L1- and L2-norm penalties, respectively. In this paper when we refer to the elastic-net estimator, we always refer to the case when [both norm penalties] are both strictly positive. In our actual implementation, we use a 3-fold cross-validation procedure to empirically select the hyperparameters.”

Machine learning methods can substantially shrink down the number of factors that can explain the cross-section of returns. …There are only two purposes of using a machine learning method in this paper: to identify the statistical arbitrage risk of each stock, and to construct the replicate portfolio of each stock… By using a machine learning method…the selection of a stock’s risky peers is completely data driven. ..The estimation and inference of statistical arbitrage risk premium for each stock use conventional empirical asset pricing procedures.”

Key findings and lessons for trading

“We use both the CRSP daily and monthly data from December 31, 1974 to December 31, 2020…We do include effectively all [U.S.] stocks, except for the most extremely illiquid or dead.”

“The key finding is that ‘unique’ stocks have both a higher statistical arbitrage risk premium and higher excess returns than ‘ubiquitous’ stocks: in the cross-section, high statistical arbitrage risk stocks have a monthly statistical arbitrage risk premium that is 1.1% greater than low statistical arbitrage risk stocks…Our main result [that] statistical arbitrage risk premium is increasing in statistical arbitrage risk is robust after controlling for risk factors and other characteristics.”

“Low statistical arbitrage risk stocks tend to be smaller stocks while high statistical arbitrage risk stocks tend to be bigger stocks.”

“The core empirical message of this paper can be summarized as: the statistical arbitrage risk premium is increasing in statistical arbitrage risk. That is in the cross-section ‘unique” stocks (having low R2, and hence high statistical arbitrage risk) have a higher statistical arbitrage risk premium than ‘ubiquitous’ stocks. Over the sample period of January 31, 1976 to December 31, 2020, high statistical arbitrage risk stocks have a monthly statistical arbitrage risk premium of 1.368% and low statistical arbitrage risk stocks have a monthly premium of 0.267%, and the difference is highly statistically significant….[Also] we have the important corollary that high statistical arbitrage risk stocks have a monthly return of 1.481% and low statistical arbitrage risk stocks have a monthly return of 0.771%, and the difference is also highly statistically significant.”

“The average statistical arbitrage risk across all stocks is countercyclical.”

“The statistical arbitrage risk factor is clearly a tradable portfolio. [The figure below] shows the cumulative returns from December 31, 1975 to December 31, 2020 of an initial $100 investment on our statistical arbitrage risk [SAR] factor and other factors, and plots the log cumulative returns.”