The basic idea behind factor models is that a large range of assets’ returns can be explained by exposure to a small range of factors. Returns reflect factor risk premia and price responses to unexpected changes in the factors. The theoretical basis is arbitrage pricing theory, which suggests that securities are susceptible to multiple systemic risks. The statistical toolkit to estimate factor models has grown in recent years. Factors and exposures can be estimated through various types of regressions, principal components analysis, and deep learning, particularly in form of autoencoders. Factor risk premia can be estimated through two-pass regressions and factor mimicking portfolios. Stochastic discount factors and loadings can be estimated with the generalized method of moments, principal components analysis, double machine learning, and deep learning. Discount factor loadings are particularly useful for checking if a new proposed factor does add any investment value.
The below summary is based on quotes from the paper and a few additional sources (linked next to the quote). Headings, cursive text, and text in brackets have been added. Also, mathematical symbols used in sentences in the original paper have been replaced by text for easy readability. This post ties in with this site’s summary on statistical methods.
Basics of factor models
“According to a factor model, the return-generating process for a security is driven by the presence of the various common factors and the security’s unique sensitivities to each factor (factor loadings). The common factors may be readily identifiable fundamental factors such as price-earning ratio, size, yield, and growth. Factor models can be used to decompose portfolio risk according to common factor exposure and to evaluate how much of a portfolio’s return was attributable to each common factor exposure.” [CFA]
“Arbitrage pricing theory provides a rigorous economic motivation for factor models. The [theory] describes how statistical factor representations are directly tied to foundational economic concepts, such as risk exposures and risk premia, which govern the risk-return trade-off.”
“Arbitrage pricing theory takes the view that systematic risk need not be measured in only one way…Academic and commercial research suggests that several primary sources of risk consistently impact stock returns. These risks arise from unanticipated changes in investor confidence, interest rates, inflation, real business activity, and a market index. Every stock and portfolio has exposures (or betas) with respect to each of these systematic risks. The pattern of economic betas for a stock or portfolio is called its risk exposure profile. Risk exposures are rewarded in the market with additional expected return, and thus the risk exposure profile determines the volatility and performance of a well-diversified portfolio. The profile also indicates how a stock or portfolio will perform under different economic conditions.” [CFA]
“Factor models are natural workhorses for modelling equity returns because they offer a parsimonious statistical description of returns’ cross-sectional dependence structure.”
“Factor models will continue to be central to empirical asset pricing in coming years…We survey the next generation of factor models with an emphasis on high-dimensional settings and…tools of machine learning. Our recapitulation highlights a recent revival of (highly sophisticated) methodological research into factor modelling in asset markets.”
“The most promising direction for…empirical asset pricing research is…a genuine fusion of economic theory and machine learning…as asset pricing theory revolves around price formation through aggregation of investor beliefs, which undoubtedly enter prices in subtle, complex, and sometimes surprising ways. At the same time, machine learning constitutes a sophisticated quiver of statistical models that flexibly adapt to settings with rich and complex information sets.”
Static and conditional factor models
In a static factor model, asset returns are the sum of  expected returns,  factor exposures times unexpected changes in factors, and  asset-specific unexplainable return volatility. Meanwhile, the expected return is equal to the sum of  pricing error (alpha) and  factor loadings times factor risk premia.
“In its simplest form, a static factor model can be written as
return = expected return + betas x factor innovations + idiosyncratic error.
expected return = alpha + betas x factor risk premia”
“The most common framework in academic finance literature assumes that factors are known and observable. An example would be industrial production growth…A second framework, which has regained popularity recently…assumes that all factors and their exposures are latent…A third framework assumes factor exposures are observable, but the factors are latent. This is arguably the most prevalent framework for practitioners…The popularity of this model stems from the fact that it conveniently accommodates time-varying exposures of individual equity returns.”
“One might argue that the static model is suitable for certain portfolios…but it is clearly inadequate for most individual assets…Risk exposures of individual stocks very likely change over time…More pointedly, assets with fixed maturities and nonlinear payoff structures (e.g., options and bonds) experience mechanical variation in their risk exposures as their maturity rolls down or the value of the underlying asset changes. In this case, a factor model should accommodate conditional risk exposures.”
“The conditional factor model can be specified [like a static model, but with the factor exposures, i.e. betas, and the factor risk premia changing over time] …Obviously [this model] contains too many degrees of freedom and… cannot be identified without additional restrictions… [A common convention] imposes that the factor exposures are linear functions of a constant vector of exposures…Consequently, the model becomes:
return = time-variant alpha + time-variant latent factor exposures x time-variant latent factors + idiosyncratic error”
Estimating factors and exposure
“In a factor model, the total variance of an asset can be decomposed into a systematic risk component driven by covariances with the factors and a component that is idiosyncratic to the asset. There are many factor modeling strategies available that differ in their assumptions about whether or not factors and their exposures are assumed known, and whether the model uses a conditional or unconditional risk decomposition.
- For a static factor model, if factors are known, we can estimate factor exposures via asset-by-asset time-series regressions…
- For a conditional factor model, if factors are latent, but exposures are observable…we can estimate factors by cross-sectional regressions at each time point…This approach is most commonly used for individual stocks, for which their loadings can be proxied by firm characteristics. It is convenient for the cross-sectional regression to accommodate time-varying characteristics…
- If neither factors nor loadings are known, we can resort to principal components analysis to extract latent factors and their loadings. Principal components analysis can identify factors and their loadings up to some unknown linear transformation…[and] extracts information about latent factors solely from realized return covariances…This decomposition yields a pair of estimates of factor innovations and exposures…Said differently, a rotation of factors and an inverse rotation of betas leaves model fits exactly unchanged. While allowing for latent factors and exposures can add great flexibility to a research project, this rotation indeterminacy makes it difficult to interpret the factors in a latent factor model…The principal components approach is also applicable if some but not all factors are observable.
- A limitation of principal components analysis is that it only applies to static factor models. It also lacks the flexibility to incorporate other data beyond returns. To address both issues [one can] estimate the conditional factor model [in form of] instrumental principal components analysis…Given conditional betas, factors are estimated from cross-section regressions of returns on betas [which] accommodates a potentially large number of characteristics…Conditional betas can be recovered from panel regressions of returns onto characteristics interacted with factors.
- Deep learning [can be applied] to return factor models [through] a conditional autoencoder to explicitly account for the risk-return trade-off. The machine learning literature has long recognized the close connection between autoencoders and principal components analysis. However, [one can] introduce additional conditioning information into the autoencoder specification. The autoencoder allows betas to depend on stock characteristics in a more realistic, nonlinear way…[The figure below] illustrates the model’s basic structure…On the left side of the network, factor loadings are a nonlinear function of covariates (e.g., firm characteristics), while the right side of the network models factors as portfolios of individual stock returns.”
Note: “Autoencoders [are] a type of algorithm with the primary purpose of learning an ‘informative’ representation of the data that can be used for different applications by learning to reconstruct a set of input observations well enough”. [Michelucci]
A quick reminder on how to address the overfitting problem
“The high capacity of a neural network model enhances its flexibility to construct the most informative features from data. With enhanced flexibility, however, comes a higher propensity to overfit.”
“To curb overfitting, the entire sample is typically divided into three disjoint subsamples that maintain the temporal ordering of the data. The first, or ‘training’, subsample is used to estimate the model subject to a specific set of tuning hyperparameter values. The second, or ‘validation’, subsample is used for tuning the hyperparameters. Fitted values are constructed for data points in the validation sample based on the estimated model from the training sample. Next, the objective function is calculated based on errors from the validation sample, and hyperparameters are then selected to optimize the validation objective. The validation sample fits are of course not truly out-of-sample because they are used for tuning, which is, in turn, an input to the estimation. Thus, the third, or ‘testing’ subsample is used for neither estimation nor tuning. It is thus used to evaluate a method’s out-of-sample performance.”
“The most common machine learning device for guarding against overfitting is to append a penalty to the objective function in order to favor more parsimonious specifications. This regularization approach mechanically deteriorates a model’s in-sample performance in the hope of improving its stability out-of-sample. This will be the case when penalization manages to reduce the model’s fit of noise while preserving its fit of the signal.”
“In addition to [linear] l1-penalization [one can] employ a second machine learning regularization tool known as ‘early stoppin’. By ending the parameter search early, as soon as the validation sample error begins to increase, parameters are shrunken toward the initial guess, for which parsimonious parameterization is often imposed. It is a popular substitute to [quadratic] l2-penalization…because of its convenience in implementation and effectiveness in combatting overfit.”
Estimating risk premia
“The risk premium of a factor is informative about the equilibrium compensation investors demand to hold risk associated with that factor. One of the central predictions of asset pricing models is that some risk factors, for example, intermediary capital or aggregate liquidity should command a risk premium: investors should be compensated for their exposure to those factors, holding constant their exposure to all other sources of risk.”
“For tradable factors – such as the market portfolio in the CAPM – estimating risk premia reduces to calculating the sample average return of the factor. This estimate is simple, robust, and requires minimal modeling assumptions. However, many theoretical models are formulated with regard to non-tradable factors – factors that are not themselves portfolios – such as consumption, inflation, liquidity, and so on. To estimate risk premia of such factors it is necessary to construct their tradable incarnations. Such a tradable factor is a portfolio that isolates the non-tradable factor while holding all other risks constant. There are two standard approaches to constructing tradable counterparts of non-tradable factors: two-pass regressions and factor mimicking portfolios…
- The classical two-pass, or Fama-MacBeth, regressions requires a model like [a static factor model] with all factors observable. The first time-series pass yields estimates of betas [or factor exposures] using regressions. Then the second cross-sectional pass estimates risk premia via an ordinary least squares (OLS) regression of average returns on the estimated beta…
- Factor mimicking portfolios [are] an inference procedure that regresses realized returns at each time t onto estimated betas…then estimating the risk premium as the time-series average of fitted portfolio returns.”
Estimating the stochastic discount factors and loadings
“A factor’s risk premium is equal to its (negative) covariance with the stochastic discount factor (SDF). The SDF is central to the field of asset pricing because, in the absence of arbitrage, covariances with the SDF unilaterally explain cross-sectional differences in expected returns.”
“The vector of SDF loadings is related to mean-variance optimal portfolio weights. SDF loadings and risk premia are directly related through the covariance matrix of the factors, but they differ substantially in their interpretation. The SDF loading of a factor tells us whether that factor is useful in pricing the cross section of returns. For example, a factor could command a nonzero risk premium without appearing in the SDF simply because it is correlated with the true factors driving the SDF. It is thereby not surprising to see many factors with significant risk premia. For this reason, it makes more sense to tame the factor zoo by testing if a new factor has a non-zero SDF loading (or has a non-zero weight in the mean-variance efficient portfolio), rather than testing if it has a significant risk premium…
- Generalized method of moments: The classical approach to estimating SDF loadings is the generalized method of moments…We can formulate a set of moment conditions: The expected values of stochastic discount factors times returns are zero for all periods. The expected factor innovations are zero for all periods…
- Principal components analysis: The absence of near-arbitrage opportunities forces expected returns to (approximately) align with common factor covariances, even in a world where belief distortions can affect asset prices. The strong covariation among asset returns suggests that the SDF can be represented as a function of a few dominant sources of return variation. Principal components analysis of asset returns recovers the common components that dominate return variation…
- Double Machine Learning: A fundamental task facing the asset pricing field today is to bring more discipline to the proliferation of factors…[One can] address this question by systematically evaluating the contribution of individual factors relative to existing factors as well as for conducting appropriate statistical inference in this high-dimensional setting. [In general, with] machine learning methods…both regularization and overfitting cause a bias that distorts inference…A general double machine learning framework [can] mitigate bias and restore valid inference on a low-dimensional parameter of interest in the presence of high-dimensional nuisance parameters…[One can] use of this framework to test the SDF loading of a newly proposed factor [for example] via two respective lasso regressions…
- Deep Learning: Since the SDF (when projected onto tradable assets) is spanned by optimal portfolio returns, estimating the SDF is effectively a problem of optimal portfolio formation. A fundamental obstacle to the conventional mean-variance analysis is the low signal-to-noise ratio: Expected returns and covariances of a large cross-section of investable assets cannot be learned with high precision. [Recent academic articles] propose an innovative solution to the portfolio optimization problem by directly parametrizing portfolio weights as functions of asset characteristics, then estimate the parameters… [One approach] extends this framework to a more flexible neural network model and optimizes the Sharpe ratio of the portfolio (SDF) via reinforcement learning, with more than 50 features plus their lagged values…[An alternative approach] parametrizes the SDF loadings and weights of test asset portfolios as two separate neural networks, and adopts an adversarial minimax approach to estimate the SDF. Both adopt Long-Short-Term-Memory (LSTM) models to incorporate lagged time series information from macro variables, firm characteristics, or past returns.”