Reward-risk timing refers to methods for allocating between a risky market index and a risk-free asset. It is a combination of reward timing, based on expected future risk asset returns, and volatility timing, based on recent price volatility. A new paper proposes to use machine learning with random forests for estimating both risk premia (return expectations) and optimal lookback windows for volatility estimates This method allows for non-linear prediction interaction and averages forecasts across a range of simplistic valid prediction functions. In an empirical analysis with data going back to 1952 the random forest method for reward-risk timing has outperformed other methods and earned significantly higher risk-adjusted returns than a buy-and-hold strategy.

Pinelis, Michael and David Ruppert (2020), “Machine Learning Portfolio Allocation”.

The below are quotes from the paper. Headings, cursive text and text in brackets has been added.
The post ties in with this site’s summary on quantitative methods for macro efficiency.

The basics of reward-risk timing

“Reward-risk market timing…models the market price of risk to determine the optimal weights [of] a market index and the risk-free asset…in the portfolio.”

“Expected-return or reward-timing involves adjusting portfolio allocation according to beliefs about future asset returns…This is akin to benchmark timing, the active management decision to vary the managed portfolio’s beta with respect to the benchmark.”

“Volatility- or risk-timing is a newer idea. While there is a wide array of volatility-based portfolio allocation strategies, this paper derives directly from the utility maximization principle a strategy that naturally depends on both the return and volatility. With this methodology, the portfolio weight in the risky asset is inversely proportional to the recent volatility… Changes in volatility over time are not offset by proportional changes in returns.”

Reward-risk timing is the combination of both return- and volatility-timing. Return timing can be profitable with superior forecasting ability, yet ignoring the risk associated with a high return, for instance, would lead to poor risk-adjusted performance. The incorrect forecasts are not mitigated by their risk. On the other hand, volatility-timing is advantageous if the risk is not compensated fully by the reward, yet there may be cases when in fact the reward overcompensates the risk. Timing the market with the price of risk accounts for the drawbacks of these individual approaches.”

A methodology

“This paper provides a unifying framework for machine learning applied to both return and volatility-timing.”

“Machine learning methods have been shown to be suitable and advantageous for the difficult task of identifying the regimes in the markets…Taking advantage of the allowance for nonlinear predictor interactions in machine learning models gives better return forecasts and parameter values in a volatility estimator based on market conditions.”

“This paper studies how the machine learning method of Random Forest can forecast the sign of the risk premia with past dividend yields. Then a separate Random Forest model is employed to predict the optimal parameters of a volatility estimator. Specifically, we apply the model to estimate the volatility reference window as a function of lagged volatilities…We propose a dynamic volatility estimator that changes the look-back window length…based on the optimal portfolio weight. To best respond to market conditions, one needs a volatility estimator that itself responds to market conditions as well.”

“A Random Forest is an ensemble machine learning algorithm …The prediction by the Random Forest is the majority vote across all the individual decision tree learners… Averaging over predictions reduces the variance and stabilizes the trees’ forecast performance… Random forests give an improvement over bagging [standard forms of bootstrap aggregation] with a variation designed to reduce the correlation among trees grown from different bootstrap samples. If most of the bootstrap samples are similar, the trees trained on these sample sets will be highly correlated…Trees are de-correlated with a method known as ‘random subspace’ or ‘attribute bagging,’ which considers only a random subset of m predictors out of p for splitting at each potential branch…Since each tree is grown with different sets of predictors, the average correlation among trees further decreases and the variance reduction relative to standard bagging is larger.”

“We perform two tasks with machine learning that give the weight of the market index in our portfolio. First, we predict if the market excess return next month will be positive with lagged net payout yields and risk-free rates as the predictor variables. Second, we estimate the prevailing volatility with lagged values for a volatility proxy. The weight of the equity index is proportional to the probability that the next month’s return exceeds that of the risk-free asset and inversely proportional to the volatility estimate. This gives us a series of out-of-sample portfolio returns and corresponding performance metrics. Finally, the same procedure is performed on a holdout set, data that provides a final estimate of the models’ performance after they have been trained and validated, to test against backtest-overfitting.”

Empirical findings

“The strategies begin on January, 1952…It is important that the data that trains a machine learning model is large enough.”

Reward-risk timing with machine learning provides substantial improvements in investor utility, alphas, Sharpe ratios, and maximum drawdowns, after accounting for transaction costs, leverage constraints, and on a new out-of-sample test set.”

“We find economically and statistically significant gains from using machine learning to dynamically allocate between the market index and the risk-free asset…Our results document that a portfolio allocation strategy that employs machine learning to reward-risk time the market gives an 95% improvement in investor utility and earns a large alpha of 4%…Comparing the performance of linear regression for reward-risk timing, we show that machine learning outperforms by a significant margin.”

“The investments that reward-risk time realize relatively steady gains. The final wealth accumulates to around $1,500 and $500 at the end of the sample for the machine learning and base (expanding sample mean reward estimate and previous month realized volatility risk estimate) strategies, respectively, versus about $400 for the buy-and-hold.”

“An investors who starts with $1 in 2011 and reward-risk times with machine learning achieves outperformance relative to the market and other strategies again. Therefore, the results cannot be easily explained by the particular choice of machine learning model parameters.”

The risk-adjusted returns from machine learning portfolio allocation are substantially higher than reward-risk timing with no model and the buy-and-hold… All the active strategies outperform the buy-and-hold on a risk-adjusted basis for each out-of-sample period. Reward-risk timing with Random Forest gives the highest Sharpe ratio of 0.60 from 1952-2010, which is a 40% increase from the buy-and-hold. An investor who reward-risk times with machine learning gains more than 2 percentage points on return per year relative to passively investing, without increasing the risk.”