A new empirical paper provides evidence that the direction of daily equity returns in the Dow Jones has been predictable over the past 15 years, based on conventional short-term factors and out-of-sample selection and forecasting methods. Hit ratios have been 51-52%. The predictability has been statistically significant and consistent over time. Trading returns based on forecasting have been economically meaningful. Simple forecasting methods have outperformed more complex machine learning.

*The post ties in with SRSV’s summary on macro information (in-) efficiency.
*

*The below are excerpts from the paper. Emphasis and cursive text have been added.*

### Variables and methods to predict daily equity returns

“We take a comprehensive look at the directional predictability of daily [*equity*] returns. For this purpose, we use a data set consisting of all stocks that were part of the Dow Jones Industrial Average (DJIA) in 1996 and various statistical classification methods [*that select variables and model for forecasting*]… We consider 5-minute data for 30 stocks included in the DJIA on January 1, 1996 and several explanatory variables. The data set is obtained from the Thomson Reuters Tick History data base and ends on January 31, 2017.”

“We…focus on [*predictive*] variables that [i] exhibit meaningful variation on a daily frequency, [ii] are easily available, and [iii] for which a plausible economic argument can be made….This results in the following list of 24 explanatory variables…

**Realized measures of moments**: log-realized variance, high-low variance, and realized skewness.**Financial market indicators**: S&P 500 return, realized betas calculated from S&P 500 5-minute returns, log-realized variance of the S&P 500, level VIX, VIX return, and oil return.**Risk aversion indicators**: variance risk premium.**Yield curve measures**: level and change of first principal component (level of the yield curve), second principal component (slope of the yield curve), and third principal component (curvature of the yield curve).**Technical indicators**: stock return, 5-day moving average stock return, on-balance volume, 12-day moving average of binary stock returns, momentum indicator, A/O oscillator (difference of 34 and 5 period moving averages), and rate-of-change indicator.”

“With 24 possible explanatory variables and a low signal-to-noise ratio, __we require a model selection procedure to obtain a more parsimonious model__. In general, a model selection procedure consists of two components, a goodness-of-fit criterion to evaluate the performance of each candidate model, and a rule that defines which models are considered as candidate models.”

“The __classification methods include logistic regression, generalized additive models, neural networks, support vector machines, random forests, and boosted classification trees__. For each method, the relevant explanatory variables are selected in the subsample from 1996 to 2003 based on a forward selection procedure that utilizes cross-validation techniques. Subsequently, the predictive performance of the selected models is evaluated in an out-of-sample environment for the period from 2004 to 2017, where each model is re-estimated in a rolling window to generate one step-ahead forecasts. Since the model selection and the forecasting period are strictly separated, the procedure mimics the situation a forecaster would face in real time.”

“The size of the selected models varies between 7 variables for the logistic regression and 13 variables in the random forest. __All of the selected models contain the lagged S&P 500 return as well as the lagged return of the respective stock itself__. Furthermore, the technical indicators 5-day moving average return, and the A/O oscillator are included by all models, except for the boosted classification tree… The variables not included in [*the table below*] were not selected by any of the models.”

### Evidence of predictability

“Directional predictability on a daily frequency exists, it is of a magnitude that is statistically significant, and it is consistent over time.”

“The hit rate is…the proportion of returns that are correctly classified…__Among the classifiers, the logistic regression achieves the highest hit rate with 51.99 percent, followed by the generalized additive model with 51.35 percent__. The forecast with the lowest out-of-sample hit rate is the random forest with a hit rate of 50.51 percent…Benchmarks obtain hit rates between 50.03…and 50.86 percent.”

“The directional __predictability of daily stock returns is consistent over time and not restricted to recessions as it seems to be the case for level predictability at lower frequencies__…Parametric logistic regression model generates the best predictions, whereas non-parametric machine learning techniques are too flexible and too prone to overfitting.”

### Evidence of profitability

“In addition to the statistical significance of the results, we also consider their economic significance. For this purpose, we propose trading strategies that are suitable to exploit directional predictability… __predictability is shown to be…of a magnitude that is economically meaningful so that it can be exploited by suitable trading strategies__.”

“Directional forecasting is an attempt to time the market. It is therefore obvious to buy stocks which are expected to have positive returns and to sell stocks which are expected to have negative returns. This is the __basis for a long-short equity strategy that trades a value neutral portfolio__ with zero net-investment and where the market risk is hedged.”

“The strategy is then implemented as follows:

- For each day sort all stocks in ascending order according to the predicted probabilities [of positive return]
- Form pairs of stocks so that the stock with the lowest probability to have a positive return and the stock with the highest probability are matched together, the stock with the second lowest and highest probability are matched together, and so on.
- For all pairs where the difference between the probabilities to have a positive return is at least a specific percent, buy the stock that is more likely to go up and sell the stock that is less likely to go up.”

“To account for trading costs… we consider actual [*time-varying*] bid-ask spreads that account for the largest proportion of transaction costs.”

“As a benchmark, we report performance measures for buying and holding the S&P 500. This benchmark doubled between 2004 and 2017 and is closely related to the optimist forecast, since an investor that predicts a positive return for every stock and every trading day could simply buy and hold the index portfolio.”

“Even after accounting for bid-ask spreads…__all performance measures indicate [ positive returns and] superior performance of the trading strategy compared to the benchmark__… the cross section of the DJIA data set considered is relatively small. It is therefore likely that the trading performance can be further improved by considering a larger asset universe.”

### Rejecting the efficient market hypothesis

“There is a consensus that daily stock returns are unpredictable…The __efficient market hypothesis requires that asset prices fully reflect all publicly available information at all times__. Price changes [*are assumed to*] reflect the arrival of new information, which is unpredictable by definition. This gives rise to the random walk hypothesis for the level of prices…Theoretical arguments are supported by the empirical findings in the [*academic*] literature. Research along these lines culminated in the Nobel prize being awarded to Eugene Fama in 2013 for ‘showing that asset prices are extremely hard to predict in the short term’.”

“__Our findings are in clear contradiction to the random walk hypothesis__. However, in its weakest form, the efficient market hypothesis allows for deviations from the random walk, as long as these cannot be exploited due to transaction costs. Statistical significance is therefore not equivalent to an economically meaningful violation of the efficient market hypothesis. Nevertheless, even after accounting for transaction costs, we observe significant alphas.”