Dynamically Retraining Models for Stock Forecasting

  • By Paul Wilcox
  • December, 18
Blog Dynamically Retraining Models for Stock Forecasting

Erez Katz, CEO and Co-founder of Lucena Research 

Yogi Berra the famous baseball-playing philosopher once said: “’It’s tough to make predictions, especially about the future’”.

Indeed, making accurate predictions is exponentially more difficult when attempting to forecast the financial markets. Here are just some of the specific challenges with financial forecasting:

  • – Data is naturally noisy.
  • – Data at times can be intentionally deceiving in order to obfuscate other investors’ true intentions
     – Markets can be parabolic, especially when the outcome is subject to unpredictable swings in human psychology such as sentiment, emotions, fear, and greed.
  •  –  Most importantly, most non-financial models are trained to solve a stationary state. The financial markets, by contrast, are dynamic in nature, varying significantly by market regimes.
  •  

Adjusting AI Models

To support this notion of non-stationary outcomes, AI models must learn to adjust in order to accommodate new market conditions. Intuitively, changes in market regimes should yield new models to drive new investment objectives. For example, let’s consider a risk-on/risk-off scenario. 

A risk-on scenario would naturally benefit from models set to maximize return at the expense of higher volatility. Conversely, a risk-off scenario would target low volatility and capital preservation. Factors and models that have worked well in the past may no longer apply and a new set of multi-factor models are needed for the strategy to take effect.

Three common methods for combating the non-stationary nature of the market:

  • – Expert voting ensembles use a collection of uncorrelated models, each trained for a different market regime. All models cast their votes towards a common securities basket, selecting constituents they believe present the highest return at the highest confidence score. The constituents with the highest vote count are selected to enter the portfolio. It’s important to note that not all votes are equal. Hence, each vote is assigned a weight based on the model’s historical accuracy. This mechanism creates the dynamics by which models come in and out of relevance based on the applicable market conditions.
  •  
  • – Roll forward retraining logic allows the model to retrain from scratch on a predetermined time interval or on demand when the model falls below a certain performance threshold
  •  
  •  – A combination of the above two combines expert voting ensembles and roll forward retraining to ensure the most robust and reactive set of logic
  •  
  •  
  •  
  • Creating Your Model Retraining Logic 

  •  

  •  
  • In order to make the retraining logic work well historically and perpetually, one must establish the mechanism that guides the roll forward logic in advance. This process utilizes hyper parameter optimization, in essence conducting a series of mini-tests to validate various configuration parameters, including:

  •  
  •  – Look back window
  •  – Retraining frequency
  •  – Forecasting period
  •  – and much more

This process requires a lot of computing power and carefully crafted code to sift through and score the outcome of each configuration. Consequently, the challenge remains how to empower the end user (the investment professional) with all these capabilities without overwhelming. QuantDesk, our cloud-based quant research platform is set to address exactly that.

 

Roll Forward retraining

Simulation of a rolling window in a backtest and on a roll forward basis in production.

 

Enabling Retraining Logic on QuantDesk

While the hyper parameter tuning and retraining concept above have been used by our framework for quite some time, we have recently exposed this powerful retraining functionality to the front-end non-quant user on QuantDesk.

Our goal is to allow users to achieve a happy medium of two conflicting goals: equip the user with the flexibility to control the retraining logic while still making the UI accessible and easy to understand.

On QuantDesk, users have the ability to control:

  •  – Which feature pool can be used for feature selection during retraining
  •  – Which features the model must use (human discretion driven)
  •  – How often should a retraining occur
  •  – Control some of the AI / retraining parameters such as:
  •     – Max number of features per model
  •     – Min retention threshold – used to dictate how specific the model should be
  •  

QuantDesk Event Study with Retraining

QuantDesk deploys a special classifier set to construct a multi-factor event scan that is designed to identify stocks most likely to move higher (or lower if we were to go short) relative to their peers or a benchmark.

 

Evnt Study No Retraining QuantDesk™___Lucena_Research_Inc_

An example of an event study using three factors set to identify stocks from the Russell 1000 that are set to outperform the S&P (the benchmark). 

The three factors used in the scan are the product of a machine learning classifier that determines which factors with certain min/max thresholds will produce a list of securities most likely to outperform the S&P 500.

It’s important to note that the training period for this model is 1/1/2010 to 12/31/2012 (three years). The caveat is that we’ve produced a static model based on this specific timeframe. But what if the market conditions in a new unknown period (2013 for example) is drastically different compared to the market in the training period?

It would be nice to recreate relevant models (or re-identify the factors and their corresponding min/mx thresholds) every three months (for example) in order to accommodate the most recent market regime.

 

Evnt Study With Retraining QuantDesk™___Lucena_Research_Inc_

An example of an event study with retraining enabled to identify stocks from the Russell 1000 that are set to outperform the S&P (the benchmark). 

As can be seen in the image above the retraining checkbox is selected, opening a new panel with the following retraining configuration options (identified numerically):

  1. 1. Retrain checkbox – enable retraining and display the additional retraining configuration
  2. 2. Frequency – set how often should a retraining be conducted
  3. 3. Max indicators – set the maximum number of factors per model
  4. 4. Retention – set the minimum number of signals that the model should support during retraining. This dictates how stringent or relaxed the model is
  5. 5. Feature group selection – select which group of features the model should consider
  6. 6. Freeze panel – force the model to use certain features and their corresponding min/max thresholds. This option allows the user to force the engine to include a feature with or without its min/max threshold, allowing human discretion to override the machine’s logic.

Once the above configuration has been set, QuantDesk assumes the retraining logic to construct models dynamically per the configuration guidelines. This process occurs during backtests and into the future through our Model Portfolios.

Below is a backtest performance chart of a strategy predicated on signals from one of our consumer purchasing intent providers, Cognovi Labs, compared against the XRT (S&P Retail SPDR ETF) as a benchmark.

 

Cognovi_Long_Retraining_50

A dynamic backtest predicated on consumer intent and emotion data signals (provided by Cognovi Labs) compared to XRT. Past performance is not indicative of future returns.

As you can see, the results are very compelling. It’s also important to note that it’s much harder to overfit with dynamic roll forward models. This is because overfitting will result in clear degradation in performance as soon as the model attempts to forecast out of sample.

It is statistically improbable to stumble upon a model that performs well out of sample time and time again sequentially. In other words, the retrained models results are substantially more reliable and production ready.

Lastly, as you can glean from the image below, QuantDesk provides full transparency on each  model, when it was used, and which specific factors and thresholds it used.

 

Model Information Cognovi_Long_Retraining_50

Model information delineates which factors were used at any given time. As you can see, the models vary in feature composition and min/max thresholds for different time frames.

 

Utilizing Dynamic Retraining for Quantitative Investing

Dynamic retraining is an important aspect of AI & Big Data for quantitative investing. Dynamic training provides more reliable models that can be tested historically and carried forward on QuantDesk without writing a single line of code. QuantDesk is a user-friendly platform that empowers investment professionals to form and test investment strategies. You can get your free trial of QuantDesk here

 

Questions about dynamically retraining models, machine learning or QuantDesk? Drop them below or contact us

Have a media inquiry or a topic you’d like to contribute to our blog?