Why You Should Be Using A Genetic Algorithm for Feature Selection

  • By Paul Wilcox
  • January, 18
Blog Why You Should Be Using A Genetic Algorithm for Feature Selection
  • Erez Katz, CEO and Co-founder of Lucena Research

    How a Genetic Algorithm (GA) Can Benefit Feature Selection

       

    Our goal at Lucena is to democratize some of the best kept secrets in the Financial industry and refute the “black-box” image associated with Machine Learning. In that spirit, I wanted to share with you an important process by which features are designated as most relevant to a particular asset universe and an investment strategy.

    Feature Selection for Investment Strategies

    As we’ve incorporated multiple big data sources and datasets into our investment research platform QuantDesk, our master feature database has grown to almost 1,000 indicators. Indicators are data elements that describe a security at a point in time. Examples of indicators can be found in the chart below.

     Feature selection for investment strategies

     

    A factor, also called a feature, is a quantitative attribute that describes a security at a given point in time.

    With such a rich array of data points, we often struggle with deciding which indicators / models are most relevant at any particular time. Not all indicators are created equal, nor are they designed to be predictive at all times.

    An effective Machine Learning algorithm “knows” how to adjust dynamically to environmental or idiosyncratic changes during the feature selection process. A Genetic Algorithm (GA) is a technique that can be used to employ a scientific process of feature selection to help distinguish between predictive signals and noise.

    What is a Genetic Algorithm in the Context of Big Data and Machine Learning?

    Genetic algorithm (GA) is a problem solving method that mimics the process of natural selection. When utilizing Machine Learning for investment decisions, factors that are most relevant to your needs can be filtered from a wide list of indicators by replicating the process of natural evolution. The only difference is that rather than dealing with DNA and chromosomes we are dealing with indicators and multi-factor models.

    Survival of the Fittest: How to Form the Best Feature Selection Method

    The goal is to identify “nuggets.”  A nugget is a multi-factor model composed of multiple indicators and their respective min/max values that together form a filter geared to identify the securities most prone to move predictably in the future. Here is an example of a multi-factor model. 

    Multi-Factor model for machine learning 

    We can easily conduct a fitness function (an event study, for example) to assess how predictive these conditions were historically. For example: let’s travel back in time (let’s say 1/1/2011 to 12/31/2011) and assess the average price move 20 days after certain stocks met the following condition:

    • Gross margins are between 45% and 85%
    • PE ratio is between 15 and 25%
    • Beta is between 0.75 and 1.5

     

    Event driven investment strategy

     

    Using the Event Analyzer, the event date represents the date in which certain securities satisfied the multi-factor (nugget) criteria. The cone represents the standard deviation of the price action of the universe of the matching stocks after the “event” took place.

    The bold line is the price prediction based on the mean. A fitness function would normally assess a more defined (biased) mean line combined with a narrower cone (smaller variance as defined by the standard deviation).

    Now let’s dive into the GA process. 

    What does the Genetic Algorithm process do? Two things: 

    – Identifies which indicators to combine into a nugget.

– Measure the fitness score of the nugget.

Here is the Genetic Algorithm process step-by-step:

Step 1: Generate random population. (Indicators are represented by letters.)

build a genetic algorithm

 

Step 2: Evaluate each nugget based on a fitness function.

How to form feature selection for investment strategies

 

Step 3: Sort the nuggets based on their fitness score.

feature selection for machine learning

 

Step 4: The best two nuggets survive to participate in the next evolution.

How to build a genetic algorithm for feature selection

 

Step 5: Form the next generation of nuggets by selecting nuggets randomly. This time, however, we favor the indicators that scored higher in the previous evolution’s fitness evaluation. 

building a genetic algorithm for investment strategies

 

Step 6: Sort the next generation based on fitness function and the best two nuggets that survived. 

feature selection using genetic algorithm 

Repeat the process above (steps 1 through 6) until you witness that a single nugget consistently remains in first place. You can now identify the “lone survivor” ready for further analysis and refinement before moving into production.

Why test AI for forming investment strategies?

The above process was greatly simplified for illustration, but you can see how vast the opportunities are to apply GA’s throughout your quantitative investment research.

The GA process covers an important step in machine learning research, which is Feature Selection. The process of selecting features most suitable for a strategy is a dynamic classification that knows how to adjust to change in market conditions which is highly relevant for our current market regime.

 

Interested in learning more about our AI driven investment strategies?

Let’s Talk. 

  

Have a media inquiry or a topic you’d like to contribute to our blog?