Skepticism in the Alternative Data revolution

  • By Paul Wilcox
  • March, 13
Blog Skepticism in the Alternative Data revolution

Erez Katz, CEO and Co-founder of Lucena Research

It’s exciting to see investment professionals starting to integrate machine-learning and alternative data in a substantial way. The market is finally warming up to and accepting what we have been advocating all along. In the past year, Lucena was hired by some of the world’s largest data companies and we can’t wait to bring to market new products predicated on these newly formed relationships.

The challenges of a technology disruptor

Let’s face it. Leading an emerging technology startup is tough! If you survey CEOs of successful financial solutions companies, you will find a universal consensus of the biggest challenge they had to overcome – skepticism.

The good news is that once your addressable market starts to trust your motivations and realize your value proposition, things get a lot easier and your business experiences accelerated growth.

A quote from the famous 18th century German philosopher Arthur Schopenhauer sums it beautifully: 

“All truth passes through three stages: First, it is ridiculed. Second, it is violently opposed. Third, it is accepted as self-evident.”

Machine learning and data science often capture headlines, but surprisingly most investment professionals still don’t adopt this important concept in their investment approaches. In addition, there is an abundance of misinformation which makes it harder to distinguish firms that actually deliver value from the noise.

According to, in 2018 the total spent on alternative data for investment worldwide was only $656M. However, this value is expected to triple by 2020. The financial market is finally ready to embrace AI in a profound way.


Total buyside spend on alternative data Source

Video: How Alt Data Can Forecast Prices and KPIs 

Lucena’s role in empowering investment professionals with AI and alt data

I wanted to share with you some of the most common questions we face when presenting our solutions to data providers, buyside analysts, and portfolio managers.  Many of the questions stem from skepticism and I’d like to use this medium to answer them the best I can.

To set the stage, let me explain succinctly what we do:

Lucena Research connects data providers with investment professionals looking to deploy predictive analytics. We partner with data providers to empirically validate and enhance their data. In addition, we develop algorithms based on alt data that form, validate, and enhance, investment ideas through advanced data science and machine learning.



Why not start your own hedge fund? Why give your secret sauce away?


This is undoubtedly the most common question we face and the answer consists of multiple parts.

  • There is a vast difference between running a technology business and managing a hedge fund. It’s really a function of where your talent and passion lies. We are technologists who love to empower our clients with the most advanced predictive analytics technology. We prefer to allow others who are more experienced and accredited with the responsibilities of raising capital, managing portfolios, and dealing with compliance.


  • We are excited to partner with experienced asset managers to help them drive live portfolios. In other words, we are happy to assume the upfront risk in order to enjoy the upside potential of a successful portfolio.


  • In a way, we are troubled by the motivation of the question. Is it due to lack of trust? I often wonder if we are held to a higher scrutiny than other professionals who provide investment advice or an opinion about a publicly traded company.  Naturally, Zach’s Investment or any other analyst could easily change their business model so that rather than providing buy/sell ratings on various companies, they use that knowledge exclusively to drive their own investment. The reason they stick with their traditional business model is very much like ours. Notwithstanding the important value of predictive analytics, it takes more than a predictive signal to run a profitable hedge fund.


  • In the context of being held to a higher standard, it’s important to set the right expectations from machine learning technology. The reality is that applying AI to predictive analytics is not a silver bullet. We have to remember that investment decisions powered by machine learning are based on statistical significance and just like traditional investment decisions, not every projection will turn successful. Although, we have shown empirically that with science, over time, the odds are very much in our favor.

For comparison, let’s look at deploying deep learning technology to other business domains, such as healthcare. Determining a medical condition via a machine learning classifier is normally achieved with homogeneous datasets and static models striving to achieve close to 100% accuracy score. After all, in healthcare we want to eliminate even one false negative since lives are at stake.

In contrast, financial models are trained to overcome a much lower accuracy threshold. Some say a 53% accuracy is enough to overcome market impact and transaction costs. The reason is the inherent noise in the data and the concept behind dynamic models that are continuously retrained and morphed to adjust to sentiment changes in the market.

The bottom line is that those who manage large portfolios for a living clearly know that it takes quite a bit more than a predictive signal to run a successful fund.

Risk management, transaction cost analysis, top down and bottom up research consolidation, and human intellect are just a few factors that drive sustainable profitable investments. Most of our clients use our output as one arrow in their quiver, while a smaller percentage use our output in a more holistic way.

Regardless, in most cases, we strive to extend rather than replace a manager’s investment decision process. For all the reasons outlined above, morphing Lucena into a hedge fund is not an attractive option for us at present time.



How do you measure your value add if your model is used as an overlay to other factors?


We measure the efficacy of a model based on its benchmark’s relative outperformance.

Our platform is capable of simulating how a predictive model translates to performance both in the past and perpetually into the future. Both backtest and model portfolio simulations generate comprehensive performance reports that measure the models’ output in terms of total returns, Sharpe ratio, beta, volatility, information ratio, R squared, risk factor, and many other attribution analysis.

Some of our customers run side-by-side simulations — their original portfolios against their respective enhanced portfolios — in order to make a fair assessment of if our algorithms indeed add value.


Long/short trading strategy


An excerpt from Lucena’s model portfolio Tiebreaker performance report. Tiebreaker has been traded perpetually (this is not a backtest) since 2014. Trades are generated and published before market opens. Slippage, transaction costs, and short borrowing costs are assessed as well as dividends, splits, and reverse splits.


Models are often short lived – how do you prolong the lifespan of a model?


A successful model is eventually bound to be recognized and exploited by the masses into obsolescence. We strongly believe in deploying multiple models for a given strategy. A multi-strategy portfolio allows for the smart allocation of funds across uncorrelated models, each designed for a different market regime.

Another novel approach to prolonging the lifespan of your strategy is to deploy multiple models in the form of multi-expert ensemble voting, which embodies multiple uncorrelated models designed to vote on a predetermined constituent universe.


A multi-expert (multi-model) voting ensemble

The above is an example of a multi-expert (multi-model) voting ensemble. Votes are cast by multiple models and the assets with the highest vote count are selected for entry.

In general, supporting the notion of dynamically re-calibrating models by retraining them based on recent data, or introducing multiple models that dynamically rotate in and out of relevancy are common techniques designed to prolong the lifespan of a successful algorithm.

You can read more about how we fight commoditization of alt data here.  



I have never seen a bad backtest – how can you determine that a backtest is authentic? 


Backtests are notoriously prone to bias since they travel back in time where the outcome they try to simulate is already known. Needless to say, backtests on their own are not sufficient to determine the predictive value of a model.

There are, however, best practices designed to increase the reliability of a backtest. Here are just a few:

  • Minimize overfitting by backtesting new (unseen) time periods that don’t overlap the model’s training periods.

  • Consider dead stocks and historical index memberships that existed during the backtest periods to minimize survivorship bias.

  • Account for market impact, transaction costs, slippage, dividends, and splits in your backtest.

  • Reduce selection bias – attempting to run many backtests on the same timeframe and dialing various execution parameters will eventually yield a good looking backtest. Unfortunately, such a backtest is setting you up for selection bias as the results are most likely unsustainable into the future.

  • Allocate in advance a holdout period. It’s important to always identify a holdout period in which a successful backtest should be validated one more time against.

  • Most importantly, you want to validate your model on a roll forward basis into the future. A model portfolio that carries the backtest execution guidelines into the future is the closest simulation of a real live portfolio. Below is a visual representation of how at Lucena we break the analysis time frame into 4 distinct (not overlapping) periods. A model is retrained and subsequently validated in a hold-out period and ultimately perpetually into the future.

How to train a machine learning model



With all the imperfections of applying AI technology for investment, the question remains: Am I better off with advanced data science and predictive analytics?

The answer is absolutely and unequivocally: Yes!

In a world that thrives on secrecy and lack of information sharing, Lucena is excited to be in a position that allows us to share our experience and best practices, while still holding on to our intellectual property, methodologies, and techniques.

I am certain that many of the answers provided may lead to additional questions which I am happy to answer either in private or in the comments below. Feel free to reach out or contact us. 


Interested in learning more about our offerings? View them here. 



Have a media inquiry or a topic you’d like to contribute to our blog?