Thanks for the post, great ideas! There are obviously many papers out there, some are good, some are not. Just wonder how did you distinguish those that are insightful vs those that does not worth time to validate?
Great write up. I’m interested in more details of the engineering challenges you faced and how you overcame them? I’m currently implementing this myself and while results are positive, I’m not seeing the performance in the paper yet.
Thanks! Back in grad school, I had the privilege to learn Machine Learning, Reinforcement Learning & Decision Making from the great Charles Isbell and Michael Littman. Here's Lex Fridman interviewing them:
Prof Isbell nails it by minute 9: "In Machine Learning problems, data is more important than the algorithm. Don't get distracted by the algorithm. Focus on the data."
I tried testing deep momentum, using both the neural network from the first paper and the second (XGBoost).
I repeated the test on the SP500 and Russell 3000.
The long-short version always performs horribly. The shorts destroy the strategy, as was to be expected in a market that has risen like the US one.
The long-only version adds value, but the max drawdown is enormous. The algorithm is unable to overcome the bimodality problem.
I only tested the equal weight version, using Norgate data.
If I then use a database that also includes companies that have gone bankrupt and dropped from the index, the performance is disastrous in any case. I charged a mere 5 bps per trade for commissions.
why not use machine learning regression to predict the expected return? Instead this paper does this in two steps (1) predict the bucket as a classification, (2) use the probability of the buckets to estimate expected return. Why is this approach better?
AFAIK, a standard regression would generate a point estimate. A Bayesian regression would be better exactly because it would generate an estimate with confidence intervals. And this is the big idea.
In other words... Imagine you have a Machine Learning regressor, and that model outputs the following estimates for 3 different stock returns:
- Stock A: +1% predicted return, with undefined/unknown confidence interval
- Stock B: +1% predicted return, with +/-3% confidence interval
- Stock C: +1% predicted return, with +/-0.1% confidence interval
Which stock would you prefer? Stock C, right?
IMHO, this is the big idea behind the article. (The implementation details don't matter much: the article is not using Bayesian regression, it's framing the problem as a classification... but that's just a detail... I think you got my point :))
I believe yes, I could make it work for futures… the obvious challenges would be #1 how to train the model with far less data, and #2 the shallow cross-section.
In Bayesian regression (and probabilistic machine learning in general), every prediction gets its own personalized uncertainty… this is one of its most powerful advantages over classical regression :)
You asked a great question... it reminded me of a similar (slightly harder) problem. Back in another life, I contributed to Pyro framework (probabilistic Machine Learning framework created by ex Uber engineers)... I wrote the Conditional Variational Auto-encoder tutorial, which solves the problem of lack of probabilistic inference of standard ML models. I think it's a good example related to the question:
I’ve been working through the same paper and implemented my interpretation of the strategy. My results come out a bit differently:
XGB: ~18.2% annual return, Sharpe 1.75
RET: ~29.2% annual return, Sharpe 1.91
(Data: Norgate Russell 3000 Current & Past)
I'll shortly create a Substack post on my implementation too.
The main hesitation for me is the execution. Running top/bottom deciles means holding 300 longs and 300 shorts, rebalanced monthly. That’s a lot of names to manage in practice and not quite in my comfort zone.
I did try the smaller universes but didn't have much luck.
Hello! May I ask what service you are using for live data for equities? I believe your broker is Interactive Brokers, correct, but you are probably using a different data source for the live signal.
Thank you for the reply! Also, for each of your trading strategies, are you running them on separate subaccounts in Interactive Brokers, or all in one main account? I can see some advantages and disadvantages for both. From a bookkeeping perspective, it might be easier to manage each strategy with different subaccounts, but you're losing out on leverage. I guess it's not a big problem if you don't max out the leverage anyway.
I run many accounts and many strategies within each account. Throughout the years, I developed my own bookkeeping system. I use that as my primary system to control strategies at scale. I can write about that. In fact, your question was great, thanks for asking!
Thanks a lot for the reply! And yeah it would be awesome to read an article from you about the logistics side and infrastructure because the more I think about this the more it feels like you can go deep into a rabbit hole just figuring out this stuff without even touching the strategy side !
Thanks for the post, great ideas! There are obviously many papers out there, some are good, some are not. Just wonder how did you distinguish those that are insightful vs those that does not worth time to validate?
Thanks! I think after reading the first 500-1k papers, everything gets easier :)
Great write up. I’m interested in more details of the engineering challenges you faced and how you overcame them? I’m currently implementing this myself and while results are positive, I’m not seeing the performance in the paper yet.
Thanks! Back in grad school, I had the privilege to learn Machine Learning, Reinforcement Learning & Decision Making from the great Charles Isbell and Michael Littman. Here's Lex Fridman interviewing them:
https://www.youtube.com/watch?v=yzMVEbs8Zz0
Prof Isbell nails it by minute 9: "In Machine Learning problems, data is more important than the algorithm. Don't get distracted by the algorithm. Focus on the data."
Hope it helps ;)
*Statistics keeps you from lying to yourself.. 🥂
Just to share my experience.
I tried testing deep momentum, using both the neural network from the first paper and the second (XGBoost).
I repeated the test on the SP500 and Russell 3000.
The long-short version always performs horribly. The shorts destroy the strategy, as was to be expected in a market that has risen like the US one.
The long-only version adds value, but the max drawdown is enormous. The algorithm is unable to overcome the bimodality problem.
I only tested the equal weight version, using Norgate data.
If I then use a database that also includes companies that have gone bankrupt and dropped from the index, the performance is disastrous in any case. I charged a mere 5 bps per trade for commissions.
Yeah, this one is hard. Making this work can easily take months of work and many, many, many iterations :)
why not use machine learning regression to predict the expected return? Instead this paper does this in two steps (1) predict the bucket as a classification, (2) use the probability of the buckets to estimate expected return. Why is this approach better?
Hi Dyutiman!
AFAIK, a standard regression would generate a point estimate. A Bayesian regression would be better exactly because it would generate an estimate with confidence intervals. And this is the big idea.
In other words... Imagine you have a Machine Learning regressor, and that model outputs the following estimates for 3 different stock returns:
- Stock A: +1% predicted return, with undefined/unknown confidence interval
- Stock B: +1% predicted return, with +/-3% confidence interval
- Stock C: +1% predicted return, with +/-0.1% confidence interval
Which stock would you prefer? Stock C, right?
IMHO, this is the big idea behind the article. (The implementation details don't matter much: the article is not using Bayesian regression, it's framing the problem as a classification... but that's just a detail... I think you got my point :))
also, do you think this could be applied to futures? i guess the cross section would be much smaller.
I believe yes, I could make it work for futures… the obvious challenges would be #1 how to train the model with far less data, and #2 the shallow cross-section.
I have an idea on how to solve both
the 3 stock example is clear. but wouldn't a regression also generate a confidence interval? I guess it'll be the same for all stocks?
In Bayesian regression (and probabilistic machine learning in general), every prediction gets its own personalized uncertainty… this is one of its most powerful advantages over classical regression :)
You asked a great question... it reminded me of a similar (slightly harder) problem. Back in another life, I contributed to Pyro framework (probabilistic Machine Learning framework created by ex Uber engineers)... I wrote the Conditional Variational Auto-encoder tutorial, which solves the problem of lack of probabilistic inference of standard ML models. I think it's a good example related to the question:
https://pyro.ai/examples/cvae.html
thanks, unfortunately my knowledge in this area is not good, but i will take a look.
Very interesting. Now if only there was an ETF that could replicate its methodology…
Sharp piece - really enjoyed it!
I’ve been working through the same paper and implemented my interpretation of the strategy. My results come out a bit differently:
XGB: ~18.2% annual return, Sharpe 1.75
RET: ~29.2% annual return, Sharpe 1.91
(Data: Norgate Russell 3000 Current & Past)
I'll shortly create a Substack post on my implementation too.
The main hesitation for me is the execution. Running top/bottom deciles means holding 300 longs and 300 shorts, rebalanced monthly. That’s a lot of names to manage in practice and not quite in my comfort zone.
I did try the smaller universes but didn't have much luck.
Did you also use the Russell 3000 Current & Past?
And one more infrastructure question, are you using AWS for live deployment? or a VPS?
AWS all the way
Hello! May I ask what service you are using for live data for equities? I believe your broker is Interactive Brokers, correct, but you are probably using a different data source for the live signal.
Yes, IB to trade. Data: Norgate, Massive (formerly Polygon), Databento, sec-api.io, Sharadar are the main ones
Thank you for the reply! Also, for each of your trading strategies, are you running them on separate subaccounts in Interactive Brokers, or all in one main account? I can see some advantages and disadvantages for both. From a bookkeeping perspective, it might be easier to manage each strategy with different subaccounts, but you're losing out on leverage. I guess it's not a big problem if you don't max out the leverage anyway.
I run many accounts and many strategies within each account. Throughout the years, I developed my own bookkeeping system. I use that as my primary system to control strategies at scale. I can write about that. In fact, your question was great, thanks for asking!
Thanks a lot for the reply! And yeah it would be awesome to read an article from you about the logistics side and infrastructure because the more I think about this the more it feels like you can go deep into a rabbit hole just figuring out this stuff without even touching the strategy side !
Did you use deep momentum on SP500 stocks in your backtest? Or all American stocks? Where do you get your data from? How many stocks in total?
I use several data sources. Norgate is a great data provider
What universe do you use? What stock filters do you use? If you can be more specific about this.
Try different things. See what works
It was just to check your results. But if you feel it's appropriate to keep it secret, no problem.