17 Comments
User's avatar
Soleil Investing's avatar

By removing the entire column of stocks with prices under $5, aren’t you reintroducing survivorship bias? I agree with your premise of not trading stocks under $5. I have traditionally solved this by adding this condition when the buy signal has happened.

Expand full comment
Quantitativo's avatar

Hi! This is an excellent point! If you notice, we are removing penny stocks with:

raw = raw[raw['unadj_close'] > 5]

We are using the UNADJUSTED close. Because of that, we are ok.

If we were using the adjusted close to remove penny stocks, we would have introduced a subtle bias.

But that's not the case :)

Expand full comment
Soleil Investing's avatar

Large companies that eventually go out of business are likely to be a penny stock at some point in that process. Even if they don't go out of business and shrink to a penny stock. This would then remove them.

Expand full comment
Mendel Friedman's avatar

he's not removing the symbol from the dataframe, only rows when price < 5. I don't think this would be an issue...

Expand full comment
Soleil Investing's avatar

Sorry, yes, I see that now. I confused the line after it dropping the columns.

Expand full comment
Bogdan Calin's avatar

Nowadays, auto-encoders systems with pretrained RBMs are not used anymore? To me, a beginner in machine learning it sounded like a good idea to train a model to reproduce the original input, forcing the model to compress the original features from 33 to 4 and then back to 33. A FNN like you used does not do that, right? It just goes from 33 to 4 and then you train it to predict 2 labels. The old technique is not more robust?

Expand full comment
Quantitativo's avatar

Nowadays, we don't need to pretrain RBMs. The current technology now allows us to do everything in one pass, which is what I meant :)

Expand full comment
derickson's avatar

Thanks for the great write up and research. This idea of replicating papers and toy projects is really interesting and you showing your process and explaining step by step has motivated me to do more.

I just recently finished my first time replicating a trading paper on deep learning and they implemented early stopping in the training process. It terminates training based on some patience, say 10, meaning if after 10 epochs the val loss has not improved it restores the best weights and moves on. You probably considered this yourself, so I would be interested why you didn't use it here, as it can save compute time. Looking forward to the next write up on the new trading system.

Expand full comment
Quantitativo's avatar

Thanks! If you look at the code, yes, the process gets the weights for the best validation loss' epoch

Expand full comment
derickson's avatar

What I mean is that for example with the several years where the first epoch is the best the early stopping would have saved you running through all 100 epochs because after there is no improvement in the 10 epochs following the first it terminates and does not run the other 89 epochs. I don't have much experience, so I can't say how common that is. It was just mentioned in the paper by Fischer and Krauss that I "rreplicated". How I understand your code works is that it runs through all 100 epochs no matter what. But maybe I am misunderstood something.

Expand full comment
Quantitativo's avatar

yes, it runs through all 100 epochs but it restores the weights from the best epoch in the validation loss... I could cut after 5-10 epochs if the val loss don't improve, but I didn't implement that (laziness :))

But the final result is exactly the same :)

Expand full comment
Paper to Profit's avatar

Amazing stuff. I guess throwing a neural network at the stock market works after all...

Great work.

Expand full comment
Quantitativo's avatar

Thanks!

Expand full comment
RG's avatar

Are the correlation values in the tables flipped? should the correlation value for S&P be 1.0 and the Strategy -0.14 ?

Great write up, as always. I wonder if there are similar papers, methods for Futures markets.

Expand full comment
Quantitativo's avatar

Depends on how you read it... the important is that -0.14 means it is uncorrelated :)

The paper:

Slow Momentum with Fast Reversion: A Trading Strategy Using Deep Learning and Changepoint Detection, from Kieran Wood, Stephen Roberts, and Stefan Zohren

is for futures. I will implement it and share whenever I find time (this is a bit longer project, though :))

Thanks!

Expand full comment
Mendel Friedman's avatar

Accidently deleted my comment so reposting:

"This article and detailed code is very helpful. It is nice to see how others go about writing an ML test framework. It can be daunting to put all the code out there and have everyone nitpick your implementation. That being said, it seems there is a very slight look ahead bias where you standardize features into z-scores using the entire datasets mean and std. Wouldn't it be better to use cumulative mean and std to ensure we standardize only on data in the past?"

Expand full comment
User's avatar
Comment deleted
4h
Comment deleted
Expand full comment
Quantitativo's avatar

Thanks! Yes, you can do that. I've done this exercise in the past, many years ago (only standardizing with previous returns data), and my conclusion is that it doesn't influence the end result, provided that the train dataset is sufficiently large enough (which is the case here). In my past exercises & experience, doing that adds a ton of code/complexity, but the end result is the same. So, I don't add that anymore.

But it's a good catch, you have a good eye :)

Expand full comment