By removing the entire column of stocks with prices under $5, aren’t you reintroducing survivorship bias? I agree with your premise of not trading stocks under $5. I have traditionally solved this by adding this condition when the buy signal has happened.
Large companies that eventually go out of business are likely to be a penny stock at some point in that process. Even if they don't go out of business and shrink to a penny stock. This would then remove them.
Nowadays, auto-encoders systems with pretrained RBMs are not used anymore? To me, a beginner in machine learning it sounded like a good idea to train a model to reproduce the original input, forcing the model to compress the original features from 33 to 4 and then back to 33. A FNN like you used does not do that, right? It just goes from 33 to 4 and then you train it to predict 2 labels. The old technique is not more robust?
Thanks for the great write up and research. This idea of replicating papers and toy projects is really interesting and you showing your process and explaining step by step has motivated me to do more.
I just recently finished my first time replicating a trading paper on deep learning and they implemented early stopping in the training process. It terminates training based on some patience, say 10, meaning if after 10 epochs the val loss has not improved it restores the best weights and moves on. You probably considered this yourself, so I would be interested why you didn't use it here, as it can save compute time. Looking forward to the next write up on the new trading system.
What I mean is that for example with the several years where the first epoch is the best the early stopping would have saved you running through all 100 epochs because after there is no improvement in the 10 epochs following the first it terminates and does not run the other 89 epochs. I don't have much experience, so I can't say how common that is. It was just mentioned in the paper by Fischer and Krauss that I "rreplicated". How I understand your code works is that it runs through all 100 epochs no matter what. But maybe I am misunderstood something.
yes, it runs through all 100 epochs but it restores the weights from the best epoch in the validation loss... I could cut after 5-10 epochs if the val loss don't improve, but I didn't implement that (laziness :))
Depends on how you read it... the important is that -0.14 means it is uncorrelated :)
The paper:
Slow Momentum with Fast Reversion: A Trading Strategy Using Deep Learning and Changepoint Detection, from Kieran Wood, Stephen Roberts, and Stefan Zohren
is for futures. I will implement it and share whenever I find time (this is a bit longer project, though :))
"This article and detailed code is very helpful. It is nice to see how others go about writing an ML test framework. It can be daunting to put all the code out there and have everyone nitpick your implementation. That being said, it seems there is a very slight look ahead bias where you standardize features into z-scores using the entire datasets mean and std. Wouldn't it be better to use cumulative mean and std to ensure we standardize only on data in the past?"
Thanks! Yes, you can do that. I've done this exercise in the past, many years ago (only standardizing with previous returns data), and my conclusion is that it doesn't influence the end result, provided that the train dataset is sufficiently large enough (which is the case here). In my past exercises & experience, doing that adds a ton of code/complexity, but the end result is the same. So, I don't add that anymore.
By removing the entire column of stocks with prices under $5, aren’t you reintroducing survivorship bias? I agree with your premise of not trading stocks under $5. I have traditionally solved this by adding this condition when the buy signal has happened.
Hi! This is an excellent point! If you notice, we are removing penny stocks with:
raw = raw[raw['unadj_close'] > 5]
We are using the UNADJUSTED close. Because of that, we are ok.
If we were using the adjusted close to remove penny stocks, we would have introduced a subtle bias.
But that's not the case :)
Large companies that eventually go out of business are likely to be a penny stock at some point in that process. Even if they don't go out of business and shrink to a penny stock. This would then remove them.
he's not removing the symbol from the dataframe, only rows when price < 5. I don't think this would be an issue...
Sorry, yes, I see that now. I confused the line after it dropping the columns.
Nowadays, auto-encoders systems with pretrained RBMs are not used anymore? To me, a beginner in machine learning it sounded like a good idea to train a model to reproduce the original input, forcing the model to compress the original features from 33 to 4 and then back to 33. A FNN like you used does not do that, right? It just goes from 33 to 4 and then you train it to predict 2 labels. The old technique is not more robust?
Nowadays, we don't need to pretrain RBMs. The current technology now allows us to do everything in one pass, which is what I meant :)
Thanks for the great write up and research. This idea of replicating papers and toy projects is really interesting and you showing your process and explaining step by step has motivated me to do more.
I just recently finished my first time replicating a trading paper on deep learning and they implemented early stopping in the training process. It terminates training based on some patience, say 10, meaning if after 10 epochs the val loss has not improved it restores the best weights and moves on. You probably considered this yourself, so I would be interested why you didn't use it here, as it can save compute time. Looking forward to the next write up on the new trading system.
Thanks! If you look at the code, yes, the process gets the weights for the best validation loss' epoch
What I mean is that for example with the several years where the first epoch is the best the early stopping would have saved you running through all 100 epochs because after there is no improvement in the 10 epochs following the first it terminates and does not run the other 89 epochs. I don't have much experience, so I can't say how common that is. It was just mentioned in the paper by Fischer and Krauss that I "rreplicated". How I understand your code works is that it runs through all 100 epochs no matter what. But maybe I am misunderstood something.
yes, it runs through all 100 epochs but it restores the weights from the best epoch in the validation loss... I could cut after 5-10 epochs if the val loss don't improve, but I didn't implement that (laziness :))
But the final result is exactly the same :)
Amazing stuff. I guess throwing a neural network at the stock market works after all...
Great work.
Thanks!
Are the correlation values in the tables flipped? should the correlation value for S&P be 1.0 and the Strategy -0.14 ?
Great write up, as always. I wonder if there are similar papers, methods for Futures markets.
Depends on how you read it... the important is that -0.14 means it is uncorrelated :)
The paper:
Slow Momentum with Fast Reversion: A Trading Strategy Using Deep Learning and Changepoint Detection, from Kieran Wood, Stephen Roberts, and Stefan Zohren
is for futures. I will implement it and share whenever I find time (this is a bit longer project, though :))
Thanks!
Accidently deleted my comment so reposting:
"This article and detailed code is very helpful. It is nice to see how others go about writing an ML test framework. It can be daunting to put all the code out there and have everyone nitpick your implementation. That being said, it seems there is a very slight look ahead bias where you standardize features into z-scores using the entire datasets mean and std. Wouldn't it be better to use cumulative mean and std to ensure we standardize only on data in the past?"
Thanks! Yes, you can do that. I've done this exercise in the past, many years ago (only standardizing with previous returns data), and my conclusion is that it doesn't influence the end result, provided that the train dataset is sufficiently large enough (which is the case here). In my past exercises & experience, doing that adds a ton of code/complexity, but the end result is the same. So, I don't add that anymore.
But it's a good catch, you have a good eye :)