19 Comments

Fascinating study. It gave me some ideas I'd like to apply to commodities as well. Your study involved lots of calculations.. can you describe some of the tech stack elements you use to process the data effectively? Do you do some database-side processing like in duckdb, etc?

Expand full comment

Sorry for the late reply, working like crazy these days... I tend to not overcomplicate my tech stack, keeping things as simple as possible.

I mostly code in Python, and have developed programs to:

- Backtest my ideas (an event-based system I've been developing myself since my master's degree many many years ago)

- Forward test (I use IBKR and their native Python API)

- Live trade (same as above)

I mostly use Norgate data, which has a great integration with Python.

You mentioned DB... I prefer using Postgres (never used DuckDB).

In Python, the most important libraries to me are Numpy, Pandas, Scipy, Talib, Celery, and Matplotlib, on top of the standard library (I used threads/concurrency a lot).

Expand full comment

It's a great article introducing how we can reconcile financial domain knowledge with statistical thinking. Thank you for your great work!

Expand full comment

Such a good read. Can you suggest any framework to perform backtestings like yours?

Expand full comment

I'd suggest a combination of a commercial backtester (like RealTest) and either something you develop yourself or an open-source option (like Zipline).

Using 2 helps me verify my code is working as intended..

Expand full comment

Thanks a lot man.

Expand full comment

Can you suggest any good resources for those interested in ML but with no experience?

Expand full comment

Assuming you have a solid base in math (probability, statistics, linear algebra, and a bit of calculus), you could read any of the books I mentioned. Also, you could practice in Kaggle (if you have a bit more free time).

If you are serious about it, I'd strongly recommend a Master's in Computer Science (which is what I did many years ago :))

Cheers!

Expand full comment

Very interesting article. In the last version you say that you use a time limit of 6 days, but the summary says that the max trade duration is 14 days, am I missing something? Thanks for sharing your work

Expand full comment

Thanks! Several small details might impact the execution and prevent us from getting out of the trade when the time limit triggers the sell order... so, sometimes, a few orders end up taking a bit longer to get executed

Expand full comment

Great article. I too use ML to trade and your article gave me a few ideas that I can try out in my own strategies. Appreciated

Expand full comment

Thanks Martyn! That was exactly my intention! I'm glad it helped! Cheers!

Expand full comment

I’m really enjoying your material. This was above my head for the most part but interesting to read nonetheless.

Expand full comment

Thanks Cory!

Expand full comment

Nice !!! This look incredible, I have built something similar , but it’s only based on fundamental datas , I got something similar around (2570% in the last 10 years). I am really curious how is that QPI tool build, I have thought about something like this( some moving avg - std)/volatility, but wasn’t looking that great..

Expand full comment

very innovative indeed. i have a technical question. how to estimate the prob of bouncing back? The ml model only outputs 0 and 1

Expand full comment

Thanks! And no, that’s wrong: the ML outputs a float number between 0 and 1. Cheers!

Expand full comment

yeah, i see. i should use predict_proba() in the xgb_clf, which outputs the probability for the binary classification.

Expand full comment

Awesome read!, what were the evaluation scores of the xgboost model? Also was it one model or one model per ticker?

Expand full comment