> A year ago, I naively wrapped the API and certainly felt this pain.
Most people, before being confronted to it, have no idea how big market data feeds really are: I certainly had no idea what I was getting into. There's a reason all these subscriptions are that pricey.
Here's an example of the pricing for the OPRA feed for Databento you mentioned:
https://databento.com/pricing#opra
We're talking about feeds that sustain 25+ Gb/s and can have spikes at twice or even three times that. And that's only for options market data.
I mean: even people with 25 GB/s fiber (which we can all agree ain't the most common and that's an understatement) at home still can't dream of getting the entire feed.
Having a bandwith big enough, storing, analyzing such amount of data: everything becomes problematic as such scales.
As to me I'm reusing my brokers' feeds (as I already pay for them): it's not a panacea but I get the info I need (plus balances/orders/etc. tied to my accounts).
I used the word firehose, which is typically reserved for streaming data and is a whole other very interesting problem space! Big Data vs Fast Data.
I’m just noting for interest that firms are applying transformers and other networks at this streaming microstructure level, but specially trained for feature extraction. HRT + Nvidia have some nice videos about it
I will also note that it is insane how much better all the LLMs are at calling MCP tools after just a year, especially the local ones.
One of the reasons I like DuckDB is the scale flexibility. I started with grabbing data and playing on my laptop, then I jumped to a server with high cores and a NAS.