logoalt Hacker News

giancarlostoroyesterday at 5:53 PM3 repliesview on HN

Reminds me when the ELK stack was called just ELK (idek what it is now) we had a server we put it on, and after making the additional dashboards my manager wanted, we learned the limits of ES / ELK. It needs a ridiculous amount of memory, because it will shove everything in memory. Same thing when I learned that MongoDB indexing puts every item in memory as well, which is a yikes, why would you not want to index?

I bet there's money to be made for building a drop-in to either of those two that requires less memory, would save companies a bundle, and make other companies a bundle as well.


Replies

hilariouslyyesterday at 7:09 PM

There's no high performance database that wont take all of your memory (at least for size of data) if you let it.

That's because it's much, MUCH faster to do it that way, though if you can deal with certain type of latency trade offs for throughput something like turbopuffer can do wonders for your costs.

show 1 reply
Yokohiiiyesterday at 9:03 PM

Production grade multi tenant databases want to *solely* run on RAM.

> why would you not want to index?

Because if you don't need an index it wastes RAM, as you've learned. Maintaining indices also has a cost. Index only what you need.

In the sense of the blog post: A senior with decent DB experience would have told you. ;)

show 1 reply
Izkatayesterday at 10:12 PM

For anything Lucene-based (Elasticsearch, Solr) this was a problem where some of the indexed data had to be transformed for another type of query to be efficient, and it put the transformed data into the Java heap then never released it. I think it was indexed data for searching was read straight from disk and was fine, but analysis queries needed the transformed version?

At some point they added the docValues configuration option per-field to do the transformation during indexing and store it to disk instead, so none of it had to be stored in the heap. Instead what you're supposed to do is rely on the OS disk cache, which handles eviction automatically, so you can run with significantly less memory but get performance improvements by adding memory without having to change any configuration further.