One has to wonder if this due to the global memory shortage. ("Oh - changing our memory allocator to be more efficient will yield $XXM dollar savings over the next year").
On top of cost, they probably cannot get as much memory as they order in a timely fashion so offsetting that with greater efficiency matters right now.
Yeah, identifying single-digit millions of savings out of profiles is relatively common practice at Meta. It's ~easy to come up with a big number when the impact is scaled across a very large numbers of servers. There is a culture of measuring and documenting these quantified wins.
With the reputation of that company, one can wonder a lot of backstories that are even more depressing than a memory shortage.
Not just shortage, any improvements to LLMs/electricity/servers memory footprint is becoming much more valuable as the time goes. If we can get 10% faster, you can easily get a lead in the LLM race. The incentives to transparently improving performance are tremendous
Oooh maybe finally time for lovingly hand-optimized assembly to come back in fashion! (It probably has in AI workloads or so I daydream)
> changing our memory allocator
they've been using jemalloc (and employing "je") since 2009.
Facebook had talks already years ago (10+) - nobody was allowed to share real numbers, but several facebook employed where allowed to share that the company has measured savings from optimizations. Reading between the lines, a 0.1% efficiency improvement to some parts of Facebook would save them $100,000 a month (again real numbers were never publicly shared so there is a range - it can't be less than $20,000), and so they had teams of people whose job it was to find those improvements.
Most of the savings seemed to come from HVAC costs, followed by buying less computers and in turn less data centers. I'm sure these days saving memory is also a big deal but it doesn't seem to have been then.
The above was already the case 10 years ago, so LLMs are at most another factor added on.