logoalt Hacker News

alyxyayesterday at 11:18 PM2 repliesview on HN

I expect the trend of large machine learning models to go towards bits rather than operating on floats. There's a lot of inefficiency in floats because typically they're something like normally distributed, which makes the storage and computation with weights inefficient when most values are clustered in a small range. The foundation of neural networks may be rooted in real valued functions, which are simulated with floats, but float operations are just bitwise operations underneath. The only issue is that GPUs operate on floats and standard ML theory works over real numbers.


Replies

hrmtst93837today at 9:17 AM

Inference at low bit-widths is easy. Training is where the wheels come off, because you spend the saved math budget on gradient tricks and rescaling just to stop the model from drifting.

That trade loses outside tight edge deploymints. Float formats stuck around for boring reasons: they handle ugly value ranges and they fit the GPU stack people already own.

cubefoxtoday at 3:31 AM

> and standard ML theory works over real numbers.

This paper uses binary numbers only, even for training, with a solid theoretical foundation: https://proceedings.neurips.cc/paper_files/paper/2024/file/7...

TL;DR: They invent a concept called "Boolean variation" which is the binary analog to the Newton/Leibniz derivative. They are then able to do backpropagation directly in binary.