logoalt Hacker News

exabrialyesterday at 3:10 PM3 repliesview on HN

A *2.4gb* ONNX? That is wild. This format continues to impress me. ONNX uses 32bit single precision floats I believe, so thats something like ~644m float params/constants. I recently dove deep 'traditional ML' side of the ONNX serialization format for the purposes of writing an JVM ML compiler for trees and regressions. ONNX actually quite clever the way it serializes trees into parallel arrays (which is then serialized using protobuf). My trees have capped out at < 32mb. I haven't dove into the neural net side of things yet, mainly because I don't have any models to run in prod.(https://github.com/exabrial/petrify if anyone is interested.)


Replies

vunderbayesterday at 4:06 PM

Same, I really like the ONNX format. I only wish that they weren't so frustratingly difficult to use on Apple iOS. Their browser engine, WebKit, has become annoyingly restrictive over the years in terms of the working memory cap.

I ran into quite a few out-of-memory iOS safari issues when I was building continuous voice recognition for my blind chess game, so people could play while on the go.

show 1 reply
ollinyesterday at 4:28 PM

Most ONNX files are fp32, but the ONNX format actually allows fp16, int8, etc. as well (see onnx.proto for the full list of dtypes [1] - they even have fp8/fp4 these days!). I ended up switching over to fp16 ONNX models for my own web-based inference project since the quality is ~identical and page loads get 2x faster.

[1] https://github.com/onnx/onnx/blob/main/onnx/onnx.proto#L605

show 1 reply
bring-shrubberyyesterday at 5:04 PM

Yeah it's pretty cool what a 2gb NN can do from a single image