logoalt Hacker News

ollinyesterday at 4:28 PM1 replyview on HN

Most ONNX files are fp32, but the ONNX format actually allows fp16, int8, etc. as well (see onnx.proto for the full list of dtypes [1] - they even have fp8/fp4 these days!). I ended up switching over to fp16 ONNX models for my own web-based inference project since the quality is ~identical and page loads get 2x faster.

[1] https://github.com/onnx/onnx/blob/main/onnx/onnx.proto#L605


Replies

exabrialtoday at 1:40 AM

Thanks for the pointer actually. I need to take a look at this version of the spec.