[dead]
[dead]
[dead]
[dead]
[flagged]
I feel like it's a little disingenuous to compare against full-precision models. Anyone concerned about model size and memory usage is surely already using at least an 8 bit quantization.
Their main contribution seems to be hyperparameter tuning, and they don't compare against other quantization techniques of any sort.
[dead]