Remaining dependent on proprietary frontier models that you can only access via an API makes no sense whatsoever. My hope is that the future is open weight models running on local hardware.
Eventually, yes. ParoQuant is hopefully the future here, 4-bit weights with no real degradation from FP16:
https://github.com/z-lab/paroquant
Eventually, yes. ParoQuant is hopefully the future here, 4-bit weights with no real degradation from FP16:
https://github.com/z-lab/paroquant