I wonder if it really needs to be worse. I am playing with the idea of fine tuning a model on my exact stack and coding patterns. I suspect I could get better performance by training “taste” into a model rather than breadth.
That approach has its advantages, but sometimes I want to generate code for a language or kind of project I’m not experienced with using the accepted best practices.
Fine tuning these models (at least with PPO or equivalent) requires even more VRAM than inference does, potentially 2-3 times more.
I also wonder about JS only, Python only, etc models.
Maybe the future is a selection of local, specific stack trained models?