These competent open models you want to use were trained on data from people like you and me.
I wonder if there are competent models trained purely on permissive open-source code like MIT or Apache 2.0.
MIT and Apache 2.0 both require attribution, so it's not like limiting to those would help in license compliance.
MIT and Apache 2.0 both require attribution, so it's not like limiting to those would help in license compliance.