Mistral have small variants (3B, 8B, 14B, etc.), as do others like IBM Granite and Qwen. Then there are finetunes based on these models, depending on your workflow/requirements.
True, but anything remotely useful is 300B and above
True, but anything remotely useful is 300B and above