I find the mistral "middle" between small LMs /1T LMs compelling. Models that are sufficiently big to be performant but specialised for domains and tasks- this is what I assumed we'd always head towards.