Not sure what you mean by efficiency as this was part of the article and I understand things differently—can you clarify? For the energy of 20 W in an hour on a laptop’s M4 pro, this model produces about 200k tokens (a book or two) at a typical electricity cost of less than a third of a US cent. Although clearly the intelligence of this particular model is unrelated to human intelligence, I always thought that there is no comparison between LLMs and humans in terms of efficiency: these models are way less energy expensive than humans. If you were to use data center scale optimizations, then serving LLMs is many additional orders of magnitude more efficient than serving LLMs at home. (The energy cost of inference on the M4 pro and iphone are listed in the article.)