What's up with - log error / model size? I'm not an LLM person, but a ratio of ~1 means a roughly 40% error rate for its size? I don't follow
(math: - log error / model size = 1 <-> error / model size = 1/e )