I haven't really had a problem with thermal throttling, but my highest compute activity is inferencing. The main performance fall-off I've observed is that the cache/context size to token output rate curve is way more aggressive than I expected given the memory bandwidth compared to GPU-based inferencing I've done on PC. But other than spinning up the fans during prompt processing, I'm able to stay peak CPU usage without clock speed reducing. Generally though this only maintains peak compute utilization for around 2-3 minutes.
I'm wondering if there was something wrong with your particular unit?
CPU performance was acceptable; GPU was the one was that falling off a cliff.
Re: particular unit, I’m not sure - it was perfectly fine during anything “normal,” and admittedly, asking a laptop to run at 100% for any extended period of time is already a big ask. But it’s possible, I suppose.
I’m waiting for the Studios to get the Max and / or Ultra, and will reconsider if I want one, or if I don’t really need to play with local LLM at this time.