At 40gb and a rumoured 5 to 7 TB size of the proprietary flagships you are looking at several megawatts to run one single model instance. Cerebras is insanely power hungry. It is funny how they are essentially a parallell happenstance (chips being made for other compute stuff also works for LLMs) to gaming processors accidentally being good for LLMs.
The world will be much more interesting when real bespoke hardware built for actual LLM usage comes to market. This means silicon of the SIMD flavour or other variants, but using DRAM so you can pack more tightly.