It could be using the same early trick of Grok (at least in the earlier versions) that they boot 10 agents who work on the problem in parallel and then get a consensus on the answer. This would explain the price and the latency.
Essentially a newbie trick that works really well but not efficient, but still looking like it's amazing breakthrough.
(if someone knows the actual implementation I'm curious)
The magic number appears to be 12 in case of GPT 5.2 pro.