What would I do to demonstrate that they are bad at math? If by "maths" we mean things like working out a double integral for a joint probability problem, or anything simpler than that, GPT5 has been flawless.
Search the topic. It is historically documented. It might no longer be true though.
A way to test might be running an open model locally, directly (without a harness) where you could be sure it's not going through a translation layer. I think these days it might have this tool call behavior built in, but I think back in the day it was treated more like a magic trick. Without it, it behaved similar to "how many r's are in strawberry" for simple math.
Search the topic. It is historically documented. It might no longer be true though.
A way to test might be running an open model locally, directly (without a harness) where you could be sure it's not going through a translation layer. I think these days it might have this tool call behavior built in, but I think back in the day it was treated more like a magic trick. Without it, it behaved similar to "how many r's are in strawberry" for simple math.