I wonder why it's so bad. Do they just paste a CSV into the raw model? Because in my experience, even small local models can handle it reasonably well if the harness forces them to write & run a Python script that parses the table and performs the calculations, instead of relying solely on next-token prediction.