I saw this paper the other day - I feel its result may be because the "polite" prompts they have chosen arent very good at putting the ai in the roleplay-space of a valued colleague, more like a sommelier or a high-end shopkeeper.
It disagrees with most other literature on the same topic, which is worth keeping in mind. This one studies gpt4o, an old model now, but a lot of other studies are on even earlier models.
"Can you kindly consider the following problem" not how anyone would actually speak to a valued collegue one considers smart. I've always been a fan of "I came across this and I know you're just the guy for the job" or "since you're an expert in this, reckon you could help me with xyz?" or "I know you tend to be a deep thinker on issues like this, and it clearly needs some brainpower behind it"
the "rude" things are also funny, and clearly not written by english as a first language speakers. This fact alone makes me wonder about the mere 250 prompt sample size
> "Can you kindly consider the following problem" not how anyone would actually speak to a valued collegue one considers smart.
Man idk, it's not how I talk but there's like 100 million nigerian english speakers, twice that indian, and they have some speech mannerisms that surprise me the first few times. I'm pretty sure I've heard exactly this from a colleague before.
Intuition about what a native speaker would do with english are scrambled right now. I'm not even sure most english is spoken by native speakers anymore, and the boundary between a native speaker and someone who has "merely" been using it as their educational and professional language for their entire life is disorienting.
"Can you kindly consider the following problem" seems like the most respectful of all your examples, TBH. The others sound like ass-kissing, or even sarcastic/patronizing.