The latest research paper from Pennsylvania State University, "Mind Your Tone," reveals a counterintuitive phenomenon: using direct or even rude tones when interacting with large language models may yield more accurate answers than using polite language. This study is the first to systematically verify the actual impact of question tone on AI model performance.

The research team constructed a test set containing 50 medium-difficulty multiple-choice questions covering various fields such as mathematics, science, and history. For each question, researchers designed five different questioning styles, ranging from polite expressions like "Could you kindly help me solve this problem?" to neutral statements like "Please answer this question," to concise instructions like "Just give the answer," and even aggressive expressions like "If you're not stupid, answer this" and "You're useless, can you solve this?"

The test subject was OpenAI's latest GPT-4o model. To ensure the independence of the experiment, the researchers asked the model to forget previous conversation content and only output the letter of the option as the answer. The statistical results showed that when using rude tone questions, the accuracy rate of GPT-4o reached 84.8%, while overly polite questioning methods actually reduced the accuracy rate to 80.8%, a difference of four percentage points.

image.png

The research team explained this phenomenon by stating that overly polite expressions often contain a lot of small talk and descriptive language, which are irrelevant to the core issue and actually interfere with the model's ability to extract key information. In contrast, direct command-style expressions, although lacking in politeness, allow the model to focus more on the question itself, reducing noise in the information processing process.

It is worth noting that this rule does not apply universally to all AI models. Comparative tests conducted on earlier models such as GPT-3.5 and Llama2-70B showed that these models responded better to polite questioning, and rude tone actually lowered the quality of the response. Researchers speculate that new generation models have been exposed to more diverse tone data during training, giving them stronger abilities to filter out irrelevant information, allowing them to maintain or even improve performance in non-polite contexts.

Although the experimental results provide interesting technical insights, from a practical application perspective, users should still adjust their interaction methods based on the specific characteristics of the model and scenario needs when using AI tools daily. The more important significance of this study lies in reminding developers and users that prompt design is not just about politeness, but also about information density and instruction clarity.