ChatGPT Contradicts on Repeated Science Questions

💡WSU study: ChatGPT self-contradicts on science—key warning for LLM reliability in apps.
⚡ 30-Second TL;DR
What Changed
Washington State University LLM evaluation
Why It Matters
Exposes LLM limitations in reliability for science tasks, pushing AI builders toward hybrid verification systems and prompt engineering improvements.
What To Do Next
Run 10 identical science prompts on ChatGPT to benchmark consistency in your RAG pipeline.
🧠 Deep Insight
Web-grounded analysis with 8 cited sources.
🔑 Enhanced Key Takeaways
- •The study tested ChatGPT across two versions (GPT-3.5 in 2024 and GPT-5 mini in 2025) using 719 hypotheses from business journal papers published since 2021, revealing that accuracy improvements from 76.5% to 80% remain marginal when accounting for random guessing baseline (50% chance on true/false questions).
- •ChatGPT's inconsistency rate is severe: when identical prompts were repeated 10 times, the model achieved only 73% consistency in statement evaluation, meaning roughly 1 in 4 repeated queries produced different answers on the same scientific claim.
- •The model exhibits asymmetric failure modes, correctly identifying false hypotheses only 16.4% of the time—substantially worse than true hypothesis identification—suggesting ChatGPT has a systematic bias toward confirming statements rather than detecting falsehoods.
- •Researcher Mesut Cicek characterized current AI tools as memorization systems without genuine comprehension, stating they 'don't understand the world the way we do' and 'don't have a brain,' framing the accuracy ceiling as a fundamental architectural limitation rather than a training data problem.
🔮 Future ImplicationsAI analysis grounded in cited sources
⏳ Timeline
📎 Sources (8)
Factual claims are grounded in the sources below. Forward-looking analysis is AI-generated interpretation.
- sciencesprings.wordpress.com — From the Washington State University AI Gets a D Study Shows Inaccuracies Inconsistency in Chatgpt Answers
- katv.com — Study Finds Chatgpt Answers Inaccurate and Inconsistent Washington State University Says AI Articicila Intelligence Work Automated Cheating Layoffs Openai Technology Tests College School Prompt Accuracy
- wcti12.com — Study Finds Chatgpt Answers Inaccurate and Inconsistent Washington State University Says AI Articicila Intelligence Work Automated Cheating Layoffs Openai Technology Tests College School Prompt Accuracy
- kutv.com — Study Finds Chatgpt Answers Inaccurate and Inconsistent Washington State University Says AI Articicila Intelligence Work Automated Cheating Layoffs Openai Technology Tests College School Prompt Accuracy
- ktul.com — Study Finds Chatgpt Answers Inaccurate and Inconsistent Washington State University Says AI Articicila Intelligence Work Automated Cheating Layoffs Openai Technology Tests College School Prompt Accuracy
- komonews.com — Study Finds Chatgpt Answers Inaccurate and Inconsistent Washington State University Says AI Articicila Intelligence Work Automated Cheating Layoffs Openai Technology Tests College School Prompt Accuracy
- katu.com — Study Finds Chatgpt Answers Inaccurate and Inconsistent Washington State University Says AI Articicila Intelligence Work Automated Cheating Layoffs Openai Technology Tests College School Prompt Accuracy
- news.wsu.edu — AI Gets a D Study Shows Inaccuracies Inconsistency in Chatgpt Answers
Weekly AI Recap
Read this week's curated digest of top AI events →
👉Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: cnBeta (Full RSS) ↗


