Or: How OpenAI Optimized for Benchmarks and Broke My Workflow I’ve used ChatGPT since GPT-3. Not casually, but as a core part of my research and writing workflow. Image generation became available in version 4o, and I integrated it: “See this article? Generate an image that represents it.” Simple, conversational, reliable. It worked. Until three…
I came across a tweet (or whatever we call the messages on X-formerly-known-as-Twitter these days) on February 19th with over 8,500 views claiming that new research from Microsoft and Salesforce should “scare every AI builder.” The thread declares that LLMs are fundamentally broken in multi-turn conversations, that “real conversations break every model on the market,”…
Dr. Randal Olson recently published research showing that modern AI models routinely flip their answers when challenged with a simple follow-up: โAre you sure?โ Ask an AI a question. Get an answer. Then ask if it’s sure. Watch it suddenly reverse itself, retract its conclusion, or hedge into oblivion as though you’ve triggered existential dread.…