Skip to main content

Here’s why people are saying GPT-4 is getting ‘lazy’

OpenAI and its technologies have been in the midst of scandal for most of November. Between the swift firing and rehiring of CEO Sam Altman and the curious case of the halted ChatGPT Plus paid subscriptions, OpenAI has kept the artificial intelligence industry in the news for weeks.

Now, AI enthusiasts have rehashed an issue that has many wondering whether GPT-4 is getting “lazier” as the language model continues to be trained. Many who use it speed up more intensive tasks have taken to X (formerly Twitter) to air their grievances about the perceived changes.

OpenAI has safety-ed GPT-4 sufficiently that its become lazy and incompetent.

Convert this file? Too long. Write a table? Here's the first three lines. Read this link? Sorry can't. Read this py file? Oops not allowed.

So frustrating.

— rohit (@krishnanrohit) November 28, 2023

Rohit Krishnan on X detailed several of the mishaps he experienced while using GPT-4, which is the language model behind ChatGPT Plus, the paid version of ChatGPT. He explained that the chatbot has refused several of his queries or given him truncated versions of his requests when he was able to get detailed responses previously. He also noted that the language model will use tools other than what it has been instructed to use, such as Dall-E when a prompt asks for a code interpreter. Krishnan also sarcastically added that “error analyzing” is the language model’s way of saying “AFK [away from keyboard], be back in a couple of hours.”

Matt Wensing on X detailed his experiment, where he asked ChatGPT Plus to make a list of dates between now and May 5, 2024, and the chatbot required additional information, such as the number of weeks between those dates, before it was able to complete the initial task.

Wharton professor Ethan Mollick also shared his observations of GPT-4 after comparing sequences with the code interpreter he ran in July to more recent queries from Tuesday. He concluded that GPT-4 is still knowledgeable, but noted that it explained to him how to fix his code as opposed to actually fixing the code. In essence, he would have to do the work he was asking GPT-4 to do. Though Mollick has not intended to critique the language, his observations fall in step with what others have described as “back talk” from GPT-4.

ChatGPT is known to hallucinate answers for information that it does not know, but these errors appear to go far beyond common missteps of the AI chatbot. GPT-4 was introduced in March, but as early as July, reports of the language model getting “dumber” began to surface. A study done in collaboration with Stanford University and the University of California, Berkeley observed that the accuracy of GPT-4 dropped from 97.6% to 2.4% between March and June alone. It detailed that the paid version of ChatGPT was unable to provide the correct answer to a mathematical equation with a detailed explanation, while the unpaid version that still runs an older GPT 3.5 model gave the correct answer and a detailed explanation of the mathematical process.

During that time, Peter Welinder, OpenAI Product vice president, suggested that heavy users might experience a psychological phenomenon where the quality of answers might appear to degrade over time when the language model is actually becoming more efficient.

There has been discussion if GPT-4 has become "lazy" recently. My anecdotal testing suggests it may be true.

I repeated a sequence of old analyses I did with Code Interpreter. GPT-4 still knows what to do, but keeps telling me to do the work. One step is now many & some are odd. pic.twitter.com/OhGAMtd3Zq

— Ethan Mollick (@emollick) November 28, 2023

According to Mollick, the current issues might similarly be temporary and due to a system overload or a change in prompt style that hasn’t been made apparent to users. Notably, OpenAI cited a system overload as a reason for the ChatGPT Plus sign-up shutdown following the spike in interest in the service after its inaugural DevDay developers’ conference introduced a host of new functions for the paid version of the AI chatbot. There is still a waitlist in place for ChatGPT Plus. The professor also added that ChatGPT on mobile uses a different prompt style, which results in “shorter and more to-the-point answers.”

Yacine on X detailed that the unreliability of the latest GPT-4 model due to the drop in instruction adherence has caused them to go back to traditional coding, adding that they plan on creating a local code LLM to regain control of the model’s parameters. Other users have mentioned opting for open-source options in the midst of the language model’s decline.

Similarly, Reddit user, Mindless-Ad8595 explained that more recent updates to GPT-4 have made it too smart for its own good. “It doesn’t come with a predefined ‘path’ that guides its behavior, making it incredibly versatile, but also somewhat directionless by default,” he said.

The programmer recommends users create custom GPTs that are specialized by task or application to increase the efficiency of the model output. He doesn’t provide any practical solutions for users remaining within OpenAI’s ecosystem.

App developer Nick Dobos shared his experience with GPT-4 mishaps, noting that when he prompted ChatGPT to write pong in SwiftUI, he discovered various placeholders and to-dos within the code. He added that the chatbot would ignore commands and continue inserting these placeholders and to-dos into the code even when instructed to do otherwise. Several X users confirmed similar experiences of this kind with their own examples of code featuring placeholders and to-dos. Dobos’ post got the attention of an OpenAI employee who said they would forward examples to the company’s development team for a fix, with a promise to share any updates in the interim.

Overall, there is no clear explanation as to why GPT-4 is currently experiencing complications. Users discussing their experiences online have suggested many ideas. These range from OpenAI merging models to a continued server overload from running both GPT-4 and GPT-4 Turbo to the company attempting to save money by limiting results, among others.

It is well-known that OpenAI runs an extremely expensive operation. In April 2023, researchers indicated it took $700,000 per day, or 36 cents per query, to keep ChatGPT running. Industry analysts detailed at that time that OpenAI would have to expand its GPU fleet by 30,000 units to maintain its commercial performance for the remainder of the year. This would entail support of ChatGPT processes, in addition to the computing for all of its partners.

While waiting for GPT-4 performance to stabilize, users exchanged several quips, making light of the situation on X.

“The next thing you know it will be calling in sick,” Southrye said.

“So many responses with “and you do the rest.” No YOU do the rest,” MrGarnett said.

The number of replies and posts about the problem is definitely hard to ignore. We’ll have to wait and see if OpenAI can tackle the problem head-on in a future update.

Editors' Recommendations

Fionna Agomuoh
Fionna Agomuoh is a technology journalist with over a decade of experience writing about various consumer electronics topics…
8 AI chatbots you should use instead of ChatGPT
Copilot on a laptop on a desk.

When ChatGPT launched in late 2022, it was a novelty. It didn't take long, however, for competition to come along.

Early on, there weren’t many ChatGPT alternatives available that weren’t in-house, research-based options or open source projects on GitHub that required some sort of coding knowledge to set up and operate. But since then, several companies have developed consumer products with free and paid tiers and a plethora of enterprise and developer options. So, if you aren't satisfied with ChatGPT for whatever reason, these are the eight other options to try out instead.
Microsoft Copilot

Read more
Why Llama 3 is changing everything in the world of AI
Meta AI on mobile and desktop web interface.

In the world of AI, you've no doubt heard about what OpenAI and Google have been up to. And now, Meta's Llama LLM (large language model) is becoming an increasingly important player in the game, especially with its open-source nature. Meta recently made a big splash with the launch of its Llama 3 AI model, and it's shaken up the field dramatically.

The reasons why are multiple and varied. It's free to use, it has a wide user base, and yes, it's open source, to name but a few. Here's why Llama 3 is taking the AI industry by storm and may shape its future for some time to come.
Llama 3 is really good
We can debate until the cows come home about how useful AIs like ChatGPT and Llama 3 are in the real world -- they're not bad at teaching you board game rules -- but the few benchmarks we have for how capable these AI are give Llama 3 a distinct advantage.

Read more
The best ChatGPT plug-ins you can use
OpenAI's website open on a MacBook, showing ChatGPT plugins.

ChatGPT is an amazing tool, and when they were introduced, plug-ins made it even better. But as of March 2024, they're no longer available as part of ChatGPT, having since been replaced by Custom GPTs, which you can make yourself. Or you can use one of the many amazing options from other developers, AI fans, and prompt engineers.

Interested in learning about how to make the best custom GPT for you? We have a guide for that. If you're more interested in the best custom GPTs available now, we have a guide for that too.

Read more