"Elon Musk promised Grok 3 would be the smartest AI ever. Spoiler alert: it wasn't." — AI critic says Sam Altman can breathe easy as xAI's model launch was a "carbon copy" of previous demos

The Grok X AI app displays on a mobile phone with the Grok X AI logo, as seen in this photo illustration.
(Image credit: Getty Images | NurPhoto)

After much anticipation and hype around xAI's Grok 3, the next-gen finally shipped. The company CEO and billionaire Elon Musk touted it as the "smartest AI on earth," claiming it outperformed proprietary models from top AI firms, including OpenAI, Anthropic, DeepSeek, and Google, across a wide range of benchmarks, including math, science, and coding.

The performance boost could be attributed to Musk's indication that Grok 3 is “complete with 10X more compute” than its predecessor. During the launch of the product on X (formerly Twitter), Elon Musk indicated:

“Grok 3 is an order of magnitude more capable than Grok 2...[It’s a] maximally truth-seeking AI, even if that truth is sometimes at odds with what is politically correct.”

"We're continually improving the models every day, and literally within 24 hours, you'll see improvements," added Musk. Interestingly, Grok 3 surpasses OpenAI’s GPT-4o across several benchmarks, including the AIME test (which evaluates a model's math capabilities) and GPQA, which evaluates a model's capabilities in science.

However, Andrej Karpathy, OpenAI co-founder and former Tesla AI lead, shared some interesting insights about Grok 3's performance:

"As far as a quick vibe check over ~2 hours this morning, Grok 3 + Thinking feels somewhere around the state of the art territory of OpenAI's strongest models (o1-pro, $200/month), and slightly better than DeepSeek-R1 and Gemini 2.0 Flash Thinking. Which is quite incredible considering that the team started from scratch ~1 year ago, this timescale to state of the art territory is unprecedented. Do also keep in mind the caveats - the models are stochastic and may give slightly different answers each time, and it is very early, so we'll have to wait for a lot more evaluations over a period of the next few days/weeks. The early LM arena results look quite encouraging indeed. For now, big congrats to the xAI team, they clearly have huge velocity and momentum and I am excited to add Grok 3 to my "LLM council" and hear what it thinks going forward."

Everything you need to know about Grok 3

(Image credit: Getty Images | NurPhoto)

Grok 3, trained using xAI's Memphis data center featuring 200,000 GPUs, garnered higher ratings than its competitors on Chatbot Arena, a crowdsourced test designed to compare different AI models.

Grok 3 ships with two modes: Think and Big Brain. The former can be used for general queries, whereas the latter handles difficult queries due to its access to more compute resources for deeper reasoning.

According to xAI, Grok 3 Reasoning and Grok 3 mini Reasoning can think and reason through problems like OpenAI's o3-mini or DeepSeek's R1 AI. The tool also ships with a new DeepSearch feature for better research, brainstorming, and data analysis when responding to queries, taking on OpenAI's Deep Search and Perplexity DeepResearch.

Grok 3 has already rolled out to X users subscribed to the Premium+ tier. It's worth noting that xAI plans to unveil a new subscription plan dubbed SuperGrok, including exclusive access to DeepSearch, better reasoning capabilities, and unlimited image generation.

To that end, Elon Musk plans to open-source Grok 2 in the next few months:

“Our general approach is that we will open-source the last version [of Grok] when the next version is fully out. When Grok 3 is mature and stable, which is probably within a few months, then we’ll open-source Grok 2.”

Interestingly, Ethan Mollick, an associate professor at the University of Pennsylvania's Wharton School, indicated that Grok 3 isn't a leader in the AI space despite Musk's claims:

  • X has caught up with the frontier of released models VERY quickly, if they continue to scale this fast, they are a major player. That said, while their base model is currently leading the Chatbot Arena, their benchmarks are not clearly beating OpenAI's o3
  • Grok 3 is closely following the OpenAI playbook, including using the same product mix
  • Not sure whether firms will use the Grok API at this point, given the enterprise partnerships (Azure, AWS, etc.), support and extensive sales & training efforts for the other big labs, I don't know if Grok has a big opening.

While Grok 3's performance against OpenAI o3 remains debatable, Gary Marcus, founder of Geometric Intelligence, indicated (via Business Insider):

"Elon Musk promised that Grok 3 would be the smartest AI ever. Spoiler alert: it wasn't."

Marcus branded Grok 3's launch a "carbon copy" of previous demos. He added that while the model shows great promise, its performance has yet to scale OpenAI's models' heights. "Sam Altman can breathe easy for now," he added. "No major leap forward here."

CATEGORIES
Kevin Okemwa
Contributor

Kevin Okemwa is a seasoned tech journalist based in Nairobi, Kenya with lots of experience covering the latest trends and developments in the industry at Windows Central. With a passion for innovation and a keen eye for detail, he has written for leading publications such as OnMSFT, MakeUseOf, and Windows Report, providing insightful analysis and breaking news on everything revolving around the Microsoft ecosystem. You'll also catch him occasionally contributing at iMore about Apple and AI. While AFK and not busy following the ever-emerging trends in tech, you can find him exploring the world or listening to music.

  • GraniteStateColin
    Just going on the information in the article, it seems like other than the Gary Markus assessment, others, including the Open AI co-founder, Kaparthy, praised Grok 3 as already as good as the most expensive subscription version of ChatGPT and improving faster.

    If that's correct, then the headline appears to be a bit misleading. A valid quote for sure, but apparently an outlier in being more negative on xAI than others. Given Musk's recently becoming a political target, objective analysis will be tougher to come by, with those on the right likely to praise and those on the left likely to criticize, regardless of merit in both cases. This makes it more challenging for journalists looking to report as objectively as possible.

    Note that the one negative comment in the article is from Gary Markus, someone the source Business Insider article calls, "a longtime critic of AI hype."
    Reply