Anthropic's upgraded Claude AI model outperforms OpenAI-o1 in coding and can use a Windows 11 PC like humans — potentially backing NVIDIA CEO's claim about software development being dead in the water

Anthropic's upgraded Claude 3.5 Sonnet AI model
Anthropic's upgraded Claude 3.5 Sonnet AI model (Image credit: Anthropic)

What you need to know

  • Anthropic recently shipped an upgraded version of Claude 3.5 Sonnet alongside a new model dubbed Claude 3.5 Haiku, with enhanced coding capabilities and more.
  • The AI firm also unveiled computer use, a new capability that allows users to prompt Claude to use computers as people do.
  • The company admits shipping the capability to the public poses great risks, but it plans will use the avenue to observe how people are leveraging the tool. It has elaborate measures in place to prevent misuse, such as restricted access to the web during training. 

The generative AI landscape is seemingly transitioning to the next phase beyond AI-generated images and text. Anthropic recently unveiled an upgraded version of Claude 3.5 Sonnet and a new model dubbed Claude 3.5 Haiku. According to the company, the upgraded version ships with enhanced coding capabilities and shares the same performance specs as Anthropic's Claude 3 Opus LLM. 

More interestingly, the new capability dubbed “Computer Use”, which is available in open beta. Through the API, developers can "can direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text." This makes Claude 3.5 Sonnet the first AI model to provide computer use in public beta.

Anthropic admits that users could encounter several setbacks while interacting with the model, including errors and a not-so-seamless user experience. The company hopes to use feedback to enhance and improve the model's performance and efficiency.

Companies like Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company have joined the fold to simplify processes that often require dozens of steps. For instance, "Replit is using Claude 3.5 Sonnet's capabilities with computer use and UI navigation to develop a key feature that evaluates apps as they’re being built for their Replit Agent product."

The upgraded version of Claude 3.5 Sonnet is available on the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. Anthropic is expected to ship Claude 3.5 Haiku later this month.

Per benchmarks shared, Anthropic's updated Claude 3.5 Sonnet shows a significant performance boost, especially in coding. For instance, the tool's performance on  SWE-bench Verified has improved from 33.4% to 49.0%, which indicates it performs significantly better than publicly available models, including OpenAI Strawberry reasoning AI models while maintaining the same price and speed as its predecessor. 

Related: NVIDIA CEO claims coding could be dead in the water with the prevalence of AI

The model corrects its mistakes by making another attempt at a task when it "realizes" it has encountered an issue, swaying it away from the desired output. As you may know, OpenAI o1 and o1-mini are exceptionally great at coding and have passed OpenAI's research engineer hiring interview for coding at a 90-100% rate.

AI agents are here but proceed with caution

Claude | Computer use for automating operations - YouTube Claude | Computer use for automating operations - YouTube
Watch On

While the highlighted improvements are impressive, the updated Claude 3.5 Sonnet AI model completed less than half of the tasks assigned in an evaluation designed to establish its proficiency in modifying flight reservations. The model failed approximately a third of the time while attempting to initiate a return.

Read More: Salesforce says it can beat Microsoft in AI

Anthropic highlights the model struggles with zooming and scrolling, making it easy to miss pop-up notifications because of how it processes screenshots. “Claude’s Computer Use remains slow and often error-prone,” the company added. 

The company admits that releasing the model to the public poses significant risks but also outlines that the benefits of observing how the model is used outweigh the dangers. 

According to Anthropic:

"We think it’s far better to give access to computers to today’s more limited, relatively safer models. This means we can begin to observe and learn from any potential issues that arise at this lower level, building up computer use and safety mitigations gradually and simultaneously."

In an attempt to prevent misuse and bad actors from leveraging the tool's sophisticated capabilities to cause harm, the new Claude 3.5 Sonnet isn't trained on users’ screenshots and prompts. It's also restricted from accessing the web during training. Anthropic developed the model with classifiers, swaying it away from high-risk actions like creating accounts and posting on social media. 

🎃The best early Black Friday deals🦃

CATEGORIES
Kevin Okemwa
Contributor

Kevin Okemwa is a seasoned tech journalist based in Nairobi, Kenya with lots of experience covering the latest trends and developments in the industry at Windows Central. With a passion for innovation and a keen eye for detail, he has written for leading publications such as OnMSFT, MakeUseOf, and Windows Report, providing insightful analysis and breaking news on everything revolving around the Microsoft ecosystem. You'll also catch him occasionally contributing at iMore about Apple and AI. While AFK and not busy following the ever-emerging trends in tech, you can find him exploring the world or listening to music.

Read more
Claude AI app by Anthropic is seen displayed on a smartphone screen.
Anthropic unveils Claude 3.7 Sonnet, the smartest and first hybrid reasoning model — "Self-reflecting like humans," but with a trade-off on speed for quality AI responses
DeepSeek logo on a smartphone in front of a PC screen with the same logo.
Is AI all hype? DeepSeek tumbles to #51 on Apple's App Store, weeks after dethroning ChatGPT as the most downloaded free AI app in the US — OpenAI CEO Sam Altman already promised to "obviously deliver better models"
Closeup computer code on screen, Man programmer, software developer coding and programming on laptop.
I thought coding was dead after Anthropic's CEO claimed that AI may take over within 12 months — I was wrong, and AI thinks so, too
The X account of OpenAI CEO Sam Altman is displayed on a mobile phone with a ChatGPT logo.
Sam Altman says OpenAI can confidently build AGI as the ChatGPT maker shifts focus to superintelligence: "I kinda miss doing AI research back when we didn't know how"
Anthropic CEO Dario Amodei gestures as he addresses the audience as part of a session on AI.
Anthropic CEO predicts AI will surpass human smarts by 2027, echoing Bill Gates' claim it will replace humans for most things — but Sam Altman said AGI would whoosh by with "surprisingly little" societal impact
Co-founder and CEO of Anthropic, Dario Amodei, an artificial intelligence safety and research company attends the Viva Technology show at Parc des Expositions Porte de Versailles on May 22, 2024 in Paris, France.
Anthropic CEO Dario Amodei says AI will write 90% of code in 6 months, automating software development within a year — Is this the final nail in handwritten coding's coffin?
Latest in Software Apps
Excel spreadsheet with checkboxes
Microsoft 365 sales are few and far between these days — grab this one before it goes away!
Office 365 on Razer laptop
Microsoft 365's best apps are about to get a speed boost — here's when the rollout begins
Photo of Microsoft's new sign-in page for Xbox.com using the Microsoft Edge browser.
Over one billion users will get a new Microsoft user experience, and it has a dark mode
Windows 11 answer file
How to easily create an unattended answer file for Windows 11
ChatGPT logo is seen displayed on a smartphone screen next to a laptop keyboard.
I love ChatGPT-4o's unhinged image-generation capabilities — but I'm afraid imminent censorship by OpenAI lurks on the horizon
Bill Gates, co-chairman of the Bill and Melinda Gates Foundation, delivers a keynote speech on the closing day of the Global Solutions Summit in Berlin, Germany, on Tuesday, May 7, 2024.
Bill Gates says "AI will replace humans for most things" — Rendering doctors and tutors obsolete within a decade
Latest in News
Call of Duty: Black Ops 6 Zombies mode screenshots for Shattered Veil map.
The next Call of Duty Zombies map, "Shattered Veil", is dropping earlier than expected
Helldivers 2
The new Helldivers 2 Illuminate Major Order is so important that we got a new stratagem for it
Hogwarts Legacy troll hero image
Hogwarts Legacy DLC reportedly canceled by WB Games
Tom Clancy's Rainbow Six Siege
Rumored Ubisoft and Tencent agreement comes to fruition with 25% stake and new division for the Assassin's Creed developer
In-game screenshot of the player consuming an enemy in Shadow Labyrinth
This isn't your grandpa's Pac-Man — Bandai Namco's iconic character gets a gritty new action game this Summer
Key art for Dragon Quest 1 and 2 HD-2D remake
Every PC and Xbox game shown off during Nintendo Direct March 2025