Microsoft's neural language AI model surpasses human performance in SuperGLUE test

Microsoft Logo at Ignite (Image credit: Windows Central)

What you need to know

Microsoft's DeBERTa AI model outperformed humans in a test of natural language understanding.
The AI earned higher marks than the human baseline in the SuperGLUE test.
Google also has an AI that beats the human baseline, though Microsoft's AI model scores higher on the same test.

As explained by Microsoft, SuperGLUE is one of the most challenging benchmarks for natural language understanding. Microsoft shares an example in its recent blog post:

Given the premise "the child became immune to the disease" and the question "what's the cause for this?," the model is asked to choose an answer from two plausible candidates: 1) "he avoided exposure to the disease" and 2) "he received the vaccine for the disease."

This is a simple question for humans. We have background information and are used to placing things within context, but it's a challenging question for AI. To make an AI model answer this question correctly, it needs to understand cause and effect, and both options presented to it. The SuperGLUE test includes natural language inference, co-reference resolution, and word sense disambiguation, as explained by Microsoft.

The DeBERTa model was recently updated to include 48 Transformer layers and 1.5 billion parameters. As a result, the DeBERTa model earned a macro-average score of 90.3 in the SuperGLUE test. The human baseline for the same test is 89.8.

Microsoft states that it will release the DeBERTa model and its source code to the public.

Microsoft explains that the DeBERTA AI model beating out humans in the SuperGLUE test doesn't mean that it's as intelligent as humans.

Despite its promising results on SuperGLUE, the model is by no means reaching the human-level intelligence of NLU. Humans are extremely good at leveraging the knowledge learned from different tasks to solve a new task with no or little task-specific demonstration. This is referred to as compositional generalization, the ability to generalize to novel compositions (new tasks) of familiar constituents (subtasks or basic problem-solving skills). Moving forward, it is worth exploring how to make DeBERTa incorporate compositional structures in a more explicit manner, which could allow combining neural and symbolic computation of natural language similar to what humans do.

Microsoft's DeBERTa model isn't the first to beat the human baseline on the SuperGLUE test. Google's T5 + Meena" model hit a score of 90.2 on January 5, 2021. Microsoft's DeBERTa model beat Google's with a score of 90.3 just a day later.

Sean Endicott is a news writer and apps editor for Windows Central with 11+ years of experience. A Nottingham Trent journalism graduate, Sean has covered the industry’s arc from the Lumia era to the launch of Windows 11 and generative AI. Having started at Thrifter, he uses his expertise in price tracking to help readers find genuine hardware value.

Beyond tech news, Sean is a UK sports media pioneer. In 2017, he became one of the first to stream via smartphone and is an expert in AP Capture systems. A tech-forward coach, he was named 2024 BAFA Youth Coach of the Year. He is focused on using technology—from AI to Clipchamp—to gain a practical edge.