Microsoft's neural language AI model surpasses human performance in SuperGLUE test

Microsoft Logo at Ignite
Microsoft Logo at Ignite (Image credit: Windows Central)

What you need to know

  • Microsoft's DeBERTa AI model outperformed humans in a test of natural language understanding.
  • The AI earned higher marks than the human baseline in the SuperGLUE test.
  • Google also has an AI that beats the human baseline, though Microsoft's AI model scores higher on the same test.

Microsoft invests heavily in artificial intelligence in a wide range of sectors. One of those sectors is natural language understanding, which aims to have AI models understand everyday speech. This is a particularly tricky challenge for machines, but Microsoft's DeBERTa AI model recently scored higher than the human baseline in the SuperGLUE test.

As explained by Microsoft, SuperGLUE is one of the most challenging benchmarks for natural language understanding. Microsoft shares an example in its recent blog post:

Given the premise "the child became immune to the disease" and the question "what's the cause for this?," the model is asked to choose an answer from two plausible candidates: 1) "he avoided exposure to the disease" and 2) "he received the vaccine for the disease."

This is a simple question for humans. We have background information and are used to placing things within context, but it's a challenging question for AI. To make an AI model answer this question correctly, it needs to understand cause and effect, and both options presented to it. The SuperGLUE test includes natural language inference, co-reference resolution, and word sense disambiguation, as explained by Microsoft.

The DeBERTa model was recently updated to include 48 Transformer layers and 1.5 billion parameters. As a result, the DeBERTa model earned a macro-average score of 90.3 in the SuperGLUE test. The human baseline for the same test is 89.8.

Microsoft states that it will release the DeBERTa model and its source code to the public.

Microsoft explains that the DeBERTA AI model beating out humans in the SuperGLUE test doesn't mean that it's as intelligent as humans.

Despite its promising results on SuperGLUE, the model is by no means reaching the human-level intelligence of NLU. Humans are extremely good at leveraging the knowledge learned from different tasks to solve a new task with no or little task-specific demonstration. This is referred to as compositional generalization, the ability to generalize to novel compositions (new tasks) of familiar constituents (subtasks or basic problem-solving skills). Moving forward, it is worth exploring how to make DeBERTa incorporate compositional structures in a more explicit manner, which could allow combining neural and symbolic computation of natural language similar to what humans do.

Microsoft's DeBERTa model isn't the first to beat the human baseline on the SuperGLUE test. Google's T5 + Meena" model hit a score of 90.2 on January 5, 2021. Microsoft's DeBERTa model beat Google's with a score of 90.3 just a day later.

CATEGORIES
Sean Endicott
News Writer and apps editor

Sean Endicott is a tech journalist at Windows Central, specializing in Windows, Microsoft software, AI, and PCs. He's covered major launches, from Windows 10 and 11 to the rise of AI tools like ChatGPT. Sean's journey began with the Lumia 740, leading to strong ties with app developers. Outside writing, he coaches American football, utilizing Microsoft services to manage his team. He studied broadcast journalism at Nottingham Trent University and is active on X @SeanEndicott_ and Threads @sean_endicott_.