Microsoft takes on OpenAI's Sora with a cutting-edge AI tool capable of turning a static image into a 'Talking Tom'

Robot standing in front of city with Microsoft logo
(Image credit: Windows Central)

What you need to know

  • Microsoft has launched VASA, a new tool capable of turning a static image into a short clip by leveraging AI capabilities.
  • The framework supports 512x512 videos at up to 40 FPS with negligible latency.
  • Microsoft is exploring different avenues to ensure the tool is used responsibly before releasing it to the general public.

Microsoft recently unveiled VASA — a new framework that generates "lifelike talking faces of virtual characters with appealing visual affective skills (VAS), given a single static image and a speech audio clip."

VASA-1 can transform a static image into a short clip by producing lip movements that perfectly synchronize with a speech audio clip. Interestingly, the sophisticated cutting-edge technology makes the AI-generated creation lifelike by "capturing a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness."

Will Microsoft's VASA fuel widespread deepfakes?

(Image credit: Image Creator from Designer | Windows Central)

With the emergence of AI, there's been an increase in deepfakes emerging across social media platforms and widespread AI-generated misinformation about elections. And now, with a sophisticated tool such as VASA-1 capable of delivering high video quality with lifelike facial and head dynamics from static images, a major concern might be how this will impact factual and credible news or information from the internet.

The tool supports 512x512 videos at up to 40 FPS with negligible latency. As it happens, I recently stumbled on a video similar to Microsoft's VASA-generated clips on LinkedIn. I noticed the video was rather off in some aspects like the tone, lip, and head movements.

As more people continue to embrace AI, tools like VASA and Image Creator from Designer will improve at generating images and clips. They are already raising concerns among professionals in the built environment industry, as they are good at generating structural designs and could render them obsolete

We recently reported on a bizarre incident where a popular Canadian rapper used AI to generate a verse using a deceased rapper's voice without his estate's approval and featured it in a track. Similarly, the flow on the diss track was off, but the deceased rapper's voice was uncanny.

Microsoft indicates it has no plans to release "an online demo, API, product, additional implementation details, or any related offerings," till it has elaborate measures to regulate and ensure the tool's offerings are used responsibly.

Kevin Okemwa
Contributor

Kevin Okemwa is a seasoned tech journalist based in Nairobi, Kenya with lots of experience covering the latest trends and developments in the industry at Windows Central. With a passion for innovation and a keen eye for detail, he has written for leading publications such as OnMSFT, MakeUseOf, and Windows Report, providing insightful analysis and breaking news on everything revolving around the Microsoft ecosystem. You'll also catch him occasionally contributing at iMore about Apple and AI. While AFK and not busy following the ever-emerging trends in tech, you can find him exploring the world or listening to music.