Microsoft and Google release new AI models on the same day: voice, image, and local open-source capabilities all debuting together

robot
Abstract generation in progress

Microsoft and Google both announced new AI models on Thursday, but the differences are clear: Microsoft is rolling out a new foundation model, MAI, available only through its Azure Foundry and the MAI Playground platform limited to the United States; while Google is introducing an entirely new open-source Gemma 4 model, which can be run locally. In addition, Google has changed the licensing terms for these new open-source models to Apache 2.0.

Three “world-class” in-house MAI models

Microsoft’s “world-class” in-house MAI models include three offerings:

First is MAI-Transcribe-1, an “advanced” speech-to-text model that can understand 25 of the most widely used languages worldwide. Its batch transcription speed is 2.5x faster than Microsoft’s existing Azure Fast solution.

Next is MAI-Voice-1, a new speech generation model that can generate 60 seconds of audio in just 1 second. At the same time, it also supports creating custom voices in Microsoft Foundry using short audio samples.

Finally is MAI-Image-2, a faster text-to-image model. It has already started rolling out in Copilot, and will be applied gradually to Bing and PowerPoint next.

Microsoft says:

“We’re rapidly deploying these top-tier models to support our own consumer and business products. Soon you’ll see more models across Foundry and in Microsoft’s various products and experiences.”

Google’s Gemma 4 open-source model

Google’s Gemma 4 open-source model uses an Apache 2.0 license, rather than the previously customized Gemma license agreement. Google says these models feature advanced reasoning capabilities, agentic workflows, code generation, as well as visual and audio generation capabilities, and they come in four different versions, optimized for local deployment—capable of running on “tens of billions of Android devices,” too.

Google says:

“Gemma 4 is built on the world-class research and technology behind Gemini 3. It’s the strongest set of models you can run locally on hardware today. They complement our Gemini models, and give developers one of the most powerful combinations of open-source and proprietary tools in the industry.”

Among them, the larger 26B and 31B versions of the Gemma 4 models are intended to run on consumer-grade GPUs, and can be used to power IDEs, programming assistants, and agentic workflows. Meanwhile, the lighter E2B and E4B versions focus more on multimodal capabilities and low-latency processing, making them suitable for mobile devices and IoT devices (including Raspberry Pi). These models also support fully offline execution.

Google’s Gemma 4 open-source models can be downloaded on multiple platforms, including Hugging Face, Kaggle, and Ollama. Google emphasizes:

“When it comes to infrastructure security, these models follow the same strict security protocols as our proprietary models.”

More news, ongoing updates

Risk notice and disclaimer terms

        There are risks in the market; invest cautiously. This article does not constitute personal investment advice, and it does not consider the specific investment objectives, financial conditions, or needs of individual users. Users should consider whether any opinions, viewpoints, or conclusions in this article align with their own specific circumstances. Investing based on this is at your own risk.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin