AI2's MolmoAct 7B Challenges Nvidia and Google in 3D Robot Reasoning

Robotics Artificial Intelligence Large Language Models LLMs Physical AI Open Source Vision-Language-Action

August 13, 2025

Source: VentureBeat AI

Strategic Shift

Media Hype 7/10

Real Impact 9/10

What is the Viqus Verdict?

We evaluate each news story based on its real impact versus its media hype to offer a clear and objective perspective.

AI Analysis:

While the immediate impact of this research is relatively contained, the underlying technology represents a fundamental shift in robotics, generating significant future potential. The combination of LLMs and robotics is a game-changer, and the open-source nature of MolmoAct will undoubtedly accelerate development and adoption.

Article Summary

Allen Institute for AI (Ai2) has unveiled MolmoAct 7B, a groundbreaking model poised to shift the landscape of physical AI. This open-source model allows robots to ‘reason in space’ by integrating large language models (LLMs) with robotics, effectively granting them the ability to understand and interact with the physical world. MolmoAct, based on Ai2’s open-source Molmo, ‘thinks’ in three dimensions, processing data inputs like video into spatially grounded tokens – distinct from traditional vision-language-action (VLA) models. The model estimates distances between objects and then predicts a sequence of ‘image-space’ waypoints, enabling actions like adjusting an arm or stretching out. Ai2’s research resulted in a task success rate of 72.1%, surpassing models from industry leaders like Google, Microsoft and Nvidia. This represents a significant step towards true physical intelligence, a long-held dream for robotics developers. The model’s open-source nature and ease of adaptation – demonstrated by its ability to function across different robot embodiments with minimal fine-tuning – are particularly noteworthy. This innovation has fuelled existing interest in physical AI, driven by advancements from Google Research (SayCan) and Meta/NYU (OK-Robot), and the recent release of Hugging Face’s desktop robot. The open source element of the project has been widely praised by the robotics community.

Key Points

MolmoAct 7B is an open-source model developed by Ai2 that allows robots to ‘reason in space’ by understanding and interacting with the physical world.
The model uses spatially grounded tokens to represent data inputs, enabling robots to gain a 3D understanding of their surroundings and plan actions accordingly.
MolmoAct outperformed models from Nvidia, Google, and Microsoft in a benchmark task, highlighting its potential to advance the field of physical AI.

Why It Matters

This research is significant because it addresses a critical bottleneck in robotics: the ability for robots to truly understand and navigate complex, dynamic physical environments. The emergence of models like MolmoAct, built upon the advancements in LLMs, represents a crucial step toward creating more adaptable and intelligent robots. For enterprise AI leaders, this shift is particularly important as robotics becomes increasingly integrated into industries ranging from manufacturing and logistics to healthcare and logistics. The potential for greater automation, improved efficiency, and new product development is substantial. Furthermore, this development underscores the strategic importance of open-source AI, fostering collaboration and accelerating innovation within the sector.

AI2's MolmoAct 7B Challenges Nvidia and Google in 3D Robot Reasoning

What is the Viqus Verdict?

Article Summary

Key Points

Why It Matters

You might also be interested in