ViqusViqus
Navigate
Company
Blog
About Us
Contact
System Status
Enter Viqus Hub

AI2's MolmoAct 7B Challenges Nvidia and Google in 3D Robot Reasoning

Robotics Artificial Intelligence Large Language Models LLMs Physical AI Open Source Vision-Language-Action
August 13, 2025
Viqus Verdict Logo Viqus Verdict Logo 9
Strategic Shift
Media Hype 7/10
Real Impact 9/10

Article Summary

Allen Institute for AI (Ai2) has unveiled MolmoAct 7B, a groundbreaking model poised to shift the landscape of physical AI. This open-source model allows robots to ‘reason in space’ by integrating large language models (LLMs) with robotics, effectively granting them the ability to understand and interact with the physical world. MolmoAct, based on Ai2’s open-source Molmo, ‘thinks’ in three dimensions, processing data inputs like video into spatially grounded tokens – distinct from traditional vision-language-action (VLA) models. The model estimates distances between objects and then predicts a sequence of ‘image-space’ waypoints, enabling actions like adjusting an arm or stretching out. Ai2’s research resulted in a task success rate of 72.1%, surpassing models from industry leaders like Google, Microsoft and Nvidia. This represents a significant step towards true physical intelligence, a long-held dream for robotics developers. The model’s open-source nature and ease of adaptation – demonstrated by its ability to function across different robot embodiments with minimal fine-tuning – are particularly noteworthy. This innovation has fuelled existing interest in physical AI, driven by advancements from Google Research (SayCan) and Meta/NYU (OK-Robot), and the recent release of Hugging Face’s desktop robot. The open source element of the project has been widely praised by the robotics community.

Key Points

  • MolmoAct 7B is an open-source model developed by Ai2 that allows robots to ‘reason in space’ by understanding and interacting with the physical world.
  • The model uses spatially grounded tokens to represent data inputs, enabling robots to gain a 3D understanding of their surroundings and plan actions accordingly.
  • MolmoAct outperformed models from Nvidia, Google, and Microsoft in a benchmark task, highlighting its potential to advance the field of physical AI.

Why It Matters

This research is significant because it addresses a critical bottleneck in robotics: the ability for robots to truly understand and navigate complex, dynamic physical environments. The emergence of models like MolmoAct, built upon the advancements in LLMs, represents a crucial step toward creating more adaptable and intelligent robots. For enterprise AI leaders, this shift is particularly important as robotics becomes increasingly integrated into industries ranging from manufacturing and logistics to healthcare and logistics. The potential for greater automation, improved efficiency, and new product development is substantial. Furthermore, this development underscores the strategic importance of open-source AI, fostering collaboration and accelerating innovation within the sector.

You might also be interested in