ViqusViqus
Navigate
Company
Blog
About Us
Contact
System Status
Enter Viqus Hub

Ai2's MolmoAct 7B Challenges Nvidia & Google in 3D Physical Reasoning

Robotics Artificial Intelligence Large Language Models LLMs Physical AI Vision-Language-Action Open Source Nvidia Google Meta AI2
August 13, 2025
Viqus Verdict Logo Viqus Verdict Logo 8
Strategic Leap
Media Hype 7/10
Real Impact 8/10

Article Summary

Ai2’s newly released MolmoAct 7B represents a significant step forward in the development of robots capable of understanding and interacting with the physical world. Utilizing Large Language Models (LLMs) alongside foundation models, MolmoAct allows robots to ‘think’ in three dimensions, interpreting spatial relationships and planning actions accordingly. The model’s key innovation lies in its ability to output "spatially grounded perception tokens," a novel approach distinct from traditional vision-language-action (VLA) models, enabling a deeper understanding of surrounding environments. Benchmarking tests demonstrate MolmoAct’s superior performance, exceeding the success rates of models from Google, Microsoft, and Nvidia. Crucially, the open-source nature of the model, coupled with its readily accessible training data, is expected to accelerate research and development within the burgeoning physical AI space. This news comes as interest in developing more spatially aware robots—a long-held dream—is growing, bolstered by advancements in LLMs.

Key Points

  • MolmoAct 7B, developed by Ai2, is an open-source model that allows robots to ‘reason in space’ through 3D spatial understanding.
  • The model’s innovation is its use of ‘spatially grounded perception tokens,’ distinct from traditional VLA models, offering a deeper understanding of the physical world.
  • MolmoAct outperformed models from Google, Microsoft, and Nvidia in initial benchmarking tests, highlighting the potential of this approach.

Why It Matters

The emergence of models like MolmoAct represents a pivotal moment in robotics development. For years, creating truly intelligent robots—those that can navigate and interact with complex, dynamic environments—has been hampered by the limitations of traditional programming approaches. The ability to move beyond pre-programmed sequences and leverage LLMs to interpret and reason about physical surroundings is a game-changer. This news is critical for anyone involved in robotics, AI, or automation, as it indicates a pathway to more adaptable and versatile robots. Furthermore, the open-source nature of the model will likely spark a wave of innovation and experimentation, potentially leading to rapid advancements in this field.

You might also be interested in