Viqus Logo Viqus Logo
Home
Categories
Language Models Generative Imagery Hardware & Chips Business & Funding Ethics & Society Science & Robotics
Resources
AI Glossary Academy CLI Tool Labs
About Contact

AI Still Struggles with White-Collar Realities, New Benchmark Reveals

Artificial Intelligence AI Knowledge Work Investment Banking Tech Benchmarks Mercor Apex Agents TechCrunch
January 22, 2026
Viqus Verdict Logo Viqus Verdict Logo 7
Reality Check
Media Hype 6/10
Real Impact 7/10

Article Summary

New research from Mercor reveals a persistent gap between the capabilities of leading AI models and the demands of professional white-collar work. The Apex-Agents benchmark, designed to mirror the intricate processes of consulting, investment banking, and legal professions, consistently shows models struggling with tasks requiring multi-domain reasoning, information tracking across multiple tools, and nuanced understanding of professional workflows. While models like Gemini 3 Flash and GPT-5.2 achieved some success – around 24% accuracy – the vast majority of queries still resulted in incorrect answers or no response. This is largely attributed to the models' difficulty in replicating the way humans operate across diverse tools and information sources – Slack, Google Drive, etc. – highlighting a key limitation in the current state of AI’s ability to truly replace complex knowledge work. The benchmark's focus on sustained, high-value tasks, rather than broad general knowledge, pushes AI systems to a level of performance that remains significantly below human professionals.

Key Points

  • AI models consistently score poorly (around 24% accuracy) on complex tasks mimicking professional white-collar work.
  • The Apex-Agents benchmark, designed to reflect real-world professional workflows, reveals a significant challenge for current AI in replicating multi-domain reasoning and information tracking.
  • Despite recent advancements, AI’s inability to seamlessly operate across diverse tools and information sources – like Slack and Google Drive – remains a substantial hurdle to automating sophisticated knowledge work.

Why It Matters

This news is significant because it directly challenges the optimistic narrative surrounding AI’s imminent takeover of knowledge work. Previous benchmarks have often shown impressive, but fleeting, gains. Mercor’s Apex-Agents offers a more realistic assessment, demonstrating that the core challenges of replicating human cognitive abilities – particularly those involved in navigating complex, dynamic professional environments – remain substantial. This has important implications for businesses considering AI adoption, investment strategies, and the broader understanding of AI’s potential. The fact that even the best models struggle with tasks that require sustained, nuanced understanding is a critical warning sign.

You might also be interested in