AI Still Struggles with White-Collar Realities, New Benchmark Reveals

Artificial Intelligence AI Knowledge Work Investment Banking Tech Benchmarks Mercor Apex Agents TechCrunch

January 22, 2026

Source: TechCrunch AI

Reality Check

Media Hype 6/10

Real Impact 7/10

What is the Viqus Verdict?

We evaluate each news story based on its real impact versus its media hype to offer a clear and objective perspective.

AI Analysis:

While AI is undoubtedly advancing rapidly, the Apex-Agents benchmark’s findings highlight a critical reality check, tempering inflated expectations. The score represents a key indicator for the field, and this level of performance suggests significant development is still needed before AI can truly dominate white-collar professions.

Article Summary

New research from Mercor reveals a persistent gap between the capabilities of leading AI models and the demands of professional white-collar work. The Apex-Agents benchmark, designed to mirror the intricate processes of consulting, investment banking, and legal professions, consistently shows models struggling with tasks requiring multi-domain reasoning, information tracking across multiple tools, and nuanced understanding of professional workflows. While models like Gemini 3 Flash and GPT-5.2 achieved some success – around 24% accuracy – the vast majority of queries still resulted in incorrect answers or no response. This is largely attributed to the models' difficulty in replicating the way humans operate across diverse tools and information sources – Slack, Google Drive, etc. – highlighting a key limitation in the current state of AI’s ability to truly replace complex knowledge work. The benchmark's focus on sustained, high-value tasks, rather than broad general knowledge, pushes AI systems to a level of performance that remains significantly below human professionals.

Key Points

AI models consistently score poorly (around 24% accuracy) on complex tasks mimicking professional white-collar work.
The Apex-Agents benchmark, designed to reflect real-world professional workflows, reveals a significant challenge for current AI in replicating multi-domain reasoning and information tracking.
Despite recent advancements, AI’s inability to seamlessly operate across diverse tools and information sources – like Slack and Google Drive – remains a substantial hurdle to automating sophisticated knowledge work.

Why It Matters

This news is significant because it directly challenges the optimistic narrative surrounding AI’s imminent takeover of knowledge work. Previous benchmarks have often shown impressive, but fleeting, gains. Mercor’s Apex-Agents offers a more realistic assessment, demonstrating that the core challenges of replicating human cognitive abilities – particularly those involved in navigating complex, dynamic professional environments – remain substantial. This has important implications for businesses considering AI adoption, investment strategies, and the broader understanding of AI’s potential. The fact that even the best models struggle with tasks that require sustained, nuanced understanding is a critical warning sign.

AI Still Struggles with White-Collar Realities, New Benchmark Reveals

What is the Viqus Verdict?

Article Summary

Key Points

Why It Matters

You might also be interested in

Samsung & OpenAI Forge Chip Supply Deal

Tech Chaos: Outages, Hacks, and AI Security Risks Dominate the Week

Sandbar's Stream Ring: A Quiet Interface for AI Capture