ViqusViqus
Navigate
Company
Blog
About Us
Contact
System Status
Enter Viqus Hub

Benchmark Battle: Qwen 3.6 vs. Claude Opus 4.7 for Creative Generative Tasks

Qwen3.6-35B-A3B Claude Opus 4.7 pelican benchmark generative AI large language models SVG generation
April 16, 2026
Source: Simon Willison
Viqus Verdict Logo Viqus Verdict Logo 5
Efficient Scale Wins Over Raw Power
Media Hype 3/10
Real Impact 5/10

Article Summary

Using the idiosyncratic 'pelican on a bicycle' benchmark, Simon Willison compared the output of Qwen3.6-35B-A3B and Anthropic's Claude Opus 4.7. While noting that the benchmark is intentionally absurd, the author finds a direct, albeit loose, correlation between generative quality and general usefulness. The comparison shows that while Opus 4.7 is state-of-the-art, the smaller, quantized Qwen model currently wins on specific SVG generation tasks, suggesting that resource-constrained local models might outperform massive, proprietary cloud APIs for certain types of creative output. The piece concludes with a reflection on the difficulty of benchmarking LLMs, asserting that the utility of these models is no longer simply tied to the absurdity of the comparison task.

Key Points

  • Qwen 3.6-35B-A3B, run locally on consumer hardware, currently produced better results for SVG illustration than Anthropic's powerful Claude Opus 4.7.
  • The article underscores the difficulty in establishing a reliable 'utility' metric for modern LLMs, as the connection between benchmark quality and real-world application is weakening.
  • While proprietary cloud models (like Opus 4.7) set the high bar, smaller, quantized models are proving surprisingly effective for specific, creative, resource-bound outputs.

Why It Matters

This post is not a structural shift, but it is highly insightful for developers and technical professionals. It challenges the prevailing assumption that the largest, most expensive, proprietary models (like Claude Opus) are inherently superior for all tasks. Instead, it validates the growing trend toward powerful, optimized, and locally run smaller models (SLMs). Companies building applications should pay closer attention to the performance-per-resource-unit of models like Qwen over merely chasing the highest benchmark score from the largest available API. This shifts focus from pure scale to efficiency and deployability.

You might also be interested in