ViqusViqus
Navigate
Company
Blog
About Us
Contact
System Status
Enter Viqus Hub

Agentic AI Workflows Speed Up 40% with Persistent WebSocket Connection for LLMs

WebSockets Agentic workflows Responses API LLM inference Latency GPT-5.3-Codex-Spark
April 22, 2026
Source: OpenAI News
Viqus Verdict Logo Viqus Verdict Logo 8
Infrastructure Breakthrough for Autonomous Agents
Media Hype 6/10
Real Impact 8/10

Article Summary

This deep-dive technical article details how the Codex team tackled latency bottlenecks in complex, multi-step AI agentic workflows. Previously, every step required a full synchronous API call, forcing the system to re-process the entire conversation history, which significantly slowed down tasks involving dozens of back-and-forth tool calls. The solution was implementing persistent WebSocket connections and leveraging in-memory state caching for the Responses API. By passing cached state (such as previous response objects and rendered tokens) rather than the full conversation context repeatedly, the overhead was drastically reduced. This optimization allowed them to maintain API stability while enabling extremely fast models like GPT-5.3-Codex-Spark to hit a 1,000+ tokens per second throughput, representing a major leap in real-world agent capability.

Key Points

  • The primary bottleneck for AI agents was not model inference speed, but the cumulative API overhead generated by numerous synchronous calls and re-processing full conversation history.
  • The team transitioned the Responses API to support persistent WebSocket connections, allowing them to cache state in memory and only process new or changed information.
  • This structural change achieved up to a 40% improvement in agentic workflows, enabling models to hit 1,000+ tokens per second in production environments.

Why It Matters

This is critical infrastructure news. The shift from stateless, synchronous API calls to persistent, stateful connections fundamentally changes the engineering feasibility and user experience of advanced AI agents. Previously, complexity was limited by API throughput; now, the limit is closer to the model's raw capacity. For developers building multi-step AI pipelines, this signals a major maturity point, making complex applications reliable and fast enough for true enterprise use cases. It accelerates the entire adoption curve for autonomous AI systems.

You might also be interested in