/ THE CORE
Viqus AI Blog

THE CORE

Deep dives into production AI, LLM infrastructure, and the engineering decisions that separate demos from real products.

Illustration comparing traditional product management workflows with AI-first product management practices
Strategy Product Management AI Strategy
The Rise of AI-First Product Roles: What PMs Need to Unlearn
The product management playbook that worked for the last decade doesn't translate cleanly to AI products. Here's what's changing — and what experienced PMs need to let go of to stay effective.
PS
Pablo Serrano
Apr 9, 2026
5 min
Framework diagram showing cost inputs, value outputs, and attribution flows for measuring AI project ROI
Strategy ROI AI Strategy
Measuring ROI on AI Projects: A Framework That Survives Executive Scrutiny
Every CFO wants to know what their AI spend is buying. Most engineering teams don't have a good answer. Here's a framework that turns fuzzy AI outcomes into numbers the business can actually defend.
PS
Pablo Serrano
Apr 6, 2026
5 min
Comparison diagram showing a RAG pipeline versus a long-context single-shot approach with tradeoffs highlighted
Deep Dive RAG Long Context
Long Context or RAG? Rethinking Retrieval in the Million-Token Era
With million-token context windows now standard, the question 'do I still need RAG?' is getting louder. The answer is yes — but the reasons have changed, and so should your architecture.
PS
Pablo Serrano
Apr 2, 2026
5 min
Dashboard showing an evaluation suite with pass rates across multiple tasks and models
Deep Dive Evaluation LLM Evals
Your Eval Suite Is Your Moat: Treating Evaluation as Product IP
Models are becoming commodities. Prompts are shared freely. What stays is the curated set of evaluations that defines what 'good' means for your specific product — and it's the most undervalued asset most teams own.
PS
Pablo Serrano
Mar 30, 2026
5 min
Illustration of a prompt injection attack flowing from a retrieved document through an LLM to an unauthorized tool call, with defensive layers in place
Engineering Security Prompt Injection
Prompt Injection Defense: A Practitioner's Playbook
Prompt injection is the security vulnerability that doesn't go away. It can't be fully solved — but it can be meaningfully contained, and the gap between teams that take it seriously and those that don't is widening.
PS
Pablo Serrano
Mar 26, 2026
5 min
Flow diagram showing a router directing different request types to different LLMs based on complexity and cost
Strategy Model Routing Cost Optimization
Model Routing in 2026: Why Shipping a Single Model Is Already Obsolete
The fastest, cheapest, most reliable LLM applications don't run on one model. They run on a carefully tuned cascade — and the routing layer is where the magic happens.
PS
Pablo Serrano
Mar 23, 2026
5 min
Layered architecture diagram showing input guardrails, model inference, and output guardrails around an LLM
Engineering Guardrails AI Safety
Guardrails That Hold: Input and Output Safety for Production LLMs
Every production LLM system needs guardrails. The question isn't whether — it's which ones, where to put them, and how to know they're working.
PS
Pablo Serrano
Mar 19, 2026
5 min
Diagram showing an LLM gateway sitting between application services and multiple model providers with routing, logging, and rate limiting
Engineering LLM Gateway Infrastructure
LLM Gateways: The Abstraction Layer Every AI Team Eventually Builds
Routing, rate limiting, fallbacks, cost tracking, audit logging. Sooner or later every team building on LLMs ends up with a gateway — and the ones that build it intentionally save themselves months of pain.
PS
Pablo Serrano
Mar 16, 2026
5 min
Architecture diagram showing a Mixture of Experts model with a router selecting a subset of experts for each token
AI Research Mixture of Experts MoE
Mixture of Experts in Production: Why Active Parameters Matter More Than Total
The 120-billion-parameter model that actually runs like a 12-billion-parameter one. MoE architectures have quietly taken over the frontier — here's what that means for anyone deploying them.
PS
Pablo Serrano
Mar 12, 2026
4 min
Visualization of a KV cache growing in GPU memory as the context window increases, with compression layers applied
Deep Dive KV Cache Inference
The KV Cache: The Hidden Bottleneck of Long-Context LLMs
Context windows keep getting bigger, but the real constraint on long-context performance isn't the model — it's the memory footprint of the KV cache. Here's what that means for anyone running LLMs at scale.
PS
Pablo Serrano
Mar 9, 2026
4 min
Diagram of an agent dynamically retrieving a subset of tool definitions from a larger tool registry based on the current task
Engineering AI Agents Tool Use
Dynamic Tool Retrieval: How Agents Scale Past 50 Tools
Stuffing every tool definition into the prompt stops working around 15–20 tools. Here's how modern agents discover and load only what they need — and why this pattern is becoming standard.
PS
Pablo Serrano
Mar 5, 2026
4 min
Diagram showing layered context construction with system instructions, retrieved documents, tool outputs, and conversation history
Deep Dive Context Engineering Prompt Engineering
Context Engineering: The New Discipline Replacing Prompt Engineering
The craft of writing clever prompts is giving way to something bigger: designing the entire information environment a model sees. Here's what that shift looks like in practice.
PS
Pablo Serrano
Mar 2, 2026
4 min
Flowchart showing different LLM caching strategies and their decision points
Engineering Caching Cost Optimization
LLM Caching Strategies: Cut Your Inference Bill Without Cutting Corners
Caching for LLMs is more nuanced than traditional caching. Here's how to implement semantic, prefix, and response caching to reduce costs by 30–50% while maintaining quality.
PS
Pablo Serrano
Feb 26, 2026
4 min
UI mockups showing different AI integration patterns beyond traditional chat interfaces
Strategy Product Design UX
Designing AI-Native Applications: Beyond the Chat Interface
Chat was the first AI interface. It won't be the last. The next generation of AI products weaves intelligence into the workflow itself — and that requires a different design philosophy.
PS
Pablo Serrano
Feb 24, 2026
4 min
Visualization of high-dimensional embedding space with clustered document vectors
Deep Dive Embeddings RAG
Embedding Models: The Unsung Heroes of Every AI Application
Everyone talks about language models. Almost nobody talks about embedding models — even though they quietly determine the quality of search, RAG, and classification systems.
PS
Pablo Serrano
Feb 21, 2026
4 min
Diagram of an AI pipeline with error handling, monitoring, and fallback paths highlighted
Engineering Reliability AI Pipelines
Building Reliable AI Pipelines: Lessons from 100 Deployments
AI pipelines fail in ways traditional software doesn't. Here are the patterns, guardrails, and testing strategies that keep production AI systems running smoothly.
PS
Pablo Serrano
Feb 20, 2026
4 min
Simplified flowchart of EU AI Act risk classification for AI products
Policy EU AI Act Regulation
The EU AI Act in Practice: What Builders Actually Need to Do in 2026
The EU AI Act is no longer theoretical. Here's a practical guide to what it means for teams shipping AI products in Europe — without the legal jargon.
PS
Pablo Serrano
Feb 17, 2026
4 min
Evolution timeline showing prompt engineering from simple instructions to systematic engineering
Deep Dive Prompt Engineering LLMs
Prompt Engineering Is Dead. Long Live Prompt Engineering.
Reports of prompt engineering's death are greatly exaggerated. What's changing is what it means — from artisanal craft to systematic engineering discipline.
PS
Pablo Serrano
Feb 15, 2026
4 min
Comparison matrix of vector databases showing performance, cost, and feature dimensions
Engineering Vector Databases RAG
Vector Databases in 2026: A Practical Comparison for Production Teams
The vector database market has matured. Here's a grounded comparison based on real-world performance, cost, and operational complexity — not vendor marketing.
PS
Pablo Serrano
Feb 13, 2026
4 min
Visual comparison of different chunking strategies applied to the same document
Engineering RAG Chunking
RAG Chunking Strategies That Actually Work in 2026
Chunking is the most unglamorous and most impactful part of any RAG system. Here's what we've learned about doing it right — and the mistakes that silently destroy retrieval quality.
PS
Pablo Serrano
Feb 11, 2026
4 min
Side-by-side comparison infographic of open-source and closed-source LLM trade-offs
Strategy Open Source LLMs
Open-Source vs. Closed LLMs: An Honest Comparison for Production Teams
The open-source vs. proprietary debate generates more heat than light. Here's a clear-eyed comparison based on what actually matters in production.
PS
Pablo Serrano
Feb 10, 2026
4 min
Comparison chart showing the gap between benchmark scores and real-world task performance for different models
Deep Dive Evaluation LLMs
Evaluating LLMs Beyond Benchmarks: What Leaderboards Won't Tell You
Leaderboard scores tell you which model is best at benchmarks. They tell you almost nothing about which model is best for your application.
PS
Pablo Serrano
Feb 7, 2026
4 min
Code review interface showing AI-generated suggestions alongside developer comments
Engineering Code Review Developer Tools
AI-Powered Code Review: Building a Pipeline That Developers Actually Trust
LLMs can review code, but most AI code review tools generate more noise than signal. Here's how to build a review pipeline developers won't ignore.
PS
Pablo Serrano
Feb 5, 2026
4 min
Dashboard showing LLM monitoring metrics including latency, quality scores, and cost tracking
Engineering Observability Monitoring
The LLM Observability Stack You Actually Need
You can't improve what you can't measure. Here's how to build an LLM monitoring stack that catches problems before your users do.
PS
Pablo Serrano
Feb 3, 2026
4 min
Decision tree diagram comparing build vs buy paths for AI infrastructure
Strategy Build vs Buy AI Strategy
Build vs. Buy in AI: A Decision Framework for Engineering Leaders
The AI tooling landscape is exploding. Knowing when to use a vendor and when to build in-house is the highest-leverage decision an engineering leader makes today.
PS
Pablo Serrano
Jan 30, 2026
4 min
Stacked bar chart showing the breakdown of production AI costs across different categories
Deep Dive Cost Optimization Production AI
The Real Cost of Running AI in Production: A Complete Breakdown
API costs are just the beginning. We break down every line item in a production AI budget — from inference to evaluation to the ops costs nobody talks about.
PS
Pablo Serrano
Jan 27, 2026
4 min
Flowchart showing a lightweight AI governance process for startup teams
Strategy AI Governance Startups
AI Governance for Startups: A Practical Guide That Won't Slow You Down
Governance doesn't have to mean bureaucracy. Here's a lightweight framework for responsible AI that actually works for teams moving fast.
PS
Pablo Serrano
Jan 25, 2026
4 min
Size comparison visualization of different language models with performance and cost metrics
AI Research Small Models Cost Optimization
Small Language Models: When Less Parameters Means More Value
The race to build bigger models dominated 2024. In 2026, the smartest teams are asking a different question: what's the smallest model that gets the job done?
PS
Pablo Serrano
Jan 22, 2026
4 min
Architecture diagram showing MCP protocol connecting an LLM to multiple external services
Deep Dive MCP Tool Use
MCP and Tool Use: The Protocol Layer That Makes Agents Useful
The Model Context Protocol is quietly becoming the USB-C of AI integrations. Here's what it actually does, why it matters, and the patterns emerging around it.
PS
Pablo Serrano
Jan 19, 2026
4 min
Pipeline diagram showing synthetic data generation, filtering, and training stages
Engineering Synthetic Data Training
Synthetic Data for Model Training: A Practitioner's Playbook
Generating training data with LLMs is now mainstream. But the difference between useful synthetic data and expensive noise comes down to a handful of design decisions.
PS
Pablo Serrano
Jan 15, 2026
4 min
Split diagram comparing fine-tuning pipeline and prompt engineering workflow
Deep Dive Fine-Tuning Prompt Engineering
Fine-Tuning vs. Prompting in 2026: When Each Actually Makes Sense
The decision between fine-tuning and prompt engineering has changed dramatically. Here's a clear framework based on cost, performance, and maintenance burden.
PS
Pablo Serrano
Jan 12, 2026
4 min
Diagram of an AI agent loop with tool calls, memory, and error handling layers
Engineering AI Agents Production AI
What We Learned Deploying AI Agents in Production for 12 Months
AI agents are no longer a demo-day novelty. After a year of real-world deployments, here are the patterns that work — and the ones that quietly fail at scale.
PS
Pablo Serrano
Jan 7, 2026
4 min
Venn diagram showing the overlap and differences between semantic and keyword search
Deep Dive Search RAG
AI Search vs. Traditional Search: When Vectors Beat Keywords (and Vice Versa)
Semantic search isn't always better than keyword search. Understanding when to use each — and how to combine them — is the key to building search that actually works.
PS
Pablo Serrano
Jan 5, 2026
4 min
Diagram showing different input modalities (text, image, audio, document) converging into a unified AI system
AI Research Multimodal AI Computer Vision
Multimodal AI in Production: What's Ready, What's Not, and What's Next
Vision, audio, and document understanding have joined text in production AI. Here's a grounded assessment of multimodal capabilities and where they deliver real value today.
PS
Pablo Serrano
Jan 3, 2026
4 min

No posts in this category yet.