Ettin Reranker Family Released: State-of-the-Art Components for RAG Systems
6
What is the Viqus Verdict?
We evaluate each news story based on its real impact versus its media hype to offer a clear and objective perspective.
AI Analysis:
The news provides genuinely useful, technical components that improve the reliability of a major AI use case (RAG), making it moderate industry news rather than a paradigm shift.
Article Summary
The latest release introduces six new Ettin Reranker models—spanning sizes from 17M to 1B—built on the robust Ettin ModernBERT encoders. These cross-encoders are designed to address the inherent limitations of vector embedding models, providing superior relevance scoring by jointly encoding the query and document pair. The models are paired with the embedding model `google/embeddinggemma-300m` and perform exceptionally well on the MTEB (English v2) Retrieval benchmark. The guide provides comprehensive details on implementation, showcasing a full retrieve-then-rerank pipeline, and optimizing usage with techniques like using bfloat16 and Flash Attention 2 for significant throughput gains.Key Points
- The new rerankers offer multiple model sizes (17M to 1B) allowing developers to trade off between computational cost and ranking accuracy.
- The architecture supports a 'retrieve-then-rerank' pipeline, where fast embedding models retrieve candidates, and the reranker accurately reorders them for superior final results.
- Performance can be significantly optimized using advanced techniques like bfloat16 and Flash Attention 2, leading to substantial speedups (up to 8.3x).

