Tuesday Jul 29, 2025

RAG Is Not Solved – Your Evaluation Just Sucks

RAG Is Not Solved – Your Evaluation Just Sucks

Your RAG pipeline is passing benchmarks, but failing reality. In this episode, Yuval sits down with Niv from AI21 to expose why most RAG evaluation is fundamentally flawed. From overhyped retrieval scores to chunking strategies that collapse under real-world complexity, they break down why your system isn’t as good as you think — and how structured RAG solves problems that traditional pipelines simply can't.

Bonus: what do Seinfeld trivia, World Cup stats, and your enterprise SharePoint have in common? (hint: your RAG pipeline chokes on all of them).

Key Topics:

Why most RAG benchmarks reward the wrong thing (and hide real failures)
The chunking trap: how bad segmentation sabotages good retrieval
When LLMs ace the answer—but your pipeline still fails
Structured RAG: pipeline that solves RAG problem over aggregative data (such as financial reports)
Evaluation tips, tricks, and traps for AI builders

Comment (0)

No comments yet. Be the first to say something!