YAAP (Yet Another AI Podcast)

YAAP brings you practical conversations with the people actually building generative AI solutions. No hype, no sales pitches, just honest discussions about challenges, solutions, and lessons learned.
Listen to developers and engineers share what works, what doesn't, and what they wish they'd known sooner. Simple, useful insights for anyone working with AI — hosted by AI21's Yuval Belfer.

Listen on:

Episodes

Tuesday Oct 28, 2025

Scraping Without Getting Sued (Or Falling Asleep)

Tuesday Oct 28, 2025

Everyone (and we do mean EVERYONE) needs data, and the web is the largest database humanity has ever built. But tapping into it at scale requires more than technical skills. If your product touches web data, scraping isn't just a backend task, it can be risky and have real consequences.
In this episode, Yuval sits down with Rony Shalit, Chief Compliance and Ethics Officer at Bright Data, to talk about what can go wrong when you treat data collection as “just an implementation detail”. From lawsuits with Meta and X to wild edge cases and vendor breakdowns, they dive into what it takes to collect data responsibly and stay out of trouble.

Tuesday Aug 26, 2025

The Judge Model Diaries: Judging the Judges

Tuesday Aug 26, 2025

Your LLM gave a great answer. But who decides what “great” means?

In this episode, Yuval talks with Noam Gat about judge language models — reward models, critic models, and how LLMs can be trained to rate, rank, and critique each other. They dive into the difference between scoring and feedback, how to use judge models during inference, and why most evaluation benchmarks don’t tell the full story.

Turns out, getting a good answer is easy. Knowing it’s good? That’s the hard part.

Tuesday Aug 12, 2025

RLVR Lets Models Fail Their Way to the Top

Tuesday Aug 12, 2025

Think you know fine-tuning? If your answer is RLHF, you don’t. In this episode, Itay, who leads the Alignment group at AI21, gives a no-fluff crash course on RLVR (Reinforcement Learning with Verifiable Rewards), the method powering today’s smartest coding and reasoning models. He explains why RLVR beats RLHF at its own game, how “hard to solve, easy to verify” tasks unlock exploration without chaos, and the emergent behaviors you only get when models are allowed to screw up. If you want to actually understand RLVR (and use it), start here.
Key topics:
How RLVR outsmarts RLHF in real-world training
The “verified rewards” trick that kills reward hacking
Emergent skills you don’t get with hand-holding: self-verification, backtracking, multi-path reasoning
Why coding models took a giant leap forward
Practical steps to train (and actually benefit from) RLVR models

Tuesday Jul 29, 2025

RAG Is Not Solved – Your Evaluation Just Sucks

Tuesday Jul 29, 2025

RAG Is Not Solved – Your Evaluation Just SucksYour RAG pipeline is passing benchmarks, but failing reality. In this episode, Yuval sits down with Niv from AI21 to expose why most RAG evaluation is fundamentally flawed. From overhyped retrieval scores to chunking strategies that collapse under real-world complexity, they break down why your system isn’t as good as you think — and how structured RAG solves problems that traditional pipelines simply can't. Bonus: what do Seinfeld trivia, World Cup stats, and your enterprise SharePoint have in common? (hint: your RAG pipeline chokes on all of them).Key Topics:Why most RAG benchmarks reward the wrong thing (and hide real failures)The chunking trap: how bad segmentation sabotages good retrievalWhen LLMs ace the answer—but your pipeline still failsStructured RAG: pipeline that solves RAG problem over aggregative data (such as financial reports)Evaluation tips, tricks, and traps for AI builders

Tuesday Jul 15, 2025

The Call Is Coming From Inside the Agent (And It Has Your Credentials)

Tuesday Jul 15, 2025

The Call Is Coming From Inside the Agent (And It Has Your Credentials)You’ve shipped your first agent. It works. It’s useful. It might also be a security liability you don’t even know about. In this episode, Yuval talks to Zenity CTO Michael Bargury about how easy it is to hijack popular agent systems like Copilot and Cursor, what “zero-click” attacks look like in the agent era, and how to monitor, constrain, and secure your AI Agent in production. From sneaky prompt injections to memory-based persistence and infected multi-agent workflows, this is the “oh no” moment every builder needs.Key Topics:Why “ignore previous instructions” still works better than it shouldHow one agent goes rogue… and infects the othersReal-world attacks: social media triggers, CRM leaks, and logic bombsObservability 101 for AI: logs, reasoning traces, and root cause sanityThe new rule: build like it will go rogue—because one day it will

Tuesday Jul 01, 2025

Building Enterprise RAG: Lessons from 2+ Years of Production Deployments

Tuesday Jul 01, 2025

Building production AI systems is hard — especially when you're pioneering entirely new categories. In this episode, Yuval speaks with Guy Becker, Group Product Manager at AI21, to trace the evolution from task-specific models to Agent planning and orchestration systems. Guy shares hard-won lessons from building some of the first RAG-as-a-service offerings when there were literally zero handbooks to follow.Key Topics:Task-specific models vs. general LLMs: Why focused, smaller models with pre and post-processing beat general purpose LLMs for business use cases.Building RAG before it was cool: Creating one of the first RAG-as-a-service platforms in early 2023 without any established patterns.The one-size-fits-all problem: Why chunking strategies, embedding models, and retrieval parameters need customization per use case.From SaaS to on-prem: Scaling deployment models for enterprise customers with sensitive data.When RAG breaks down: Multi-hop queries, metadata filtering, and why semantic search isn't always enough.Multi-agent orchestration: How AI21 Maestro uses automated planning to break complex queries into parallelizable subtasks.Production lessons: Evaluation strategies, quality guarantees, and building explainable AI systems for enterprise..

Thursday Jun 19, 2025

Trailer

Thursday Jun 19, 2025

Tuesday Jun 17, 2025

You Can’t Have an Agent Without a Plan: What 90% of ’Agents’ Are Missing

Tuesday Jun 17, 2025

Everyone's talking about AI agents, but most of what we call "agents" are just workflows in disguise. Real autonomous agents require planning. And that, changes everything. In this episode, Yuval speaks with AI21's Algo Tech Lead, Nitzan Cohen about why the popular React framework isn't enough and how planning architecture unlocks true agent capabilities.Key Topics:1. The difference between workflows/chains and real autonomous agents2. Why React agents fail at complex tasks, parallel execution, and user transparency3. Free text vs. code-based planning approaches and their trade-offs4. How planning enables multi-agent systems and model delegation5. Training planners with reinforcement learning and replanning mechanisms6. Evaluation challenges: Gaia benchmark, Agent Bench, and building custom datasets7. Practical advice: When to upgrade from React and which frameworks to useFrom competitive analysis that runs in parallel to breaking down complex coding tasks, discover how planning transforms AI agents from simple tool-calling loops into sophisticated problem-solving systems.

Tuesday Jun 10, 2025

The Hard Truths About AI Agents: Why Benchmarks Lie and Frameworks Fail

Tuesday Jun 10, 2025

Building AI agents that actually work is harder than the hype suggests — and most people are doing it wrong. In this special "YAAP: Unplugged" episode (a live panel from AI Tinkerers meetup at the Hugging Face offices in Paris), Yuval sits down with Aymeric Roucher (Project Lead for Agents at Hugging Face) and Niv Granot (Algorithms Group Lead at AI21 Labs) for an unfiltered discussion about the uncomfortable realities of agent development.Key Topics:Why current benchmarks are broken: From MMLU's limitations to RAG leaderboards that don't reflect real-world performanceThe tool use illusion: Why 95% accuracy on tool calling benchmarks doesn't mean your agent can actually planLLM-as-a-judge problems: How evaluation bottlenecks are capping progress compared to verifiable domains like codingFramework: friend or foe? When to ditch LangChain, LlamaIndex, and why minimal implementations often work betterThe real agent stack: MCP, sandbox environments, and the four essential components you actually needBeyond the hype cycle: From embeddings that can't distinguish positive from negative numbers to what comes after agentsFrom FIFA World Cup benchmarks that expose retrieval failures to the circular dependency problem with LLM judges, this conversation cuts through the marketing noise to reveal what it really takes to build agents that solve real problems — not just impressive demos.Warning: Contains unpopular opinions about popular frameworks and uncomfortable truths about the current state of AI agent development.

Thursday May 29, 2025

Tool Calling 2.0: How MCP Is Standardizing AI Connections

Thursday May 29, 2025

MCP (Model Context Protocol) is changing how developers connect AI applications to external tools – but what exactly is it, and why should you care? In this episode, Yuval speaks with Etan Grundstein, Technical Product Manager (and formerly Director of Engineering) at AI21, to break down the protocol that’s standardizing AI integrations, moving beyond basic weather APIs and calculators to real-world productivity workflows.Key Topics:1) What MCP actually is and how it differs from traditional tool calling2) Real-world examples: Connecting AI to Jira, Notion, Git, and even Blender3) The evolution from local MCP servers to cloud integrations4) Authentication challenges and how they’re being addressed5) Why developers are building MCP servers to build other MCP servers6) Looking ahead: Agent-to-Agent protocols and what comes next