Tuesday Aug 26, 2025

The Judge Model Diaries: Judging the Judges

Your LLM gave a great answer. But who decides what “great” means?

In this episode, Yuval talks with Noam Gat about judge language models — reward models, critic models, and how LLMs can be trained to rate, rank, and critique each other. They dive into the difference between scoring and feedback, how to use judge models during inference, and why most evaluation benchmarks don’t tell the full story.

Turns out, getting a good answer is easy. Knowing it’s good? That’s the hard part.

Comment (0)

No comments yet. Be the first to say something!