From the course: Complete Guide to Evaluating Large Language Models (LLMs)
Unlock this course with a free trial
Join today to access over 24,900 courses taught by industry experts.
AIs supervising AIs: LLM as a judge
From the course: Complete Guide to Evaluating Large Language Models (LLMs)
AIs supervising AIs: LLM as a judge
- When evaluating free text, like looking at a model's output, very high-level, broadly speaking, we all have two main options because as we've talked about already, it's quite difficult and at least time-consuming to read through hundreds, thousands, hundreds of thousands, millions, depending on how big your company is, generated AI outputs. It's really easy for an AI to generate an output. It's a lot harder to make sure that the output was decent. So again, broadly speaking, we have two options. We can either ask a human to do it, right? Seems obvious. Ask a human to read it, give them ideally some rubric or criteria to say, "This is what we're looking for. These are bad. These are good. You tell us if this AI output is good or bad." It's expensive. It's not relatively a new industry. We've had things like Mechanical Turk and Scale AI for a while. There's a lot of issues with wrangling humans to do tasks. In general, we are a wily bunch, if you will. But the main issue conditioned…