From the course: Complete Guide to Evaluating Large Language Models (LLMs)

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

Evaluating AI agents: Task automation and tool integration

Evaluating AI agents: Task automation and tool integration

From the course: Complete Guide to Evaluating Large Language Models (LLMs)

Evaluating AI agents: Task automation and tool integration

- For our first case study in our marathon of case studies in this lesson, we're going to turn to a pretty hot topic in AI, AI agents, which are known to be semi-autonomous systems or fully autonomous depending on how good you make them that can interact with an environment, make decisions, and perform tasks on our behalf. Now frankly, I have a whole, you know, several-hour video series on agents by themselves, but I wanted to highlight one of the main case studies that I do in that video series that particularly relates not only to classification metrics but also to the positional bias and generative AI. So the way agents tend to work, and again this is a broad stroke of how agents work, but when you ask an agentic system a question, that agent will have, usually, a toolbox of tools that it can use to answer a question. So if I ask an agent a question like, "Check my stock portfolio," the agent will say, "Okay, well, what tools do I have? I have the ability to Google something. I…

Contents