Galileo’s cover photo
Galileo

Galileo

Software Development

Burlingame, California 18,902 followers

Galileo is the leading AI Reliability + Evaluation platform that helps teams of all sizes build AI apps they can trust.

About us

Galileo is the leading platform for enterprise GenAI evaluation and observability. Our comprehensive suite of products support builders across the new AI development workflow—from fine-tuning LLMs to developing, testing, monitoring, and securing their AI applications. Each product is powered by our research-backed evaluation metrics. Today, Galileo is used by 100s of AI teams from startups to Fortune 50 enterprises, including Twilio, Comcast, and HP.

Website
https://www.galileo.ai
Industry
Software Development
Company size
51-200 employees
Headquarters
Burlingame, California
Type
Privately Held
Founded
2021

Locations

Employees at Galileo

Updates

  • View organization page for Galileo

    18,902 followers

    📅 Don't miss our workshop with CrewAI tomorrow on Evaluating Multi-Agent Systems 👇 Last-minute registrations here: https://lnkd.in/dW8tVJTN

    View organization page for CrewAI

    84,581 followers

    Learning how to instrument, monitor, and evaluate multi-agent systems is just one click away. TOMORROW: Our Director of Product Marketing, Shane Johnson, and Galileo’s Product Manager, Xian Ke, will discuss must-know tips for building production-ready agents. ➡️An #AI eval playbook purpose-built for multi-agent challenges. ➡️How to trace root causes across agent handoffs with session, step, and system-level metrics. ➡️How to use our orchestration framework with Galileo's observability platform to create reliable multi-agent systems. 🕒 Last chance to save your seat: https://lnkd.in/dW8tVJTN

    • No alternative text description for this image
  • Galileo reposted this

    📣 I'M HIRING 📣 I joined Galileo a few weeks ago and I'm already doing one of my favorite things: building a team. 🚀 Looking for a Developer Advocate who loves building AI agents, creating technical content, and actually enjoys explaining complex workflows until they click. 📍 Bay Area based 🔗 Apply here: https://lnkd.in/gt8tdNd2 You'd be joining as one of the first 100 Galileans and helping shape how developers build reliable AI apps. Questions? DM me — happy to chat about the role or what it's like here so far. #DevRel #AIJobs #Startups #GenerativeAI #LLM #AI #Evals

    • No alternative text description for this image
  • Galileo reposted this

    Evals aren't unit tests. The non-deterministic nature of AI models creates new failure patterns (and improvement opportunities!)—which means eval infrastructure needs intentionality about how it evolves over time. Tomorrow on the Chain of Thought Podcast, I sat down with Vikram Chatterji (Galileo CEO) to dig into how evaluation engineering has become a discipline and: → Why generic evals plateau at 70% accuracy (and why that's unshippable) → How one Fortune 50 company went from ~$25M to $800K in guardrail costs while scaling 1 agent to 50 → What kills internal eval infrastructure and how to avoid it → The engineering decisions behind turning evals into production guardrails "Today's evals become tomorrow's guardrails" sounds simple, but executing it means solving drift detection, ensuring reliability, fine-tuning economics, and encoding domain expertise at scale. While I've recently joined Modular to lead technical ecosystem, I'm delighted to have Galileo continuing as the presenting sponsor of Chain of Thought's Season 3 (of which this is episode two) and we've got a lot more exciting conversations ahead. Coming soon: - Angie Jones (VP Engineering at Block) on deploying AI agents to 12,000 employees in 8 weeks - Fergal Reid (PhD) (Chief AI Officer of Intercom) on why and how they reversed their fine-tuning strategy - we just recorded a great deep dive today! 🎙 But first, tune in Wednesday for the full conversation with Vikram! Who else should we feature on Season 3? #AIEngineering #ProductionAI #EvalEngineering #ChainOfThought

    • No alternative text description for this image
  • ⏰ Our course on eval engineering is live! Join our Principle Dev Advocate, Jim Bennett, to learn what evals are, how you can use simple evals to detect issues in an AI application, and get hands-on adding an eval to an app Watch it live on YouTube here: https://lnkd.in/gRyBWcVr

    View organization page for Galileo

    18,902 followers

    90% of agents fail to successfully reach production. The biggest reason? AI engineers building these apps don't have a clear way of evaluating that these agents are doing what they should do, and using the results of this evaluation to fix them. This is why our Principal Dev Advocate, Jim Bennett, is hosting a free 5-part course to help AI engineers become experts in eval engineering, with the first lesson starting on December 9th at 9:00am PT. This first lesson covers: ✅ What evals are ✅ How you can use simple evals to detect issues in an AI application ✅ Hands-on training on how to add evals to an app Register with the link below, and we hope to see you there 👇

    • No alternative text description for this image
  • Learn Eval Engineering in this free, 5-part, hands-on course. 90% of AI agents don't make it successfully to production. The biggest reason is the AI engineers building these apps don't have a clear way of evaluating that these agents are doing what they should do, and using the results of this evaluation to fix them. In this course, you will learn all about evals for AI applications. You'll start with some out-of-the-box metrics and learn about evals, then move onto understanding observability for AI apps, analyzing failure states, defining custom metrics, then finally using these across your whole SDLC. This will be hands on, so be prepared to write some code, create some metrics, and do some homework! In this first lesson, you will: - Learn what evals are - Learn how you can use simple evals to detect issues in an AI application - Get hands on adding an eval to an app Prerequisites: - A basic knowledge of Python - Access to an OpenAI API key - A free Galileo account (we will be using Galileo as the evals platform) Future lessons Stay tuned for links to future lessons!

    Eval Engineering for AI Developers - Lesson 1

    Eval Engineering for AI Developers - Lesson 1

    www.linkedin.com

  • Galileo reposted this

    #AI peeps, come join me as I teach eval engineering. Join either here, or sign up on our Luma event page to join on YouTube. https://luma.com/6q19vpzb

    View organization page for Galileo

    18,902 followers

    Learn Eval Engineering in this free, 5-part, hands-on course. 90% of AI agents don't make it successfully to production. The biggest reason is the AI engineers building these apps don't have a clear way of evaluating that these agents are doing what they should do, and using the results of this evaluation to fix them. In this course, you will learn all about evals for AI applications. You'll start with some out-of-the-box metrics and learn about evals, then move onto understanding observability for AI apps, analyzing failure states, defining custom metrics, then finally using these across your whole SDLC. This will be hands on, so be prepared to write some code, create some metrics, and do some homework! In this first lesson, you will: - Learn what evals are - Learn how you can use simple evals to detect issues in an AI application - Get hands on adding an eval to an app Prerequisites: - A basic knowledge of Python - Access to an OpenAI API key - A free Galileo account (we will be using Galileo as the evals platform) Future lessons Stay tuned for links to future lessons!

    Eval Engineering for AI Developers - Lesson 1

    Eval Engineering for AI Developers - Lesson 1

    www.linkedin.com

  • Galileo reposted this

    For enterprise AI deployments, latency consistency matters as much as model accuracy. A client-side load-balancing approach backed by Redis and atomic scoring eliminates the GPU scheduling blind spots that typically cause tail latency spikes. The result: significantly higher utilization, stable P99s, and no additional servers. For any org scaling agent-based systems, this is the level of optimization required to keep performance and economics aligned. Galileo #genai #agents #llm

    View organization page for Galileo

    18,902 followers

    For runtime agent observability, we had to solve the GenAI inference problem of poor GPU utilization and unpredictable latency. Here's how our engineer Lev N. did it 👇 Lev's solution: client-side load balancing backed by Redis & Lua scripting: – Clients compute a load score for each inference request (based on payload size) – Redis maintains a real-time view of GPU fleet load using sorted sets – Lua scripts ensure atomic operations, pick the least busy GPU and increment its score simultaneously – A background reconciler handles failures and keeps scores accurate The results: 📈 ~40% increase in average GPU utilization 📉 70% reduction in tail latency ☑️ Same infrastructure, zero additional servers For our customers who run millions of agent logs per month and depend on Galileo for runtime intervention at scale, this has been critical. Read more about it in Lev’s blog below 👇

    • No alternative text description for this image
  • Galileo reposted this

    This year Expedia Group rolled out new APIs + GenAI “Trip Matching” that turns social travel inspiration into bookable itineraries, a perfect example of how AI is moving from chat → actions → revenue. This is where the industry is heading: Not another chatbot. Not more demos. Real workflows, plugged into real systems, driving real transactions. At Galileo we’re building the infra that makes that possible, fast iteration, evals, data integration, safety, reliability at scale. If you’re building agentic products and need to go from prototype → production without breaking things, DM me. Let’s ship. #genai #infra #aiagents #product

  • Galileo reposted this

    > best production agents are deliberately constrained 68% of agents run ≤10 steps before human intervention 70% use off-the-shelf models with careful prompting 85% are custom-built, avoiding framework abstractions 74% rely on human evaluation, not automated benchmarks 80% use static workflows, not open-ended autonomy This landmark empirical study "Measuring Agents in Production" surveys 306 practitioners and conducts 20 in-depth interviews across 26 application domains to understand how AI agents actually work in production. The core insight is that successful production teams deliberately trade capability for controllability. They accept limited autonomy in exchange for reliability, using constrained architectures with predefined workflows rather than open-ended autonomous agents. Despite the common claim that "95% of agent deployments fail," this research demonstrates that practitioners can successfully deploy reliable systems that serve real users despite environmental and operational constraints. Rather than solving reliability through better models, successful teams constrain the problem: 1. Limited autonomy: Fewer steps = fewer failure points 2. Predefined workflows: Static paths are predictable and debuggable 3. Human checkpoints: Regular intervention catches errors before cascading 4. Constrained tools: Limited tool access reduces attack surface 5. Domain specialization: Narrow scope enables thorough testing Does this match your experience? #agents #llm #gpt #ml

  • Microsoft's VP of Developer Relations, Patrick Chanezon's spicy take from #MSIgnite? AI's architecture is about to flip from centralized to client-server. The last four years have been the mainframe era of AI, characterized by massive, centralized models with applications that primarily interact with them. The next phase? Local models running on your machine, calling server models only when specific tasks demand it. Watch Patrick's full take 👇

Similar pages

Browse jobs