Galileo

Software Development

Burlingame, California 18,902 followers

Galileo is the leading AI Reliability + Evaluation platform that helps teams of all sizes build AI apps they can trust.

View all 144 employees

About us

Galileo is the leading platform for enterprise GenAI evaluation and observability. Our comprehensive suite of products support builders across the new AI development workflow—from fine-tuning LLMs to developing, testing, monitoring, and securing their AI applications. Each product is powered by our research-backed evaluation metrics. Today, Galileo is used by 100s of AI teams from startups to Fortune 50 enterprises, including Twilio, Comcast, and HP.

Website: https://www.galileo.ai
External link for Galileo
Industry: Software Development
Company size: 51-200 employees
Headquarters: Burlingame, California
Type: Privately Held
Founded: 2021

Locations

Primary

287 Lorton Ave

Burlingame, California 94010, US

Get directions
Brooklyn, New York, US

Get directions
Bengaluru, IN

Get directions

Employees at Galileo

See all employees

Updates

Galileo

18,902 followers
22h
Report this post
📅 Don't miss our workshop with CrewAI tomorrow on Evaluating Multi-Agent Systems 👇 Last-minute registrations here: https://lnkd.in/dW8tVJTN
CrewAI

84,581 followers
23h

Learning how to instrument, monitor, and evaluate multi-agent systems is just one click away. TOMORROW: Our Director of Product Marketing, Shane Johnson, and Galileo’s Product Manager, Xian Ke, will discuss must-know tips for building production-ready agents. ➡️An #AI eval playbook purpose-built for multi-agent challenges. ➡️How to trace root causes across agent handoffs with session, step, and system-level metrics. ➡️How to use our orchestration framework with Galileo's observability platform to create reliable multi-agent systems. 🕒 Last chance to save your seat: https://lnkd.in/dW8tVJTN
Like Comment Share
Galileo reposted this
Gabriela de Queiroz
1d
Report this post
📣 I'M HIRING 📣 I joined Galileo a few weeks ago and I'm already doing one of my favorite things: building a team. 🚀 Looking for a Developer Advocate who loves building AI agents, creating technical content, and actually enjoys explaining complex workflows until they click. 📍 Bay Area based 🔗 Apply here: https://lnkd.in/gt8tdNd2 You'd be joining as one of the first 100 Galileans and helping shape how developers build reliable AI apps. Questions? DM me — happy to chat about the role or what it's like here so far. #DevRel #AIJobs #Startups #GenerativeAI #LLM #AI #Evals
46 Comments

Like Comment Share
Galileo reposted this
Conor Bronsdon
1d
Report this post
Evals aren't unit tests. The non-deterministic nature of AI models creates new failure patterns (and improvement opportunities!)—which means eval infrastructure needs intentionality about how it evolves over time. Tomorrow on the Chain of Thought Podcast, I sat down with Vikram Chatterji (Galileo CEO) to dig into how evaluation engineering has become a discipline and: → Why generic evals plateau at 70% accuracy (and why that's unshippable) → How one Fortune 50 company went from ~$25M to $800K in guardrail costs while scaling 1 agent to 50 → What kills internal eval infrastructure and how to avoid it → The engineering decisions behind turning evals into production guardrails "Today's evals become tomorrow's guardrails" sounds simple, but executing it means solving drift detection, ensuring reliability, fine-tuning economics, and encoding domain expertise at scale. While I've recently joined Modular to lead technical ecosystem, I'm delighted to have Galileo continuing as the presenting sponsor of Chain of Thought's Season 3 (of which this is episode two) and we've got a lot more exciting conversations ahead. Coming soon: - Angie Jones (VP Engineering at Block) on deploying AI agents to 12,000 employees in 8 weeks - Fergal Reid (PhD) (Chief AI Officer of Intercom) on why and how they reversed their fine-tuning strategy - we just recorded a great deep dive today! 🎙 But first, tune in Wednesday for the full conversation with Vikram! Who else should we feature on Season 3? #AIEngineering #ProductionAI #EvalEngineering #ChainOfThought
2 Comments

Like Comment Share
Galileo

18,902 followers
2d
Report this post
⏰ Our course on eval engineering is live! Join our Principle Dev Advocate, Jim Bennett, to learn what evals are, how you can use simple evals to detect issues in an AI application, and get hands-on adding an eval to an app Watch it live on YouTube here: https://lnkd.in/gRyBWcVr
Galileo

18,902 followers
6d

90% of agents fail to successfully reach production. The biggest reason? AI engineers building these apps don't have a clear way of evaluating that these agents are doing what they should do, and using the results of this evaluation to fix them. This is why our Principal Dev Advocate, Jim Bennett, is hosting a free 5-part course to help AI engineers become experts in eval engineering, with the first lesson starting on December 9th at 9:00am PT. This first lesson covers: ✅ What evals are ✅ How you can use simple evals to detect issues in an AI application ✅ Hands-on training on how to add evals to an app Register with the link below, and we hope to see you there 👇
Like Comment Share
Galileo

18,902 followers
2d
Report this post
Learn Eval Engineering in this free, 5-part, hands-on course. 90% of AI agents don't make it successfully to production. The biggest reason is the AI engineers building these apps don't have a clear way of evaluating that these agents are doing what they should do, and using the results of this evaluation to fix them. In this course, you will learn all about evals for AI applications. You'll start with some out-of-the-box metrics and learn about evals, then move onto understanding observability for AI apps, analyzing failure states, defining custom metrics, then finally using these across your whole SDLC. This will be hands on, so be prepared to write some code, create some metrics, and do some homework! In this first lesson, you will: - Learn what evals are - Learn how you can use simple evals to detect issues in an AI application - Get hands on adding an eval to an app Prerequisites: - A basic knowledge of Python - Access to an OpenAI API key - A free Galileo account (we will be using Galileo as the evals platform) Future lessons Stay tuned for links to future lessons!

Eval Engineering for AI Developers - Lesson 1

www.linkedin.com

4 Comments

Like Comment Share
Galileo reposted this
Jim Bennett
2d
Report this post
#AI peeps, come join me as I teach eval engineering. Join either here, or sign up on our Luma event page to join on YouTube. https://luma.com/6q19vpzb

Galileo

18,902 followers
2d

Learn Eval Engineering in this free, 5-part, hands-on course. 90% of AI agents don't make it successfully to production. The biggest reason is the AI engineers building these apps don't have a clear way of evaluating that these agents are doing what they should do, and using the results of this evaluation to fix them. In this course, you will learn all about evals for AI applications. You'll start with some out-of-the-box metrics and learn about evals, then move onto understanding observability for AI apps, analyzing failure states, defining custom metrics, then finally using these across your whole SDLC. This will be hands on, so be prepared to write some code, create some metrics, and do some homework! In this first lesson, you will: - Learn what evals are - Learn how you can use simple evals to detect issues in an AI application - Get hands on adding an eval to an app Prerequisites: - A basic knowledge of Python - Access to an OpenAI API key - A free Galileo account (we will be using Galileo as the evals platform) Future lessons Stay tuned for links to future lessons!

Eval Engineering for AI Developers - Lesson 1

www.linkedin.com

Like Comment Share
Galileo reposted this
Sara Manteghi
3d
Report this post
For enterprise AI deployments, latency consistency matters as much as model accuracy. A client-side load-balancing approach backed by Redis and atomic scoring eliminates the GPU scheduling blind spots that typically cause tail latency spikes. The result: significantly higher utilization, stable P99s, and no additional servers. For any org scaling agent-based systems, this is the level of optimization required to keep performance and economics aligned. Galileo #genai #agents #llm
Galileo

18,902 followers
1w

For runtime agent observability, we had to solve the GenAI inference problem of poor GPU utilization and unpredictable latency. Here's how our engineer Lev N. did it 👇 Lev's solution: client-side load balancing backed by Redis & Lua scripting: – Clients compute a load score for each inference request (based on payload size) – Redis maintains a real-time view of GPU fleet load using sorted sets – Lua scripts ensure atomic operations, pick the least busy GPU and increment its score simultaneously – A background reconciler handles failures and keeps scores accurate The results: 📈 ~40% increase in average GPU utilization 📉 70% reduction in tail latency ☑️ Same infrastructure, zero additional servers For our customers who run millions of agent logs per month and depend on Galileo for runtime intervention at scale, this has been critical. Read more about it in Lev’s blog below 👇
Like Comment Share
Galileo reposted this
Hayden Kempton
3d
Report this post
This year Expedia Group rolled out new APIs + GenAI “Trip Matching” that turns social travel inspiration into bookable itineraries, a perfect example of how AI is moving from chat → actions → revenue. This is where the industry is heading: Not another chatbot. Not more demos. Real workflows, plugged into real systems, driving real transactions. At Galileo we’re building the infra that makes that possible, fast iteration, evals, data integration, safety, reliability at scale. If you’re building agentic products and need to go from prototype → production without breaking things, DM me. Let’s ship. #genai #infra #aiagents #product

Galileo AI: The AI Observability and Evaluation Platform galileo.ai

1 Comment

Like Comment Share
Galileo reposted this
Pratik Bhavsar
3d
Report this post
> best production agents are deliberately constrained 68% of agents run ≤10 steps before human intervention 70% use off-the-shelf models with careful prompting 85% are custom-built, avoiding framework abstractions 74% rely on human evaluation, not automated benchmarks 80% use static workflows, not open-ended autonomy This landmark empirical study "Measuring Agents in Production" surveys 306 practitioners and conducts 20 in-depth interviews across 26 application domains to understand how AI agents actually work in production. The core insight is that successful production teams deliberately trade capability for controllability. They accept limited autonomy in exchange for reliability, using constrained architectures with predefined workflows rather than open-ended autonomous agents. Despite the common claim that "95% of agent deployments fail," this research demonstrates that practitioners can successfully deploy reliable systems that serve real users despite environmental and operational constraints. Rather than solving reliability through better models, successful teams constrain the problem: 1. Limited autonomy: Fewer steps = fewer failure points 2. Predefined workflows: Static paths are predictable and debuggable 3. Human checkpoints: Regular intervention catches errors before cascading 4. Constrained tools: Limited tool access reduces attack surface 5. Domain specialization: Narrow scope enables thorough testing Does this match your experience? #agents #llm #gpt #ml

9 Comments

Like Comment Share
Galileo

18,902 followers
6d
Report this post
Microsoft's VP of Developer Relations, Patrick Chanezon's spicy take from #MSIgnite? AI's architecture is about to flip from centralized to client-server. The last four years have been the mainframe era of AI, characterized by massive, centralized models with applications that primarily interact with them. The next phase? Local models running on your machine, calling server models only when specific tasks demand it. Watch Patrick's full take 👇

1 Comment

Like Comment Share

Galileo

Software Development

Burlingame, California 18,902 followers

Galileo is the leading AI Reliability + Evaluation platform that helps teams of all sizes build AI apps they can trust.

About us

Locations

Employees at Galileo

Xian Ke

Ahmed Quadri

Dharmesh Thakker

Brent Chalker

Updates

Eval Engineering for AI Developers - Lesson 1

www.linkedin.com

Eval Engineering for AI Developers - Lesson 1

www.linkedin.com

Join now to see what you are missing

Similar pages

Galileo AI

Galileo Financial Technologies

Galileo

Neuron7.ai

Arize AI

Glean

Voxela, Inc.

SoFi

Anthropic

Decagon

Browse jobs

Scientist jobs

Engineer jobs

Machine Learning Engineer jobs

Analyst jobs

Developer jobs

Intern jobs

Manager jobs

Data Scientist jobs

Account Executive jobs

Site Reliability Engineer jobs

Director jobs

Senior Scientist jobs

Project Manager jobs

Technical Product Manager jobs

Account Manager jobs

Intelligence Specialist jobs

Senior Data Engineer jobs

Software Engineer jobs

Enterprise Account Executive jobs

Recruiter jobs