Shop Mayam
Enjoy fast, free delivery, exclusive deals, and award-winning movies & TV shows.
Buy new:
-45% $32.99
FREE delivery Tuesday, December 16 on orders shipped by Amazon over $35
Ships from: Amazon.com
Sold by: Amazon.com
Kindle app logo image

Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.

Read instantly on your browser with Kindle for Web.

Using your mobile phone camera - scan the code below and download the Kindle app.

QR code to download the Kindle App

Follow the authors

Get new release updates & improved recommendations
See all
Something went wrong. Please try your request again later.

Site Reliability Engineering: How Google Runs Production Systems 1st Edition


{"desktop_buybox_group_1":[{"displayPrice":"$32.99","priceAmount":32.99,"currencySymbol":"$","integerValue":"32","decimalSeparator":".","fractionalValue":"99","symbolPosition":"left","hasSpace":false,"showFractionalPartIfEmpty":true,"offerListingId":"F7UANgS3HLSHAD1zbUd4EO%2B3POpvUKvEZEo%2FycSQH0ryY84H9OIXBvEOlm9EDJkGLqpwsZjVHo%2B3glr0cSmgyHZoxOfwFNG0Bge2bJoqLddwvrDgK6tSWl06i7hI40dxa8mDeNLJFV5kJTTTAJCINQ%3D%3D","locale":"en-US","buyingOptionType":"NEW","aapiBuyingOptionIndex":0}, {"displayPrice":"$19.00","priceAmount":19.00,"currencySymbol":"$","integerValue":"19","decimalSeparator":".","fractionalValue":"00","symbolPosition":"left","hasSpace":false,"showFractionalPartIfEmpty":true,"offerListingId":"F7UANgS3HLSHAD1zbUd4EO%2B3POpvUKvEKR83FkNIpUEFNqnZvB%2F0CkCLH784xpwX5Q%2Bu9EEiyRmB9%2BrYY2gIGFvnaqg6k4e6vYikyeLRAmNm9v%2BPai5buqYdwaW43KtmCvnAmiulzqXtROVKVNXIbiJITZ%2FFrMm9iT0rCZamxA83viZmHdQFiQ%3D%3D","locale":"en-US","buyingOptionType":"USED","aapiBuyingOptionIndex":1}]}

Purchase options and add-ons

The overwhelming majority of a software system's lifespan is spent in use, not in design or implementation. So, why does conventional wisdom insist that software engineers focus primarily on the design and development of large-scale computing systems?

In this collection of essays and articles, key members of Google's Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world. You'll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficient―lessons directly applicable to your organization.

This book is divided into four sections:

  • Introduction―Learn what site reliability engineering is and why it differs from conventional IT industry practices
  • Principles―Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE)
  • Practices―Understand the theory and practice of an SRE's day-to-day work: building and operating large distributed computing systems
  • Management―Explore Google's best practices for training, communication, and meetings that your organization can use

Frequently bought together

This item: Site Reliability Engineering: How Google Runs Production Systems
$32.99
Get it as soon as Tuesday, Dec 16
In Stock
Ships from and sold by Amazon.com.
+
$32.00
Get it as soon as Tuesday, Dec 16
Only 12 left in stock (more on the way).
Ships from and sold by Amazon.com.
+
$37.00
Get it as soon as Tuesday, Dec 16
In Stock
Ships from and sold by Amazon.com.
Total price: $00
To see our price, add these items to your cart.
Details
Added to Cart
Some of these items ship sooner than the others.
Choose items to buy together.

Customers also bought or read

Loading...

From the brand


From the Publisher


This book is divided into four sections:
  • Introduction—Learn what site reliability engineering is and why it differs from conventional IT industry practices
  • Principles—Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE)
  • Practices—Understand the theory and practice of an SRE’s day-to-day work: building and operating large distributed computing systems
  • Management—Explore Google's best practices for training, communication, and meetings that your organization can use

How to Read This Book

This book is a series of essays written by members and alumni of Google’s Site Reliability Engineering organization. It’s much more like conference proceedings than it is like a standard book by an author or a small number of authors. Each chapter is intended to be read as a part of a coherent whole, but a good deal can be gained by reading on whatever subject particularly interests you. (If there are other articles that support or inform the text, we reference them so you can follow up accordingly.)

You don’t need to read in any particular order, though we’d suggest at least starting with Chapters 2 and 3, which describe Google’s production environment and outline how SRE approaches risk, respectively. (Risk is, in many ways, the key quality of our profession.) Reading cover-to-cover is, of course, also useful and possible; our chapters are grouped thematically, into Principles (Part II), Practices (Part III), and Management (Part IV). Each has a small introduction that highlights what the individual pieces are about, and references other articles published by Google SREs, covering specific topics in more detail. Additionally, there’s a companion website mentioned in the book that has a number of helpful resources.

We hope this will be at least as useful and interesting to you as putting it together was for us.

— The Editors.

Site Reliability Engineering
The Site Reliability Workbook
Customer Reviews
4.7 out of 5 stars 1,201
4.6 out of 5 stars 382
Price $32.99 $32.00
Explore the book & companion workbook How Google Runs Production Systems Practical Ways to Implement SRE

Editorial Reviews

About the Author

Niall Murphy leads the Ads Site Reliability Engineering team at Google Ireland. He has been involved in the Internet industry for about 20 years, and is currently chairperson of INEX, Ireland’s peering hub. He is the author or coauthor of a number of technical papers and/or books, including "IPv6 Network Administration" for O’Reilly, and a number of RFCs. He is currently cowriting a history of the Internet in Ireland, and is the holder of degrees in Computer Science, Mathematics, and Poetry Studies, which is surely some kind of mistake. He lives in Dublin with his wife and two sons.

^

Betsy Beyer is a Technical Writer for Google Site Reliability Engineering in NYC. She has previously written documentation for Google Datacenters and Hardware Operations teams. Before moving to New York, Betsy was a lecturer on technical writing at Stanford University.

^

Chris Jones is a Site Reliability Engineer for Google App Engine, a cloud platform-as-a-service product serving over 28 billion requests per day. Based in San Francisco, he has previously been responsible for the care and feeding of Google’s advertising statistics, data warehousing, and customer support systems. In other lives, Chris has worked in academic IT, analyzed data for political campaigns, and engaged in some light BSD kernel hacking, picking up degrees in Computer Engineering, Economics, and Technology Policy along the way. He’s also a licensed professional engineer.

^

Jennifer Petoff is a Program Manager for Google’s Site Reliability Engineering team and based in Dublin, Ireland. She has managed large global projects across wide-ranging domains including scientific research, engineering, human resources, and advertising operations. Jennifer joined Google after spending eight years in the chemical industry. She holds a PhD in Chemistry from Stanford University and a BS in Chemistry and a BA in Psychology from the University of Rochester.

Product details

About the authors

Follow authors to get new release updates, plus improved recommendations.

Customer reviews

4.7 out of 5 stars
1,201 global ratings

Customers say

Customers find this book to be a must-read for DevOps engineers, praising its content as a trove of knowledge and its unique approach. The book receives positive feedback for its pacing, with one customer describing it as a great deep dive into a style of delivery, and another noting its applicability to many environments. However, the Kindle edition's formatting receives criticism, with multiple customers describing it as horribly formatted. Additionally, the writing style receives mixed reactions, with one customer noting it's written by a committee of authors.

37 customers mention "Readability"36 positive1 negative

Customers find the book highly readable, describing it as fascinating and a must-read for both DevOps engineers and modern-day software developers.

"This is a great book! It touches on a lot of great ways to think about developer operations and similar engineering...." Read more

"...book on DevOps, SRE, and current trends in the industry, It's a great read for anyone who wants to apply some "best practices" to their role...." Read more

"Awesome read, eye opening and counter intuitive." Read more

"Fantastic book. It's a bit long-winded at times but it's filled with nuggets of gold...." Read more

24 customers mention "Information quality"22 positive2 negative

Customers find the book's content to be a trove of knowledge, with one customer particularly impressed with the detailed coverage of monitoring chapters.

"Thanks to the Google SRE team for sharing such valuable information and perspectives! The best thing I have read in this space for awhile...." Read more

"...While it covers a range of topics, don't look for any low level details on implementation or you'll be sorely disappointed...." Read more

"...Part 3 - Some useful info and a lot of stuff that's not really unique to Google in my experience...." Read more

"Wish there were more books written like this; great subject and good detail..." Read more

5 customers mention "Ideas"5 positive0 negative

Customers appreciate the ideas in the book, with one noting its unique approach.

"...Good ideas and a very simple read. Worth keeping on the shelf as a reference and keeping current on how to identify and reduce toil." Read more

"...of the book because the material is so interesting and their approach is so unique...." Read more

"...Brilliant concept for a book and very well done." Read more

"...service while managing costs, and this book gives you some really innovative, forward-thinking ideas on how to achieve both." Read more

5 customers mention "Pacing"4 positive1 negative

Customers appreciate the pacing of the book, with one describing it as a great deep dive into a style of delivery, while another notes how it captures the whole amount of experience.

"This was a great deep dive into a style of delivery that everyone developing modern software must consider...." Read more

"Fascinating book with a whole amount of experience captured and it can save you a great deal of pain...." Read more

"Slow Burn and Overly Google..." Read more

"Fantastic book. It's a bit long-winded at times but it's filled with nuggets of gold...." Read more

5 customers mention "Value for money"4 positive1 negative

Customers find the book provides good value for money, with one customer noting it's worth keeping as a reference.

"...This book is a great launching point for discussion, and it’s worth having a copy if you deploy systems at scale...." Read more

"...Good ideas and a very simple read. Worth keeping on the shelf as a reference and keeping current on how to identify and reduce toil." Read more

"Great book! Even if you only read selected chapters, very worth while for any IT professional" Read more

"...Unacceptable for such an expensive publication...." Read more

4 customers mention "Use"4 positive0 negative

Customers find the book useful, with one mentioning its applicability across industry and another highlighting its relevance to modern cloud computing.

"Excellent information about SRE practice within Google. Applicable to many environments." Read more

"Tons of nuggets about best practices, how they can be useful across industry, Google's tooling, how they got there, challenges faced, communication..." Read more

"...Part 2 - Interesting and useful concepts for modern cloud computing...." Read more

"Useful But Needlessly Verbose..." Read more

4 customers mention "Writing"1 positive3 negative

Customers have mixed opinions about the writing style of the book, with one customer noting it was written by a committee of authors.

"...Because the books is actually a collection of chapters, the writing is a bit uneven...." Read more

"First, this book is a collection of writings by different authors, none of which is listed on the cover...." Read more

"...It's all written by different people too, which on the one hand makes it not quite as repetitive, but on the other hand makes it hard to just skim..." Read more

"...This book was written by a committee of authors and it showed. So much replication and repetition!..." Read more

3 customers mention "Book formatting"0 positive3 negative

Customers find the Kindle edition of the book to be poorly formatted.

"The Kindle edition is horribly formatted. Headings, subheadings and call-outs are not visible as such, but appear as normal body text...." Read more

"...Unfortunately the Kindle version is formatted terribly and I wish I'd bought the print version instead...." Read more

"Useful Information, Unevenly Presented..." Read more

Kindle edition is horribly formatted
1 out of 5 stars
Kindle edition is horribly formatted
The Kindle edition is horribly formatted. Headings, subheadings and call-outs are not visible as such, but appear as normal body text. Unacceptable for such an expensive publication. Additional information: this is on Windows 10, with the Kindle reader app from the Windows Store. Attached screenshot shows how the author and editor credits, as well as a quote, appear as normal body text at the start of a chapter: this problem persists through the entire publication, and is especially annoying for subheadings. All my other Kindle books look fine using the same reader app.
Thank you for your feedback
Sorry, there was an error
Sorry we couldn't load the review

Top reviews from the United States

  • Reviewed in the United States on October 4, 2025
    Format: PaperbackVerified Purchase
    Amazing book with high quality content
  • Reviewed in the United States on January 8, 2021
    Format: PaperbackVerified Purchase
    I was amazed by the depth of this book, and the way it covers several aspects of what it takes to operate a complex and distributed software system. I was particularly impressed with the details of some chapters related to monitoring, load balancing (at the front end and back end), designing applications to manage overload conditions, and being on call.
    I think the book has a lot to teach and inspire. Some of the approaches described are very counterintuitive like the error budget, and the blameless postmortem culture. One of the shortcomings I noticed was that some chapters are hard to read because they treat rather advanced topics. The fact that the book has very few illustrations makes it hard to understand some of the concepts at times. Overall, an invaluable resource.
    One person found this helpful
    Report
  • Reviewed in the United States on April 20, 2019
    Format: PaperbackVerified Purchase
    It's worth noting that there is a great Coursera course about SRE from Google. It will not cover as much as the book, but's it is a distilled version to learn the basics.

    This book has a lot of great information, which I found invaluable over the years. One of the harder thing for growing organizations is to keep teams focused, and I've seen that DevOps and SRE practices help to zero in on what is essential.

    A lot of Automation related work feels like 'yak shaving,' which is a term to refer to entirely unrelated things that don't add value to our product. For development teams, this feels very frustrating. Why would I want to make a script to automate this? We only use it once a year!

    SRE helps to solve these frustrations, to some extent, with practices that help organizations understand why should they communicate, why should they talk about issues, and why we measure some things on some level and not others.
    9 people found this helpful
    Report
  • Reviewed in the United States on December 22, 2018
    Format: KindleVerified Purchase
    First off - it's worth noting that Google lets you read this entire book for free on their website.

    I bought the Kindle version anyways because I spend enough time in front of a backlit screen that it seemed worth it to read something this large using a device that's better on your eyes. Unfortunately the Kindle version is formatted terribly and I wish I'd bought the print version instead. The book is broken up into Parts which are broken up into Chapters which are further broken up into headlined sections. The Kindle version identifies those headlined sections as chapters which is somewhat useless.

    Anyways, the first few chapters aren't especially useful unless you work at Google. They mostly discuss what's unique about Google's computing infrastructure. Despite this, they were EASILY my favorite part of the book because the material is so interesting and their approach is so unique. After that, each chapter is written in a way that it can stand on its own if you aren't reading the entire book, or are reading it out of order. This is convenient for people who want to pick and choose what parts they want to read, but means that people who are reading the entire thing wind up getting a lot of the same information multiple times. It's all written by different people too, which on the one hand makes it not quite as repetitive, but on the other hand makes it hard to just skim over the sections with info you already have because you don't recognize it as information you already know until you've processed it.

    Overall this is a fantastic book on DevOps, SRE, and current trends in the industry, It's a great read for anyone who wants to apply some "best practices" to their role. I would however say that reading the entire thing is overkill for most people and not necessarily the best use of your time if you have other things you'd like to be learning as well.

    Part 1 - Fascinating read. I imagine this would be a good overview if you're about to start at Google and want a sneak peek at how things are done, but I'm only speculating this as an outsider.

    Part 2 - Interesting and useful concepts for modern cloud computing.

    Part 3 - Some useful info and a lot of stuff that's not really unique to Google in my experience. Read the parts that you think you could use some improvement on, skip the rest.

    Part 4 - A condensed view from a managerial perspective of things you already read in Part 3.

    Part 5 - Some case studies, comparisons from other businesses, a useless recap, and examples that could be useful to share using the website version of the book if you're trying to explain to your team what new concepts are being implemented.
    46 people found this helpful
    Report
  • Reviewed in the United States on April 1, 2022
    Format: PaperbackVerified Purchase
    Tons of nuggets about best practices, how they can be useful across industry, Google's tooling, how they got there, challenges faced, communication between engineers and SRE, how to look at problems, and so much more.
    There were parts of the book that got can be too deep or not best explained, and end up boring. I just skipped pages to move on to the next learning.
    Overall a good addition to my library.
  • Reviewed in the United States on June 26, 2016
    Format: PaperbackVerified Purchase
    The really liked this book. Cool to see how Google actually runs things at their scale. Got me thinking about things I never thought about when it comes to my work in tech. This could sound like the book makes you paranoid, but I think that's too negative. I felt more like I now have a little license and education on how things can (and will) fail and how I can better prepare for and mitigate them. It's like you got to do a ride along in a busy Ambulance service, gets you thinking "hmm, maybe I should take that CPR course and brush up on the heimlich maneuver...".

    Even though several of the topics covered weren't things I deal with day to day, I think the mindset you develop after seeing how they solve various issues applies to most any IT / tech endeavor (i.e. whether you're in ops, a SWE, etc.). I think if this book's subject interests you at all, you'll really appreciate having read it.
    16 people found this helpful
    Report
  • Reviewed in the United States on February 11, 2024
    Format: PaperbackVerified Purchase
    I think Googles practices are now standard across the industry. A lot of things mentioned in the book are already in practice at my employ. Good read.
  • Reviewed in the United States on September 27, 2019
    Format: PaperbackVerified Purchase
    I'm grateful for the approach Niall took, because this isn't a textbook or reference book.

    It's a collection of stories and tools. Some of the stories are irrelevant to me. But others made me nod, smile, and, ultimately, learn important definitions and techniques I can apply to our own SRE practice at DigitalOnUs.

Top reviews from other countries

  • J. Andrews
    5.0 out of 5 stars The book every infrastructure engineer and DevOps person should read
    Reviewed in the United Kingdom on January 21, 2018
    Format: KindleVerified Purchase
    If you are new to infrastructure engineering this book will inform you as to an approach and model to use as you start down this road. If you are an experienced engineer then you will see a lot of truth in what is written here. It may change you viewpoint or solidify an existing one, whatever the case this book is an essential reference and an honest account with a huge amount of wisdom.
  • Niels Albers
    5.0 out of 5 stars Must read for the serious DevOps engineer
    Reviewed in the Netherlands on May 4, 2016
    Format: KindleVerified Purchase
    Just the first chapter alone lists a number of concrete issues that anyone who has any experience with operations at all will both recognise, and the recommendations this book makes just make sense. Actually, not only people with DevOps experience should be reading this, there is a lot in here that their managers could certainly profit from, in every sense of the word.
    Key words:
    - Error budget
    - Toil / development ballance (and the 50% time rule)
    - The impossibility of never having a failure.

    I'm still working my way through the book, but every new chapter has new insights that really help to put our complex job into perspective, and offer concrete ways of making our work better.
  • Grisi
    5.0 out of 5 stars Deep info
    Reviewed in Australia on November 1, 2023
    Format: PaperbackVerified Purchase
    What to know about an Engineer - read this book - it's Deep
  • Chf
    5.0 out of 5 stars Interesting and useful
    Reviewed in Italy on May 16, 2016
    Format: KindleVerified Purchase
    Of course, I have not the same infrastructure like Google but many problems are the same.
    This book is very interesting because shows different tips & tricks to resolve and manage communication problems between departments and of course reliability problems.
    I suggest it to every IT professional, ITIL experts, DevOps wannabe and of course CTO.
  • Óscar Casal Sánchez
    5.0 out of 5 stars Excelente libro
    Reviewed in Spain on April 18, 2017
    Format: KindleVerified Purchase
    Libro excelente que da muchos puntos de vista de como formar un equipo de trabajo y cómo afrontar los problemas. También recorre todos los procesos de una empresa: presupuestos, monitorización, sla, puesta marcha servicio, mantenimiento de un servicio...

    En este libro se ve que la cultura de Google es "blameless" y que no hay una línea entre devs y ops, existe el concepto de SRE que podría decirse que es parecido al actual de devops, aunque con más funciones.

    Libro que debería leer toda persona que trabaja en IT y también a toda la