Brian Love talks with us about site reliability engineering, if SRE is the same as health monitoring, how you can ensure you don't cause more issues trying to monitor, and where AI has a role in SRE?
const podcast = {
episode: 228,
title: 'Web Apps and Site Reliability Engineering',
topics: [
'reliability', 'web apps', 'user focused'
],
guest: 'Brian Love'
hosts: [
'John Papa', 'Ward Bell'
]
};
Recording date: March, 23, 2023
John Papa @John_Papa
Ward Bell @WardBell
Dan Wahlin @DanWahlin
Craig Shoemaker @craigshoemaker
Brian Love @Brian_love
Brought to you by
Resources:
- Google Books on SRE
- What is SRE
- Introduction to Site Reliability Engineering (SRE)
- Reliable systems in DevOps
- Ping test
- Voting with your feet
- What is an SLA
- Service Level Objectives and Indicators
- SLA vs SLO vs SLI
- SLIs, SLOs, and SLAs, oh my:
- Interview with Dave Rensen, SRE Engineering Director on the SRE Workbook:
- The Origins of SRE
- What it means to be a SRE
- Get Polaris (SRE tool)
- Send Beacon API
- GitHub Copilot X
- Prompt Engineering
- Learn with Introduction to Prompt Engineering
Timejumps
- 00:29 Welcome
- 01:37 Guest introduction
- 02:55 What is SRE?
- 05:38 What is it like if you don't have an SRE?
- 09:29 Sponsor: Ag Grid
- 10:36 Available vs reliable
- 13:35 Is SRE the same as health monitoring?
- 21:29 Sponsor: IdeaBlade
- 22:30 How do I make sure I don't cause more reliability issues?
- 27:36 Who's providing the infastructure?
- 31:04 Where's the AI in all of this?
- 33:59 Final thoughts
Podcast editing on this episode done by Chris Enns of Lemon Productions.