Visão Geral
Este curso apresenta os fundamentos e práticas de Observabilidade, DevOps e Site Reliability Engineering (SRE), mostrando como esses pilares trabalham juntos para garantir sistemas confiáveis, escaláveis e resilientes. O foco está em conceitos sólidos, práticas operacionais reais e tomada de decisão baseada em dados.
Conteúdo Programatico
Module 1 – Foundations of Modern Operations
- Evolution of IT operations
- Problems of traditional operations
- DevOps culture and principles
- Reliability as a business requirement
Module 2 – DevOps Core Concepts
- Continuous Integration and Continuous Delivery
- Infrastructure as Code fundamentals
- Automation and configuration management
- Collaboration and shared ownership
Module 3 – Introduction to Observability
- Monitoring vs observability
- Metrics, logs and traces
- Telemetry data types
- Observability-driven operations
Module 4 – Site Reliability Engineering (SRE) Fundamentals
- What is SRE
- SRE vs traditional operations
- Reliability engineering principles
- Error budgets concept
Module 5 – Metrics and Service Level Management
- SLIs, SLOs and SLAs
- Choosing meaningful metrics
- Golden signals
- Service health evaluation
Module 6 – Incident Management and Reliability
- Incident response lifecycle
- On-call practices
- Postmortems and blameless culture
- Continuous improvement
Module 7 – Observability Tooling and Architecture
- Observability stack overview
- Data pipelines for telemetry
- Tool integration concepts
- Scalability considerations
Module 8 – Real-World DevOps and SRE Scenarios
- High availability architectures
- Failure scenarios and resilience
- Production best practices
- Common anti-patterns