Curso Observability for Kafka and OpenSearch on Kubernetes

  • DevOps | CI | CD | Kubernetes | Web3

Curso Observability for Kafka and OpenSearch on Kubernetes

24 horas
Visão Geral

Este curso ensina como implementar, configurar e operar soluções completas de observabilidade para ambientes distribuídos baseados em Apache Kafka e OpenSearch executando sobre Kubernetes.
O aluno aprenderá a monitorar métricas, logs e eventos usando ferramentas modernas como Prometheus, Grafana, OpenSearch Dashboards, Filebeat, Fluent Bit e Exporters especializados, além de práticas de instrumentação e troubleshooting.

Ao final, o participante terá a capacidade de criar um ecossistema de observabilidade robusto para análise de performance, identificação de gargalos, diagnóstico de falhas e acompanhamento de pipelines de dados em tempo real.

Objetivo

Após realizar este curso Observability for Kafka and OpenSearch on Kubernetes, você será capaz de:

  1. Criar uma arquitetura completa de observabilidade para Kafka e OpenSearch em K8s
  2. Configurar coleta de métricas, logs e eventos em ambientes distribuídos
  3. Monitorar brokers, topics, partitions, consumer groups e ingest pipelines
  4. Identificar problemas de performance e gargalos em clusters Kafka
  5. Acompanhar a saúde, estado, shards e índices do OpenSearch
  6. Integrar Prometheus, Grafana e exporters especializados
  7. Criar dashboards avançados para detecção de anomalias e análise temporal
  8. Implementar troubleshooting orientado a observabilidade
  9. Aplicar boas práticas de monitoramento em produção
  10. Configurar alertas e regras automáticas
Publico Alvo
  • Engenheiros DevOps
  • Engenheiros de dados
  • Administradores de Kubernetes
  • SREs (Site Reliability Engineers)
  • Arquitetos de software
  • Desenvolvedores que precisam monitorar serviços de streaming
  • Profissionais responsáveis por operação de OpenSearch e Kafka em produção
Pre-Requisitos
  • Conhecimentos intermediários de Kubernetes
  • Noções de Apache Kafka e OpenSearch
  • Noções de Helm e YAML
  • Noções básicas de métricas, logs e observabilidade
  • Experiência prévia com linha de comando
Materiais
Inglês/Português + Exercícios + Lab Pratico
Conteúdo Programatico

Module 1 — Introduction to Observability for Data Platforms

  1. Observability pillars: logs, metrics, traces
  2. Why observability is critical for Kafka and OpenSearch
  3. Challenges of distributed data systems on Kubernetes
  4. Tools and architecture overview

Module 2 — Monitoring Kafka on Kubernetes

  1. Understanding Kafka performance metrics
  2. Broker metrics: CPU, memory, I/O, network
  3. Topic and partition metrics
  4. Consumer lag monitoring
  5. Exporters for Kafka (JMX Exporter, Kafka Exporter)
  6. Collecting Kafka metrics with Prometheus
  7. Visualizing Kafka health in Grafana

Module 3 — Monitoring OpenSearch on Kubernetes

  1. Key OpenSearch metrics: shards, indexing rate, search latency
  2. Node health and cluster status (green/yellow/red)
  3. Monitoring heap usage, GC activity and thread pools
  4. OpenSearch Exporter installation
  5. Prometheus scraping configuration
  6. Grafana dashboards for OpenSearch performance

Module 4 — Logging Architecture

  1. Log ingestion pipelines for Kubernetes
  2. Fluent Bit vs Filebeat: use cases and best practices
  3. Shipping Kafka logs to OpenSearch
  4. Shipping OpenSearch logs to OpenSearch (self-monitoring)
  5. Parsing logs for structured observability
  6. Building dashboards in OpenSearch Dashboards

Module 5 — Observability for Streaming Pipelines

  1. Tracking message flow across producers, brokers, consumers
  2. Identifying bottlenecks in ingestion pipelines
  3. Slow consumers and partition skew analysis
  4. Distributed logging for multi-service pipelines
  5. Using Kafdrop for visual inspection and debugging

Module 6 — Prometheus & Grafana Deep Dive

  1. Prometheus fundamentals
  2. Recording rules and alerting rules
  3. Setting up Alertmanager
  4. Creating custom dashboards for Kafka and OpenSearch
  5. Using Grafana Loki (overview)
  6. Using Grafana Tempo (overview)

Module 7 — OpenSearch Dashboards for Analytics

  1. Dashboards and visualizations
  2. Index patterns and search queries
  3. Identifying ingestion anomalies
  4. Detecting cluster degradation
  5. Combining logs and metrics for root-cause analysis

Module 8 — Tracing and Advanced Diagnostics

  1. Overview of tracing in event-driven systems
  2. Jaeger or OpenTelemetry (optional overview)
  3. Tracing event latency from producer to consumer
  4. Correlation IDs and distributed tracing patterns

Module 9 — Alerting and Automation

  1. Kafka alerting: broker failure, lag thresholds, partition issues
  2. OpenSearch alerting: shard failures, node unbalance
  3. Configuring Prometheus alert rules
  4. Integrating alerting into Slack, Teams or email
  5. Automated recovery strategies

Module 10 — Hands-On Labs

  • Lab 1: Installing Prometheus and Grafana on Kubernetes
  • Lab 2: Configuring Kafka Exporter + JMX Exporter
  • Lab 3: Configuring OpenSearch Exporter
  • Lab 4: Building Kafka performance dashboards
  • Lab 5: Building OpenSearch cluster health dashboards
  • Lab 6: Shipping logs with Fluent Bit
  • Lab 7: Troubleshooting real-world Kafka scenarios
  • Lab 8: Troubleshooting OpenSearch performance issues
  • Lab 9: Lag investigation using Kafdrop + Grafana
  • Lab 10: Alerting configuration with Alertmanager
TENHO INTERESSE

Cursos Relacionados

Curso Ansible Red Hat Basics Automation Technical Foundation

16 horas

Curso Terraform Deploying to Oracle Cloud Infrastructure

24 Horas

Curso Ansible Linux Automation with Ansible

24 horas

Ansible Overview of Ansible architecture

16h

Advanced Automation: Ansible Best Practices

32h