Visão Geral
O Curso Kafka for Data Engineers foi desenvolvido para capacitar engenheiros de dados a projetar, construir e gerenciar pipelines de dados modernos utilizando o Apache Kafka.
Os participantes aprenderão a integrar o Kafka com diversas fontes e destinos de dados, compreender o fluxo de eventos em tempo real, aplicar boas práticas de processamento distribuído e implementar pipelines de streaming escaláveis e resilientes.
O curso combina fundamentos teóricos e práticas laboratoriais para garantir domínio técnico na utilização do Kafka em ambientes de engenharia de dados.
Conteúdo Programatico
Module 1: Introduction to Kafka for Data Engineering
- Kafka’s role in modern data pipelines
- Batch vs. Streaming data processing
- Core concepts: brokers, topics, partitions, producers, consumers
- Real-time data use cases
Module 2: Kafka Architecture and Data Flow
- Understanding producers, consumers, and consumer groups
- Message serialization and deserialization
- Offsets, partitions, and replication strategies
- High availability and fault tolerance
Module 3: Data Ingestion with Kafka
- Designing scalable data ingestion pipelines
- Connecting data sources with Kafka Connect
- Integrating relational and NoSQL databases
- Ingesting data from REST APIs, logs, and IoT devices
Module 4: Kafka Connect Deep Dive
- Source and Sink connectors explained
- Configuring connectors and transformations
- Managing distributed Connect clusters
- Hands-on: deploying connectors with Docker Compose
Module 5: Data Serialization and Schema Management
- Introduction to Schema Registry
- Working with Avro, JSON, and Protobuf schemas
- Enforcing schema compatibility and evolution
- Best practices for schema versioning
Module 6: Stream Processing for Data Engineers
- Kafka Streams fundamentals
- Stateless vs. stateful transformations
- Windowing, joins, and aggregations
- Hands-on with Kafka Streams API
Module 7: Integrating Kafka with Big Data Ecosystems
- Integration with Apache Spark Structured Streaming
- Integration with Apache Flink
- Data lake and data warehouse ingestion (S3, Snowflake, BigQuery)
- Building hybrid streaming + batch architectures
Module 8: Monitoring, Security, and Optimization
- Kafka metrics and monitoring with Prometheus/Grafana
- Securing Kafka: SSL, SASL, and ACLs
- Performance tuning for throughput and latency
- Troubleshooting data ingestion issues
Module 9: Hands-On Project
Project: Build a real-time data pipeline using Kafka, Connect, Schema Registry, and Spark Structured Streaming, integrating data from multiple sources and visualizing insights in a dashboard.