Visão Geral
O Curso Kafka Data Streaming with Spark & Flink ensina como integrar Apache Kafka com frameworks de stream processing como Apache Spark Structured Streaming e Apache Flink.
Durante o treinamento, os participantes aprenderão a construir pipelines de dados em tempo real, processar grandes volumes de eventos, implementar transformações, agregações e joins, além de monitorar e otimizar aplicações de streaming.
O curso combina teoria, laboratórios práticos e projetos para capacitar os alunos a criar arquiteturas de streaming escaláveis, resilientes e de alto desempenho.
Conteúdo Programatico
Module 1: Introduction to Kafka and Streaming
- Overview of Kafka and event-driven architectures
- Introduction to streaming concepts and frameworks
- Comparing batch vs real-time processing
- Setting up Kafka, Spark, and Flink environments
Module 2: Kafka Producers and Consumers
- Producing and consuming messages in Kafka
- Serialization and deserialization (JSON, Avro, Protobuf)
- Managing partitions, offsets, and consumer groups
- Error handling and retries
Module 3: Spark Structured Streaming Integration
- Overview of Spark Structured Streaming
- Reading from and writing to Kafka topics
- Transformations, aggregations, and windowing in Spark
- Stateful and stateless processing
Module 4: Flink Streaming Integration
- Introduction to Apache Flink and its architecture
- KafkaSource and KafkaSink integration
- Keyed streams, windows, and event time processing
- Stateful processing and checkpointing in Flink
Module 5: Advanced Stream Processing
- Joining streams and enriching data in real-time
- Handling late events and out-of-order data
- Performance tuning and resource optimization
- Fault tolerance and exactly-once semantics
Module 6: Monitoring and Observability
- Metrics and dashboards with Prometheus and Grafana
- Logging and debugging Kafka + Spark/Flink pipelines
- Alerting for production pipelines
- Troubleshooting common streaming issues
Module 7: Deployment and Scaling
- Running Spark and Flink applications in Docker and Kubernetes
- Scaling streaming jobs for high throughput
- Best practices for production-grade deployments
- Resource management and cluster configuration
Module 8: Hands-On Project
Project: Build a complete Kafka data streaming pipeline using Spark Structured Streaming and Flink, including producers, consumers, stream transformations, windowing, joins, and deployment in Docker/Kubernetes.