Visão Geral
Este curso foi desenvolvido para profissionais que desejam dominar a criação de pipelines de dados em tempo real utilizando Apache Flink e Apache Kafka. O aluno aprenderá conceitos fundamentais e avançados de stream processing, integração, padrões arquiteturais profissionais, otimização, garantia de entrega, e implementação de pipelines completos ponta a ponta.
Ao final, será capaz de construir sistemas escaláveis, tolerantes a falhas, com baixa latência e altamente resilientes.
Conteúdo Programatico
Module 1 – Kafka Architecture Essentials (4h)
- Core Kafka concepts
- Topics, partitions, replication
- Producers and consumers
- Consumer groups and rebalancing
- Offset management
- Kafka delivery semantics: at-most-once, at-least-once, exactly-once
- Schema Registry basics (Avro / JSON / Protobuf)
- Hands-on: Creating topics, producing and consuming events
Module 2 – Flink Architecture & Runtime Review (3h)
- Flink cluster architecture overview
- JobManager, TaskManager, task slots
- Checkpoints, savepoints, barriers
- Flink event-time fundamentals
- DataStream vs Table API
- Hands-on: Running Flink locally and submitting jobs
Module 3 – Flink + Kafka Integration (6h)
3.1 Modern Kafka Source & Sink
- KafkaSource (new unified API)
- Offset strategies (earliest, latest, committed)
- KafkaSink with transactions
- Delivery semantics with Kafka + Flink
3.2 Working with Schema Registry
- Avro schema evolution rules
- Enforcing compatibility
- Integrating Flink with Confluent Schema Registry
3.3 Hands-on
- Build a streaming pipeline Flink → Kafka → Flink
- Testing exactly-once
- Handling consumer lag
Module 4 – Stream Transformations & Event-Time Processing (5h)
- Stateless transformations (map, filter, flatMap)
- Keyed stream patterns
- Event time vs processing time
- Watermarks and late events
- Window types:
- Tumbling
- Sliding
- Session
- Global windows
- Custom window functions
- Hands-on: Real-time aggregations with event-time
Module 5 – Stateful Stream Processing (4h)
- Keyed state and operator state
- ValueState, ListState, MapState
- Timers and state expiration
- RocksDB state backend deep usage
- Checkpoints and fault tolerance
- Hands-on: Building a stateful fraud-detection pipeline
Module 6 – Streaming Patterns with Kafka + Flink (4h)
- Stream enrichment with broadcast state
- Multiple Kafka topics processing
- Side outputs / DLQ
- Reprocessing and backfill strategies
- CDC (Change Data Capture) with Flink + Debezium + Kafka
- End-to-end patterns:
- Lambda architecture
- Kappa architecture
- Stateful event routing
- Hands-on: Pipeline with side outputs and DLQ
Module 7 – Integration with External Systems (3h)
- JDBC Sink (PostgreSQL / MySQL)
- Connecting to ElasticSearch
- Exporting to S3 / MinIO
- Schema on read vs schema on write
- Hands-on: Kafka → Flink → PostgreSQL pipeline
Module 8 – Performance Optimization & Backpressure (2h)
- Understanding backpressure
- Monitoring throughput and latency
- Operator chaining
- Parallelism tuning
- Slot allocations
- RocksDB tuning for large state
- Hands-on: Fixing a pipeline with backpressure
Module 9 – Deployment & Observability (3h)
- Flink deployments:
- Standalone
- Docker
- Kubernetes
- Flink Native Kubernetes mode
- Kafka in distributed environments
- Observability:
- Flink Web UI
- Metrics
- Logs
- Prometheus + Grafana
- Hands-on: Deploying a Kafka + Flink pipeline in Docker Compose
Module 10 – Final Project (Training Capstone) (2h)
- Build an enterprise-grade streaming pipeline using:
- KafkaSource + Schema Registry
- Stateful Flink transformations
- Event-time windowing
- DLQ + retries
- Exactly-once semantics
- Sink to PostgreSQL / S3
- Monitoring through Flink UI
- Entrega final: working real-time pipeline.