Visão Geral
Este Curso Apache Flink for Data Engineers foi criado para formar engenheiros de dados capazes de trabalhar com Apache Flink em cenários reais de alto volume e alta complexidade. Ele aborda desde os fundamentos do Flink até a criação de pipelines escaláveis, resilientes e integrados com ecossistemas modernos de Big Data. Você aprenderá conceitos essenciais para projetar, otimizar e operar fluxos de dados contínuos que atendem aplicações empresariais.
Conteúdo Programatico
Module 1 – Foundations for Data Engineers
- Role of data engineers in streaming architectures
- Challenges of real-time processing
- Big Data ecosystem overview
- Where Flink fits in the modern data stack
Module 2 – Flink Architecture for Engineering
- Distributed runtime in depth
- Job graph and execution graph
- Task slots, parallelism and resource utilization
- Checkpoints and state internals
Module 3 – Building Robust Data Pipelines
- Data ingestion patterns
- End-to-end streaming designs
- Stateless vs stateful pipeline design
- Designing fault-tolerant workflows
Module 4 – Working with DataStream API
- Transformations and operators
- Keyed streams and partitioning strategies
- Custom functions and user-defined operators
- Serialization formats and schema design
Module 5 – Event-Time Processing & Windowing
- Time domains and semantics
- Watermarks strategies
- Advanced windowing techniques
- Handling late and out-of-order events
Module 6 – Advanced State Management
- Keyed vs operator state
- RocksDB internals
- State growth limitations
- Designing large-scale stateful applications
Module 7 – Integrating Flink with Data Systems
- Kafka source & sink
- File systems (S3, HDFS, local)
- JDBC, NoSQL databases and object stores
- CDC pipelines with Flink + Debezium
Module 8 – Streaming Joins & Enrichment
- Stream-stream joins
- Stream-batch joins
- Side inputs and enrichment patterns
- Temporal tables and versioned data
Module 9 – Observability & Monitoring
- Metrics essentials for data engineers
- Backpressure diagnosis
- Flink Dashboard deep dive
- Logging, tracing and alerting patterns
Module 10 – Performance Engineering
- Memory tuning
- Parallelism optimization
- Checkpoint interval tuning
- Low-latency and high-throughput strategies
Module 11 – Deploying Flink in Production
- Standalone, YARN and Kubernetes
- Flink Operator for Kubernetes
- CI/CD automation for Flink jobs
- Multi-environment release strategies
Module 12 – Capstone Data Engineering Project
- Designing a full streaming pipeline
- Ingesting raw data from Kafka
- Applying transformations and windowing
- Persisting results into analytical storage
- Deploying and validating the production pipeline