Visão Geral
Este curso ensina, de forma prática e direta, como utilizar Kafka Connect e Debezium para implementar pipelines de Change Data Capture (CDC) em tempo real. Você aprenderá como capturar alterações de bancos de dados, transmitir eventos para o Apache Kafka e integrá-los com outros sistemas de destino. O curso prepara você para construir pipelines modernos, robustos e altamente escaláveis usados em engenharia de dados e arquiteturas de microservices.
Conteúdo Programatico
Module 1 – Introduction to CDC and Event-Driven Data
- What is CDC and why it matters
- Traditional ETL vs CDC
- Event-driven architectures and data streaming
- Role of Kafka Connect and Debezium
Module 2 – Kafka Connect Fundamentals
- Connectors, tasks, workers
- Source vs sink connectors
- Distributed vs standalone mode
- Connector configuration structure
Module 3 – Debezium Fundamentals
- Debezium architecture
- Database transaction logs
- Debezium connectors overview
- Change events and their structure
Module 4 – Setting Up the Environment
- Running Kafka + Connect + Debezium with Docker
- Installing connectors
- Configuring offsets, tasks, and workers
- Exploring logs and monitoring startup
Module 5 – Debezium for Relational Databases
- MySQL connector
- PostgreSQL connector
- SQL Server connector
- Handling schema changes and metadata
Module 6 – Understanding CDC Events
- Create, update, delete events
- Before/after states
- Envelopes and payload structure
- Debezium event types and topics
Module 7 – Sink Connectors and Downstream Integration
- File system sinks
- JDBC sinks
- ElasticSearch and NoSQL sinks
- Multi-system fan-out strategies
Module 8 – Schema Management & Serialization
- JSON vs Avro vs Protobuf
- Schema Registry integration
- Schema evolution scenarios
- Compatibility and versioning
Module 9 – Fault Tolerance and Reliability
- Rebalancing and task distribution
- Offset management
- Handling connector failures
- Restarting, pausing and resuming connectors
Module 10 – Performance & Scalability
- Worker scaling strategies
- Tuning connector tasks
- Efficient topic partitioning
- Minimizing latency
Module 11 – Security & Governance
- Securing connectors
- Data masking
- Change history retention
- GDPR / LGPD considerations
Module 12 – Full End-to-End CDC Pipeline Project
- Capturing data from a relational database
- Streaming changes to Kafka
- Transforming data with SMTs
- Writing output into a data lake or analytics system
- Validating consistency end-to-end