Visão Geral
Curso Airflow plus Spark and Lakehouse. Este curso aborda a construção de pipelines de dados modernos utilizando a integração entre Apache Airflow, Apache Spark e a arquitetura de Lakehouse. O foco está na orquestração de workflows de dados em escala, processamento distribuído e armazenamento analítico unificado, capacitando o aluno a projetar e operar plataformas de dados robustas, escaláveis e prontas para produção.
Conteúdo Programatico
Module 1: Modern Data Platforms Overview
- Evolution of data architectures
- Data warehouse vs data lake
- Lakehouse architecture principles
- Role of orchestration in data platforms
Module 2: Airflow in Large-Scale Data Pipelines
- Airflow architecture for data platforms
- DAG design for batch processing
- Scheduling and dependencies
- Production considerations
Module 3: Apache Spark Fundamentals for Data Engineering
- Spark architecture and execution model
- Spark jobs and applications
- DataFrames and transformations
- Batch processing patterns
Module 4: Orchestrating Spark with Airflow
- SparkSubmitOperator
- Managing Spark jobs from Airflow
- Parameterized Spark pipelines
- Monitoring Spark executions
Module 5: Lakehouse Storage Layers
- Bronze, Silver and Gold layers
- Table formats and metadata
- Schema evolution
- Partitioning strategies
Module 6: Building ETL and ELT Pipelines
- Ingestion pipelines
- Transformations at scale
- Data enrichment workflows
- Incremental processing
Module 7: Reliability and Data Quality
- Idempotent Spark jobs
- Error handling strategies
- Data validation checks
- Recovery and backfill
Module 8: Performance and Scalability
- Spark optimization techniques
- Parallelism and resource allocation
- Airflow concurrency tuning
- Cost and performance trade-offs
Module 9: Production-Ready Lakehouse Pipelines
- CI/CD for data pipelines
- Versioning and deployments
- Security and access control
- Observability and monitoring
Module 10: Real-World Scenarios and Best Practices
- End-to-end pipeline design
- Common architectural patterns
- Anti-patterns and pitfalls
- Preparing for advanced platforms