Curso Advanced Data Engineering with Databricks

  • DevOps | CI | CD | Kubernetes | Web3

Curso Advanced Data Engineering with Databricks

24 horas
Visão Geral

O Curso Advanced Data Engineering with Databricks foi desenvolvido para profissionais que já possuem experiência em engenharia de dados e desejam aprofundar-se nas práticas avançadas de desenvolvimento, otimização e automação de pipelines de dados utilizando o Databricks Unified Data Analytics Platform.

Durante o treinamento, o aluno aprenderá a otimizar jobs complexos, implementar pipelines de produção, gerenciar dados em larga escala, trabalhar com Delta Lake avançado, orquestrar fluxos de dados com Databricks Workflows e integrar ferramentas de machine learning e streaming em tempo real.

O curso combina teoria e prática, com laboratórios que simulam cenários corporativos reais.

Objetivo

Após realizar o Curso Advanced Data Engineering with Databricks, você será capaz de:

  • Projetar e otimizar pipelines de dados altamente escaláveis
  • Aplicar técnicas de performance tuning em jobs Spark e Delta Lake
  • Implementar arquiteturas modernas de dados (medallion architecture)
  • Gerenciar pipelines de dados contínuos e orquestrados com Databricks Workflows
  • Integrar dados em tempo real e machine learning pipelines
  • Automatizar e monitorar todo o ciclo de vida do pipeline de dados
Publico Alvo
  • Engenheiros de Dados e Arquitetos de Dados experientes
  • Cientistas de Dados que desejam aprimorar pipelines de dados em Databricks
  • Profissionais responsáveis por migração e integração de dados em larga escala
  • Engenheiros de Plataforma e DevOps que trabalham com data pipelines e automação na nuvem
Pre-Requisitos
  • Conhecimento prévio em Databricks Fundamentals
  • Experiência com SQL, Python e Spark
  • Familiaridade com Delta Lake, ETL e cloud computing (Azure, AWS ou GCP)
Informações Gerais

Metodologia

  • Curso ao vivo via Microsoft Teams
  • Ministrado por instrutor/consultor ativo no mercado e docente em sala de aula
  • Curso prático, com laboratórios individuais e cenários reais
  • Um aluno por microcomputador com ambiente Databricks configurado
  • Apostilas e exercícios práticos inclusos
  • Metodologia que combina teoria, prática e troubleshooting avançado
Materiais
Inglês/Português Lab Pratico
Conteúdo Programatico

Module 1: Databricks Advanced Overview

  1. Review of Databricks Lakehouse Architecture

  2. Advanced cluster configuration and optimization
  3. Databricks Runtime internals and job execution lifecycle
  4. Workspace organization, governance, and multi-environment management

Module 2: Advanced Delta Lake Concepts

  1. Delta Lake internals: transaction logs and data versioning

  2. Schema evolution, time travel, and optimization techniques
  3. Delta Live Tables and change data capture (CDC)
  4. Implementing SCD Type 1, 2, and 3 in Delta Lake
  5. Managing large tables and compaction strategies

Module 3: Data Pipeline Architecture

  1. Building advanced ETL/ELT pipelines

  2. Medallion Architecture (Bronze, Silver, Gold layers)
  3. Incremental data loading and upsert operations
  4. Handling late-arriving and duplicate data
  5. Designing resilient and idempotent pipelines

Module 4: Performance Tuning and Optimization

  1. Spark job optimization: partitioning, caching, and broadcast joins

  2. Adaptive query execution (AQE)
  3. Delta Lake optimization commands and Z-Ordering
  4. Profiling, debugging, and performance monitoring
  5. Managing cluster costs and job scheduling

Module 5: Advanced Data Orchestration

  1. Databricks Workflows: building and managing complex pipelines

  2. Task dependencies and retries
  3. Integrating with Airflow, Azure Data Factory, and Prefect
  4. Parameterized pipelines and reusable templates
  5. Monitoring and alerting on workflow failures

Module 6: Streaming Data and Real-Time Processing

  1. Structured Streaming concepts and architecture
  2. Reading and writing streaming data with Delta Lake
  3. Stateful stream processing and watermarks
  4. Integrating with Kafka, Event Hubs, and Kinesis
  5. Handling data consistency and fault tolerance

Module 7: Data Quality and Observability

  1. Implementing data validation and testing frameworks
  2. Integrating Databricks SQL for data quality dashboards
  3. Automating anomaly detection
  4. Logging and monitoring best practices
  5. Governance with Unity Catalog and fine-grained access control

Module 8: Advanced Machine Learning Integration

  1. Integrating feature pipelines with MLflow
  2. Managing data versioning for ML training
  3. Automating feature engineering pipelines
  4. Real-time inference using streaming data and Delta Live Tables

Module 9: Automation and CI/CD for Data Engineering

  1. Managing code versioning with Git integration

  2. Implementing Databricks Repos and notebooks versioning
  3. CI/CD using GitHub Actions and Azure DevOps
  4. Automated deployment of jobs and clusters
  5. Infrastructure as code with Terraform and Databricks Provider

Module 10: Hands-on Labs

  1. Creating and optimizing an end-to-end ETL pipeline

  2. Implementing Medallion architecture with Delta Live Tables
  3. Configuring Databricks Workflows with dependencies
  4. Monitoring and tuning real-time data ingestion
  5. Automating job deployment with CI/CD

Module 11: Case Studies and Best Practices

  1. Enterprise use cases and lessons learned

  2. Cost optimization in Databricks environments
  3. Data governance and compliance considerations
  4. Building a scalable and maintainable data engineering platform
  5. Future trends in data engineering with Databricks
TENHO INTERESSE

Cursos Relacionados

Curso Ansible Red Hat Basics Automation Technical Foundation

16 horas

Curso Terraform Deploying to Oracle Cloud Infrastructure

24 Horas

Curso Ansible Linux Automation with Ansible

24 horas

Ansible Overview of Ansible architecture

16h

Advanced Automation: Ansible Best Practices

32h