Curso Big Data Integration & ETL

  • Big Data

Curso Big Data Integration & ETL

24 horas
Visão Geral

Curso Big Data Integration & ETL. Na era do big data e da IA ​​orientada por dados, muitas empresas começam a perceber a importância de estabelecer as melhores práticas de engenharia de dados. Como resultado, a demanda por engenharia de dados vem crescendo rapidamente. Atualmente existe um grande descompasso entre oferta e demanda no mercado de talentos. Uma razão para o desequilíbrio é que a engenharia de dados moderna requer novas ferramentas/habilidades e os ambientes de aprendizado tradicionais, como universidades, faculdades e bootcamps, não acompanham as tendências. Outra razão é que a engenharia de dados é difícil de ensinar! Os currículos precisam ser extremamente práticos e exigem instrutores muito experientes que trabalham no campo para ensinar da maneira mais prática.

Português/Inglês + Exercícios + Lab Pratico
Conteúdo Programatico

Become a better problem solver

Most online courses will teach specific tools. But knowing 20+ tools doesn't mean you can solve real data problems. We strongly believe that problem-solving skills are the most essential for a successful career in data engineering. Therefore, we've designed a very unique approach to teaching learners how to solve tough problems and become independent thinkers.

  1. Build a strong foundation in DE concepts and know the right questions to ask
  2. Become a Google/Stackoverflow pro. Know the most relevant keywords to use for problem solving
  3. Develop new skills in how to approach new projects and problems
  4. Learn modern tools and platforms and know how to put them to work together
  5. Become a better team player through group assignments, projects, and client projects

Learn Docker and Containerization

While docker and kubernetes are containerization and orchestration tools that DevOps engineers and infrastructure engineers will work on, data engineers also need to have a good understanding of it. Whether it's setting up airflow servers using docker compose, or running spark jobs on kubernetes, docker is an essential tool that data engineers need to be comfortable with.

  1. Learn how docker containers work
  2. Know how to deploy Flask apps using docker and docker compose
  3. Understand the basics of kubernetes and container orchestration
  4. Build hands-on experience with deploying airflow and submit airflow jobs using docker on Kubernetes

Become experienced in AWS and GCP

Cloud computing plays a more and more important role in current days. Applications and services are typically run on servers, which are comprised of CPU-processor, RAM-memory and storage-HDD, SSD. Instead of owning and provisioning servers in the on-prem data center, you could RENT the compute power and move it into the cloud. This means cloud computing is an on-demand delivery of computing power, database storage, applications and other IT resources through cloud services platforms via the internet. Cloud providers(Amazon Web Services-AWS, Google Cloud Platform(GCP), Microsoft Azure) provide rapid access to flexible and low-cost IT resources.

  1. Work with several data engineering services on AWS and GCP
  2. Hands-on experience with setting up an end-to-end pipeline across multiple services on AWS and GCP
  3. Learn how to deploy pipelines in the cloud

Build ETL/ELT data pipelines using Apache Airflow

Apache Airflow is an open-source workflow management platform to programmatically author, schedule, and monitor workflows. It is one of the most robust platforms used by Data Engineers for orchestrating workflows or pipelines. You can easily visualize your data pipelines' dependencies, progress, logs, code, trigger tasks, and success status.

  1. Become an advanced Apache Airflow user
  2. Hands-on experience with automating your data pipeline using Apache Airflow
  3. Develop your ability to maintain and trouble-shoot an Airflow pipeline

Data Ingestion: NiFi, Kafka, Kinesis

Data Ingestion is the first step of a big data processing system and it is essential for the Data ETL pipeline. The data ingestion layer is the backbone of any analytics architecture. Downstream reporting and analytics systems rely on consistent and accessible data. As a Data Engineer, Apache NiFi, Apache Kafka, AWS Kinesis are important and commonly used tools for Data Ingestion.

  1. Learn different Data Ingestion tools
  2. Ability to process data from different sources using the Data Ingestion tools
  3. Hands-on experiences with connecting Data Ingestion tools with the downstream tasks

NoSQL and Big Data

What makes data engineering both exciting and challenging in recent years is the shift from the traditional data warehouse to the data lake and even the new data lakehouse approach. In the data warehouse part of the program, we will cover the fundamental concepts of data modelling, star schema, and snowflake schema. Students will learn how to create ER diagrams and implement their own data warehouse. Modern tools and technologies such as BigQuery, Redshift, and Snowflake will be covered as hands-on labs so students get to experience how it works in practice. We will also cover different scenarios where each approach is a better fit for the problem.

  1. Learn the basics of NoSQL databases
  2. Understand the key features of NoSQL databases
  3. Learn how to work with NoSQL databases

Cursos Relacionados

Curso Data Lake Inteligente Fundamentos para Analistas

16 horas

Curso Apache Spark and Scala

24 horas de curso pratico

Curso BigQuery Google Foudation

16 horas

Curso Bamboo Integração contínua

24 Horas

Curso Python 6 Projetos Python com Programacao Foundation to Advanced

60 horas

Curso Big Data Business Intelligence for Criminal Intelligence Analysis


Curso Cloudera for Apache Kafka Overview

32 horas

Curso Cloudera Data Engineering Developing Applications with Apache Spark

32 horas