Conteúdo Programatico
Become a better problem solver
Most online courses will teach specific tools. But knowing 20+ tools doesn't mean you can solve real data problems. We strongly believe that problem-solving skills are the most essential for a successful career in data engineering. Therefore, we've designed a very unique approach to teaching learners how to solve tough problems and become independent thinkers.
- Build a strong foundation in DE concepts and know the right questions to ask
- Become a Google/Stackoverflow pro. Know the most relevant keywords to use for problem solving
- Develop new skills in how to approach new projects and problems
- Learn modern tools and platforms and know how to put them to work together
- Become a better team player through group assignments, projects, and client projects
Learn Docker and Containerization
While docker and kubernetes are containerization and orchestration tools that DevOps engineers and infrastructure engineers will work on, data engineers also need to have a good understanding of it. Whether it's setting up airflow servers using docker compose, or running spark jobs on kubernetes, docker is an essential tool that data engineers need to be comfortable with.
- Learn how docker containers work
- Know how to deploy Flask apps using docker and docker compose
- Understand the basics of kubernetes and container orchestration
- Build hands-on experience with deploying airflow and submit airflow jobs using docker on Kubernetes
Become experienced in AWS and GCP
Cloud computing plays a more and more important role in current days. Applications and services are typically run on servers, which are comprised of CPU-processor, RAM-memory and storage-HDD, SSD. Instead of owning and provisioning servers in the on-prem data center, you could RENT the compute power and move it into the cloud. This means cloud computing is an on-demand delivery of computing power, database storage, applications and other IT resources through cloud services platforms via the internet. Cloud providers(Amazon Web Services-AWS, Google Cloud Platform(GCP), Microsoft Azure) provide rapid access to flexible and low-cost IT resources.
- Work with several data engineering services on AWS and GCP
- Hands-on experience with setting up an end-to-end pipeline across multiple services on AWS and GCP
- Learn how to deploy pipelines in the cloud
Build ETL/ELT data pipelines using Apache Airflow
Apache Airflow is an open-source workflow management platform to programmatically author, schedule, and monitor workflows. It is one of the most robust platforms used by Data Engineers for orchestrating workflows or pipelines. You can easily visualize your data pipelines' dependencies, progress, logs, code, trigger tasks, and success status.
- Become an advanced Apache Airflow user
- Hands-on experience with automating your data pipeline using Apache Airflow
- Develop your ability to maintain and trouble-shoot an Airflow pipeline
Data Ingestion: NiFi, Kafka, Kinesis
Data Ingestion is the first step of a big data processing system and it is essential for the Data ETL pipeline. The data ingestion layer is the backbone of any analytics architecture. Downstream reporting and analytics systems rely on consistent and accessible data. As a Data Engineer, Apache NiFi, Apache Kafka, AWS Kinesis are important and commonly used tools for Data Ingestion.
- Learn different Data Ingestion tools
- Ability to process data from different sources using the Data Ingestion tools
- Hands-on experiences with connecting Data Ingestion tools with the downstream tasks
NoSQL and Big Data
What makes data engineering both exciting and challenging in recent years is the shift from the traditional data warehouse to the data lake and even the new data lakehouse approach. In the data warehouse part of the program, we will cover the fundamental concepts of data modelling, star schema, and snowflake schema. Students will learn how to create ER diagrams and implement their own data warehouse. Modern tools and technologies such as BigQuery, Redshift, and Snowflake will be covered as hands-on labs so students get to experience how it works in practice. We will also cover different scenarios where each approach is a better fit for the problem.
- Learn the basics of NoSQL databases
- Understand the key features of NoSQL databases
- Learn how to work with NoSQL databases