Curso Data Analytics With Hadoop And Spark

  • DevOps | CI | CD | Kubernetes | Web3

Curso Data Analytics With Hadoop And Spark

24 horas
Visão Geral

Este Curso Data Analytics With Hadoop And Spark. Os alunos aprenderão como o Spark se encaixa no ecossistema de Big Data e como usar o Spark para análise de dados.
Esta aula é ministrada em linguagem Python e em ambiente Jupyter.
 

Objetivo

Durante o curso Data Analytics With Hadoop And Spark, os alunos aprenderão:

  • Ecossistema Spark
  • Casca de faísca
  • Estruturas de dados Spark (RDD/Dataframe/Dataset)
  • Faísca SQL
  • Formatos de dados modernos e Spark
  • Spark, Hadoop e Hive
Publico Alvo
  • Analistas de dados, analistas de negócios
Pre-Requisitos
  • Experiência de analista (familiaridade com SQL, Scripting ..etc)

 

Materiais
Inglês/Português/Lab Prático
Conteúdo Programatico

Spark Introduction

  1. Big Data, Hadoop, Spark
  2. Spark concepts and architecture
  3. Spark components overview
  4. Labs : Installing and running Spark

First Look at Spark

  1. Spark shell
  2. Spark web UIs
  3. Analyzing dataset – part 1
  4. Labs: Spark shell exploration

Spark Data structures

  1. Partitions
  2. Distributed execution
  3. Operations: transformations and actions
  4. Labs: Unstructured data analytics using RDDs

Caching

  1. Caching overview
  2. Various caching mechanisms available in Spark
  3. In memory file systems
  4. Caching use cases and best practices
  5. Labs: Benchmark of caching performance

Dataframes / Datasets

  1. Dataframes Intro
  2. Loading structured data (json, CSV) using Dataframes
  3. Using schema
  4. Specifying schema for Dataframes
  5. Labs : Dataframes, Datasets, Schema

Spark SQL

  1. Spark SQL concepts and overview
  2. Defining tables and importing datasets
  3. Querying data using SQL
  4. Handling various storage formats : JSON / Parquet / ORC
  5. Labs: querying structured data using SQL; evaluating data formats

Spark and Hadoop

  1. Hadoop Primer: HDFS / YARN
  2. Hadoop + Spark architecture
  3. Running Spark on Hadoop YARN
  4. Processing HDFS files using Spark
  5. Spark & Hive

Workshops

  1. These are group workshops
  2. Attendees will work on solving real-world data analysis problems using Spark
TENHO INTERESSE

Cursos Relacionados

Curso Ansible Red Hat Basics Automation Technical Foundation

16 horas

Curso Terraform Deploying to Oracle Cloud Infrastructure

24 Horas

Curso Ansible Linux Automation with Ansible

24 horas

Ansible Overview of Ansible architecture

16h

Advanced Automation: Ansible Best Practices

32h

Curso Red Hat DevOps Pipelines and Processes: CI/CD with Jenkins

24h

Curso Cloud Security and DevSecOps Automation

32 horas