Curso Data Analytics With Hadoop And Spark

  • DevOps | CI | CD | Kubernetes | Web3

Curso Data Analytics With Hadoop And Spark

24 horas
Visão Geral

Este Curso Data Analytics With Hadoop And Spark. Os alunos aprenderão como o Spark se encaixa no ecossistema de Big Data e como usar o Spark para análise de dados.
Esta aula é ministrada em linguagem Python e em ambiente Jupyter.
 

Objetivo

Durante o curso Data Analytics With Hadoop And Spark, os alunos aprenderão:

  • Ecossistema Spark
  • Casca de faísca
  • Estruturas de dados Spark (RDD/Dataframe/Dataset)
  • Faísca SQL
  • Formatos de dados modernos e Spark
  • Spark, Hadoop e Hive
Publico Alvo
  • Analistas de dados, analistas de negócios
Pre-Requisitos
  • Experiência de analista (familiaridade com SQL, Scripting ..etc)

 

Materiais
Inglês/Português/Lab Prático
Conteúdo Programatico

Spark Introduction

  1. Big Data, Hadoop, Spark
  2. Spark concepts and architecture
  3. Spark components overview
  4. Labs : Installing and running Spark

First Look at Spark

  1. Spark shell
  2. Spark web UIs
  3. Analyzing dataset – part 1
  4. Labs: Spark shell exploration

Spark Data structures

  1. Partitions
  2. Distributed execution
  3. Operations: transformations and actions
  4. Labs: Unstructured data analytics using RDDs

Caching

  1. Caching overview
  2. Various caching mechanisms available in Spark
  3. In memory file systems
  4. Caching use cases and best practices
  5. Labs: Benchmark of caching performance

Dataframes / Datasets

  1. Dataframes Intro
  2. Loading structured data (json, CSV) using Dataframes
  3. Using schema
  4. Specifying schema for Dataframes
  5. Labs : Dataframes, Datasets, Schema

Spark SQL

  1. Spark SQL concepts and overview
  2. Defining tables and importing datasets
  3. Querying data using SQL
  4. Handling various storage formats : JSON / Parquet / ORC
  5. Labs: querying structured data using SQL; evaluating data formats

Spark and Hadoop

  1. Hadoop Primer: HDFS / YARN
  2. Hadoop + Spark architecture
  3. Running Spark on Hadoop YARN
  4. Processing HDFS files using Spark
  5. Spark & Hive

Workshops

  1. These are group workshops
  2. Attendees will work on solving real-world data analysis problems using Spark
TENHO INTERESSE

Cursos Relacionados

Curso Terraform Deploying to Oracle Cloud Infrastructure

24 Horas

Ansible Overview of Ansible architecture

16h

Curso ISO/IEC 20000-1 Certification Service Management

24 horas

Curso Secure Automotive Software Development

24 horas

Curso FinOps for Machine Learning Platforms

16 horas

Curso FinOps for AI Practitioners

16 horas