Visão Geral
Este Curso Data Analytics With Hadoop And Spark. Os alunos aprenderão como o Spark se encaixa no ecossistema de Big Data e como usar o Spark para análise de dados.
Esta aula é ministrada em linguagem Python e em ambiente Jupyter.
Objetivo
Durante o curso Data Analytics With Hadoop And Spark, os alunos aprenderão:
- Ecossistema Spark
- Casca de faísca
- Estruturas de dados Spark (RDD/Dataframe/Dataset)
- Faísca SQL
- Formatos de dados modernos e Spark
- Spark, Hadoop e Hive
Publico Alvo
- Analistas de dados, analistas de negócios
Pre-Requisitos
- Experiência de analista (familiaridade com SQL, Scripting ..etc)
Materiais
Inglês/Português/Lab Prático
Conteúdo Programatico
Spark Introduction
- Big Data, Hadoop, Spark
- Spark concepts and architecture
- Spark components overview
- Labs : Installing and running Spark
First Look at Spark
- Spark shell
- Spark web UIs
- Analyzing dataset – part 1
- Labs: Spark shell exploration
Spark Data structures
- Partitions
- Distributed execution
- Operations: transformations and actions
- Labs: Unstructured data analytics using RDDs
Caching
- Caching overview
- Various caching mechanisms available in Spark
- In memory file systems
- Caching use cases and best practices
- Labs: Benchmark of caching performance
Dataframes / Datasets
- Dataframes Intro
- Loading structured data (json, CSV) using Dataframes
- Using schema
- Specifying schema for Dataframes
- Labs : Dataframes, Datasets, Schema
Spark SQL
- Spark SQL concepts and overview
- Defining tables and importing datasets
- Querying data using SQL
- Handling various storage formats : JSON / Parquet / ORC
- Labs: querying structured data using SQL; evaluating data formats
Spark and Hadoop
- Hadoop Primer: HDFS / YARN
- Hadoop + Spark architecture
- Running Spark on Hadoop YARN
- Processing HDFS files using Spark
- Spark & Hive
Workshops
- These are group workshops
- Attendees will work on solving real-world data analysis problems using Spark
TENHO INTERESSE