Curso Analyzing Big Data Using Hadoop Hive Spark and HBase

32 horas

Visão Geral

Este Curso Analyzing Big Data Using Hadoop Hive Spark and HBase, começa com uma visão geral do Big Data e seu papel na empresa. Em seguida, apresenta o Hadoop Distributed File System (HDFS), que é a base para muitas das outras tecnologias de Big Data mostradas no curso. O Hadoop MapReduce é então apresentado e aplicativos MapReduce simples são demonstrados usando APIs de streaming e Java.

Neste ponto, o cenário está montado para apresentar o Apache Spark no YARN como uma plataforma flexível e de alto desempenho para computação em cluster. A arquitetura e APIs do Spark são apresentadas com ênfase na mineração de dados HDFS com MapReduce.

O foco do Curso Analyzing Big Data Using Hadoop Hive Spark and HBase, muda então para o uso do Hadoop como plataforma de data warehouse. A primeira tecnologia analisada sob essa perspectiva é o Apache Hive. O Hive permite que os clientes acessem arquivos HDFS como se fossem tabelas relacionais. Isso é feito usando uma linguagem de consulta semelhante a SQL chamada Hive Query Language (HQL). O curso oferece uma visão geral do HQL e mostra como os metadados da tabela podem ser acessados por outros aplicativos, como o Spark.

Objetivo

Depois de concluir com êxito este Curso Analyzing Big Data Using Hadoop Hive Spark and HBase, os alunos serão capazes de:

Descreva a arquitetura do Hadoop
Gerencie arquivos e diretórios no HDFS
Explique os componentes de um aplicativo MapReduce no Hadoop
Implementar e executar aplicativos Apache Spark
Use o Hive Query Language (HQL) para analisar dados HDFS
Crie tabelas mutáveis no HDFS com HBase
Processe dados de streaming quase em tempo real com Apache Storm

Materiais

Inglês + Exercícios + Lab Pratico

Conteúdo Programatico

Overview of Big Data

What Is Big Data?
Big Data Use Cases
The Rise of the Data Center and Cloud Computing
MapReduce and Batch Data Processing
MapReduce and Near Real-Time (Stream) Processing
NoSQL Solutions for Persisting Big Data
The Big Data Ecosystem

The Hadoop Distributed File System (HDFS)

Overview of HDFS
Launching HDFS in Pseudo-Distributed Mode
Core HDFS Services
Installing and Configuring HDFS
HDFS Commands
HDFS Safe Mode
Check Pointing HDFS
Federated and High Availability HDFS
Running a Fully-Distributed HDFS Cluster with Docker

MapReduce with Hadoop

MapReduce from the Linux Command Line
Scaling MapReduce on a Cluster
Introducing Apache Hadoop
Overview of YARN
Launching YARN in Pseudo-Distributed Mode
Demonstration of the Hadoop Streaming API
Demonstration of MapReduce with Java

Introduction to Apache Spark

Why Spark?
Spark Architecture
Spark Drivers and Executors
Spark on YARN
Spark and the Hive Metastore
Structured APIs, DataFrames, and Datasets
The Core API and Resilient Distributed Datasets (RDDs)
Overview of Functional Programming
MapReduce with Python

Apache Hive

Hive as a Data Warehouse
Hive Architecture
Understanding the Hive Metastore and HCatalog
Interacting with Hive using the Beeline Interface
Creating Hive Tables
Loading Text Data Files into Hive
Exploring the Hive Query Language
Partitions and Buckets
Built-in and Aggregation Functions
Invoking MapReduce Scripts from Hive
Common File Formats for Big Data Processing
Creating Avro and Parquet Files with Hive
Creating Hive Tables from Pig
Accessing Hive Tables with the Spark SQL Shell

Persisting Data with Apache HBase

Features and Use Cases
HBase Architecture
The Data Model
Command Line Shell
Schema Creation
Considerations for Row Key Design

Apache Storm

Processing Real-Time Streaming Data
Storm Architecture: Nimbus, Supervisors, and ZooKeeper
Application Design: Topologies, Spouts, and Bolts

Apache Pig

Declarative vs. Procedural
Role of Pig
Setting Up Pig
Loading and Working with Data
Writing a Pig Script
Executing Pig in Local and Hadoop Mode
Filtering Results
Storing, Loading, Dumping

Getting the Most Out of Pig

Relations, Tuples, Fields
Pig Data Types
Tuples, Bags, and Maps
Flatten on Bags and Tuples
Join and Union
Regular Expressions

32 horas

Ver Curso

Curso Analyzing Big Data Using Hadoop Hive Spark and HBase

Curso Analyzing Big Data Using Hadoop Hive Spark and HBase

Visão Geral

Objetivo

Materiais

Conteúdo Programatico

Ficha do Curso

Investimento

Formato de Entrega

Nível

Calendário

Receba todas informações

Cursos Relacionados

Curso Docker Advanced

Curso AI ML Toolkits with Kubeflow Foundation

Curso Container Management with Docker

Curso TensorFlow

Curso Machine Learning Python & R In Data Science

Curso Docker for Developers and System Administrators

Curso artificial inteligence AI for Everyone Foundation

Curso IA Inteligência Artificial e Código Aberto Foundation

Curso Artificial Intelligence with Azure

Curso RPA Robotic Process Automation Industria 4.0

O que você quer aprender hoje?

Curso Analyzing Big Data Using Hadoop Hive Spark and HBase

Curso Analyzing Big Data Using Hadoop Hive Spark and HBase

Visão Geral

Objetivo

Materiais

Conteúdo Programatico

Ficha do Curso

Investimento

Formato de Entrega

Nível

Calendário

Receba todas informações

Cursos Relacionados

Curso Docker Advanced

Curso AI ML Toolkits with Kubeflow Foundation

Curso Container Management with Docker

Curso TensorFlow

Curso Machine Learning Python & R In Data Science

Curso Docker for Developers and System Administrators

Curso artificial inteligence AI for Everyone Foundation

Curso IA Inteligência Artificial e Código Aberto Foundation

Curso Artificial Intelligence with Azure

Curso RPA Robotic Process Automation Industria 4.0