Curso Cloudera Data Analyst

  • Data Science Analytic

Curso Cloudera Data Analyst

24 horas
Visão Geral

Neste Curso Cloudera Data Analyst, você obterá as habilidades necessárias para aplicar análises de dados tradicionais e habilidades de business intelligence a big data.

Seu instrutor especialista apresentará as ferramentas e técnicas necessárias para acessar, manipular, transformar e analisar conjuntos de dados complexos usando SQL e linguagens de script familiares.

Objetivo

Após participar com êxito deste Curso Cloudera Data Analyst, você será capaz de:

  • Os recursos que Pig, Hive e Impala oferecem para aquisição, armazenamento e análise de dados
  • Os fundamentos do Apache Hadoop e ETL de dados (extrair, transformar, carregar), ingestão e processamento com Hadoop
  • Como Pig, Hive e Impala melhoram a produtividade para tarefas típicas de análise
  • Unindo diversos conjuntos de dados para obter informações valiosas sobre negócios
  • Execução de consultas complexas em tempo real em conjuntos de dados
Pre-Requisitos
  • SQL
  • Linha de comando do Linux
  • Pelo menos uma linguagem de script (por exemplo, script Bash, Perl, Python, Ruby).
Materiais
Inglês/Português/Lab Prático
Conteúdo Programatico

Introduction Apache Hadoop Fundamentals

  1. The Motivation for Hadoop
  2. Hadoop Overview
  3. Data Storage: HDFS
  4. Distributed Data Processing: YARN, MapReduce, and Spark
  5. Data Processing and Analysis: Pig, Hive, and Impala
  6. Database Integration: Sqoop
  7. Other Hadoop Data Tools
  8. Exercise Scenarios

Introduction to Apache Pig

  1. What is Pig?
  2. Pig's Features
  3. Pig Use Cases
  4. Interacting with Pig

Basic Data Analysis with Apache Pig

  1. Pig Latin Syntax
  2. Loading Data
  3. Simple Data Types
  4. Field Definitions
  5. Data Output
  6. Viewing the Schema
  7. Filtering and Sorting Data
  8. Commonly Used Functions

Processing Complex Data with Apache Pig

  1. Storage Formats
  2. Complex/Nested Data Types
  3. Grouping
  4. Built-In Functions for Complex Data
  5. Iterating Grouped Data

Multi-Dataset Operations with Apache Pig

  1. Techniques for Combining Datasets
  2. Joining Datasets in Pig
  3. Set Operations
  4. Splitting Datasets

Apache Pig Troubleshooting and Optimisation

  1. Troubleshooting Pig
  2. Logging
  3. Using Hadoop's Web UI
  4. Data Sampling and Debugging
  5. Performance Overview
  6. Understanding the Execution Plan
  7. Tips for Improving the Performance of Pig Jobs

Introduction to Apache Hive and Impala

  1. What is Hive?
  2. What is Impala?
  3. Why Use Hive and Impala?
  4. Schema and Data Storage
  5. Comparing Hive and Impala to Traditional Databases
  6. Use Cases

Querying with Apache Hive and Impala

  1. Databases and Tables
  2. Basic Hive and Impala Query Language Syntax
  3. Data Types
  4. Using Hue to Execute Queries
  5. Using Beeline (Hive's Shell)
  6. Using the Impala Shell

Apache Hive and Impala Data Management

  1. Data Storage
  2. Creating Databases and Tables
  3. Loading Data
  4. Altering Databases and Tables
  5. Simplifying Queries with Views
  6. Storing Query Results

Data Storage and Performance

  1. Partitioning Tables
  2. Loading Data into Partitioned Tables
  3. When to Use Partitioning
  4. Choosing a File Format
  5. Using Avro and Parquet File Formats

Relational Data Analysis with Apache Hive and Impala

  1. Joining Datasets
  2. Common Built-In Functions
  3. Aggregation and Windowing

Complex Data with Apache Hive and Impala

  1. Complex Data with Hive
  2. Complex Data with Impala

Analysing Text with Apache Hive and Impala

  1. Using Regular Expressions with
  2. Hive and Impala
  3. Processing Text Data with SerDes in Hive
  4. Sentiment Analysis and n-grams in Hive

Apache Hive Optimisation

  1. Understanding Query Performance
  2. Bucketing
  3. Indexing Data
  4. Hive on Spark

Apache Impala Optimisation

  1. How Impala Executes Queries
  2. Improving Impala Performance

Extending Apache Hive and Impala

  1. Custom SerDes and File Formats in Hive
  2. Data Transformation with
    1. Custom Scripts in Hive
    2. User-Defined Functions
    3. Parameterised Queries

Choosing the Best Tool for the Job

  1. Comparing Pig, Hive, Impala, and Relational Databases
TENHO INTERESSE

Cursos Relacionados

Curso Fundamentos de Gerenciamento de Dados Mestres

16 horas

Curso Big Data Analyst Mineração de Dados

32 horas

Curso Técnicas de integração de dados ETL

16 horas

Curso Big Data Boot Camp Visão de Negócios

Curso Inteligência Artificial / AI Visão Geral

8 horas

Curso Oracle Fundamentos de Big Data

32 horas

Curso Fundamentos de Qualidade de Dados

16 horas

Curso Marchine Learning Com Hadoop

32 horas

Curso Python for Data Analysis

24 horas