Curso Pentaho Data Integration Fundamentals

  • Tableau Data Visualization

Curso Pentaho Data Integration Fundamentals

40 horas
Visão Geral

Curso Pentaho Data Integration Fundamentals. Uma introdução de  24 horas à transformação de dados usando Pentaho Data Integration (PDI). Do começo ao desenvolvimento de uma estrutura ETL para ingerir arquivos de estrutura variada.

Objetivo

Transformações

  • Etapas de entrada e saída
  • Transformações de campo
  • Junções e pesquisas
  • Transformações de conjuntos
  • Entradas JSON e XML
  • Variáveis ​​e portabilidade
  • Registro e desempenho
  • Injeção de metadados

Empregos

  • Orquestração básica
  • Gerenciamento de arquivos e bancos de dados
  • Iteração e loop em trabalhos
Publico Alvo

Novos usuários encarregados de usar o Pentaho e/ou usuários existentes que buscam formalizar seus conhecimentos - Analistas de negócios, analistas de dados e desenvolvedores de ETL.

Materiais
Inglês/Português/Lab Prático
Conteúdo Programatico

Introduction

  1. Installing and starting PDI. The user interface

Part I – Transformations

Input and output steps; 

  1. Exploration of the various ways to read data into, and write data out of, PDI: CSV files, Excel files SQL queries, etc. Installing JDBC drivers
  2. Lab 1: CSV Input, MySQL output

Field transformations

  1. Overview of various transformation steps: Calculator, string manipulation, adding counters, value mapping, handling nulls, javascript and regular expressions.

Joins and lookups

  1. Merging two or more data streams and combining the data: managing slowly changing dimensions dimensions, in-memory and database lookups, querying HTTP services/apis, merge joins, row diff, etc.
  2. Lab 2: Joins and lookups (enriching data stream)

Set transformations

  1. Operations on groups of rows: sorting, grouping, splitting fields into rows, normalising/denormalising data, cloning, appending.
  2. Lab 3: Grouping data

JSON and XML inputs

  1. Reading XML data via Xpath and using the very fast performing StaX parser. JSON parsing via JSONpath
  2. Lab 4: JSON and XML inputs (Xpath, Stax parser, Jsonpath)

Variables and portability

  1. Setting and getting variables; global variables, runtime variables, parameters; portable connections, file paths, and other best practices
  2. Lab 5: Portable transformations

Logging and Performance

  1. Reading PDI logs; analysing performance and runtime metrics; examples of fast and slow streps, identifying bottlenecks; step copies in parallel

Metadata injection

  1. Use cases for metadata injection. Modifying metadata in runtime. Advanced metadata injection options.
  2. Lab 6: Flexible CSV loading

Part II – Jobs

Basic orchestration

  1. Usage of PDI jobs to orchestrate tasks; overview of job entries :sub-jobs, sub-transformations, SQL, shell scripts, conditions, error handling, getting/putting files, etc. Wrapper jobs.

File and DB management

  1. Using lock files; downloading and archiving files; checking database connections; conditionally create/drop/modify database structure; error handling; recording execution results
  2. Lab 7: Building a simple job

Iteration and looping in jobs

  1. Run job/transformation for each file in folder; handling different file types in one go; iterating over API results; loop until condition met; running .sh or .bat scripts depending on OS
  2. Lab 8: Developing a powerful ETL framework
TENHO INTERESSE

Cursos Relacionados

Curso Análise de Dados Com o Power BI - 20778B

24 horas

Curso Análise de dados Excel Com Power BI - 20779B

16 horas

Curso Talend Data Integration Foundation

16 horas

Curso Talend Data Integration Advanced

16 horas

Curso Advanced Data Analysis and Dashboard Reporting

28 horas