Curso Pentaho Data Integration

40 horas

Visão Geral

O Pentaho Data Integration (PDI), que faz parte do Pentaho Open Source BI Suite, inclui todo tipo de software para dar suporte à tomada de decisões de negócios.

Objetivo

Introdução ao PDI
Introdução às transformações
Manipulando dados do mundo real
Filtrando Pesquisando e Executando outra operação útil com Dados.
Transformando o conjunto de linhas
Executando operações avançadas com bancos de dados
Criando fluxos de tarefas básicas (trabalhos)
Integração de dados em tempo real
Ajuste de desempenho
Paralelização, cluster e particionamento

Pre-Requisitos

Conhecimento dos conceitos de ETL

Materiais

Inglês

Conteúdo Programatico

MODULO I

Getting Started with Pentaho Data Integration
Pentaho Data Integration and Pentaho BI Suite
Using PDI in real-world scenarios
Loading data warehouses or data marts
Integrating data
Data Cleansing
Migrating information
Exporting data
Integrating PDI along with other Pentaho tools
Demo – Installing PDI
Demo – starting and customizing Spoon
Setting preferences in the Options window
Storing transformations and jobs in a repository
Demo – creating a hello world transformation Directing Kettle engine with transformations
Exploring the Spoon interface
Designing a transformation
Running and previewing the transformation
Demo – installing MySQL on Windows
Getting Started with Transformations
Designing and previewing transformations
Demo – creating a simple transformation and getting familiar with the design process
Understanding the Kettle rowset
Seeing the results in the Execution-Results-pane
Demo – generating a range of dates and inspecting the data as it is being created
Adding/modifying fields using various PDI steps
Demo – avoiding errors while converting the estimated time from string to integer
The error handling functionality
Demo – configuring the error handling to see the description of the errors
Personalizing the error handling

MODULO II

Manipulating Real-world Data
Reading data from files
Demo – reading results of football matches from files
Reading several files at once
Demo – reading all your files at a time using a single text file input step
regular expressions
Troubleshooting reading files
Sending data to files
Demo – sending the results of matches to a plain file
Getting system information
Demo – reading and writing matches files with flexibility
Running transformations from a terminal window
Demo – running the matches transformation from a terminal window
Demo – getting data from an XML file with information about countries
PDI transformation files
Getting data from XML files
Kettle variables
Filtering/Searching/Performing related Useful Operations with Data
Sorting data
Demo – sorting information about matches with the Sort rows step
Calculations on groups of rows
Demo – calculating football match statistics by grouping data Group by Step
Numeric fields
Filtering
Demo – counting frequent words by filtering
Demo – refining the counting task by filtering even more
Filtering rows using the Filter rows step
Looking up data
Demo – finding out which language people speak
The Stream lookup step
Data cleaning
Demo – fixing words before counting them
Cleansing data with PDI
Controlling the Flow of Data
Splitting streams
Demo – browsing new features of PDI by copying a dataset
Copying rows
Distributing rows
Demo – assigning tasks by distributing
Splitting the stream based on conditions
Demo – assigning tasks by filtering priorities with the Filter rows step
PDI steps – splitting the stream-based on conditions
Demo – assigning tasks by filtering priorities with the Switch/Case step
Merging streams
Demo – gathering progress and merging it all together
PDI options for merging streams
Demo – giving priority to Bouchard by using the Append Stream
Treating invalid data by splitting and merging streams
Demo – treating errors in the estimated time to avoid discarding rows
Treating rows with invalid data
Transforming Your Data by Coding
Doing simple tasks with the JavaScript
Demo – counting frequent words by coding in JavaScript
Using the JavaScript language in PDI
Using transformation predefined constants
Testing the script using the Test script button
Reading and parsing unstructured files with JavaScript
Demo – changing a list of house descriptions with JavaScript
Looping over the dataset rows
Doing simple tasks with the Java Class
Demo – counting frequent words by coding in Java
Using the Java language in PDI
Sending rows to the next step
Data types equivalence
Transforming the dataset with Java
Demo – splitting the field to rows using Java
Avoiding coding by using purpose-built steps

MODULO III

Transforming the Row set
Converting rows to columns
Demo – enhancing the file by converting rows to columns
Converting row data to column data by using the Row Demoralizer
Aggregating data with a Row Demoralizer
Demo – aggregating football matches data with the Row Demoralizer
Using Row Demoralizer for aggregating data
Normalizing data
Demo – enhancing the matches file by normalizing the dataset
Modifying the dataset with a Row Normalizer
Summarizing: PDI steps which operate on sets of rows
Generate a customized time-dimension dataset using Kettle-variables
Demo – creating the time dimension dataset
Getting variables
Demo – parameterizing the start and end date of the time dimension dataset
Working with Databases
Introducing the Steel Wheels sample database
Connecting to the Steel Wheels database
Demo – creating a connection to the Steel Wheels database
Connecting with Relational Database Management Systems
Querying a database
Demo – getting data about shipped orders
Generating data from the database from Table input step
Generating a new dataset using the SELECT statement
Making flexible queries using parameters
Demo – getting orders in a range of dates using parameters
Adding parameters to your queries
Making flexible queries by using Kettle variables
Demo – getting orders in a range of dates by using Kettle variables
Using Kettle variables in your queries
Sending data to a database
Demo – loading a table with a list of manufacturers
Inserting or updating data by using other PDI
Demo – inserting new products or updating existing ones
Demo – testing the update of existing products
Eliminating data from a database
Demo – deleting data about discontinued items
Performing Advanced Operations with Databases
Demo – populating the Jigsaw database
Exploring the Jigsaw database model
Doing simple lookups
Demo – using a Database lookup step to create a list of products to buy
Performing complex lookups
Demo – using a Database join step to create a list of suggested products to buy
Joining data to the stream-data by using a Database-join
Introducing dimensional modeling
Loading dimensions with data
Demo – loading a region dimension with a Combination lookup/update step
Describing data with dimensions
Loading Type I SCD with a Combination lookup/update
Storing history of changes
Demo – keeping a history of changes in products by using the Dimension lookup/update
Keeping an entire history of data with a Type II SCD
Loading Type II SCDs with the Dimension lookup/update step

MODULO IV

Creating Basic Task Flows (Jobs)
Introducing PDI jobs
Demo – creating a folder with a Kettle job
Executing processes with PDI jobs
Using Spoon to design and run jobs
Demo – creating a simple job and getting familiar with the design process
Changing the flow of execution on the basis of conditions
Look at the results – Execution results window
Running transformations from jobs
Demo – generating a range of dates and inspecting how things are running
Using the Transformation job entry
Receiving arguments and parameters in a job
Demo – generating a hello world file by using arguments and parameters
Using named parameters in jobs
Running jobs from a terminal window
Demo – executing the hello world job from a terminal window
Demo – calling the hello world transformation with fixed arguments and parameters
Deciding: use of a command-line argument or a named parameter
Creating Advanced Transformations and Jobs
Re-using part of your transformations
Demo– calculating statistics with the use of a sub-transformation
Creating a job as a process flow
Demo – generating top average scores by copying and getting rows
Use the copy/get rows mechanism to transfer data between transformations
Demo – generating custom files by executing a transformation for every input row
Executing for each row
Enhancing your processes with the use of variables
Demo – generating custom messages by setting a variable with the name of the examination file
Case Study : Developing and Implementing a Sample Datamart
Exploring the sales data mart
Deciding the level of granularity
Loading the dimensions
Demo – loading the dimensions for the sales datamart
Extending the sales datamart model
Loading a fact table with aggregated data
Demo – loading the sales fact table by looking up dimensions
Get the data from the source – SQL queries
Translating the business keys into surrogate keys
Obtaining the surrogate key for Type I SCD
Obtaining the surrogate key for Type II SCD
Obtaining the surrogate key for the Junk dimension
Obtaining the surrogate key for the Time dimension
Getting facts and dimensions together
Demo – loading the fact table using a range of dates obtained from the command line
Demo – loading the SALES star schema model
Automating the administrative tasks
Demo – automating the loading of the sales data mart

MODULO V

Real-Time Data Integration
Introduction to Real-Time ETL
Real-Time Challenges
Requirements
Transformation Streaming
A Practical Example of Transformation Streaming
Debugging
Third-Party Software and Real-Time Integration
Java Message Service
Creating a JMS Connection and Session
Performance Tuning
Transformation Performance: Finding the Weakest Link
Finding Bottlenecks by Simplifying
Finding Bottlenecks by Measuring
copying Rows of Data
Improving Transformation Performance
Using Lazy Conversion for Reading Text Files
Single-File Parallel Reading, Multi-File Parallel Reading
Configuring the NIO Block Size
Changing Disks and Reading Text Files
Improving Performance in Writing Text Files
Using Lazy Conversion for Writing Text Files
Parallel Files Writing
Improving Database Performance
Avoiding Dynamic SQL, Handling Roundtrips
Handling Relational Databases
Sorting Data, Sorting on the Database, Sorting in Parallel
Reducing CPU Usage
Optimizing the Use of JavaScript
Launching Multiple Copies of a Step
Selecting and Removing Values
Managing Thread Priorities
Adding Static Data to Rows of Data
Limiting the Number of Step Copies
Avoiding Excessive Logging
Improving Job Performance
Loops in Jobs
Database Connection Pools
Parallelization, Clustering, and Partitioning
Multi-Threading
Row Distribution, Row Merging, and Row Redistribution
Data Pipelining
Consequences of Multi-Threading
Database Connections
Order of Execution, Parallel Execution in a Job
Using Carte as a Slave Server
The Configuration File
Defining Slave Servers
Remote Execution, Monitoring Slave Servers
Carte Security, Services
Clustering Transformations
Defining a Cluster Schema, Designing Clustered Transformations
Execution and Monitoring
Metadata Transformations
Rules
Data Pipelining, Partitioning
Defining a Partitioning Schema
Objectives of Partitioning, Implementing Partitioning
Internal Variables
Database Partitions
Partitioning in a Clustered Transformation
Bonus Topics
Working with Repositories
Pan/Kitchen – Launching Transformations and Jobs from the Command-Line

TENHO INTERESSE

Cursos Relacionados

Curso Fundamentos de Gerenciamento de Dados Mestres

16 horas

Ver Curso

Curso Big Data Analyst Mineração de Dados

32 horas

Ver Curso

Curso Técnicas de integração de dados ETL

16 horas

Ver Curso

Curso Big Data Boot Camp Visão de Negócios

Ver Curso

Curso Inteligência Artificial / AI Visão Geral

8 horas

Ver Curso

Curso Hadoop Fundamentos

16 horas

Ver Curso

Curso Oracle Fundamentos de Big Data

32 horas

Ver Curso

Curso Fundamentos de Qualidade de Dados

16 horas

Ver Curso

Curso Marchine Learning Com Hadoop

32 horas

Ver Curso

Curso Python for Data Analysis

24 horas

Ver Curso

Curso Pentaho Data Integration

Curso Pentaho Data Integration

Visão Geral

Objetivo

Pre-Requisitos

Materiais

Conteúdo Programatico

Ficha do Curso

Investimento

Formato de Entrega

Calendário

Receba todas informações

Cursos Relacionados

Curso Fundamentos de Gerenciamento de Dados Mestres

Curso Big Data Analyst Mineração de Dados

Curso Técnicas de integração de dados ETL

Curso Big Data Boot Camp Visão de Negócios

Curso Inteligência Artificial / AI Visão Geral

Curso Hadoop Fundamentos

Curso Oracle Fundamentos de Big Data

Curso Fundamentos de Qualidade de Dados

Curso Marchine Learning Com Hadoop

Curso Python for Data Analysis

O que você quer aprender hoje?

Curso Pentaho Data Integration

Curso Pentaho Data Integration

Visão Geral

Objetivo

Pre-Requisitos

Materiais

Conteúdo Programatico

Ficha do Curso

Investimento

Formato de Entrega

Calendário

Receba todas informações

Cursos Relacionados

Curso Fundamentos de Gerenciamento de Dados Mestres

Curso Big Data Analyst Mineração de Dados

Curso Técnicas de integração de dados ETL

Curso Big Data Boot Camp Visão de Negócios

Curso Inteligência Artificial / AI Visão Geral

Curso Hadoop Fundamentos

Curso Oracle Fundamentos de Big Data

Curso Fundamentos de Qualidade de Dados

Curso Marchine Learning Com Hadoop

Curso Python for Data Analysis