Curso Scalable Machine Learning

  • DevOps | CI | CD | Kubernetes | Web3

Curso Scalable Machine Learning

24 horas
Visão Geral

O Curso Scalable Machine Learning, foi projetado e desenvolvido para fornecer aos alunos exposição em Scalable Machine Learning. O curso se concentra na utilização dos frameworks Hadoop e Spark para implementar algoritmos SML por meio das linguagens de programação Scala e Python.

O Curso Scalable Machine Learning, começa com uma introdução ao SML e por que os desenvolvedores usam o Spark para SML. Em seguida, o curso se aprofunda na aquisição de dados, no pré-processamento de dados para modelagem e no trabalho com algoritmos iterativos. O curso termina com avaliação, otimização e implantação de modelos.

Objetivo

Após este Curso Scalable Machine Learning, você será capaz de:

  • Descreva a função do Spark no aprendizado de máquina
  • Aplique aprendizado de máquina em grandes conjuntos de dados
  • Demonstrar experiência em análise e modelagem de processamento de aquisição de dados usando Hadoop e Spark
  • Avalie vários tipos comuns de dados, por exemplo, dados de mídia social CSV XML JSON etc. para pré-processamento e/ou construção de modelos de aprendizado de máquina usando Spark
  • Treinar, testar e implantar modelos de aprendizado de máquina
Materiais
Português/Inglês + Exercícios + Lab Pratico
Conteúdo Programatico

Introduction to SML 

  1. What is SML?
  2. Why it is required?
  3. Key platforms for performing SML
  4. SMLProject End to End Pipeline
  5. Spark Introduction
  6. Why Spark for SML?
  7. Databricks Platform Demo
  8. Approaches for scaling sci-kit learn code
  9. Hands-on Exercise(s): Experiencing the first notebook

Why Spark for SML? 

  1. Problems with Traditional Machine Learning Frameworks
  2. Machine Learning at Scale – Various options
  3. Iterative Algorithms
  4. How Spark performs well for Iterative Machine Learning Algorithms?
  5. Hands-on Exercise(s)

SML on Enterprise Platform 

  1. Quick Recap/Introduction to Hadoop
  2. Logical View of Cloudera Distribution
  3. Big Data Analytics Pipelines
  4. Components in Cloudera Distribution for performing SML
  5. Hands-on Exercise(s)

Data Acquisition at Scale 

  1. Acquiring Structured content from Relational Databases
  2. Acquiring Semi-structured content from Log Files
  3. Acquiring Unstructured content from other key sources like Web
  4. Tools for Performing Data acquisition at Scale
  5. Sqoop, Flume and Kafka Introduction, use cases and architectures
  6. Hands-on Exercise(s)

Data Pre-Processing for Modeling 

  1. Using the Spark Shell
  2. Resilient Distributed Datasets (RDDs)
  3. Functional Programming with Spark
  4. RDD Operations
  5. Key-Value Pair RDDs
  6. MapReduce and Pair RDD Operations
  7. Building and Running a Spark Application
  8. Performing Data Validation
  9. Data De-Duplication
  10. Detecting Outliers
  11. Hands-on Exercise(s)

Working with Iterative Algorithms 

  1. Dealing with RDD Infinite Lineages
  2. Caching Overview
  3. Distributed Persistence
  4. Checkpointing of an Iterative Machine Learning Algorithm
  5. Hands-on Exercise(s)

Spark SQL 

  1. Introduction
  2. Dataframe API
  3. Performing ad-hoc query analysis using Spark SQL
  4. Hands-on Exercise(s)

Spark Machine Learning Using MLLib 

  1. Spark ML vs Spark MLLib
  2. Data types and key terms
  3. Feature Extraction
  4. Linear Regression using Spark MLLib
  5. Hands-on Exercise(s)

Spark Machine Learning Using ML 

  1. Spark ML Overview
  2. Transformers and Estimators
  3. Pipelines
  4. Implementing Decision Trees
  5. K-Means Clustering using Spark ML
  6. Hands-on Exercise(s)

Decision Trees and Random Forest 

  1. Types – Classification and Regression trees
  2. Gini Index, Entropy and Information Gain
  3. Building Decision Trees
  4. Pruning the trees
  5. Prediction using Trees
  6. Ensemble Models
  7. Bagging and Boosting
  8. Advantages of using Random Forest
  9. Working with Random Forest
  10. Ensemble Learning
  11. How ensemble learning works
  12. Building models using Bagging
  13. Random Forest algorithm
  14. Random Forest model building
  15. Fine tuning hyper-parameters
  16. Hands-on Exercise(s)

Model Evaluation, Optimization and Deployment 

  1. Model Evaluation
  2. Optimizing a Model
  3. Deploying Model
  4. Best Practices
TENHO INTERESSE

Cursos Relacionados

Curso Ansible Red Hat Basics Automation Technical Foundation

16 horas

Curso Terraform Deploying to Oracle Cloud Infrastructure

24 Horas

Curso Ansible Linux Automation with Ansible

24 horas

Ansible Overview of Ansible architecture

16h

Advanced Automation: Ansible Best Practices

32h