Curso Apache Spark with Scala for Big Data

  • DevOps | CI | CD | Kubernetes | Web3

Curso Apache Spark with Scala for Big Data

24 horas
Visão Geral

Este Curso Apache Spark with Scala for Big Data, você aprenderá a aproveitar as práticas recomendadas do Spark, desenvolver soluções executadas na plataforma Apache Spark e aproveitar as vantagens do uso eficiente de memória e do poderoso modelo de programação do Spark. Aprenda a turbinar seus dados com o Apache Spark, uma plataforma de big data adequada para algoritmos iterativos exigidos por análise gráfica e aprendizado de máquina.

Objetivo

Após realizar este Curso Apache Spark with Scala for Big Data, você será capaz de:

  • Desenvolva aplicativos com Spark
  • Trabalhe com as bibliotecas para SQL, Streaming e Machine Learning
  • Mapeie problemas do mundo real para algoritmos paralelos
  • Crie aplicativos de negócios que se integram ao Spark
Pre-Requisitos
  • Experiência mínima de 6 meses em programação profissional Java ou C#
Materiais
Portugues/Inglês + Lab Pratico
Conteúdo Programatico

Introduction to Spark

  1. Defining Big Data and Big Computation
  2. What is Spark?
  3. What are the benefits of Spark?

Scaling-out applications

  1. Identifying the performance limitations of a modern CPU
  2. Scaling traditional parallel processing models

Designing parallel algorithms

  1. Fostering parallelism through functional programming
  2. Mapping real-world problems to effective parallel algorithms

Parallelizing data structures

  1. Partitioning data across the cluster using Resilient Distributed Datasets (RDD) and DataFrames
  2. Apportioning task execution across multiple nodes
  3. Running applications with the Spark execution model

The anatomy of a Spark cluster

  1. Creating resilient and fault-tolerant clusters
  2. Achieving scalable distributed storage

Managing the cluster

  1. Monitoring and administering Spark applications
  2. Visualizing execution plans and results

Selecting the development environment

  1. Performing exploratory programming via the Spark shell
  2. Building stand-alone Spark applications

Working with the Spark APIs

  1. Programming with Scala and other supported languages
  2. Building applications with the core APIs
  3. Enriching applications with the bundled libraries

Querying structured data

  1. Processing queries with DataFrames and embedded SQL
  2. Extending SQL with User-Defined Functions (UDFs)
  3. Exploiting Parquet and JSON formatted data sets

Integrating with external systems

  1. Connecting to databases with JDBC
  2. Executing Hive queries in external applications

What is streaming?

  1. Implementing sliding window operations
  2. Determining state from continuous data
  3. Processing simultaneous streams
  4. Improving performance and reliability

Streaming data sources

  1. Streaming from built-in sources (e.g., log files, Twitter sockets, Kinesis, Kafka)
  2. Developing custom receivers
  3. Processing with the streaming API and Spark SQL

Classifying observations

  1. Predicting outcomes with supervised learning
  2. Building a decision tree classifier

Identifying patterns

  1. Grouping data using unsupervised learning
  2. Clustering with the k-means method
TENHO INTERESSE

Cursos Relacionados

Curso Ansible Red Hat Basics Automation Technical Foundation

16 horas

Curso Terraform Deploying to Oracle Cloud Infrastructure

24 Horas

Curso Ansible Linux Automation with Ansible

24 horas

Ansible Overview of Ansible architecture

16h

Advanced Automation: Ansible Best Practices

32h