Visão Geral
Este Curso Apache Spark with Scala for Big Data, você aprenderá a aproveitar as práticas recomendadas do Spark, desenvolver soluções executadas na plataforma Apache Spark e aproveitar as vantagens do uso eficiente de memória e do poderoso modelo de programação do Spark. Aprenda a turbinar seus dados com o Apache Spark, uma plataforma de big data adequada para algoritmos iterativos exigidos por análise gráfica e aprendizado de máquina.
Objetivo
Após realizar este Curso Apache Spark with Scala for Big Data, você será capaz de:
- Desenvolva aplicativos com Spark
- Trabalhe com as bibliotecas para SQL, Streaming e Machine Learning
- Mapeie problemas do mundo real para algoritmos paralelos
- Crie aplicativos de negócios que se integram ao Spark
Pre-Requisitos
- Experiência mínima de 6 meses em programação profissional Java ou C#
Materiais
Portugues/Inglês + Lab Pratico
Conteúdo Programatico
Introduction to Spark
- Defining Big Data and Big Computation
- What is Spark?
- What are the benefits of Spark?
Scaling-out applications
- Identifying the performance limitations of a modern CPU
- Scaling traditional parallel processing models
Designing parallel algorithms
- Fostering parallelism through functional programming
- Mapping real-world problems to effective parallel algorithms
Parallelizing data structures
- Partitioning data across the cluster using Resilient Distributed Datasets (RDD) and DataFrames
- Apportioning task execution across multiple nodes
- Running applications with the Spark execution model
The anatomy of a Spark cluster
- Creating resilient and fault-tolerant clusters
- Achieving scalable distributed storage
Managing the cluster
- Monitoring and administering Spark applications
- Visualizing execution plans and results
Selecting the development environment
- Performing exploratory programming via the Spark shell
- Building stand-alone Spark applications
Working with the Spark APIs
- Programming with Scala and other supported languages
- Building applications with the core APIs
- Enriching applications with the bundled libraries
Querying structured data
- Processing queries with DataFrames and embedded SQL
- Extending SQL with User-Defined Functions (UDFs)
- Exploiting Parquet and JSON formatted data sets
Integrating with external systems
- Connecting to databases with JDBC
- Executing Hive queries in external applications
What is streaming?
- Implementing sliding window operations
- Determining state from continuous data
- Processing simultaneous streams
- Improving performance and reliability
Streaming data sources
- Streaming from built-in sources (e.g., log files, Twitter sockets, Kinesis, Kafka)
- Developing custom receivers
- Processing with the streaming API and Spark SQL
Classifying observations
- Predicting outcomes with supervised learning
- Building a decision tree classifier
Identifying patterns
- Grouping data using unsupervised learning
- Clustering with the k-means method
TENHO INTERESSE