Visão Geral
O curso Spark Streaming ensina como projetar, desenvolver e gerenciar aplicações de processamento de dados em tempo real utilizando o Apache Spark. O participante aprenderá a construir pipelines de streaming para capturar, transformar e analisar dados contínuos de fontes como Kafka, sockets e arquivos, além de aplicar técnicas de tolerância a falhas, escalabilidade e otimização de performance.
Conteúdo Programatico
Module 1: Introduction to Spark Streaming
- Overview of real-time data processing
- Batch vs. streaming processing
- Apache Spark Streaming architecture
- Use cases and real-world applications
Module 2: Setting Up Spark Streaming Environment
- Installing and configuring Apache Spark
- Spark cluster overview (Standalone, YARN, Kubernetes)
- Understanding SparkContext and StreamingContext
- Running your first Spark Streaming job
Module 3: DStreams (Discretized Streams)
- Core concepts of DStreams
- Transformations and actions on DStreams
- Window operations and sliding intervals
- Stateful operations with updateStateByKey
Module 4: Structured Streaming Fundamentals
- Introduction to Structured Streaming
- Differences between DStreams and Structured Streaming
- Defining sources, transformations, and sinks
- Event-time processing and watermarking
Module 5: Integrating with External Systems
- Reading from and writing to Apache Kafka
- Integration with Redis, Cassandra, and HDFS
- Consuming data from sockets and file streams
- Writing output to dashboards and APIs
Module 6: Fault Tolerance and Checkpointing
- Understanding fault tolerance in Spark Streaming
- Configuring checkpointing for state recovery
- Managing driver and executor failures
- Data consistency and exactly-once semantics
Module 7: Performance Tuning and Optimization
- Batch interval tuning and backpressure handling
- Memory and resource management
- Parallelism, partitioning, and task scheduling
- Best practices for low-latency streaming
Module 8: Monitoring and Observability
- Using Spark UI and metrics system
- Integrating Prometheus and Grafana for observability
- Log analysis and job debugging
- Alerting and production monitoringModule 9: Hands-On Project
- Building a real-time log analytics application
- Consuming streaming data from Kafka
- Performing real-time aggregation and windowing
- Writing results to a database or dashboard