Curso Advanced Data Analytics with PySpark

  • Data Science Analytic

Curso Advanced Data Analytics with PySpark

24 horas
Visão Geral

Este Curso Advanced Data Analytics with PySpark, ensina os participantes como desenvolver e executar tarefas analíticas mais sofisticadas do Spark usando PySpark (a API Python para Apache Spark). Os participantes aprendem a manipular e analisar dados usando o Spark Shell para estruturar dados com Spark SQL, Pandas e Seaborn. Ao final deste Curso Advanced Data Analytics with PySpark, os participantes estarão prontos para enfrentar com sucesso seus projetos de conjuntos de dados em grande escala.

Objetivo

Após realizar este Curso Advanced Data Analytics with PySpark você será capaz de:

  • Trabalhe com o ambiente PySpark Shell
  • Entenda os DataFrames do Spark
  • Processe dados com a API PySpark DataFrame
  • Trabalhe com tabelas dinâmicas no PySpark
  • Realize visualização de dados e análise exploratória de dados (EDA) no PySpark
Pre-Requisitos
Materiais
Inglês/Português/Lab Prático
Conteúdo Programatico

Introduction to Apache Spark

  • What is Apache Spark
  • The Spark Platform
  • Spark vs. Hadoop's MapReduce (MR)
  • Common Spark Use Cases
  • Languages Supported by Spark
  • Running Spark on a Cluster
  • The Spark Application Architecture
  • The Driver Process
  • The Executor and Worker Processes
  • Spark Shell
  • Jupyter Notebook Shell Environment
  • Spark Applications
  • The spark-submit Tool
  • The spark-submit Tool Configuration
  • Interfaces with Data Storage Systems
  • Project Tungsten
  • The Resilient Distributed Dataset (RDD)
  • Datasets and DataFrames
  • Spark SQL, DataFrames, and Catalyst Optimizer
  • Spark Machine Learning Library
  • GraphX
  • Extending Spark Environment with Custom Modules and Files

The Spark Shell

  • The Spark Shell
  • The Spark v.2 + Command-Line Shells
  • The Spark Shell UI
  • Spark Shell Options
  • Getting Help
  • Jupyter Notebook Shell Environment
  • Example of a Jupyter Notebook Web UI (Databricks Cloud)
  • The Spark Context (sc) and Spark Session (spark)
  • Creating a Spark Session Object in Spark Applications
  • The Shell Spark Context Object (sc)
  • The Shell Spark Session Object (spark)
  • Loading Files
  • Saving Files

Introduction to Spark SQL

  • What is Spark SQL?
  • Uniform Data Access with Spark SQL
  • Hive Integration
  • Hive Interface
  • Integration with BI Tools
  • What is a DataFrame?
  • Creating a DataFrame in PySpark
  • Commonly Used DataFrame Methods and Properties in PySpark
  • Grouping and Aggregation in PySpark
  • The "DataFrame to RDD" Bridge in PySpark
  • The SQLContext Object
  • Examples of Spark SQL/DataFrame (PySpark Example)
  • Converting an RDD to a DataFrame Example
  • Example of Reading/Writing a JSON File
  • Using JDBC Sources
  • JDBC Connection Example
  • Performance, Scalability, and Fault-tolerance of Spark SQL

Practical Introduction to Pandas

  1. What is pandas?
  2. The Series Object
  3. Accessing Values and Indexes in Series
  4. Setting Up Your Own Index
  5. Using the Series Index as a Lookup Key
  6. Can I Pack a Python Dictionary into a Series?
  7. The DataFrame Object
  8. The DataFrame's Value Proposition
  9. Creating a pandas DataFrame
  10. Getting DataFrame Metrics
  11. Accessing DataFrame Columns
  12. Accessing DataFrame Rows
  13. Accessing DataFrame Cells
  14. Using iloc
  15. Using loc
  16. Examples of Using loc
  17. DataFrames are Mutable via Object Reference!
  18. Deleting Rows and Columns
  19. Adding a New Column to a DataFrame
  20. Appending/Concatenating DataFrame and Series Objects
  21. Example of Appending/Concatenating DataFrames
  22. Re-indexing Series and DataFrames
  23. Getting Descriptive Statistics of DataFrame Columns
  24. Getting Descriptive Statistics of DataFrames
  25. Applying a Function
  26. Sorting DataFrames
  27. Reading From CSV Files
  28. Writing to the System Clipboard
  29. Writing to a CSV File
  30. Fine-Tuning the Column Data Types
  31. Changing the Type of a Column
  32. What May Go Wrong with Type Conversion

Data Visualization with seaborn in Python

  1. Data Visualization
  2. Data Visualization in Python
  3. Matplotlib
  4. Getting Started with matplotlib
  5. Figures
  6. Saving Figures to a File
  7. Seaborn
  8. Getting Started with seaborn
  9. Histograms and KDE
  10. Plotting Bivariate Distributions
  11. Scatter plots in seaborn
  12. Pair plots in seaborn
  13. Heatmaps

Quick Introduction to Python for Data Engineers (Optional) 

  1. What is Python?
  2. Additional Documentation
  3. Which version of Python am I running?
  4. Python Dev Tools and REPLs
  5. IPython
  6. Jupyter
  7. Jupyter Operation Modes
  8. Jupyter Common Commands
  9. Anaconda
  10. Python Variables and Basic Syntax
  11. Variable Scopes
  12. PEP8
  13. The Python Programs
  14. Getting Help
  15. Variable Types
  16. Assigning Multiple Values to Multiple Variables
  17. Null (None)
  18. Strings
  19. Finding the Index of a Substring
  20. String Splitting
  21. Triple-Delimited String Literals
  22. Raw String Literals
  23. String Formatting and Interpolation
  24. Boolean
  25. Boolean Operators
  26. Numbers
  27. Looking Up the Runtime Type of a Variable
  28. Divisions
  29. Assignment-with-Operation
  30. Relational Operators
  31. The if-elif-else Triad
  32. An if-elif-else Example
  33. Conditional Expressions (a.k.a. Ternary Operator)
  34. The While-Break-Continue Triad
  35. The for Loop
  36. try-except-finally
  37. Lists
  38. Main List Methods
  39. Dictionaries
  40. Working with Dictionaries
  41. Sets
  42. Common Set Operations
  43. Set Operations Examples
  44. Finding Unique Elements in a List
  45. Enumerate
  46. Tuples
  47. Unpacking Tuples
  48. Functions
  49. Dealing with Arbitrary Number of Parameters
  50. Keyword Function Parameters
  51. The range Object
  52. Random Numbers
  53. Python Modules
  54. Importing Modules
  55. Installing Modules
  56. Listing Methods in a Module
  57. Creating Your Own Modules
  58. Creating a Runnable Application
  59. List Comprehension
  60. Zipping Lists
  61. Working with Files
  62. Reading and Writing Files
  63. Reading Command-Line Parameters
  64. Accessing Environment Variables
  65. What is Functional Programming (FP)?
  66. Higher-Order Functions
  67. Lambda Functions in Python
  68. Lambdas in the Sorted Function
  69. Other Examples of Using Lambdas
  70. Regular Expressions
  71. Using Regular Expressions Examples
  72. Python Data Science-Centric Libraries
TENHO INTERESSE

Cursos Relacionados

Curso Fundamentos de Gerenciamento de Dados Mestres

16 horas

Curso Big Data Analyst Mineração de Dados

32 horas

Curso Técnicas de integração de dados ETL

16 horas

Curso Big Data Boot Camp Visão de Negócios

Curso Inteligência Artificial / AI Visão Geral

8 horas

Curso Oracle Fundamentos de Big Data

32 horas

Curso Fundamentos de Qualidade de Dados

16 horas

Curso Marchine Learning Com Hadoop

32 horas

Curso Python for Data Analysis

24 horas