Curso Comprehensive Data Science with Python

  • Development

Curso Comprehensive Data Science with Python

32 horas
Visão Geral

Este Curso Comprehensive Data Science with Python, ensina a engenheiros, cientistas de dados, analistas de dados, estatísticos e outros profissionais quantitativos as habilidades de programação Python necessárias para mapear, visualizar e aplicar estatísticas inferenciais. Os participantes aprendem os fundamentos do Python, incluindo estruturas de dados, variáveis ​​e bibliotecas, bem como como o Python é usado na ciência de dados. Os alunos também aprendem como limpar e explorar seus dados, construir modelos preditivos e desenvolver aplicativos da web baseados em dados. Nossos instrutores experientes guiam você por toda a gama de tópicos, começando pelo básico, e preparam você para trabalhos avançados de ciência de dados.

Objetivo

Após realizar este Curso Comprehensive Data Science with Python você será capaz de:

  • Understand the difference between Python basic data types
  • Know when to use different Python collections
  • Implement Python functions
  • Understand control flow constructs in Python
  • Handle errors via exception handling constructs
  • Quantitatively define an answerable, actionable question
  • Import both structured and unstructured data into Python
  • Parse unstructured data into structured formats
  • Understand the differences between NumPy arrays and pandas dataframes
  • Simulate data through random number generation
  • Understand mechanisms for missing data and analytic implications
  • Explore and clean data
  • Create compelling graphics to reveal analytic results
  • Reshape and merge data to prepare for advanced analytics
  • Find test for group differences using inferential statistics
  • Implement linear regression from a frequentist perspective
  • Understand non-linear terms, confounding, and interaction in linear regression
  • Extend to logistic regression to model binary outcomes
Materiais
Inglês/Português/Lab Prático
Conteúdo Programatico

An Accelerated Introduction and Overview to Python for Data Science Foundations

  1. Introduction to course and computing environment
  2. Up and running with Jupyter notebooks
  3. Fundamental Python types: String literals, numeric, Boolean, and dates
  4. Understanding Python ‘variables’ (reference assignment)
  5. Slicing syntax
  6. Fundamental collections: tuples, lists, dictionaries, and sets
  7. Control flow iteration in Python (if/then, for, while, list comprehension)
  8. Writing your own functions
  9. Handling exceptions

Matrix Computing with NumPy

  1. Introduction to the ndarray
  2. Dtypes in NumPy
  3. NumPy operations, uFuncs
  4. Broadcasting
  5. Missing data in NumPy (masked array)
  6. Random number generation

Managing, Exploring, and Cleaning Data with Pandas

  1. Fundamental Pandas: Series and DataFrames
  2. Exploring objects with attributes/methods
  3. Importing data from different structured sources
  4. Basic DataFrame summaries
  5. Creating new variables (columns)
  6. Scaling and standardizing data elements
  7. Discretizing continuous data
  8. Mapping categorical data to new values
  9. Establishing dummy codes (one hot encoding)
  10. Filtering rows and selecting columns
  11. Managing the indices
  12. Identifying duplicate rows
  13. Quantifying and managing missing data
  14. Combining datasets
  15. Merging datasets
  16. Transposing datasets
  17. Changing data from long to wide formats and back

Exploratory Data Analysis with Pandas (including visualization with Seaborn)

  1. Univariate Statistical Summaries and Detecting Outliers, visually with graphical approaches and numerically.
  2. Multivariate Statistical Summaries and Outlier Detection, visually with graphical approaches and numerically.
  3. Groupwise calculations
  4. Pivot Table type operations to aggregate by group
  5. Pandas DataFrame plotting methods

Data Pseudo-Coding Process, Extension to Data-Centric Problems

  1. Identifying data verbs
  2. Answering a question using a well-formatted analytic dataframe
  3. Understanding the unit of analysis
  4. Identifying the unit of analysis for a given question – is my dataframe organized this way?
  5. Leveraging normalized data to create the analytic dataframe through combinations of data verbs
  6. Identify the question and unit of analysis
  7. Define the desired analytic dataframe
  8. Examine the normalized source data
  9. Create data pseudo-code to map source data to the final analytic dataframe
  10. Implement with Python

Focus on Graphics with Python: Seaborn, Matplotlib, and Plotly

  1. Using seaborn for 1 and 2 variable summaries
  2. Advanced statistical plots with Seaborn
  3. Controlling plot details through Seaborn
  4. Making graphs interactive with Plotly
  5. Introduction to Matplotlib for full control of parameters

Overview of Descriptive versus Inferential Analytics

  1. Identifying the null hypothesis
  2. P-value interpretation
  3. The idea of statistical power and type 1/2 errors

Implementing Inferential Statistics in Python

  1. Analyzing an A/B randomized test:
  2. T-tests/ANOVA
  3. Chi-square tests
  4. Correlation methods

Multivariate Models: Linear Regression

  1. Estimating the mean
  2. Identifying p-values of interest
  3. Adding a categorical predictor and the link to t-tests
  4. Nonlinear trends: Polynomial regression and spline modeling
  5. Interaction terms
  6. Confounding
  7. Model building approaches (choosing the best model)
  8. Scoring new data from the model (making predictions)

Multivariate Models: Logistic Regression

  1. GLMs and the link function
  2. Understanding the logit function
  3. The binomial distribution and
  4. Recovering the average event probability from the model
  5. Interpreting the coefficient – the odds ratio
  6. Categorical predictors and the connection to the chi-square test
  7. Expansion to more complex models (non-linear trends, multiple predictors)
  8. Confounding
  9. Interaction terms
  10. Making predictions
  11. Comparing models and picking the ‘best’ model

Optional modules depending on student interest and timing

  • Analyzing unstructured data with Python
    1. Overview of structure versus unstructured data
    2. Implementing regular expressions in Python
    3. Converting unstructured data to structured data for analysis
  • Missing Data
    1. Exploring and understanding patterns in missing data
    2. Missing at Random
    3. Missing Not at Random
    4. Missing Completely at Random
    5. Data imputation methods
TENHO INTERESSE

Cursos Relacionados

Curso Python Programação Advanced

32 horas

Curso SAP ABAP S4hana Developer Foundation

32 horas

Curso Full Stack and React Essentials

32 Horas

Curso Node.js for JavaScript Developers

24 horas

Curso Elasticsearch for Developers

16H

Curso Elastic Stack Foundation

24 horas

Curso Apache Spark™ Foundation

8 Horas