Visão Geral
Este Curso Comprehensive Data Science with Python, ensina a engenheiros, cientistas de dados, analistas de dados, estatísticos e outros profissionais quantitativos as habilidades de programação Python necessárias para mapear, visualizar e aplicar estatísticas inferenciais. Os participantes aprendem os fundamentos do Python, incluindo estruturas de dados, variáveis e bibliotecas, bem como como o Python é usado na ciência de dados. Os alunos também aprendem como limpar e explorar seus dados, construir modelos preditivos e desenvolver aplicativos da web baseados em dados. Nossos instrutores experientes guiam você por toda a gama de tópicos, começando pelo básico, e preparam você para trabalhos avançados de ciência de dados.
Conteúdo Programatico
An Accelerated Introduction and Overview to Python for Data Science Foundations
- Introduction to course and computing environment
- Up and running with Jupyter notebooks
- Fundamental Python types: String literals, numeric, Boolean, and dates
- Understanding Python ‘variables’ (reference assignment)
- Slicing syntax
- Fundamental collections: tuples, lists, dictionaries, and sets
- Control flow iteration in Python (if/then, for, while, list comprehension)
- Writing your own functions
- Handling exceptions
Matrix Computing with NumPy
- Introduction to the ndarray
- Dtypes in NumPy
- NumPy operations, uFuncs
- Broadcasting
- Missing data in NumPy (masked array)
- Random number generation
Managing, Exploring, and Cleaning Data with Pandas
- Fundamental Pandas: Series and DataFrames
- Exploring objects with attributes/methods
- Importing data from different structured sources
- Basic DataFrame summaries
- Creating new variables (columns)
- Scaling and standardizing data elements
- Discretizing continuous data
- Mapping categorical data to new values
- Establishing dummy codes (one hot encoding)
- Filtering rows and selecting columns
- Managing the indices
- Identifying duplicate rows
- Quantifying and managing missing data
- Combining datasets
- Merging datasets
- Transposing datasets
- Changing data from long to wide formats and back
Exploratory Data Analysis with Pandas (including visualization with Seaborn)
- Univariate Statistical Summaries and Detecting Outliers, visually with graphical approaches and numerically.
- Multivariate Statistical Summaries and Outlier Detection, visually with graphical approaches and numerically.
- Groupwise calculations
- Pivot Table type operations to aggregate by group
- Pandas DataFrame plotting methods
Data Pseudo-Coding Process, Extension to Data-Centric Problems
- Identifying data verbs
- Answering a question using a well-formatted analytic dataframe
- Understanding the unit of analysis
- Identifying the unit of analysis for a given question – is my dataframe organized this way?
- Leveraging normalized data to create the analytic dataframe through combinations of data verbs
- Identify the question and unit of analysis
- Define the desired analytic dataframe
- Examine the normalized source data
- Create data pseudo-code to map source data to the final analytic dataframe
- Implement with Python
Focus on Graphics with Python: Seaborn, Matplotlib, and Plotly
- Using seaborn for 1 and 2 variable summaries
- Advanced statistical plots with Seaborn
- Controlling plot details through Seaborn
- Making graphs interactive with Plotly
- Introduction to Matplotlib for full control of parameters
Overview of Descriptive versus Inferential Analytics
- Identifying the null hypothesis
- P-value interpretation
- The idea of statistical power and type 1/2 errors
Implementing Inferential Statistics in Python
- Analyzing an A/B randomized test:
- T-tests/ANOVA
- Chi-square tests
- Correlation methods
Multivariate Models: Linear Regression
- Estimating the mean
- Identifying p-values of interest
- Adding a categorical predictor and the link to t-tests
- Nonlinear trends: Polynomial regression and spline modeling
- Interaction terms
- Confounding
- Model building approaches (choosing the best model)
- Scoring new data from the model (making predictions)
Multivariate Models: Logistic Regression
- GLMs and the link function
- Understanding the logit function
- The binomial distribution and
- Recovering the average event probability from the model
- Interpreting the coefficient – the odds ratio
- Categorical predictors and the connection to the chi-square test
- Expansion to more complex models (non-linear trends, multiple predictors)
- Confounding
- Interaction terms
- Making predictions
- Comparing models and picking the ‘best’ model
Optional modules depending on student interest and timing
- Analyzing unstructured data with Python
- Overview of structure versus unstructured data
- Implementing regular expressions in Python
- Converting unstructured data to structured data for analysis
- Missing Data
- Exploring and understanding patterns in missing data
- Missing at Random
- Missing Not at Random
- Missing Completely at Random
- Data imputation methods