Visão Geral
Este Curso Modeling Data for Inference: Modelagem de dados para inferência ensina os participantes como usar Python para realizar inferência causal em dados observacionais. Os participantes aprendem como trabalhar com modelos inferenciais, dados faltantes e design experimental.
Conteúdo Programatico
Introduction
GLMs with Python using Stats Models
- Applying Statistical Models for Analysis in Python: The A/B test
- Explanation of statsmodels library of functions
- Inferential and descriptive statistics refresher
- Implementing A/B tests
Modeling Continuous Data (Linear models)
- Formulation of the simple linear model
- Application of the intercept only, null model
- Binary predictor
- Interpreting results
- Categorical predictor
- Continuous predictor
- Polynomial expansions
- Multiple linear regression
- Spline models
- Interaction terms
- Picking the “best” model
- Discussion of confounding, interaction terms, and model building approaches
Modeling Binary Data (Logistic models)
- Discussion of the generalized linear model
- The Logit link function
- Binomial distribution
- Intercept only model
- Back transformation of coefficients
- Simple predictor
- Multiple predictors
- Odds ratio interpretations
- Generating a scoring data set
- Predicting from the model with new data
- Modeling Count Outcomes
- How are count outcomes different?
- Poisson models
- Over dispersed modeling options
- Log link functions
- Using offsets to model rates / uneven follow-up
Power Analyses/Study Design
- Understanding and estimating statistical power
- Type 1 and type 2 errors
- Using existing power estimators
- Simulating power through the data-generating process
Non-Parametric Analysis Methods
- Using bootstrapping/permutation tests
- Bootstrapping versus depending on asymptotic behavior to estimate confidence intervals
- How different/stable are my results?
- resampling a data set
- bias-corrected bootstrap interval
- Extending the bootstrap function to calculate more statistics
- Permutation tests for p-values
Missing data
- Quantifying
- Visualizing missing data
- MAR,MCAR,MNAR
- Sensitivity analysis
- Imputation
- MICE/trees pre-processing
Time to Event (Survival) Analysis
- Visualizing Hazards Across Time
- Understanding the Log Rank Test
- Cox Proportional Hazards Modeling
- Understanding and interpreting the Hazard Ratio
- Model diagnostics and assumptions
- Implementing Time Varying Covariates
- Parametric Survival Models
- Weibull Model
- Exponential Model
- Predicting Failure Times
Causal Inference: The Potential Outcomes Framework
- Defining treatment effects (ATT, ATE)
- Identifying populations of interest
- Defining your causal hypothesis
- Understanding the counterfactual
- Establishing the causal diagram for your problem
- Different methods for conditioning on variables:
- Propensity Scores
- Direct regression adjustment
- G-computation formulas
- Instrumental variable analysis