BIOSTAT 620 Introduction to Health Data Science

Course Information

Lectures

Lecture slides, class notes, and problem sets are linked below. New material is added approximately on a weekly basis.

Dates Topic Slides Reading
Jan 09 Productivity Tools Intro, Unix, RStudio Installing R and RStudio on Windows or Mac, Getting Started, Unix
Jan 14 Productivity Tools Quarto, Git and GitHub RStudio Projects, Quarto, Git
Jan 16 Data processing in R R basics, Vectorization R Basics, Vectorization
Jan 16 Data Processing in R Tidyverse, ggplot2 dplyr, ggplot2
Jan 21 Data Processing in R Tyding data Reshaping Data
Jan 21, Jan 23 Wrangling Intro, Data Importing, Dates and Times, Locales, Data APIs, Web scraping, Joining tables Importing data, dates and times, Locales, Joining Tables, Extracting data from the web
Jan 28, Jan 30 Data visualization Dataviz Principles, Distributions, Dataviz in practice Distributions, Dataviz Principles
Feb 04 Probability Intro, Foundations for Inference Monte Carlo, Random Variables & CLT
Feb 06 Inference Intro, Parameter and estimates, Confidence Intervals Parameters & Estimates, Confidence Intervals
Feb 11, Feb 13 Statistical Models Models, Bayes, Hierarchical Models Data-driven Models, Bayesian Statistics, Hierarchical Models
Feb 18 Midterm 1 Covers material from Jan 09-Feb 13
Feb 20, Feb 25 Linear models Intro, Regression Regression, Multivariate Regression
Feb 27, Mar 11 Linear models Multivariate regression, Treatment effect models Measurement Error Models, Treatment Effect Models, Association Tests, Association Not Causation
Mar 13 High dimensional data Intro to Linear Algebra, Matrices in R Matrices in R, Applied Linear Algebra,
Mar 18 High dimensional data Distance, Dimension reduction Dimension Reduction
Mar 20 Machine Learning Intro, Metrics, Conditionals, Smoothing Notation and terminology, Evaluation Metrics, conditional probabilities, smoothing
Mar 25, Mar 27 Machine Learning kNN, Resampling methods, caret package, Algorithms, ML in practice Resampling methods, ML algorithms, ML in practice

Problem sets

Problem set Topic Due Date (11:59pm) Difficulty
01 Unix, Quarto Jan 19 easy
02 Data analysis with R Jan 26 medium
03 Wrangling Feb 2 medium
04 Dataviz Feb 9 medium
05 Probability Feb 23 medium
06 Predict the election Mar 02 hard
07 Excess mortality after Hurricane María Mar 16 medium
08 Matrices March 25 easy

Office hour times

Meeting Time Location
Dylan Cable Wed 2:00-3:00PM SPH I 4635
Yize Hao Fri 10:00-11:00AM SPH II M4034

Acknowledgments

We thank Maria Tackett and Mine Çetinkaya-Rundel for sharing their web page template, which we used in creating this website. We thank Rafael Irizarry for sharing materials.