Welcome!

BIOSTAT 620: Introduction to Health Data Science

Instructor

Dylan Cable

  • Assistant Professor of Biostatistics
  • Office hours: Friday 11am-12pm, SPH I 4635

Canvas + Website

Canvas for announcements and grading:

https://umich.instructure.com

Official class website, containing syllabus, reading materials, slides, labs, and assignments:

https://dmcable.github.io/BIOSTAT620W26/

What is data science?

  • Data science is an exciting discipline that allows you to turn raw data into understanding, insight, and knowledge.

(Also see here, and here, and here, and here )

What is this course?

  • This course is a introduction to the world of data science with a focus on application in the health sciences.

  • The course will teach data science skills that are easily transferable, with examples done in R.

  • In this class, we will be using R and RStudio.

What ISN’T this course?

This is not a formal statistics class. You will not be expected to know or use:

  • parametric distributions
  • hypothesis tests
  • p values

What ISN’T this course?

Data does not exist in a vacuum. In order to gain new insights from data, you must start with a baseline understanding of the subject. “Domain knowledge” or “subject matter expertise” is critical, but it is not the purpose of this class.

This course will focus on applications in Public Health, but the skills you learn will be widely transferable.

What is a computer?

File Structure

File Structure

Before computers had graphics and mice, there were only text-based interfaces, called command lines, that let you interact with the directories and files on the computer.

The modern “Desktop” is actually just a directory on your computer!

  • MacOS/Linux: /Users/<username>/Desktop
  • Windows: C:\Users\<username>\Desktop

The route from the root directory to any specific file or directory is called the “path”.

File Structure

Whenever you run a program on your computer, you are running it in a specific location (directory). If you want to access another file on your computer, you’ll need to know the path to that file. Paths can be either relative or absolute.

How to get from my Desktop directory to my Documents directory via:

  • absolute path: /Users/dmcable/Desktop
  • relative path: ../Desktop/

File Structure

Special symbols:

  • . Current directory
  • .. Parent directory (one step up the hierarchy)
  • ~ Home directory

We won’t have to use the command line too much in this class, but understanding file paths will be very important!

What is R?

R logo

R is a language and environment for statistical computing and graphics: https://r-project.org

Created by statisticians for statisticians.

Over 16,000 packages added to CRAN

What is RStudio?

RStudio logo

RStudio is an integrated development environment (IDE) for R: https://www.rstudio.com/products/rstudio/

R in the terminal

R + RStudio

Poll

https://forms.gle/rbPyera5dDvxwzPW7

Icebreaker

AI Statement

Lab time

We will run our first Lab (1)

Dataset

Dataset

First Lab

The lab exercises can be found on the Schedule page of the course website:

https://dmcable.github.io/BIOSTAT620W26/schedule.html