Syllabus

BIOSTAT 620: Introduction to Health Data Science

Term: Winter 2026

Time: Tuesdays and Thursdays 8:30am - 9:50am

Location: SPH I 3755

Units: 3

Course Overview

This course serves as an introduction to data science with a focus on the acquisition and analysis of real-life data. Students will learn the tools needed to:

  1. Create usable and reproducible datasets by accessing, scraping, and cleaning data
  2. Conduct exploratory data analysis and visualization
  3. Identify scientific questions that can be answered with a given dataset
  4. Write functional code in the R programming language, build basic apps, and construct a website

Learning Objectives

Through this course, students will become familiar with the techniques used in data science and apply them to health-related datasets. Students will learn:

  • Programming in R and associated tools including Quarto, Git, and SQL
  • Data visualization – selecting appropriate plots to gain insight from data
  • Data collection – web scraping, data wrangling, and database management
  • Exploratory data analysis – generating hypotheses while building intuition and understanding of a dataset
  • Basic computational algorithms, including simulation strategies
  • Building interactive tools and websites

Prerequisite(s): BIOSTAT 607, BIOSTAT 601, BIOSTAT 650

Recommended Preparation: Familiarity with programming, particularly in the R language

Course Slides

Lecture slides presented in class are available on the course website: https://dmcable.github.io/BIOSTAT620W26/.

Technological Proficiency and Hardware/Software Required

The R language (http://cran.r-project.org) will be used throughout the semester and we recommend using the R Studio IDE for coding (https://posit.co/download/rstudio-desktop/). Additionally, if they do not already have one, students will be required to create a GitHub account (https://github.com/).

Readings and Supplementary Materials

There are no required readings for this course.

Supplementary References

  1. R Programming for Data Science, 2019. Roger Peng. https://bookdown.org/rdpeng/rprogdatascience/
  2. R for Data Science, 2017. Garrett Grolemund and Hadley Wickham. http://r4ds.had.co.nz/
  3. Exploratory Data Analysis with R, 2020. Roger Peng. https://bookdown.org/rdpeng/exdata/
  4. Mastering Software Development in R, 2017. Roger Peng, Sean Kross, Brooke Anderson. https://bookdown.org/rdpeng/RProgDA/
  5. R Packages, 2023. Hadley Wickham and Jennifer Bryan. https://r-pkgs.org/
  6. Modern Data Science with R, 2023. Benjamin S. Baumer, Daniel T. Kaplan, and Nicholas J. Horton. https://mdsr-book.github.io/mdsr3e/
  7. Introduction to Data Science, 2024. Rafael Irizarry. https://rafalab.dfci.harvard.edu/dsbook/

Description and Assessment of Assignments

Attendance: This is a hands-on course, and attendance is required. Students are allowed 2 absences for the semester for any reason.

Labs: There will be weekly lab assignments which are graded for completion. Each week, there will be class time devoted to working on that week’s lab. Completing the weekly labs will count as part of the overall grade.

Homework: There will be 4 assignments given throughout the semester, approximately every 2 weeks. Students may discuss the problems with one another, however, individual solutions must be submitted and copying will not be tolerated.

Midterm Project: The midterm project will be to perform a thorough exploratory analysis and write a report with preliminary findings using a real-world dataset of your choosing. The source code and PDF report will be uploaded to GitHub.

Final Project: The final project will be to write a report for an analysis applied to a real-world dataset of your choosing and to create a website that includes interactive visualizations to display the data and results. The source code, website files, and PDF report will be uploaded to GitHub. Final presentations will occur the last week of the semester.

Grading Breakdown

Assignment % of Grade
Attendance 10%
Labs 10%
Homework (4) 30%
Midterm Project 20%
Final Project 30%
TOTAL 100%

Assignment Submission Policy

All assignments must be completed in Quarto (or R Markdown) and submitted through the GitHub Issues page for the course. Late assignments will be penalized by 20% for each day past the due date, up to 5 days late, except when verifiable extenuating circumstances can be demonstrated.

Schedule

Consult the Schedule Page for more information on weekly topics, problem sets, readings, and other materials. Links to readings, assignments, and other materials from class will be posted on that page.

Academic Integrity

The faculty and staff of the School of Public Health believe that the conduct of a student registered or taking courses in the School should be consistent with that of a professional person. Courtesy, honesty, and respect should be shown by students toward faculty members, guest lecturers, administrative support staff, community partners, and fellow students. Similarly, students should expect faculty to treat them fairly, showing respect for their ideas and opinions and striving to help them achieve maximum benefits from their experience in the School. Student academic misconduct refers to behavior that may include plagiarism, cheating, fabrication, falsification of records or official documents, intentional misuse of equipment or materials (including library materials),and aiding and abetting the perpetration of such acts. Please visit the University of Michigan School of Public Health Academic Policies and Processes for the full Academic Integrity policy. for the full policy.

Statement on the use of Artificial Intelligence

Generative artificial intelligence (AI) may be used under the direction and rules specified by the course instructor in specific circumstances as outlined in the syllabus. The student is responsible for the quality and content of all written assignments. Unless otherwise indicated by the course instructor, generative AI may be used to create an initial literature review, document outline and/or to organize material toward a first draft of a class paper, proofreading, or grammatical accuracy; however the final content of the written document and critical thinking of the ideas presented in the document must represent the student’s individual work and ideas learned through course content and/or research conducted from sources outside of the generative AI system. The student must include an annotation on all materials submitted that explicitly documents how AI was used to generate the document and properly reference both the sources and the AI tools such as ChatGPT (OpenAI, 2023). The student must review the information in the document and edit for accuracy, completeness, proper grammar, and demonstrate that the wording accurately reflects the student’s understanding and purpose in writing the text. Students should be aware that text generated solely from AI generators may include factual errors, bias, and may contain incomplete or inaccurate reference information, in addition to furthering appropriating knowledge produced by historically marginalized scholars without proper crediting. If you have any questions on whether a specific AI tool is allowed for any aspect of your work in this class, please ask your instructor for guidance. Failure to ensure agreement with your instructor on use of AI, prior to doing so, may result in a zero score. (NOTE: instructors have sophisticated tools to determine AI plagiarism.)

Students and Disability Accommodations:

Students should speak with their instructors before or during the first week of classes regarding any special needs. Students can also visit the Office for Student Affairs for assistance in coordinating communications around accommodations.

Students seeking academic accommodations should register with Services for Students with Disabilities (SSD). SSD arranges reasonable and appropriate academic accommodations for students with disabilities. Please visit the Services for Students with Disabilities website for more information on student accommodations.

Students who expect to miss classes, examinations, or other assignments as a consequence of their religious observance shall be provided with a reasonable alternative opportunity to complete such academic responsibilities. It is the obligation of students to provide faculty with reasonable notice of the dates of religious holidays on which they will be absent. Please visit the Office of the Provost website for the complete University policy.

Students who are feeling ill should not come to class in-person. Your grade will not be negatively impacted by not attending class due to illness. If you need flexibility due to being ill, having caregiving responsibilities, childcare responsibilities, etc., please email me as soon as possible so that we can come up with a plan to allow flexibility in assignment due dates.