Midterm Project

Due Friday, March 13th, 2026 by 11:59pm Eastern Time.

Learning Objective: To apply the skills learned in BIOSTAT620 (through Week 6) by analyzing and interpreting a health or biology dataset of your choice.

Narrative: As we have discussed in class, the practice of data science requires both quantitative skills and qualitative knowledge of the domain from which the data was collected. The first step in any data analysis is to have a dataset for which you have formulated an interesting question. If you do not have a dataset to work with, you may choose one from our list of suggestions. With your dataset, formulate a clear and concise question to answer and conduct data wrangling, exploratory data analysis, and data visualization to explore/answer this question.

Deliverable: A written report generated in Quarto (HTML or PDF) with embedded tables and figures that is submitted to a project-specific GitHub repository that you create. The report should have the following sections:

In your report, please do not include any unformatted text output (e.g. output from head(), str(), print(), etc.). You should summarize these aspects of your data within the text, figures, and/or tables. We expect to see text that organizes your project stylistically into a paper. You should not just be stringing together figures, tables, and code with little context. Rather, these elements should be integrated into your overall paper narrative and framework.

Note that you cannot use the same dataset on both the Midterm and Final. So if you came into this class with a dataset that you wanted to analyze, you may want to save that for later.