Final Project
General Instructions
For the final project you will choose one of these topics:
- COVID-19 pandemic in the US
- Excess mortality in Puerto Rico after Hurricane María.
- Topic of your choice (needs instructor approval).
There is another project choice available on predicting the election. It is not health related, but you can still choose it. If you are interested, please contact the instructor for details.
The final project will have both a written component and an in-class presentation.
You will submit your project using Git. Your project should be completely reproducible, meaning all the code and data needed to render your report from scratch should be in the repository.
You can work alone or in groups of at most three people. Please submit your proposal here by February 27th.
Sections
You will prepare a comprehensive report following the style of an academic paper. This report will be divided into the following five structured sections, with approximate word counts to help you reach a target of 2,500 to 3,000 words, up to four figures and up to two tables.
Abstract (150-200 words)
- Purpose: The abstract provides a concise summary of your project, including its objectives, key findings, and significance. Write this section last, after completing all other sections, to accurately reflect your project’s focus and main results.
- Guidelines: Limit this section to 150-200 words. Briefly outline the purpose of your study, the approach you used, and the primary results and conclusions. The abstract should be clear, succinct, and give readers an immediate understanding of what your project entails.
Introduction (500-600 words)
- Purpose: The introduction sets the stage for your project, presenting the background and rationale for your analysis. Explain why the topic is significant.
- Guidelines: Start with a broad overview of the topic, gradually narrowing down to your specific focus. Conclude with a clear statement of your research questions, hypotheses, or objectives. Use 2-3 paragraphs to establish a solid foundation for the rest of the paper.
Methods (600-700 words)
- Purpose: This section details the data sources, methods, and analytical techniques you used to conduct your analysis. It should be specific enough that someone else could replicate your study using the same resources and approach.
- Guidelines: Describe the dataset(s) you used, including information about data collection (e.g., sources, time frame). Outline your approach for cleaning and analyzing the data, including any statistical or computational methods applied. Clearly explain any assumptions or limitations in your approach.
Results (500-600 words)
- Purpose: The results section presents the main findings of your analysis without interpretation. Organize the data logically to highlight key insights, using tables, figures, and charts to illustrate trends and comparisons.
- Guidelines: For each result, briefly describe it and refer to relevant visuals or tables where appropriate. Do not provide explanations or discuss implications in this section; focus only on presenting the findings clearly and accurately.
Discussion (600-700 words)
- Purpose: In the discussion, interpret the significance of your findings, explore potential implications, and relate the results back to your initial research questions or hypotheses. This section allows you to discuss any patterns, unexpected findings, or limitations and suggest possible future research.
- Guidelines: Analyze your results in the context of your research question, linking them back to the background information from the introduction. Consider what your findings reveal, any limitations they may have, and how they might impact future work or policy. End with a brief conclusion summarizing your main insights.
Your final report should be professionally formatted, with each section clearly labeled and referenced. Aim for clarity, precision, and a well-organized presentation of your analysis.
Total Word Count: Approximately 2,500-3,000 words.
Supplementary Methods (no limit)
You can include a separate document titled Supplementary Methods.
Purpose: Share any mathematical derivations, data visualizations, or tables needed to justify the choices described in the Methods Section. You can also provide further support for the claims made in the Results Section. You can refer to this document in the main report.
Guidelines: There is no limits in the length of this section nor on the number of figures and tables. However, be careful not to drown the grader with too much information.
Examples
Here are two examples or papers related to the two first topics:
GitHut Repository
We recommend your repository include:
- Directories
code
,data
, anddocs
. - At least one script for wrangling in the
code
directory. - There should be one file called
final-project.qmd
that can be rendered to produce the final report. This can be in thecode
or home directories as long as it renders. - You should include a
README
file explaining how to reproduce all the results. - If you need to share raw data include it in the
raw-data
directory. Alternatively, you can include code that downloads the necessary data from the internet.
We expect to see at least five commits by each person.
Presentation
In addition to submitting a written report, you will prepare a brief (7-10 minute) presentation on your project. The sections in your presentation should be the same as the sections of your paper (with the exception of Abstract). The presentations will occur on April 17th and April 22nd in class.
In structuring your presentation, make sure to justify to your audience why you made the decisions you did. If you tried/considered multiple methods of data analysis, show some motivation for why you made the eventual decision. All figures that you show should be sufficiently explained to the audience. You should explain why the figures that you show support your conclusions. Design your slides and speech for clarity.
If you are working in a group, we expect that each group member will spend approximately the same amount of time speaking. Plan to leave 1-2 minutes for questions.
Covid-19
Data
You can use the data we downloaded in problem set 4.
We want you to examine the entire pandemic, until at least April 1, 2025 which means you will need to obtain population estimates through 2025.
We also want you to obtain daily or weekly overall mortality data for each state.
Tasks and questions
Divide the pandemic period, January 2020 to April 2025 into waves. Justify your choice with data visualization.
For each period compute the deaths rates by state. Describe which states did better or worse during the different periods.
Describe if COVID-19 became less or more virulent across the different periods.
Estimate excess mortality for each week for each state. Do COVID-19 deaths explain the excess mortality?
Repeat 2 but for excess mortality instead of COVID-19 deaths.
Excess mortality
Data
You can use puerto_rico_counts
in the excessmort package.
We want you to wrangle the data from the pdf report shared by the New York Times. https://github.com/c2-d2/pr_mort_official/raw/master/data/Mortalidad-RegDem-2015-17-NYT-part1.pdf
Tasks and questions
Examine the population sizes by age group and sex. Describe any interesting patterns.
Use data from before 2017 to estimate expected mortality and a standard deviation for each week. Do this by age group and sex. Describe tendencies you observe. You can combine data into bigger age groups if the data show they have similar death rates.
Explore the data to see if there are periods during or before 2017 that appear to have excess mortality. If so, explain and recompute expected death rates removing these periods.
Estimate excess deaths for each week of 2017-2018. Make sure you define the weeks so that one of the weeks starts the day María made landfall. Comment on excess mortality. Which age groups were affected? Were men and women affected differently?
Extract the data from the PDF shared with NY Times. Comment on how it matches with excessmort package data.