Lab 4 - Data Visualization

Learning Goals

  • Read in and prepare the COVID-19 dataset
  • Create several graphs with different geoms() in ggplot2
  • Create a facet graph
  • Customize your plots
  • Create a detailed map

Lab Description

In this lab, we will work with the COVID-19 data from the CDC. The data has already been preprocessed.

The objective of the lab is to examine the association between weekly average COVID-19 death and vaccination rates in ten regions of the US.

Note the following variable definitions:

  • mmwr_year: year (according to the MMWR Calendar)
  • mmwr_week: week (according to the MMWR Calendar)
  • pop: state population
  • cases: weekly COVID cases
  • hosp: weekly COVID hospitilizations
  • deaths: weekly COVID deaths
  • series: cumulative vaccinated population
  • booster: cumulative boosted population

Steps

1. Read in the data

First download and then read in with read.csv()

if (!file.exists("covid_processed.csv"))
  download.file(
    url = "https://raw.githubusercontent.com/dmcable/BIOSTAT620W26/main/data/covid/covid_processed.csv",
    destfile = "covid_processed.csv",
    method   = "libcurl",
    timeout  = 60
    )

2. Prepare the data

  • Since states have different population sizes, make sure that you are working with population-normalized rates (e.g. units cases per 100,000 population). Transform the following variables accordingly: cases, hosp, booster, series, deaths.
  • Use the ymd function from the lubridate to convert the date variable to a Date object. This is easier to work with for arithmetic and plotting.
  • Compute the mean by state of the normalized variables above. We recommend creating a new data.frame with the state-averaged data. Keep any other relevant variables from the original data (e.g. population, region etc).
  • Create a logical variable for low vs high population.
  • Make sure that region is a factor

3. Use geom_violin to examine the case rates and death rates by region

You saw how to use geom_boxplot in class. Try using geom_violin instead (take a look at the help). (hint: you will need to set the x aesthetic to 1)

  • Use facets
  • Fill color by region
  • Describe what you observe in the graph

4. Use geom_point with stat_smooth to examine the association between time and case rates by region

  • Filter out the data to exclude the first two weeks in the dataset
  • Color points by region
  • Fit a function with stat_smooth by region. Does method=lm or the default method work better?
  • For the default stat_smooth, play around with the span parameter to get a smooth that fits the data better without being too noisy.
  • Describe what you observe in the graph

5. Use geom_bar to create barplots of the states by population category colored by region

  • Bars by population category using position="dodge"
  • Change colors from the default. Color by region using scale_fill_brewer see this
  • Create nice labels on the axes and add a title
  • Describe what you observe in the graph

6. Use stat_summary to examine mean vaccination rate and death rate by region with standard deviation error bars

  • Use fun.data="mean_sdl" in stat_summary. Look up what mean_sdl does.
  • Add another layer of stats_summary but change the geom to "errorbar" (see the help).
  • Describe the graph and what you observe
  • Vaccination rates are …
  • Death rates are…

7. Make a map showing the spatial trend in COVID deaths in the US

  • Use geom_polygon and map_data. Modify the given code.
  • Merge the map_data data.frame with your COVID data.frame to add in the COVID data to the map data. Hint: use left_join.
  • Use a color palette with custom colors
  • Make sure that your legend is labelled correctly.
  • Add a * or other label for the top 10 states in highest death rate.
  • Make sure that the aspect ratio is appropriate
  • Describe your observations
  • Describe the trend in COVID death rates across the US

8. Use a ggplot extension

  • Pick an extension (except cowplot) from here and make a plot of your choice using the COVID data
  • You might want to try examples that come with the extension first (e.g. ggtech, gganimate, ggforce)

9. Submission (due Tuesday, Feb 10 at 8:30am)

Submit through the course Github issues page.