if (!file.exists("covid_processed.csv"))
download.file(
url = "https://raw.githubusercontent.com/dmcable/BIOSTAT620W26/main/data/covid/covid_processed.csv",
destfile = "covid_processed.csv",
method = "libcurl",
timeout = 60
)Lab 4 - Data Visualization
Learning Goals
- Read in and prepare the COVID-19 dataset
- Create several graphs with different
geoms()inggplot2 - Create a facet graph
- Customize your plots
- Create a detailed map
Lab Description
In this lab, we will work with the COVID-19 data from the CDC. The data has already been preprocessed.
The objective of the lab is to examine the association between weekly average COVID-19 death and vaccination rates in ten regions of the US.
Note the following variable definitions:
mmwr_year: year (according to the MMWR Calendar)mmwr_week: week (according to the MMWR Calendar)pop: state populationcases: weekly COVID caseshosp: weekly COVID hospitilizationsdeaths: weekly COVID deathsseries: cumulative vaccinated populationbooster: cumulative boosted population
Steps
1. Read in the data
First download and then read in with read.csv()
2. Prepare the data
- Since states have different population sizes, make sure that you are working with population-normalized rates (e.g. units cases per 100,000 population). Transform the following variables accordingly:
cases,hosp,booster,series,deaths. - Use the
ymdfunction from thelubridateto convert the date variable to a Date object. This is easier to work with for arithmetic and plotting. - Compute the mean by state of the normalized variables above. We recommend creating a new data.frame with the state-averaged data. Keep any other relevant variables from the original data (e.g. population, region etc).
- Create a logical variable for low vs high population.
- Make sure that region is a factor
3. Use geom_violin to examine the case rates and death rates by region
You saw how to use geom_boxplot in class. Try using geom_violin instead (take a look at the help). (hint: you will need to set the x aesthetic to 1)
- Use facets
- Fill color by region
- Describe what you observe in the graph
4. Use geom_point with stat_smooth to examine the association between time and case rates by region
- Filter out the data to exclude the first two weeks in the dataset
- Color points by region
- Fit a function with
stat_smoothby region. Doesmethod=lmor the default method work better? - For the default
stat_smooth, play around with thespanparameter to get a smooth that fits the data better without being too noisy. - Describe what you observe in the graph
5. Use geom_bar to create barplots of the states by population category colored by region
- Bars by population category using
position="dodge" - Change colors from the default. Color by region using
scale_fill_brewersee this - Create nice labels on the axes and add a title
- Describe what you observe in the graph
6. Use stat_summary to examine mean vaccination rate and death rate by region with standard deviation error bars
- Use
fun.data="mean_sdl"instat_summary. Look up whatmean_sdldoes. - Add another layer of
stats_summarybut change the geom to"errorbar"(see the help). - Describe the graph and what you observe
- Vaccination rates are …
- Death rates are…
7. Make a map showing the spatial trend in COVID deaths in the US
- Use
geom_polygonandmap_data. Modify the given code. - Merge the
map_datadata.frame with your COVID data.frame to add in the COVID data to the map data. Hint: useleft_join. - Use a color palette with custom colors
- Make sure that your legend is labelled correctly.
- Add a * or other label for the top 10 states in highest death rate.
- Make sure that the aspect ratio is appropriate
- Describe your observations
- Describe the trend in COVID death rates across the US
8. Use a ggplot extension
- Pick an extension (except
cowplot) from here and make a plot of your choice using the COVID data - You might want to try examples that come with the extension first (e.g.
ggtech,gganimate,ggforce)
9. Submission (due Tuesday, Feb 10 at 8:30am)
Submit through the course Github issues page.