Problem set 10

Published

April 20, 2025

The data for this problem set is provided by this link: https://github.com/dmcable/BIOSTAT620/raw/refs/heads/main/data/pset-10-mnist.rds

Read this object into R. For example, you can use:

fn <- tempfile()
download.file("https://github.com/dmcable/BIOSTAT620/raw/refs/heads/main/data/pset-10-mnist.rds", fn)
dat <- readRDS(fn)
file.remove(fn)

The object is a list with two components dat$train and dat$test. Use the data in dat$train to develop a machine learning algorithms to predict the labels for the images in the dat$test$images component.

Save the your predicted labels in an object called digit_predictions. This should be a vector of integers with length nrow(dat$test$images). It is important that the digit_predictions is ordered to match the rows of dat$test$images.

Save the object to a file called digit_predictions.rds using:

saveRDS(digit_predictions, file = "digit_predictions.rds")

You will submit:

  1. The file digit_predictions.rds

  2. A quarto file that reproduces your analysis and provides brief explanations for your choices.

If your code reproduces the result, your grade will be your accuracy rounded up the closest integer. So, for example, if your accuracy is .993 your grade will be 100%. Depending on the distribution of accuracy values, the teaching staff may issue an update about the grading system used.

You will have one opportunities to redo your predictions after you see your accuracy from your first submission.