Lab 9 - SQL

Intro

In this lab, you will be working with a hospital database. There will be four tables that we will be working with: doctor, visit, patient, and payment. The main goal of the lab is to gain experience working with SQL databases. First, we will set up our SQL database:

# install.packages(c("RSQLite", "DBI"))

library(RSQLite)
library(DBI)

# Initialize a temporary in memory database
con <- dbConnect(SQLite(), ":memory:")

# Download tables
doctor <- read.csv("https://raw.githubusercontent.com/dmcable/BIOSTAT620W26/main/data/hospital/doctor.csv")
visit <- read.csv("https://raw.githubusercontent.com/dmcable/BIOSTAT620W26/main/data/hospital/visit.csv")
patient <- read.csv("https://raw.githubusercontent.com/dmcable/BIOSTAT620W26/main/data/hospital/patient.csv")
payment <- read.csv("https://raw.githubusercontent.com/dmcable/BIOSTAT620W26/main/data/hospital/payment.csv")

# Copy data.frames to database
dbWriteTable(con, "doctor", doctor)
dbWriteTable(con, "visit", visit)
dbWriteTable(con, "patient", patient)
dbWriteTable(con, "payment", payment)
dbListTables(con)

TIP: You can use the following QUERY to see the structure of a table

PRAGMA table_info(doctor)

SQL references:

https://www.w3schools.com/sql/

Exercise 1

Edit the code below to retrieve the doctor ID, first name and last name for all doctors using the doctor table. Sort by last name and then by first name (note that the code chunk below is set up to run SQL code rather than R code).

SELECT 
FROM 
ORDER BY

Exercise 2

Retrieve the doctor ID, first name, and last name for doctors whose last name equals ‘WILLIAMS’ or ‘DAVIS’.

SELECT 
FROM 
WHERE ___ IN ('WILLIAMS', 'DAVIS')

Exercise 3

Write a query against the visit table that returns the IDs of the patients who visited a hospital starting on July 5, 2005 (use the visit_start_date column, and you can use the date() function to ignore the time component). Include a single row for each distinct patient ID.

SELECT DISTINCT 
FROM 
WHERE date(___) = '2005-07-05'

Exercise 4

Exercise 4.1

Construct a query that retrieves all columns of rows from the payment table where the amount is either 199, 799, or 999.

SELECT *
FROM ___
WHERE ___ IN (199, 799, 999)

Exercise 4.2

Construct a query that retrieves all columns of rows from the payment table where the amount is greater then 500.

SELECT *
FROM 
WHERE 

Exercise 4.3

Construct a query that retrieves all columns of rows from the payment table where the amount is greater then 500 and less then 800.

SELECT *
FROM ___
WHERE ___ AND ___

Exercise 5

Retrieve all the payment IDs and their amounts from the patients whose last name is ‘DAVIS’.

SELECT 
FROM 
  INNER JOIN 
WHERE 
AND 

Exercise 6

Exercise 6.1

Use COUNT(*) to count the number of rows in visit.

Exercise 6.2

Use COUNT(*) and GROUP BY to count the number of visits for each patient_id.

Exercise 6.3

Repeat the previous query and sort by the count in descending order.

Exercise 6.4

Repeat the previous query but use HAVING to only keep the groups with 40 or more.

Exercise 7

Write a query that calculates a number of summary statistics for amount in the payment table using MAX, MIN, AVG and SUM

Exercise 7.1

Modify the above query to do those calculations for each patient_id.

Exercise 7.2

Modify the above query to only keep the patient_ids that have more then 5 payments.

Exercise 8.1

Create a SQL query to join the doctor and visit tables. Then, use SQL to compute the average visit length for each doctor. Hint: use the SQLite function julianday to convert a text date into a number.

Exercise 8.2

What are the names of the doctors with the 5 longest and 5 shortest average visits? Hint: use the LIMIT command.

Cleanup

Run the following chunk to disconnect from the connection.

# clean up
dbDisconnect(con)