Homework-3

How to clone your repo

You clone your homework-3 repo exactly how we have been cloning AEs! Please see Moodle for more information.

How to turn your HW in

Homework is turned in via Gradescope. You can find the Gradescope HW-3 button on our Moodle page. Please remember to select your pages correctly when turning in your assignment. For more information, please see on Moodle: Submit Homework on the Gradescope Website.

How to format your Homework

For each question (ex. Question 1), put a level two (two pound signs) section header with the name of the question.

For questions with multiple parts (ex. a, b, c), please put these labels in bold as normal text.

For example…

Question 1

Important

This homework is due Sunday, October 5th at 11:59pm.

Important

You will need to have at least 3 (meaningful) commits by the end of your homework assignment. Please practice proper version control techniques by committing and pushing after each answered question.

Packages

Start your document by making a Packages header, and copying this code and code chunk over into your .qmd file.

Use message: false and warning: false as code chunk arguments for this code chunk so you don’t get all of the extra unnecessary information when you render your document.

library(tidyverse)

Exercise 1 (15 points)

For each scenario, write out the appropriate null and alternative hypothesis in proper notation.

A professor wants to know if the average exam score for her class is different from the historical average of 80 on a standardized test. She believes her class will perform differently due to a new teaching method.
A space agency wants to compare the reliability of two different private companies, Provider A and Provider B, for launching satellites into orbit. The agency collects data on their recent launch history. Provider A has completed 200 launches, with 192 of them being successful. Provider B has completed 150 launches, with 138 of them being successful. The space agency wants to investigate if Provider B has less safe launches, as they tend to user older technologies.
A politician claims that they have an approval rating of at least 50%. A political analyst wants to test this claim and believes the true approval rating is lower.

Exercise 2

Exploratory data analysis: Murderous Nurse (20 points)

For several years in the 1990s, Kristen Gilbert worked as a nurse in the intensive care unit (ICU) of the Veterans Administration Hospital in Northampton, Massachusetts. Over the course of her time there, other nurses came to suspect that she was killing patients by injecting them with the heart stimulant epinephrine. Gilbert was eventually arrested and charged with these murders. Part of the evidence presented against Gilbert at her murder trial was a statistical analysis of 1,641 randomly selected eight-hour shifts during the time Gilbert worked in the ICU. For each of these shifts, researchers recorded two variables: whether Gilbert worked on the shift and whether at least one patient died during the shift.

The data set you will be working with is called gilbert. Run the code below to create the data set.

gilbert <- 
  tibble(
    outcome = c(rep("died", 74), 
                rep("no-death", 1567)),
    working = c(rep("gilbert", 40), 
                rep("no-gilbert", 34), 
                rep("gilbert", 217), 
                rep("no-gilbert", 1350)))

What are the observational units. That is, whom or what exactly are the data being collected off of.
Classify each variable as categorical or quantitative. Additionally, classify if this is our response variable or explanatory variable.

Whether Gilbert worked on the shift

Whether at least one patient died during the shift

Create a summary table using summarize() and group_by() to summarize these data. Next write out your statistic in proper notation.
Now, create a proper data visualization to help explore these data. Comment on a patterns you observe with these data. Hint: In the appropriate geom, use position = "fill to create a visualization that is easier to read when we have differing sample sizes. Include appropriate labels.

Inference: Murderous Nurse (Confidence Interval) (40 points)

Before we explore Kristen Gilbert, we want to investigate the proportion of deaths at this hospital, by itself. Specifically we want to estimate the true proportion of deaths at this hospital.

Report the appropriate statistic, using proper notation, for the proportion of deaths at this hospital, regardless if Gilbert was working.
Can we justify creating a 90% confidence interval? That is, can we trust the restults? Justify your answer below.
Regardless to your answer in part b, we are going to create a 90% confidence interval. First, use qnorm()to find the appropriate \(Z^*\) to create a 90% confidence interval.
Now, calculate the SE(\(\hat{p}\)). Show your work.
Report your margin of error here.
Now report your 90% confidence interval AND interpret it in the context of the problem.
Now, caluclate a 70% confidence interval. Use qnorm() to find the appropriate \(Z^*\). You do not have to interpret this confidence interval.
Discuss one benefit and one drawback of having a lower confidence level below.
Interpret your confidence level (90%) below. That is, what is the meaning of 90% confident? Note: This is not an interpretation of your single confidence interval.

Inference: Murderous Nurse (Hypothesis Test) (20 points)

Now, we are going to investigate if there is a relationship between Gilbert working, and deaths at the hospital. Specifically, we want to investigate if there more deaths occur while Gilbert is working.

Write out your null and alternative hypotheses below in both words and in proper notation. Use informative subscripts.

Ho:

Ha:

We are now going to conduct a Z-test to test your null/alternative hypothesis.

Before conducting this test, justify if we can trust the results of a z-test below? Why or why not.

Regardless of your answer, calculate your z-statistic below. Show your work.
Use the pnorm() function to calculate a p-value for this hypothesis test.
Now, write an appropriate decision and conclusion in the context of the problem.

Exercise 3 (10-points)

A software company wants to test if a new user interface (UI) design for its mobile app affects the proportion of users who complete a core task (e.g., sharing a photo). They set up an A/B test with a control group (Old UI) and a test group (New UI). They set up the following hypotheses:

\(H_o: \pi_n - \pi_o = 0\)

\(H_a: \pi_n - \pi_0 \neq 0\)

After they collect some data, they calculate a z-statistic of -0.34, with an estimated p-value of 0.734.

The researchers than conclude that there is no difference between the true proportion of users in the test group who complete a core task vs the true proportion of users in the control group who complete a core task.

In ask much detail as possible, critique and correct their claim.