R Users Guide To Stat 201: Chapter 8 And 9 08 09
User Manual: Pdf
Open the PDF directly: View PDF .
Page Count: 7
Download | ![]() |
Open PDF In Browser | View PDF |
R Users Guide to Stat 201: Chapter 8 and 9 Michael Shyne, 2017 Chapters 8 and 9: Hypothesis Testing Chapters 8 and 9 both concern hypothesis testing, specifically proportion tests and mean tests. Chapter 8 introduces one sample tests and chapter 9 continues with two sample tests. However, the R functions are the same for one sample and two sample tests with only small changes to parameters. Thus, we will tackle both chapters together in this guide. Proportion tests One sample proportion tests are conducted to determine whether a sample is drawn from a population that has a particular proportion of “successes.” In R, they are conducted using the prop.test() function, which takes the parameters x (number of successes) and n (sample size). This is equivalent to the StatCrunch test “with summary”. To conduct a test of a sample with 125 successes out of a sample size of 300, prop.test(125, 300, correct=FALSE ) ## ## ## ## ## ## ## ## ## ## ## 1-sample proportions test without continuity correction data: 125 out of 300, null probability 0.5 X-squared = 8.3333, df = 1, p-value = 0.003892 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.3622761 0.4731644 sample estimates: p 0.4166667 When running an hypothesis test, often the first thing we are interest in is the p-value. This can be found on the second line of output (not counting the title and blank lines). For this test, p-value = 0.003892. Next, we would like to know the test statistic. This is also found in the second line. However, we are not given a z value as we might expect. Instead, R reports X-squared = 8.3333. The X here is not really an “X”, but rather represents the Greek letter “chi” (χ). Thus, the test statistic is the χ2 (chi-squared) statistic. The chi2 distribution will be introduced later in the course. For now, it suffices to note that chi2 is related to the z distribution squared. In fact, for our one sample proportion test, the the chi2 statistic is the square of the z statistic. To find the z test statistic, simply take the square root of chi2 value keeping in mind the z statistic will be negative is the sample proportion is less than the null proportion (more on this later). The test output gives us a lot more information. The first line reports the data used to conduct the test, as well as the proportion test against as the null probability. Since we did not specify a proportion to test, the function used the default value of 0.5. The third line tells us the alternative hypothesis used. Again, since we did not specify a value, it used the default alternative hypothesis of “not equal” to the null proportion. Lines four and five give us a confidence interval, defaulting to a 95% interval. Lines six and seven report the sample proportion. You should have noticed the correct=FALSE parameter we added to this first function call. By default, R will apply a Yates’ continuity correct to the test. The details of this correct are beyond the scope of this class. In order to get the same results as StatCrunch, and the results MyStatLab expects, we need to always include the correct=FALSE option to prevent the correction from being used. 1 Of course, we will often want to change the proportion we are testing against and the alternative hypothesis test. To change the null proportion, include the p= parameter. It can take any valid proportion (0 < p < 1). To use a different alternative hypothesis, include the parameter alternative= (this can be shorted to alt=). It can take one of three strings, " “two.sided”, “less” or “greater” (these can also be abbreviated, i.e “two” or “great”). Thus, to test whether a population proportion is greater then 0.45, using the same data as above, prop.test(125, 300, p=0.45, alt='greater', correct=FALSE) ## ## ## ## ## ## ## ## ## ## ## # Don't forget the correction 1-sample proportions test without continuity correction data: 125 out of 300, null probability 0.45 X-squared = 1.3468, df = 1, p-value = 0.8771 alternative hypothesis: true p is greater than 0.45 95 percent confidence interval: 0.3707965 1.0000000 sample estimates: p 0.4166667 A note about confidence intervals and one-sided tests: As stated above the prop.test() function, as well as many other R functions, defaults to a 95% confidence interval. This can be adjusted by including the conf.int= parameter in the function call (for example, confi.int=0.9). However, when testing a one-sided alternative hypothesis (p > 0.035), R will report a “proper” confidence interval. Such intervals are not symmetric, putting the whole rejection region on one side or the other of the distribution. Thus, it is not necessary to employ to trick necessary in StatCrunch of calculating a confidence interval with a 2 × α rejection region. Test results as objects While the output from prop.test() provides a lot of useful information, sometimes we wish to use the results in a different manner. If we want to automatically store the results in some fashion, such as logging the results in an automated process, or perhaps we want to do further calculations with the results. The output of the proportion test function and, as we will see, many other R functions can be stored in a variable. p.test.results <- prop.test(125, 300, p=0.45, alt='greater', correct=FALSE) As you can see, when we do this we don’t get any kind of output. If we want to see the original output, we can simply type the variable name. p.test.results ## ## ## ## ## ## ## ## ## ## ## 1-sample proportions test without continuity correction data: 125 out of 300, null probability 0.45 X-squared = 1.3468, df = 1, p-value = 0.8771 alternative hypothesis: true p is greater than 0.45 95 percent confidence interval: 0.3707965 1.0000000 sample estimates: p 0.4166667 To see what information the variable contains, we can examine its structure. 2 str(p.test.results) ## List of 9 ## $ statistic : Named num 1.35 ## ..- attr(*, "names")= chr "X-squared" ## $ parameter : Named int 1 ## ..- attr(*, "names")= chr "df" ## $ p.value : num 0.877 ## $ estimate : Named num 0.417 ## ..- attr(*, "names")= chr "p" ## $ null.value : Named num 0.45 ## ..- attr(*, "names")= chr "p" ## $ conf.int : atomic [1:2] 0.371 1 ## ..- attr(*, "conf.level")= num 0.95 ## $ alternative: chr "greater" ## $ method : chr "1-sample proportions test without continuity correction" ## $ data.name : chr "125 out of 300, null probability 0.45" ## - attr(*, "class")= chr "htest" The output of the str() function can be intimidating, but if we look carefully, we can see out result object contains all the relevant information provided in the default output. By referencing the named parameters, we display specific values or do further calculations. # What is the p-value? p.test.results$p.value ## [1] 0.877081 # What is the test statistic? p.test.results$statistic ## X-squared ## 1.346801 # The z statistic is the square root of the chi-squared value z <- sqrt(p.test.results$statistic) names(z)<- "z statistic" # Change the "name" of the statistic # This is only for display purposes # If the sample proportion is less than the null proportion. # then z will have a negative value if (p.test.results$estimate < p.test.results$null.value) z <- -1 * z z ## z statistic ## -1.160518 Two sample proportion tests Two sample proportion tests are conducted much in the same way as one sample tests. The function prop.test() is still used, but now vectors of successes and sample sizes are passed to the function. For example, to test our original sample of 125 successes out of 300 against another sample with 142 successes out of a sample size of 290, 3 prop.test(c(125,142), c(300,290), correct=FALSE) ## ## ## ## ## ## ## ## ## ## ## ## 2-sample test for equality of proportions without continuity correction data: c(125, 142) out of c(300, 290) X-squared = 3.1708, df = 1, p-value = 0.07497 alternative hypothesis: two.sided 95 percent confidence interval: -0.153128869 0.007151858 sample estimates: prop 1 prop 2 0.4166667 0.4896552 Much of what we did with one sample proportions can be done with two sample proportions. One important difference is the the null proportion parameter p= works differently and, for our purposes, be ignored in two proportion tests. Proportion tests “with data” Often proportion data will be reported as successes out of sample size as we’ve been working with. Sometimes, however, you have a full data set you wish to use for hypothesis tests. While you could count values that correspond to successes and then conduct test as we have been, there are some sort cuts available to us in R. Consider the mtcars data set. The variable am refers to the presence of automatic transmission (0 = automatic, 1 = manual). To summarize this data, we can use the table() function. table(mtcars$am) ## ## 0 1 ## 19 13 We can see that there are 19 cars with automatic transmissions and 13 with manual transmissions. We can pass this table straight into prop.test(). prop.test(table(mtcars$am), correct=F) ## ## ## ## ## ## ## ## ## ## ## 1-sample proportions test without continuity correction data: table(mtcars$am), null probability 0.5 X-squared = 1.125, df = 1, p-value = 0.2888 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.4226002 0.7448037 sample estimates: p 0.59375 If we conduct the same test with the data is summary form (19 successes out of 32 sample size), we get identical results. prop.test(19, 32, correct=F) 4 ## ## ## ## ## ## ## ## ## ## ## 1-sample proportions test without continuity correction data: 19 out of 32, null probability 0.5 X-squared = 1.125, df = 1, p-value = 0.2888 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.4226002 0.7448037 sample estimates: p 0.59375 Keep in mind, R considers the first item in the table, alphabetically, the “successes”. In this example, the automatic transmissions with a 0 value were the successes. It can take some effort if you want to consider a different value the successes. If the variable you are testing has named values stored as character strings, it might require renaming the categories so that the category of interest comes first alphabetically. In our case, since the values are stored as 1 and 0, we can test 1 - am to “flip” the values. # Test 1 - am, so manual transmissions (am=1) come first prop.test(table(1-mtcars$am), correct=FALSE) ## ## ## ## ## ## ## ## ## ## ## 1-sample proportions test without continuity correction data: table(1 - mtcars$am), null probability 0.5 X-squared = 1.125, df = 1, p-value = 0.2888 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.2551963 0.5773998 sample estimates: p 0.40625 Mean tests (t tests) One sample mean tests are conducted to determine whether a sample is drawn from a population that has a particular mean parameter (µ). In R, we will use the function t.test(). This works similarly to prop.test() with a few notable exception. First, t.test() expects sample data. It does not work with summary data as StatCrunch can. Thus, the first parameter in the function call should be a vector of numeric data. Second, the function parameter for the null value is mu= rather than p=. Using the mtcars data set again, we can test if the mean fuel efficiency of the cars is less than 25 mpg. t.test(mtcars$mpg, mu=25, alt='less') ## ## ## ## ## ## ## ## ## ## One Sample t-test data: mtcars$mpg t = -4.6079, df = 31, p-value = 3.293e-05 alternative hypothesis: true mean is less than 25 95 percent confidence interval: -Inf 21.89707 sample estimates: mean of x 5 ## 20.09062 We can see that the output is very similar to the output for proportion tests. The p-value, test statistic (t-value) and confidence intervals are available. Like proportion tests, results can be stored in a variable for further manipulation. Two independent sample t tests Two sample t tests are conducted by passing two numeric vectors to t.test(). For example, suppose we wish to compare the fuel efficiency of cars with automatic transmissions and manual transmissions. mpg.auto <- mtcars$mpg[mtcars$am==0] mpg.manual <- mtcars$mpg[mtcars$am==1] # mpg of auto trans cars # mpg of manual trans cars # Are the mean mpgs the same? t.test(mpg.auto, mpg.manual) ## ## ## ## ## ## ## ## ## ## ## Welch Two Sample t-test data: mpg.auto and mpg.manual t = -3.7671, df = 18.332, p-value = 0.001374 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -11.280194 -3.209684 sample estimates: mean of x mean of y 17.14737 24.39231 Paired data t tests Paired data t tests are conducted by including paired=TRUE in the function call. The immer data set contains data from an experiment on barley yields. Various varieties of barley were planted in different locations over two years. We can test whether the yields changed from year 1 to year 2. require(MASS) # The dataset is in the MASS library ## Loading required package: MASS # Is the mean difference in yeilds (Y1 - Y2) zero? t.test(immer$Y1, immer$Y2, paired=TRUE) ## ## ## ## ## ## ## ## ## ## ## Paired t-test data: immer$Y1 and immer$Y2 t = 3.324, df = 29, p-value = 0.002413 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 6.121954 25.704713 sample estimates: mean of the differences 15.91333 6 t tests with summary data While there are not built-in functions to do t tests with summary data, it is not difficult to calculate test statistics “by hand” and then calculate p-values. Recall, given a sample mean x̄ and sample standard deviation s, the t test statistic for a test with a null hypothesis µ = µ0 is calculated by t= x̄ − µ0 . s Then, the p-value for a t test is the probability of a value equal to or more extreme than t in a t distribution. This can be calculated with the pt() function. For example, to test a sample with mean x̄ = 15.6, standard deviation on s = 2.1 and sample size n = 20 against H0 : µ = 13, x.bar <- 15.6 s <- 2.1 n <- 20 mu.0 <- 13 t <- (x.bar - mu.0)/s t ## [1] 1.238095 # Since t > 0, we want the upper tail p.val <- pt(t, df=n-1, lower.tail=FALSE) p.val ## [1] 0.1153808 Since this is the probability of just one tail, if you are conducting a one-sided test (Ha : µ > 13), this is your p-value. However, if you are conducting a two-sided test (Ha : µ 6= 13), remember to multiply this value by 2 to account for both tails. License This document is distributed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 7
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : No Page Count : 7 Page Mode : UseOutlines Author : Michael Shyne, 2017 Title : R Users Guide to Stat 201: Chapter 8 and 9 Subject : Creator : LaTeX with hyperref package Producer : pdfTeX-1.40.14 Create Date : 2017:08:27 09:16:24-05:00 Modify Date : 2017:08:27 09:16:24-05:00 Trapped : False PTEX Fullbanner : This is pdfTeX, Version 3.1415926-2.5-1.40.14 (TeX Live 2013/Debian) kpathsea version 6.1.1EXIF Metadata provided by EXIF.tools