This is version A of the test. Test 1 The problems are worked the same way on all versions. St 301 Fall 1993 D. A. Dickey 1. I put 29 people on a diet for 15 days and record their weight change (day 15 weight minus day 1 weight) as follows: -3.0, 2.1, 0.3, -1.3, -0.1, 1.2, 0.5, -0.3. -2.1, -0.5, -2.6, -1.0, -0.2, 0.3, -2.3, 0.7, -3.2, -1.1, 0.1, -1.7, -1.9, -0.8, 0.3, 1.2, -2.9, -3.4, -3.1, -2.8, -1.3 A. (20 points) Make a stem and leaf plot with two leaf rows per stem value. The order of digits within a leaf row does not matter and the stem values can be in ascending or descending order. .| -3 *| 0241 .| 698 -2 *| 13 .| 79 -1 *| 3013 <-- -1.0 is 15th observation in order .| 58 -0 *| 132 0 *| 3313 \ .| 57 \ 1 *| 22 9/29 ths of the data .| / 2 *| 1 / B. (18 points) Compute the relative frequency of people not losing weight on this diet (9/29= 31%) Compute the median weight change (-1.0 ) Compute the third quartile (75th percentile) of these weight changes (8th observation from the top so 0.3 notice that 0.1 is further from the top than 0.3 and the leaf row 3313 is not in any particular order - be careful! ) 2. A sample of five lightbulbs gave lifetimes 530, 825, 750, 915, and X. (4 points) What would X have to be to make the sample mean 800? X= __980__ (4 points) What would X have to be to make the sample median 800? X= _800_ 530<750< median <825<915 3. I had a random sample of 10 student SAT scores. I subtracted the sample mean from each score to get 10 deviations from the sample mean. I then squared all 10 of these deviations and added the squares getting 9000. A. (4 points) Compute the sample variance. _ 2 The problem is giving us sum (X(i)-X ) = 9000 so divide by n-1=9 (SAMPLE variance) to get 1000 B. (4 points) What is one advantage of reporting the standard deviation rather than the variance as a measure of dispersion? It is in same units (dollars, pounds, days) as data instead of squared units. C. (4 points) What is one advantage of reporting the interquartile range rather than the variance or standard deviation as a measure of dispersion? Insensitivity to outliers (Koopmans also points out the interpretation as enclosing 50% of the data. It doesn't really tell you directly about outliers. If the iqr is 20, are there outliers? We have no idea! ) 4. (30 points) Here is a stem and leaf display computed as in our text for 52 grade point averages (GPAs) of students. Note that there are five leaf rows per stem value and the numbers within each leaf row are in random order. 2 * | 0 t | 3 GPA is stem.leaf f | 4 s | | 8 9 3 * | 0 1 1 0 0 t | 2 3 2 2 3 3 3 3 3 3 f | 4 4 5 5 4 4 4 5 5 4 4 5 s | 6 7 6 7 7 6 6 7 7 7 6 7 6 6 7 | 8 9 9 8 4 * | 0 In order to construct a box plot, we would put the top and bottom of the box at what GPA numbers? _2.65__ and __3.25___ (52+1)/2 = 26.5 median location => (26+1)/2=13.5 quartile location. Leaves not in order so be careful. Count down 13 from top IN ORDER 4.0 3.9 3.9 3.8 3.8 3.7 3.7 3.7 3.7 3.7 3.7 3.7 3.7 * 3.6 3.6 3.65 = q3 Count up from bottom in order and you are between 3.2 and 3.3 so q1 = 3.25 Two lines or "whiskers" extend from the box, one going to __2.8___ and another to __4.0_____ Koopmans page 53 top - lines (whiskers) extend to most extreme OBSERVATIONS in "adjacent region" (within 1.5 iqr) , NOT to the boundaries of the adjacent regions . Boundaries are 3.25 - 1.5*(3.65-3.25) = 2.65 and 3.65 + 1.5(3.65-3.25) = 4.25. List any minor outliers in the dataset __________________________ 3 iqr below 3.25 is 2.05 and anything between 2.05 and 2.65 is minor outlier. THis gives 2.3 and 2.4. List any major, or extreme, outliers in this dataset __2.0 (below 2.05) No outliers on positive side. 5. (12 points) I am working in SAS with a dataset that has variables WEIGHT and HEIGHT with the observations in random order. I want to print out the data in increasing order of HEIGHT. Fill in the program so that, when I submit it, I will get what I want in the output window. DATA CLASS; INPUT HEIGHT WEIGHT; CARDS; 62 105 71 138 67 148 68 135 73 152 64 112 ; <--- Fill in remaining code here.