Design of Experiments I

This lesson introduces sampling, design of experiments, and selecting simple random samples.

I. Objectives

  1. Distinguish between a population and a sample.
  2. Know several definitions.
  3. Introduce the concept of inference.

II. Reading Assignment

    No reading is required but an optional additional source for further reading is The Basic Practice of Statistics, David S. Moore, pp. 176-198.( See reserve desk at D. H. Hill, ask for Chapt 3, Producing data)

IV. Explanation and Examples

Conclusions we draw from data analysis are usually expected to hold for larger groups. If our data do not fairly represent these larger groups then our conclusions will most likely be inaccurate.

The entire group we want information about and for which we want to have conclusions is called the population. Since a population is usually too large to study as a whole, we take a sample from the population. The sample is the part of the population we actually examine. The process of studying a part to get information on an entire group is called sampling.

There are different ways to sample individuals from a population. Samples can be taken from either an observational study or an experiment. An observational study observes individuals and measures variables of interest but doesn't attempt to influence the responses. An experiment, on the other hand, deliberately imposes some treatment on individuals in order to observe their responses.

For example, a survey or a poll would be an observational study. When asking a subject questions in a survey, we don't want to influence the subject's response. When taking a survey we can either select the subjects ourselves or let them respond on their own volition. The second type, consisting of people who choose themselves by responding to a general appeal, is called a voluntary response sample. Voluntary response samples are often used by T.V. programs such as Entertainment Tonight or shows on ESPN that want to measure public opinion by having people call the show and respond to simple questions. In general one should be very cautious in interpreting the results from voluntary response samples.

Experiments are used when we want to see how individuals respond to some kind of stimulus or treatment. We perform the experiment on the experimental units. If the experimental units are people, we will call them subjects. A specific experimental condition applied to the experimental units is called a treatment. The treatment can be considered an explanatory variable since it is controlled by the researcher and we wish to see if it can explain some kind of reaction in the experimental unit. The variable measured to judge the reaction is called the response.

It is important to keep in mind what your population is throughout designing, conducting, and finally making conclusions for your study. If you want your conclusions to apply to all pet owners, you can't just sample from dog owners. You must include cat owners, fish owners, bird owners, snake owners, etc. When we claim that our conclusions made on a sample hold for the population, we are making an inference from the sample to the population.

Example. We go to a local mall and ask people who walk by whether they are prochoice or prolife on the abortion issue. Some people freely respond to our question and others profess to have no opinion.

Is this an experiment or an observational study?

This is an observational study since the people have the option of answering our question. Nonresponse occurs when an individual chosen for the sample does not answer our question. In general, a nonresponse occurs when an individual can't be contacted or refuses to cooperate.

Example. We work for a company that makes paint. We believe our paint dries faster than another brand. In order to prove this, we take 8 plywood boards and paint them. We paint 4 with our paint and 4 with our competitor's. We measure the time for each board to dry.

Is this an experiment or an observational study?

This is an experiment since we deliberately impose a treatment. We painted each of these boards so that we could observe how long each took to dry.

We need to be able to identify several things in an experiment, some of which are: the treatment, the experimental units, and the response. In our paint experiment, the treatment is the condition imposed by the researcher which is the type of paint applied to the boards. The experimental unit is a single board. The response is what is measured as the outcome of our experiment. In this case it is the time it takes for the paint to dry.

In this course, we will concentrate more on experiments rather than observational studies. In engineering, it is more common to design an experiment than to take a poll or survey. Moreover, the conclusions to be drawn from well-run experiments are much more trustworthy than those from observational studies.

Things can go wrong with an experiment, however. Suppose a few or our boards in the last example were damp before we painted them. Naturally, we want to show our product is better than our competitor's but it would be cheating to pick the dry boards for our paint. If we design our experiment so that it systematically favors certain outcomes, it is biased.

To keep from introducing bias into our studies, we use random chance. We select our subjects at random from our population. This keeps us from inadvertently choosing experimental units that will respond how we wish. There are many ways to choose a sample randomly. For our example in the mall, we could poll every fourth person who comes in each door. Many techniques have been designed to draw random samples, but here we shall only talk about the simplest method, called simple random sampling.

Choosing a Simple Random Sample

Step 1: Label. Assign a numerical label to every individual in the population.

Step 2: Table. Use a random digits table to select labels at random.

Example. Suppose we want to know the bacteria levels in vats that we use to make ice cream. The measuring process takes time and money so that we can only afford to sample 5 of the 50 vats.

Step 1: We label each vat with a number from 1 to 50.

Step 2. We use the random digits table to select 5 numbers.

When using our random digits table, we first need to select a starting position in the table. This should also be done at random since if we were to perform another study, we wouldn't want to start at the same position on the table which would result in the same numbers being selected. As an arbitrary choice, let's choose to start at fifth line, fourth column of numbers, third digit. (Following along on a table of 2000 Random digits) The digits are as follows:

571 31010 24674 05455 61427 77938 91936 74029 43902

Since our largest possible number is 50, having 2 digits, we will read across the table in 2-digit increments. The first number from the table is 57, followed by 13, 10, 10, 24, 67, 40, 54, 55, and then 61. Since 50 is the largest number we can use, we keep only 13, 10, 24, and 40. Notice also that 10 appears twice in the list of random digits, but we will only use 10 once since we do not want to sample from a vat twice. We still need one more vat to sample from 5, so the next number from the table is 42. Now we can perform the measuring of vats 10, 13, 24, 40, and 42 to see if the bacteria levels are within acceptable levels. Our population is all 50 vats and the sample is these 5 vats that we have listed above. If all the vats in our sample of size 5 are acceptable, we will infer that the whole population is within acceptable limits. Then we can make our ice cream.

In S-Lab we can draw random samples very simply using the sample function. Here is an illustration of drawing a sample of size n=5 from a population of size 50:

> set.seed(395)            sets seed of random generator,
                           use any number 1-1000
> sample(50,5)             population size goes first,
                           then sample size
[1] 47 14  4 17 49

> sample(50,5)             get another sample
[1] 36 12 28 25 34

> sample(50,5)             and a 3rd sample
[1] 27  3 48 37 39

> set.seed(395)            just to prove that we can get
                           the first sample again
> sample(50,5)
[1] 47 14  4 17 49
You might be interested to know that there are 50!/(5!45!)=2,118,760 different samples of size n=5 that can be selected from a population of size 50.

Random Allocation of Experimental Units to Treatments

In an experiment it is important to randomly assign the experimental units to the different treatments. This protects us from bias, and we will talk more about this in the next lesson. But here we want to illustrate how to do this with the random digits table or in S-Lab.

Example Returning to the example of 8 boards and 2 treatments, lets think about how to assign 4 boards to treatment 1 (our paint) and 4 boards to treatment 2 (our competitor's paint). We could do this as above by simply drawing a sample of size n=4 (from a population of size 8) to be given treatment 1 with the remaining 4 to get treatment 2. We suggest an alternate method here that will also work easily for situations with more than 2 treatments.

Step 1: Label. Number the boards 1, 2, ..., 8.

Step 2: Randomize. Use a random digits table or the computer to get a random ordering (called a permutation) of the integers 1 to 8. The first 4 of these are assigned to treatment 1, the second 4 get treatment 2. To make our life easy, suppose we use the same starting place in the random digit table (Table C.10, p. 454) as above:

571 31010 24674 05455 61427 77938 91936 74029 43902

Going through these one digit at a time, ignoring 0 and 9 and any repeats, our random ordering would be (5,7,1,3,2,4,6,8). Thus boards 5,7,1,3 get treatment 1 and the remaining get treatment 2. Of course we didn't really need to get the last 4 digits. However, suppose that we had 12 boards and 3 treatments. If we get a random ordering of the digits 1-12, then the first 4 get treatment 1, the second 4 get treatment 2, and the remaining 4 get treatment 3.

In S-Lab getting a random ordering is very simple:

> set.seed(478)

> sample(8)
[1] 2 4 8 1 7 5 3 6
So here boards 2,4,8,1 get treatment 1.

For the 3 treatment extension just mentioned

> set.seed(438)

> sample(12)
 [1] 11  7 12  2  4  8  5 10  3  1  6  9
Here boards 11,7,12,2 get treatment 1, boards 4,8,5,10 get treatment 2, and boards 3,1,6,9 get treatment 3.


This page was last modified on Wednesday, January 6, 1999.