Unit 7: Chance Variation

Text reading and homework:

Read chapters 16 and 17 of FPP and do the following review exercises:
Chapter 16 (pages 285-286): 1, 2, 3, 4, 6, 7, 8, 10
Chapter 17 (pages 304-307): 1, 2, 4, 7, 8, 9, 12, 13

Reading:

"We're Measuring Bacteria With a Yardstick," by John Allen Paulos, New York Times,, 22 Nov 2000.

Document source: "We're Measuring Bacteria With a Yardstick" by John Allen Paulos, New York Times website, published November 22, 2000.

Possible essay questions:

Do you think the article was relevant to the imminent presidential election? Would it be relevant to a local election like, say, for the mayor of Hamilton?
Could the (tongue-in-cheek) remedy suggested by Paulos for the dilemma of the article ever become law? If so, how could it be decided when it must be used?

Computer project:

Description

How close should the count of heads on 40 coin tosses be to 20? I.e., what spread should we expect in this count? Notice that the spread of the count is a different thing than the spread of the collection of flip results: The spread in the collection of flips is the standard deviation (SD) of a list of 1's (heads) or 0's (tails), roughly equal in number, so this SD will be close .5. The SE of the count of heads is the theoretical approximation to the variability (SD) of many counts of heads -- the counts will be close to half the number of flips, but the SE relates to how much the counts can vary given the number of flips. Similarly, the SE of the percentage of heads is the theoretical approximation to the variability (SD) in the percentages of heads on a given number of flips.

In this project we explore how the spread of the counts and the average of the counts change as the number of flips increase.

Preliminary Write-up

Using techniques from class, and before going to the computer, write up what theory predicts for the results as follows:

Estimate the number of heads you would expect to get and the spread in this number of heads for three cases of the number of flips: 10 flips, 40 flips and 160 flips.
As the number of flips is increased by a factor of 4 from 10 to 40 and from 40 to 160, how does this affect the expected value of the count (what factor of increase occurs)? By what factor does the standard error of the count increase?
Consider doing the previous experiment 30 times and looking for exactly half the flips being heads in as many cases out of the 30 as possible. Would you get exactly half more often with a 10-flip experiment, 40-flip experiment or 160-flip experiment? (It is possible to compute the exact binomial probability of exactly 5 heads in 10 flips, but because the binomial coefficients C(40,20) and C(160,80) are harder to compute, for the 40-flip case you will want to use normal approximation to find P(between 19.5 and 20.5) in a normal distribution with average 20 and SD .5(sqrt(40)), and similarly for 160-flip case.)

Simulations

Use Excel to perform the following coin tossing simulations. Use the random number (RAND) and IF functions to simulate a coin toss. (See the simulation instructions if needed.) Structure your spreadsheet in an organized fashion with, for example, each simulation being one column. The results of the simulation can then be placed conveniently at the top or bottom of the column. After you check to make sure one simulation is working you can copy that column 30 times and create summary statistics for the entire 30 simulations somewhere convenient. It might be convenient to overlap parts (a), (b) and (c) of the computer simulation: arrange 160 flips of a coin in a column, and find the counts of the number of heads in the first 10 flips, in the first 40 flips, and in the 160 flips.

Simulate tossing a fair coin 10 times and counting the number of heads. Do the simulation 30 times and compute the average and SD of the 30 counts. Note that the SD function is used here to approximate the SE of the counts: Theory says the average of these 30 counts should be close to the expected value and the SD of the counts should be close to the SE of the counts.
Do 30 simulations of tossing a fair coin 40 times and counting the number of heads. Compute the average and SD of the 30 counts.
Do 30 simulations of tossing a fair coin 160 times and counting the number of heads. Compute the average and SD of the 30 counts.
The numbers of tosses was increased by a factor of 4 from (a) to (b) and from (b) to (c). How did this affect the average and SD (in terms of factor of increase -- i.e., unchanged, factor of 4, factor of 0.5, etc.)? Compare these factor increases with your preliminary predictions from 2) above.
Looking at the counts for each set of simulations, how many times out of 30 did you get exactly half of the flips being heads? In which case, (a), (b) or (c), are you most likely to get heads on exactly half the number of tosses? Compare with your preliminary theory from 3) above.

(For each of these simulations, print out the counts, average and SD appropriately labeled. To save paper, do not print out the simulations in their entirety.) A source of help for this project is a video outlining a similar project.