Part IV. Ways to Model Data

Investigations 22-25: The Normal Distribution

Normal Distribution. The binomial distribution and the Poisson distribution are examples of discrete functions in that they predict the probability of a discrete event, such as finding exactly two green M&Ms in the next bag of M&Ms that we open. Not all data we might collect on M&Ms, however, is discrete.

Investigation 22. Explain why we cannot use the binomial distribution or the Poisson distribution to the net weight of M&Ms in Table 2.

To model the net weight of packages of M&Ms, we use the normal distribution, which gives the probability of obtaining a particular outcome from a population with a known mean, \(\mu\), and a known variance, \({ \sigma }^{ 2 }\). Mathematically, we express the normal distribution as

\[P\left( X \right) =\frac { 1 }{ \sqrt { 2\pi { \sigma }^{ 2 } } } { e }^{ -{ \left( X-\mu \right) }^{ 2 }/2\pi { \sigma }^{ 2 } }\]

Figure 6 shows the normal distribution curves for \(\mu =0\) and for variances of 25, 100, and 400.

Investigation 23. Using the curves in Figure 6 as an example, discuss the general features of a normal distribution, giving particular attention to the importance of variance. How do you think the areas under the three curves from \(-\infty \) to \(+\infty\) are related to each other? Why might this be important?

Because the equation for a normal distribution depends solely on the population’s mean, \(\mu\), and variance, \({ \sigma }^{ 2 }\), the probability that a sample drawn from a population has a value between any two arbitrary limits is the same for all populations. For example, 68.26% of all samples drawn from a normally distributed population will have values within the range \(\mu \pm \sigma \), and only 0.621% will have values greater than \(\mu +2.5\sigma \); see Appendix 2 for further details.

Investigation 24. Assuming that the mean, \(\overline { x } \), and the standard deviation, \(s\), for the net weight of our samples of M&Ms are good estimates for the population’s mean, \(\mu\), and standard deviation, \(\sigma\), what is the probability that the contents of a 1.69-oz bag of plain M&Ms selected at random will weigh less than the stated net weight of 1.69 oz? Suppose the manufacturer wants to reduce this probability to no more than 5%: How might they accomplish this?

For a binomial distribution, if \(N\times P \ge 5\) and \(N\times \left( 1-P \right) \ge 5\), then a normal distribution closely approximates a binomial distribution; the same is true for a Poisson distribution when \(\lambda \ge 20\).

Investigation 25. Suppose we arrange to collect samples of plain M&Ms such that each sample contains 330 M&Ms—an amount roughly equivalent to a 10-oz bag of plain M&Ms—drawn from the same population as the data in Table 2. Can we model this data using a normal distribution in place of the binomial distribution or the Poisson distribution? What advantages are there in being able to use the normal distribution? How might you apply this to more practical analytical problems, such as determining the concentration of Pb2+ in soil?