Investigation 6: Box and Whisker Plots and Dot Plots

Suppose we are interested in characterizing 1.69-oz (47.9-g) packages of plain M&Ms. We obtain 30 bags (ten from each of three stores) and, for each bag, report the number of blue, brown, green, orange, red, and yellow M&Ms—for yellow, the number in parentheses is the number of yellow M&Ms in the first five drawn from the bag—and their combined net weight. Table 2 summarizes the data for the last six samples. The full set of data for all 30 samples is available as a separate spreadsheet or R file.

Table 2. Source, Distribution, and Net Weight of M&Ms

bag id

store

blue

brown

green

orange

red

yellow

net weight (g)

25

CVS

07

13

00

04

15

16 (2)

48.212

26

Target

06

15

01

13

10

14 (1)

51.682

27

CVS

05

17

06

04

08

19 (1)

50.802

28

Kroger

01

21

06

05

19

14 (0)

49.055

29

Target

04

12

06

05

13

14 (2)

46.577

30

Kroger

15

08

09

06

10

08 (1)

48.317

Having collected some data, our next step is to examine it for possible problems, such as missing values or errors introduced when we recorded the data, or to identify important variables and interesting patterns or trends within or between these variables. Although this information is embedded within the data itself, often it is difficult to see it when the data is displayed as a table, particularly if the data set is large in size. Instead, we use one or more simple visualizations of the data.

Two simple visualizations are box and whisker plots and dot plots, examples of which are shown in Figure 1 using the data for yellow M&Ms. Note that neither plot has meaningful information along the y-axis as the vertical dimension simply helps us visualize the data. The vertical distribution of points in the dot plot, for example, is the result of jittering, which offsets samples that share a com-mon value so that, we hope, each appears as a distinct point.

Investigation 6. Use the dot plot in Figure 1 to deduce the general structure of a box and whisker plot, giving particular attention to the position along the x-axis of the three vertical lines that make up the yellow box and the two vertical lines that make up the whiskers on either side of the yellow box. You might begin by tabulating the number of samples that fall to the left of the box, that fall within the box, including its boundaries, and that fall to the right of the box, and the number of samples that lie to the left and to the right of line inside the box.