Part II. Ways to Visualize Data

Investigation 6: Box and Whisker Plots and Dot Plots

Suppose we are interested in characterizing 1.69-oz (47.9-g) packages of plain M&Ms. We obtain 30 bags (ten from each of three stores) and, for each bag, report the number of blue, brown, green, orange, red, and yellow M&Ms—for yellow, the number in parentheses is the number of yellow M&Ms in the first five drawn from the bag—and their combined net weight. Table 2 summarizes the data for the last six samples. The full set of data for all 30 samples is available as a separate spreadsheet or R file.

Table 2. Source, Distribution, and Net Weight of M&Ms
bag id store blue brown green orange red yellow net weight (g)
25 CVS 07 13 00 04 15 16 (2) 48.212
26 Target 06 15 01 13 10 14 (1) 51.682
27 CVS 05 17 06 04 08 19 (1) 50.802
28 Kroger 01 21 06 05 19 14 (0) 49.055
29 Target 04 12 06 05 13 14 (2) 46.577
30 Kroger 15 08 09 06 10 08 (1) 48.317

Having collected some data, our next step is to examine it for possible problems, such as missing values or errors introduced when we recorded the data, or to identify important variables and interesting patterns or trends within or between these variables. Although this information is embedded within the data itself, often it is difficult to see it when the data is displayed as a table, particularly if the data set is large in size. Instead, we use one or more simple visualizations of the data.

Two simple visualizations are box and whisker plots and dot plots, examples of which are shown in Figure 1 using the data for yellow M&Ms. Note that neither plot has meaningful information along the y-axis as the vertical dimension simply helps us visualize the data. The vertical distribution of points in the dot plot, for example, is the result of jittering, which offsets samples that share a com-mon value so that, we hope, each appears as a distinct point.

Investigation 6. Use the dot plot in Figure 1 to deduce the general structure of a box and whisker plot, giving particular attention to the position along the x-axis of the three vertical lines that make up the yellow box and the two vertical lines that make up the whiskers on either side of the yellow box. You might begin by tabulating the number of samples that fall to the left of the box, that fall within the box, including its boundaries, and that fall to the right of the box, and the number of samples that lie to the left and to the right of line inside the box.