George W. Wallace Library: QM Course Guide: Visual Descriptions of Data

Organizing Data with Tables and Graphs

We gather data to tell us something about a population, but a spreadsheet full of raw data doesn’t tell us much.

To analyze the data we collect, we always follow the same 3-step strategy:
Make a graph	Choose best graph based on level of measurement
Identify patterns and deviations	Look at: shape center spread
Choose a numerical summary	Use a few numbers to describe: measures of center (mean, median, mode) measures of spread (standard deviation, five-number summary)

In this section, we look at the first two steps for distributions of single variables.

1. We choose the best table or graph to display the data.

2. We identify patterns and deviations in the data. (This helps us choose the best numerical summaries in Step 3.)

Tables and Graphs

Frequency Tables

A frequency distribution is one way to organize raw data.

It shows two things:

the categories of the variable
how many times (or the frequency) that value is recorded as a response.

This video shows how to construct and interpret a frequency table.

Types of graphs and their uses

The most common graphs for categorical variables are:

• pie charts

• bar graph

The most common graphs for quantitative variables are:

• histograms

• stemplots

This video gives an excellent overview of these graphs.

Choosing the Best Graph

It’s important to choose a graph that is appropriate for your data set.

Before you create a graph, identify the type of variable:

qualitative (categorical)
quantitative (numeric)

This video can help you chose an appropriate graph to display the distribution of your variable.

Graphs by Level of Measurement

Graphs for Categorical variables
Pie charts are good:	Bar charts are good:	Pareto charts are good:	Dot plots are good:
when there are just a few categories nominal variables	to compare frequencies between variables ordinal variables	to easily see largest and smallest frequencies nominal variables only	when you need to tally data by hand
https://ec.europa.eu/eurostat/web/products-eurostat-news/-/DDN-20180920-1	Moore, Statistics: Concepts and Controversies, 9e, 2017 by W. H. Freeman and Co	Moore, Statistics: Concepts and Controversies, 9e, 2017 by W. H. Freeman and Co	https://www.pinterest.co.uk/pin/406168460117709867/
How to construct a pie chart Create a pie chart in SPSS	How to construct a bar graph Create a Bar Chart in SPSS	How to construct a Pareto chart Create a Pareto chart in SPSS	How to construct a dot plot Create a dot plot in SPSS

Graphs for Quantitative variables

Histograms are good:

frequency polygons are good:

Stem-and-leaf plots are good:

Boxplots are good:

for large data sets
*most common graph for quantitative variables

to compare distributions

for small data sets
see details of distribution

for skewed distributions

Moore, Statistics: Concepts and Controversies, 9e, 2017 by W. H. Freeman and Co

https://courses.lumenlearning.com/introstats1/chapter/histograms-frequency-polygons-and-time-series-graphs/

Moore, Statistics: Concepts and Controversies, 9e, 2017 by W. H. Freeman and Co

https://www.onlinemath4all.com/analyzing-box-plots-worksheet.html

How to construct a histogram

Create a histogram in SPSS

How to construct a frequency polygon

Create a frequency polygon in SPSS

How to construct a stem and leaf plot

Create and interpret a stemplot in SPSS

How to construct a box plot

Create a boxplot in SPSS

Graphs to show change over time

Time-series graphs are good:

to show change over time

Moore, Statistics: Concepts and Controversies, 9e, 2017 by W. H. Freeman and Co

How to construct a time-series graph

Identifying Patterns and Deviations in a Graph

In Step 1, we choose the best graph to display the data.

Now, in Step 2, we identify patterns and deviations in the graph.

An outline that shows that shape, center, and spread constitute the data pattern; outliers are exceptions to the pattern.

https://courses.lumenlearning.com/wmopen-concepts-statistics/chapter/dotplots-2-of-2/

To find patterns and deviations, we look at:
shape	if the data distribution is relatively symmetric or not
center	where most of the data values cluster in the data distribution
variation	how far the values spread from the center in the data distribution (and, if there are outliers)

Shape of a distribution

To describe the shape of a distribution, look at:

number of modes
whether it is symmetric or skewed

Number of modes:

http://www.lynnschools.org/classrooms/english/faculty/documents/tim_serino/Printable_Assignments/24_notes__describing_quantitative_data.pdf

This video briefly describes how to identify whether distribution is symmetric or skewed.

Symmetric or skewed distribution?
Symmetric	Skewed left (negatively)	Skewed right (positively)
data values are evenly distributed around center of unimodal distribution ← →left and right hand sides of distribution show a mirror image	data values are more spread out on left side ←the tail goes to the left	data values are more spread out on right side the tail goes to the right→

mode, mean, and median are the same	outliers pull mean towards the left	outliers pull the mean to the right
		all images from Statistical Reasoning for Everyday Life, 5e

Center

The center is the location where most of the data values cluster in a distribution. Think about it as a “typical” value of the data set.

Spread (Variation)

Variation, or spread, describes how far the values are spread out from the center of the data distribution (and, if there are outliers)

In the picture below, you can see increasing variation in each image as you move from left to right. The center of the data stays the same, but the values get more spread out.

Small variation

Moderate variation

Large variation

https://www.spss-tutorials.com/standard-deviation/

Outliers

An outlier is a value in a data set that is either very high or very low when compared to the other values.

An outlier increases variation in a data set.

To find an outlier, we must first create a graph.

Tip: An outlier strongly affects the mean of a data set, but does not effect the median.

https://statisticsbyjim.com/basics/histograms/ https://online.stat.psu.edu/stat462/node/170/

How To Spot A Bad Graph

Sometimes, graphs may not present an accurate display of the data. This may be accidental or intentional.