We gather data to tell us something about a population, but a spreadsheet full of raw data doesn’t tell us much.
To analyze the data we collect, we always follow the same 3step strategy: 


Choose best graph based on level of measurement 

Look at:


Use a few numbers to describe:
(mean, median, mode)
(standard deviation, fivenumber summary) 
In this section, we look at the first two steps for distributions of single variables.
1. We choose the best table or graph to display the data.
2. We identify patterns and deviations in the data. (This helps us choose the best numerical summaries in Step 3.)
Frequency Tables
A frequency distribution is one way to organize raw data.
It shows two things:
This video shows how to construct and interpret a frequency table.
Types of graphs and their uses
The most common graphs for categorical variables are:
• pie charts
• bar graph
The most common graphs for quantitative variables are:
• histograms
• stemplots
This video gives an excellent overview of these graphs.
It’s important to choose a graph that is appropriate for your data set.
Before you create a graph, identify the type of variable:
This video can help you chose an appropriate graph to display the distribution of your variable.
Graphs for Categorical variables 

Pie charts are good: 
Bar charts are good: 
Pareto charts are good: 
Dot plots are good: 




https://ec.europa.eu/eurostat/web/productseurostatnews//DDN201809201 
Moore, Statistics: Concepts and Controversies, 9e, 2017 by W. H. Freeman and Co 
Moore, Statistics: Concepts and Controversies, 9e, 2017 by W. H. Freeman and Co 
https://www.pinterest.co.uk/pin/406168460117709867/ 

Graphs for Quantitative variables 

Histograms are good: 
frequency polygons are good: 
Stemandleaf plots are good: 
Boxplots are good: 




Moore, Statistics: Concepts and Controversies, 9e, 2017 by W. H. Freeman and Co 
https://courses.lumenlearning.com/introstats1/chapter/histogramsfrequencypolygonsandtimeseriesgraphs/ 
Moore, Statistics: Concepts and Controversies, 9e, 2017 by W. H. Freeman and Co 
https://www.onlinemath4all.com/analyzingboxplotsworksheet.html 

Graphs to show change over time 
Timeseries graphs are good: 

Moore, Statistics: Concepts and Controversies, 9e, 2017 by W. H. Freeman and Co 
In Step 1, we choose the best graph to display the data.
Now, in Step 2, we identify patterns and deviations in the graph.
https://courses.lumenlearning.com/wmopenconceptsstatistics/chapter/dotplots2of2/
To find patterns and deviations, we look at: 

shape 
if the data distribution is relatively symmetric or not 
center 
where most of the data values cluster in the data distribution 
variation 
how far the values spread from the center in the data distribution (and, if there are outliers) 
Shape of a distribution
To describe the shape of a distribution, look at:
Number of modes:
http://www.lynnschools.org/classrooms/english/faculty/documents/tim_serino/Printable_Assignments/24_notes__describing_quantitative_data.pdf
This video briefly describes how to identify whether distribution is symmetric or skewed.
Symmetric or skewed distribution? 

Symmetric 
Skewed left (negatively) 
Skewed right (positively) 
data values are evenly distributed around center of unimodal distribution
← →left and right hand sides of distribution show a mirror image 
data values are more spread out on left side
←the tail goes to the left 
data values are more spread out on right side
the tail goes to the right→




mode, mean, and median are the same 
outliers pull mean towards the left 
outliers pull the mean to the right 


all images from Statistical Reasoning for Everyday Life, 5e 
Center
The center is the location where most of the data values cluster in a distribution. Think about it as a “typical” value of the data set.
Spread (Variation)
Variation, or spread, describes how far the values are spread out from the center of the data distribution (and, if there are outliers)
In the picture below, you can see increasing variation in each image as you move from left to right. The center of the data stays the same, but the values get more spread out.
Small variation 
Moderate variation 
Large variation 



https://www.spsstutorials.com/standarddeviation/
Outliers
An outlier is a value in a data set that is either very high or very low when compared to the other values.
An outlier increases variation in a data set.
To find an outlier, we must first create a graph.
Tip: An outlier strongly affects the mean of a data set, but does not effect the median.
https://statisticsbyjim.com/basics/histograms/ https://online.stat.psu.edu/stat462/node/170/
Sometimes, graphs may not present an accurate display of the data. This may be accidental or intentional.