The goal of collecting data is to find out something about the characteristics of a sample or a population.
To analyze the data we collect, we always follow the same 3step strategy: 


Choose best graph based on level of measurement 

Look at:


Use a few numbers to describe:
(mean, median, mode)
(standard deviation, fivenumber summary) 
Looking at a graph gives us a quick idea of the distribution of values in the set—a graph “summarizes” or “describes” the data.
In this section we will look at Step 3:
Tip: Before choosing the best numerical measures of center and variation, we
Graphs help us see the patterns in data, but we often prefer to use one number, or a numerical summary, to describe a data set.
There are two primary numerical measures of a data set:
Some of the most commonly used measures of center and spread are outlined in this video.
The center of a data set, or the “average,” is where most values cluster. It is also called a measure of center. You can also think of it as representing the “typical” value of a data set.
There are three common measures of central tendency: Mean, Median, Mode.
Measures of central tendency 

Measure 
Description 
How to calculate 
*When to use it 
Mode 
Most frequently occurring value 
Nominal data 

Median 
Midpoint of an ordered data set 
Ordinal data
Quantitative data That’s skewed or with outliers 

Mean *most common 
Arithmetic average 
Quantitative data with symmetric distribution 
Choosing the best measure of center
To choose the best measure of center for a distribution, follow Steps 1 and 2: graph your data and look at it’s shape.
This brief video shows the differences between the three measures of center.
Variation tells us far from the data values are spread from the center.
There are a few measures of spread for quantitative data: range, the five number summary (quartiles, minimum and maximum), and standard deviation.
Measures of spread or variation 

Measure 
Description 
How to calculate 
*When to use it 
Range 
Difference between highest and lowest value 


Values that divide the data into 100 equal groups 


Quartiles 
Values that divide distribution into 4 equal groups The Interquartile range (IQR) describes the middle 50% of data 
When distribution is skewed or with outliers *Use with median 

Standard Deviation 
The average distance that observations are spread from the mean 
Only when distribution is symmetric *Use with mean 
Choosing the best measure of spread
To choose the best measure of spread for a distribution, follow Steps 1 and 2: graph your data and look at it’s shape.
This video describes three common measures of spread.
Outliers have a large impact on the mean, by pulling the mean in the direction of the outlier. This results in a number that may be far from where most of the data points cluster.
When there are outliers, always use the median as the measure of central tendency. The median is less influenced by an outlier.
Outliers also impact the measure of spread. When there are outliers, the variation is larger.
When there are outliers, use the fivenumber summary as the measure of spread.
In this video we will be talking about the effects of outliers on spread and centre.
When a data distribution has outliers or is skewed, the mean (measure of center) and standard deviation (measure of spread) are not accurate summaries of the data set.
The fivenumber summary is used to numerically summarize a data set when there are outliers or a skewed distribution.
The fivenumber summary uses five numbers to summarize a data set. The numbers are listed from smallest to largest:
Minimum value
First quartile
Median
Third quartile
Maximum
A boxplot is the visual display that is used to show the fivenumber summary.
How to create and interpret and the fivenumber summary (Watch until 3:31)
Here is a brief example on calculating the fivenumber summary and drawing a box plot.
Shape of the distribution 
best numerical descriptor 
numbers used 
best graph 
skewed or outliers? 
The Five Number Summary 
Min, Q1, Med, Q3, Max 
Boxplot 
relatively symmetric? 
Mean & Standard Deviation 
x̄ , s 
Histogram 
This video shows the reasoning behind choosing either the mean and standard deviation or the fivenumber summary