Skip to Main Content
Main site homepage

QM Course Guide

Describing Data with Numbers

The goal of collecting data is to find out something about the characteristics of a sample or a population.

To analyze the data we collect, we always follow the same 3-step strategy:

  1. Make a graph

Choose best graph based on level of measurement

  1. Identify patterns and deviations

Look at:

  • shape
  • center
  • spread
  1. Choose a numerical summary

Use a few numbers to describe:

  • measures of center

(mean, median, mode)

  • measures of spread

(standard deviation, five-number summary)

Looking at a graph gives us a quick idea of the distribution of values in the set—a graph “summarizes” or “describes” the data. 

In this section we will look at Step 3:

  • how to “summarize” or “describe” a data set using numbers

Tip: Before choosing the best numerical measures of center and variation, we

  • know the variable’s level of measurement
  • examine the graph of the data

Numerical Summaries

Graphs help us see the patterns in data, but we often prefer to use one number, or a numerical summary, to describe a data set.

There are two primary numerical measures of a data set:

  • Measures of central tendency (or center): where most of the data points cluster
  • Measures of spread (or variation, or dispersion)

Some of the most commonly used measures of center and spread are outlined in this video.

Measures of Central Tendency (or Center)

The center of a data set, or the “average,” is where most values cluster. It is also called a measure of center. You can also think of it as representing the “typical” value of a data set.

There are three common measures of central tendency: Mean, Median, Mode. 

Measures of central tendency

Measure

Description

How to calculate

*When to use it

Mode

Most frequently occurring value

How To Calculate The Mode

Nominal data

Median

Midpoint of an ordered data set

M=n+12

How To Calculate The Median

Ordinal data

Quantitative data

That’s skewed or with outliers

Mean

*most common

Arithmetic average

x=nn

How To Calculate The Mean

Quantitative data

with symmetric distribution

Choosing the best measure of center

To choose the best measure of center for a distribution, follow Steps 1 and 2:  graph your data and look at it’s shape.

  • Skewed distribution? Use the median.
  • Outliers? Use the median.
  • Symmetric distribution? Use the mean.

This brief video shows the differences between the three measures of center.

 

Measures of Variation (or Spread)

Variation tells us far from the data values are spread from the center.

There are a few measures of spread for quantitative data: range, the five number summary (quartiles, minimum and maximum), and standard deviation.

Measures of spread or variation

Measure

Description

How to calculate

*When to use it

Range

Difference between highest and lowest value

How to calculate the range

Percentiles

Values that divide the data into 100 equal groups

How to calculate a percentile

 

Quartiles

Values that divide distribution into 4 equal groups

The Inter-quartile range (IQR) describes the middle 50% of data

How to calculate quartiles

When distribution is

skewed or with outliers

*Use with median

Standard Deviation

The average distance that observations are spread from the mean

How to calculate the standard deviation

Only when distribution is symmetric

*Use with mean

Choosing the best measure of spread

To choose the best measure of spread for a distribution, follow Steps 1 and 2:  graph your data and look at it’s shape.

  • Skewed distribution? Use the five-number summary.
  • Outliers? Use the five-number summary. 
  • Symmetric distribution? Use standard deviation

This video describes three common measures of spread.

 

Outliers Impact Measures of Center and Spread

Outliers have a large impact on the mean, by pulling the mean in the direction of the outlier.  This results in a number that may be far from where most of the data points cluster.

When there are outliers, always use the median as the measure of central tendency. The median is less influenced by an outlier.

Outliers also impact the measure of spread. When there are outliers, the variation is larger.

When there are outliers, use the five-number summary as the measure of spread.

In this video we will be talking about the effects of outliers on spread and centre.

The Five Number Summary and Boxplots

When a data distribution has outliers or is skewed, the mean (measure of center) and standard deviation (measure of spread) are not accurate summaries of the data set. 

The five-number summary is used to numerically summarize a data set when there are outliers or a skewed distribution.

The five-number summary uses five numbers to summarize a data set.  The numbers are listed from smallest to largest:

Minimum value

First quartile

Median

Third quartile

Maximum

A boxplot is the visual display that is used to show the five-number summary.

How to create and interpret and the five-number summary (Watch until 3:31)

Here is a brief example on calculating the five-number summary and drawing a box plot.

How to Choose the Best Numerical Summary for a Quantitative Distribution

Shape of the distribution

best numerical descriptor

numbers used

best graph

skewed or outliers?

The Five Number Summary

Min, Q1, Med, Q3, Max

Boxplot

relatively symmetric?

Mean & Standard Deviation

, s

Histogram

This video shows the reasoning behind choosing either the mean and standard deviation or the five-number summary