George W. Wallace Library: QM Course Guide: Normal Distributions

The Normal Distribution

What is the normal distribution?

It is graphed data that is:

symmetrical
bell-shaped, and
single-peaked.

It looks like this.

https://www.allaboutcircuits.com/technical-articles/introduction-to-Gaussian-distribution-electrical-engineering/

Symmetry & Skewness

Perfectly normal distributions are symmetrical, and mirror data to the right and left of the mean / median / mode.

Some distributions are close to normal, but are not symmetrical, with more data points to the right or left of the mean. These distributions are skewed, and are not considered normal.

Left-skewed distribution

Right-skewed distribution

https://openstax.org/books/introductory-statistics/pages/2-6-skewness-and-the-mean-median-and-mode?query=symmetry&target=%7B%22type%22%3A%22search%22%2C%22index%22%3A0%7D#element-6520

Describing the normal distribution

We use both the mean and standard deviation to describe the normal distribution.

The mean is the average of the data set. In a perfectly normal distribution it will be the same as the mode and median.

But, see the graphs below? They could all have the same mean, but the spread of the data differs.

https://tinystats.github.io/teacups-giraffes-and-statistics/04_variance.html

So, we also use the standard deviation to describe the curve. The standard deviation is a measure of the variance, or how spread out the data are.

How do we calculate the standard deviation?

https://www.k2analytics.co.in/measures-of-dispersion-standard-deviation/

The Z-score / Standard score

The z-score tells us how far a value is from the mean, expressed in standard deviations. It can fall either above or below the mean.

The z-score is calculated with this formula:

The z-score of the mean is z = 0

The z-score of a data value 2.2

standard deviations above the mean is z = 2.2

https://mathbitsnotebook.com/Algebra2/Statistics/STzScores.html

The z-score of a data value - 0.5 standard deviations below the mean is z = - 0.5

Why do we use the z-score?

Once we know the z-score of any given value, the normal distribution allows us to find its percentile.

The Empirical Rule

The Empirical rule (also known as the 68-95-99.7 rule) allows us to determine how many of our data points fall 1, 2 or 3 standard deviations from the mean.

Mean +/- standard deviations	Percentage of data contained
1	68%
2	95%
3	99.7%

Examples of variables that are (approximately) normally distributed

The heights of female Champlain College students
Birth weights of babies
Your travel time(s) from house to school
How many Macintosh apples fit in a bushel basket

Why is the Normal Distribution important?

Most hypothesis testing (link to section on hypothesis testing) requires the use of the normal distribution to tell us if our test results are meaningful, using p-values of 0.05, 0.01 and 0.001 as common cut-offs for significance.

How do we know if a data set in SPSS is normally distributed? Is there a test in SPSS for that?

To visually assess if a distribution is approximately normal we can overlay a normal curve on a histogram.

Or, you can test for normality through using the Kolmogorov-Smirnov and Shapiro-Wilk tests in SPSS.

The Central Limit Theorem

Using the central limit theorem we can make inferences about a population from sample data.

Even if the data from a sample is not normally distributed, if you take enough samples from the same population (n>30), the sample means will be normally distributed.

This allows us to use the normal curve to draw conclusions, even if the individual sample data is not normally distributed.

https://slidetodoc.com/section-6-5-the-central-limit-theorem-learning/

Why is the Central Limit Theorem important?