It is graphed data that is:
It looks like this.
https://www.allaboutcircuits.com/technical-articles/introduction-to-Gaussian-distribution-electrical-engineering/
Symmetry & Skewness
Perfectly normal distributions are symmetrical, and mirror data to the right and left of the mean / median / mode.
Some distributions are close to normal, but are not symmetrical, with more data points to the right or left of the mean. These distributions are skewed, and are not considered normal.
Left-skewed distribution |
Right-skewed distribution |
https://openstax.org/books/introductory-statistics/pages/2-6-skewness-and-the-mean-median-and-mode?query=symmetry&target=%7B%22type%22%3A%22search%22%2C%22index%22%3A0%7D#element-6520
We use both the mean and standard deviation to describe the normal distribution.
The mean is the average of the data set. In a perfectly normal distribution it will be the same as the mode and median.
But, see the graphs below? They could all have the same mean, but the spread of the data differs.
https://tinystats.github.io/teacups-giraffes-and-statistics/04_variance.html
So, we also use the standard deviation to describe the curve. The standard deviation is a measure of the variance, or how spread out the data are.
https://www.k2analytics.co.in/measures-of-dispersion-standard-deviation/
The z-score tells us how far a value is from the mean, expressed in standard deviations. It can fall either above or below the mean.
The z-score is calculated with this formula:
The z-score of the mean is z = 0
The z-score of a data value 2.2
standard deviations above the mean is z = 2.2
https://mathbitsnotebook.com/Algebra2/Statistics/STzScores.html
The z-score of a data value - 0.5 standard deviations below the mean is z = - 0.5
Why do we use the z-score?
Once we know the z-score of any given value, the normal distribution allows us to find its percentile.
The Empirical rule (also known as the 68-95-99.7 rule) allows us to determine how many of our data points fall 1, 2 or 3 standard deviations from the mean.
Mean +/- standard deviations |
Percentage of data contained |
1 |
68% |
2 |
95% |
3 |
99.7% |
Examples of variables that are (approximately) normally distributed
Most hypothesis testing (link to section on hypothesis testing) requires the use of the normal distribution to tell us if our test results are meaningful, using p-values of 0.05, 0.01 and 0.001 as common cut-offs for significance.
How do we know if a data set in SPSS is normally distributed? Is there a test in SPSS for that?
To visually assess if a distribution is approximately normal we can overlay a normal curve on a histogram.
Or, you can test for normality through using the Kolmogorov-Smirnov and Shapiro-Wilk tests in SPSS.
Using the central limit theorem we can make inferences about a population from sample data.
Even if the data from a sample is not normally distributed, if you take enough samples from the same population (n>30), the sample means will be normally distributed.
This allows us to use the normal curve to draw conclusions, even if the individual sample data is not normally distributed.
https://slidetodoc.com/section-6-5-the-central-limit-theorem-learning/
Why is the Central Limit Theorem important?