George W. Wallace Library: QM Course Guide: Basic Concepts in Statistics

Basic Concepts in Statistics

This video gives an introduction to some of the basic ideas you need to get started.

Descriptive and Inferential Statistics

There are two major fields in statistics. QM starts with descriptive statistics then moves to inferential statistics.

Statistics

Descriptive statistics

Inferential statistics

summarizes (describes) data

from a sample

using graphs and numbers

draws a conclusion (an inference)

about a population

based on data from a sample

Examples:

bar graphs

histogram

mean and median

standard deviation

Examples:

confidence intervals

hypothesis tests

Populations and Samples

By collecting information from a representative sample, we can draw conclusions about a population.

Population:

the entire group of individuals we want information about
the complete set

Sample:

the specific part of the group we get information from
a subset

https://www.omniconvert.com/what-is/sample-size/

Population parameter:

a number that describes a characteristic of a population
the number if fixed, but usually unknown

Sample statistic:

a number that describes the characteristic of a sample
we know what the number is, but it varies by sample

Variables, Types of Data, and Levels of Measurement

A variable’s level of measurement determines the type of graphs, measures of center and spread, and statistical tests that can be conducted.

An easy way to remember the levels of measurement is using the acronym N.O.I.R

Four levels of measurement

Qualitative/Categorical

can only be grouped

Quantitative/Numerical

can be measured

Nominal

Ordinal

Interval

Ratio

Categories with

no logical order

Categories with

a logical order

Zero is arbitrary

Zero actually means none

marital status

hair color

gender

blood type

level of satisfaction (very low to very high)

education level (secondary to PhD)

temperature (F° or C°)

SAT scores

dates

grade point average

age

height

income

cost of data plan

Sampling Methods and Techniques

Reliable data comes from a sample of individuals that accurately represents the population of interest.

Biased sampling methods (like convenience sampling or voluntary response sampling) may not produce a representative sample because some part of the population may be underrepresented or overrepresented.

To get unbiased samples, we choose a random sampling method, based on probability, that gives a representative, unbiased sample.

In this video we will be looking at the different methods of obtaining a sample.

adapted from: https://www.scribbr.com/methodology/sampling-methods/ and http://web.colby.edu/jawieczo/files/2020/01/AmherstTalk_2020_01_24.pdf

Sampling and non-sampling error

A sample statistic will never be a perfect representation of the population parameter—it is always an estimate.

There are two types of errors of possible errors: sampling and non-sampling.

We can measure sampling error by using the margin of error, or, how many points your sample statistic may differ from the true population parameter.

*Errors when using a sample statistic to estimate* a population parameter**
Sampling errors		Non-sampling errors
cause	how to reduce	possible causes	how to reduce
the fact that we only observe a part of the population	increase the sample size use good sampling method	non-coverage response error non-response	use all parts of population construct clear questions contact respondents multiple times

This video summarizes why we use a probability sample and take a large sample to reduce sampling error.

Types of Statistical Studies

Experiments and observational studies are the main two types of statistical studies that social science researchers use.

In observational studies, we record the traits of individuals—we do not want to change their beliefs or behaviors. We want to describe a group or explore relationships between variables.

In experiments, we deliberately change a variable to see how it impacts other variables. We want to establish a cause-and-effect relationship.

Types of statistical studies
Observational studies			Experiments
Researcher: records characteristics of individuals with no intention of changing their beliefs or behaviors			Researcher: intentionally manipulates one variable (the treatment) to see how it affects other variables
Good for: describing populations looking for association between variables			Good for: establishing cause-and-effect relationships
There are three types of observational studies.
Survey	Census	Case study
Gathers data from: a sample of a population	Gathers data from: every individual in a population	Gathers data from: in-depth study of a few individuals

This video summarizes the main differences between an observational study and an experiment.

Experiments

The objective of an experiment is to determine if the change in one variable (explanatory variable) causes change in another variable (response variable). This explanatory variable is called the treatment.

All other factors must be controlled so we know that the it is only the explanatory variable that is causing the change in the response variable. So, we must eliminate any confounding, or lurking, variables that might also impact the response variable.

Image source : https://adata.site.wesleyan.edu/schedule-2/confounding-and-multivariate-models/

The placebo effect is another factor that can limit the effectiveness of a study. The placebo effect is when individuals believe that there they have experienced a change because they expected it by virtue of participating in the study.

To reduce the impact of confounding variables and the placebo effect, researchers do two things:

Create a control group, a group that does not receive the treatment and a treatment group that does
Randomly assign participants to the groups

In this video we will be talking about placebo effect, control groups and double-blind experiment.

Here is a diagram of a randomized, controlled experiment

https://introductorystats.wordpress.com/2011/03/09/design-of-experiments/

An experiment is considered to be the “gold standard” in research, but many social research questions can only be answered using an observational study. How do you decide?

Measurement Errors

Measurement error is the difference between the observed (measured) value and the true value.

Two types of measurement error
Random error	Systematic error
Difference based on chance	Difference that is consistent
The measurement fluctuates: sometimes it’s higher than the true value and sometimes it’s lower.	The measurement is always higher or lower than the true value.
High precision: an instrument repeatedly produces the same measurement	High accuracy: the instrument represents what it purports to measure
For better precision: take the average of repeated measures	For better accuracy: improve measurement instrument
https://www.scribbr.com/methodology/random-vs-systematic-error/

This video gives a clear example of the difference between accuracy and precision.

Other measurement errors in statistics

It’s important to recognize the types of measurement errors.

Measurement errors in statistics
Absolute error	Relative error	Percent error
Difference between the true value and the measured value	Size of the error relative to true value	Relative error shown as percentage
True value - measured value	Absolute error True value	Relative error x 100

This video shows how to calculate absolute change and relative change using percentages.

Statistics Canada also has a useful guide to using percentages in statistics.

How to Evaluate the Trustworthiness of Statistical Studies

For tips on how to decide if a study is trustworthy, have a look at this webpage, Factors to Consider When Evaluating Statistics

For example:

Collection Methods & Completeness

How are the data collected? Count, measurement or estimation?
Even a reputable source and collection method can introduce bias. Crime data come from many sources, from victim reports to arrest records.
If a survey, what was the total population -- how does that compare to the size of the population it is supposed to represent?
If a survey, what methods used to select the population included, how was the total population sampled?
If a survey, what was the response rate?
What populations included? Excluded?