The Nature of Probability and Statistics
The ____ are the values for a variable from measurements or observations.
data
I A collection of data values form a ___ ___.
data set
Each value in the data set is called a data value or ____.
datum
In ___ statistics, the statistician tries to describe a situation
descriptive
The boundaries of a continuous variable are given in one additional decimal place and always end with the digit __.
5
What are the two areas of statistics
- Descriptive statistics - Inferential statistics
Measurement or Response bias
- Occurs when the method of observation tends to produce values that differ from the true value in some way. - Improperly calibrated scale - Tendency to not be completely honest - Questions are worded in a way that tends to influence response
Convenience Sample
- Subjects are easily available / convenient to form a sample
Why study statistics?
- To be informed.. - To make informed judgments - To evaluate decisions that affect your life
Qualitative variable
- are variables that have distinct categories according to some characteristic or attribute - gender, religion, geographic location
What are the three main types of observational studies?
- cross-sectional study - retrospective study - longitudinal study
- A ___ of size n is selected from the population in a way that ensures that every different possible ___ of the desired size has the same chance of being selected. - All subjects of the population have an equal chance of being selected. - Another way to select a simple random ___ is to create a list of all the subjects in the population called a sampling frame, and randomly selecting subjects by use of some random numbers generator or random numbers table.
- sample -
For descriptive statistics, statisticians try to describe the situation. They do this by use of:
- tables - averages - graphs
What are the 4 Statistical Problem Solving step?
1. Formulate Questions 2. Collect Data 3. Analyze Data 4. Interpret Results
What are the four basic methods of sampling?
1. Simple Random Sampling 2. Systematic Sampling 3. Stratified Sampling 4. Cluster Sampling
The ___ level of measurement ranks data, precise differences between units of measure do exist; however, there is not meaningful zero - Examples: Temperature, shoe size, IQ
interval
Parameter
is a numerical summary of the population characteristic. Parameter is a fixed value/number but usually it is unknown. (μ - average age of all people on Guam)
confounding variable
is one that influences the dependent or outcome variable but was not separated from the independent variable.
Sampling Error
is the difference between the results obtained from a sample and the results obtained from a population from which the sample was selected.
population
is the entire collection of individuals or objects about which information is desired.
Theoretical population
is the population you would like to generalize your results to. And usually it is not the population that will be accessible to you.
Statistical Inference
is the process by which we acquire information about populations from samples, drawing conclusions based on data.
Bias
is the tendency for samples to differ from the corresponding population in some systematic way.
census
is when data is collected from every subject in the population.
True zero
means that a 0 data value indicates the absence of the object being measured.
The ____ level of measurement classifies data into categories in which no order or ranking can be imposed on the data. - Examples: Political affiliation, Religious affiliation, Marital Status, Zip Code,
nominal
boundary
of a number, then, is defined as a class in which a data value would be placed before the data value was rounded. -Boundaries are written for convenience as 72.5-73.5 but are understood to mean all values up to but not including 73.5 -for continuous data
The ____ level of measurement classifies data into categories that can be ranked; however, precise differences between the ranks do not exist. - Examples: Evaluation on a survey (poor, fair, good, excellent), pizza size (small, medium, large), letter grades (A, B, C, D, F)
ordinal
Inferential statistics uses
probability theory
Variables whose values are determined by chance are called ____ _____.
random variables
- The ___ level of measurement possesses all the characteristics of interval measurement, and there exists a true zero. - True ___ exists when the same variable is measured on two different members of the population - Examples: Height, weight, number of classes a student is enrolled in
ratio
- A ___ is a subset of the population. - Researchers use ___ to collect data and information about a particular variable from a large population. - Using ___ saves time and money
sample
- Simple random samples are selected from each stratum. - Strata are groups that are similar (homogeneous) based upon some characteristic group members
stratified
- A ___ sample is a sample obtained by selecting every k th member of the population where k is a counting number. - One of the first k individuals is selected at random. - Then every kth individual in the sequence is included in the sample. - This method works reasonably well as long as there are no repeating patterns in the population list. - Does not guarantee that the sample is representative of the population
systematic
Experimental Study
the researcher manipulates one of the variables and tries to determine how the manipulation influences other variables.
Observational Study
the researcher merely observes what is happening or what has happened in the past and tries to draw conclusions based on these observations
Statistics
the science of conducting studies to collect, organize, summarize, analyze and draw conclusions from data
A ______ is any characteristic whose value may change from one individual to another.
variable
Confounding variable
variable that influences the dependent/outcome/response variable
nonsampling error
which occurs when the data are obtained erroneously or the sample is biased, i.e., nonrepresentative
Observational Study
A study in which the researcher observes characteristics of a sample
Experimental study
A study in which the researcher observes how a response (dependent / outcome) variable behaves when one or more explanatory (independent) variables are manipulated
As much as possible, ___ the study
CONTROL
Selection bias
Can occur when the way the sample is selected excluded some part of the population of interest. - This is undercoverage.
- Can assume an infinite number of values in an interval between any two specific values. - Usually measurements of something. - Examples: temperature, time it takes you to get to school, height of an incoming freshman
Continuous
What are the other methods of sampling?
Convenience Sample and Volunteer or Self-selected Sample
This consists of the collection, organization, summarization and presentation of data
Descriptive statistics
- Assumes values that can be counted - These are isolated points along a number line - Examples: number of children in a family, number of classes you are enrolled in, number of significant others you have
Discrete
Quantitative / numerical variables can be further classified as
Discrete and Continuous
What are the two types of inferences?
Estimation and Hypothesis Testing
What are the disadvantages with the experimental study?
Hawthorne effect and Confounding variable
This consists of making generalizations from a sample to a population, performing hypothesis tests, determining relationships among variables, and making predictions
Inferential statistics
What type of data? - SAT score - IQ - Temperature
Interval-level data
- Numerical I - Variables that can be counted or measured I - Makes sense to find the average of these values I - Example: age, number of students enrolled in MA151, weight of books
Quantitative
What are the four common types of measurement scales?
Nominal, Ordinal, Interval, Ratio
What type of data? - Zip code - Gender (male, female) - Eye color (blue, brown, green, hazel) - Political affiliation - Religious affiliation - Major field (mathematics, computers, etc.) - Nationality
Nominal-level data
What are the two types of studies we will cover
Observational Study and Experimental Study
What type of data? - Grade (A, B, C, D, F) - Judging (first place, second place, etc.) - Rating scale (poor, good, excellent) - Ranking of tennis players
Ordinal-level data
Nonresponse bias
People are chosen but refuse to participate
- Categorical I - Variables that have distinct categories according to some characteristic or attribute I - Examples: your major in college, village you reside in, political affiliation
Qualitative
Variables can be classified as
Qualitative and Quantitative
What type of data? - Height - Weight - Time - Salary - Age
Ratio-level data
- A ___ sample is obtained by dividing the population into non-overlapping subgroups called clusters - Randomly select one or more cluster and using all subjects in the cluster(s) as the members of the sample - Often based upon location. - Best if clusters are heterogeneous subgroups from the population.
cluster
Volunteer or Self-selected Sample
Subjects volunteer or select themselves to be part of the sample
Example Problem 3 - What level of measurement would be used to measure each variable a. The ages of patients in a local hospital b. The ratings of movies released this month c. Colors of athletic shirts sold by Oak Park Health Club d. Temperatures of hot tubs in local health clubs
a. numerical, ratio b. categorical, ordinal c. categorical, nominal d. numerical, interval
Sampling frame
The listing of the accessible population from which you'll draw your sample
Determine whether descriptive or inferential statistics were used. a. The average jackpot for the top five lottery winners was $367.6 million. b. A study done by the American Academy of Neurology suggests that older people who had a high caloric diet more than doubled their risk of memory loss. c. Based on a survey of 9317 consumers done by the National Retail Federation. the average amount that consumers spent on Valentine's Day in 2011 was $116. d. Scientists at the University of Oxford in England found that a good laugh significantly raises a person's pain level tolerance.
a. Descriptive statistics b. Inferential statistics c. Descriptive statistics d. Inferential statistics
Example Problem 2 Classify each variable as a discrete or a continuous variable. a. The highest wind speed of a hurricane b. The weight of a baggage on an airplane c. The number of pages in a statistics book d. The amount of money a person spends per year for online purchases
a. Continuous b. Continuous c. Discrete d. Discrete
Quantitative variable
can be counted or measured - discrete (0,1,2,3,4) "number of _____" - continuous - infinite in between 2 numbers (fractions & decimals)