Week 1- Chapter 1- STATISTCS
Nominal level of measurement
( not one better than the other: student ID number, zip code)
Quantitative Variables can be broken into 2 categories: Discrete and Continuous
****** Discrete Variable - a quantitative variable that takes on only whole numbers - Can't have any decimal values # of pets you own, # of classes you are taking ******Continuous Variable - a quantitative variable that can take on ANY numerical value - Measurement (height, weight, length) or time
Designs of an Experiment
*****Randomized Experiment - Experiments in which the participants are randomly assigned to participate in one condition or another - Which group they were placed in Aspirin or Placebo ******Matched-pairs design - where participants are paired so they are related in some way matched on similar characteristic measured twice under different circumstances ******Block Design - divide units into groups (blocks) and within each block randomly assign units to the different treatments
What is the difference between Stratified Random Sample and Cluster Sampling?
- stratified - divide into groups and take a certain number for a sample from each group - clusters - divide into groups and take the whole group (cluster) to sample
Blinding in an Experiment
Blinding in an Experiment _______Single Blind - an experiment in which either the participant or researcher does not know which treatment is being used _______Double Blind - an experiment in which both the participant AND research do not know which treatment is being used
Population or Sample?
Ex1: All adult Americans 2765 adult Americans who participated in a survey Ex2: US Census - is the population no sample
Examples of nominal
Nominal level: name, label, category EX: zip code, gender, eye color, political affiliation
Types of Statistics
Statistics can be divided into 2 types: Descriptive _ Statistics and Inferential statistics
Data is
The information referred to in the definition is data. Data are a "fact or proposition used to draw a conclusion or make a decision." Data describe characteristics of an individual.
Statistics is
The science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. In addition, statistics is about providing a measure of confidence in any conclusions.
Cluster Random sample
___________________- - the population is divided into groups, called clusters, and take a random samples of clusters and measure only those clusters - Survey in the dorms. There are 10 floors with 10 rooms on a floor. Each floor is a cluster. Randomly pick 3 floors and survey all 10 rooms on that floor.
Population EX: US
consists of all subjects (not necessarily human) that are being studied or are of interest
Inferential statistics - Take that number and predict something else
methods of making decisions or predictions about a population, based on data obtained from a sample of that population
Systematic sampling
sampling every kth subject - Lifetime of every 100th battery on an assembly line
Sample error is
the error that results because a sample is being used to create information about a population
Nonsample error is
the error that results from the process of obtaining the data
Experimental Study
the researcher manipulates one of the variables and tries to determine how the manipulation influences other variables.
Random Samples
selected by using chance methods or random numbers Put names in a hat and draw one out
Random variables are
variables whose values are determined by chance
Hawthorne
when participants in a study change their behavior when the treatment is not working or if they are in the control group.
Individual EX: 1 family in boulder
- a person or subject that is a member of the population being studied
Confounding/lurking variable
- a variable that influences the outcome of an experiment but cannot be separated from the explanatory variable
Cross sectional studies
- collects information about individuals at a specific point in time and over a very short period of time
Other Sampling Methods
*****Convenience Sample - a sample that is easy to obtain such that you use the most convenient group available or decide haphazardly on the spot who to sample - Unlikely to be representative of the population - Severe biases my result due to time and location of the interview and judgment of the interviewer about whom to interview
Advantages of stratified random sample
****Advantages 1. Data separated by strata 2. Lessens variability if strata are grouped together well
Steps in Conducting an Experiment
****Step 1: Identify the problem to be solved. - Should be explicit - Should provide the experimenter direction - Should identify the response variable and the population to be studied. - Often referred to as the claim.
Simple random sample
- A sample of size n from a population of size N is obtained through simple random sampling if every possible sample of size n has an equally likely chance of occurring. - The sample is then called a simple random sample.
interval level of measurement ( the different bw number is meaningful)
- A variable is at the interval level of measurement if it has the properties of the ordinal level of measurement and the differences in the values of the variable have meaning. A value of zero in the interval level of measurement does not mean the absence of the quantity. Arithmetic operations such as addition and subtraction can be performed on values of the variable.
Ratio level of measurement ( the zero is meaningful)
- A variable is at the ratio level of measurement if it has the properties of the interval level of measurement and the ratios of the values of the variable have meaning. A value of zero in the ratio level of measurement means the absence of the quantity. Arithmetic operations such as multiplication and division can be performed on the values of the variable.
Independent-Dependent variable
- Dependent (Response variable) - measures the outcome of interest - Independent (Explanatory) Variable - the variable that explains the response variable
Parameter or Statistic?
- Ex 3: A sample of 13 preschoolers was taken from the entire school. 11% of those preschoolers are on free and reduced lunch. ( statistic) - Ex 4: 45% of all Olympians who participated in the 2004 Olympics said they liked chocolate. ( parameter)
Examples
- Identify the explanatory and response variables in the following: - Smoking leads to lung cancer - Final grade in a class and amount of work completed + response: Final grade + explanatory : amount of work
Steps to use random numbers to select a simple random sample:
- Number the subjects in the sampling frame (list of your population), using numbers of the same length If you have 100 people in your sampling frame then you number your people 1 to 100, using numbers of length 3 ie 001, 002, ...,020,...099,100 - Select numbers of that length from a table of random numbers - Note: You can pick where you start in the table. IT DOESN'T MATTER - Ex. Using table on page 25 and starting on line (1) and column (1) you get the first 3 digit number to be 893, the second to be 922, the 3rd 321, the 4th 274, the 5th 483...
Statistic
- a characteristic of a sample - Suppose a sample of 250 students is obtained, and from this sample we find that 86.4% have a job.
Stratified Random Sample ( cont)
- divides the population into groups (strata) and then takes a simple random sample from each group - There are 100 dorm rooms total - ½ undergraduate and ½ graduate and we want to survey 30 of them - Divide into 2 groups - undergraduate and graduate - Take a sample of 15 from each
Qualitative ( categorical) variable
- if each observation belongs to one of a set of categories - Sex, Eye Color, Dominant Hand, Religion
Quantitative variable
- if observations take on numerical values that are either measurements or counts - Height, Weight, # of Classes being Taken, Hours of Sleep - We can find numerical summaries of quantitative variables such as the mean
Sample EX: boulder
- is a group of subjects selected from a population - subset - sample should be representative of the population
Parameter is
- is the whole population - a characteristic of a population - Suppose the percentage of all students on your campus who have a job is 84.9%.
Example of ordinal level of measurement
- name, label or category but that can be arranged in a rank order - ex: Grade (A, B, etc), Judging (1st, 2nd, etc), Rating Scale (poor, excellent, etc)
EX of Ratio level of measurement
- numerical and has a meaningful zero - ex: Height, weight, Time, Age, Salary
Example of interval level of measurement
- numerical and has no meaningful zero - EX: SAT Score, IQ, Temp, Shoe Size
Observations ( anything that we take the data on)
- the data values that we observe for a variable - These observations can either be a number, such as the height, or can be a category, such as yes/no
Observational Study
- the researcher merely observes what is happening or what has happened in the past and tries to draw conclusions based on these observations - ( old research is still observational data)
Sampling bias
- the technique used to obtain the sample's individuals tends to favor one part of the population over another; has undercoverage of a group - Wanting to sample men and only use East coast
Non response bias
- when individuals selected to be in the sample do not respond to the survey - Chosen individuals do not want to participate or they can't be reached
Response bias
- when the answers on a survey do not reflect the true feelings of the respondent - Interviewer error, misrepresented answers, wording of questions
Page 10 - Level of Measurement of a Variable
------ A variable is at the nominal level of measurement if the values of the variable name, label, or categorize. In addition, the naming scheme does not allow for the values of the variable to be arranged in a ranked, or specific, order. ******A variable is at the ordinal level of measurement if it has the properties of the nominal level of measurement and the naming scheme allows for the values of the variable to be arranged in a ranked, or specific, order.
Stratified random sample
--Strata - natural groups found within a population - Region in a country, political party, age - Note: Strata are bigger than clusters
A key aspect of data is that they vary
-height - hair color There is variability among individuals. There can be variability within an individual.
Steps in Systematic Sampling:
1) Estimate population size (N) 2) Determine sample size (n) 3) Find N/n. This is k. 4) Pick a starting point, p 5) Pick every kth value after this: p, p + k, p + 2k, ...
1.6 - The Design of Experiments
A researcher conducts an experiment by assigning subjects to certain experimental conditions and then observing outcomes on the response variable
Case control studies
A retrospective study - requires individuals to look back in time or require the researcher to look at existing records
Confounding
Confounding in a study occurs when the effects of two of more explanatory variables are not separated. Therefore, any relation that may exist bw an explanatory variable and the response variable may be due to some other variables not accounted for the study.
Variables
Variable - a characteristic or attribute that can assume different values and is recorded for subjects in a study.
Examples of descriptive or inferential statistics
a. In the year 2016, 148 million Americans will be enrolled in an HMO ( descriptive) b. Nine out of ten on-the-job fatalities are men ( inferential) c. Expenditures for the cable industry were $5.66 billion in 1996 ( descriptive) d. The median household income for people aged 25-34 is $35,888 ( Descriptive) e. Allergy therapy makes bees go away ( inferential) f. Drinking decaffeinated coffee can raise cholesterol levels by 7% ( inferential)
More Examples-------- Researcher Elisabeth Kvaavik and others studied factors that affect the eating habits of adults in their mid-thirties. (Source: Kvaavik E, et. al. Psychological explanatorys of eating habits among adults in their mid-30's (2005) International Journal of Behavioral Nutrition and Physical Activity (2)9.) Classify each of the following variables considered in the study as qualitative or quantitative. a. Nationality b. Number of children c. Household income in the previous year d. Level of education e. Daily intake of whole grains (measured in grams per day)
a. Nationality ( qualitative) b. Number of children ( quantitative ) c. Household income in the previous year ( quantitative) d. Level of education ( qualitative) e. Daily intake of whole grains (measured in grams per day) ( quantitative)
Example of discrete and continuous variable : Researcher Elisabeth Kvaavik and others studied factors that affect the eating habits of adults in their mid-thirties. (Source: Kvaavik E, et. al. Psychological explanatorys of eating habits among adults in their mid-30's (2005) International Journal of Behavioral Nutrition and Physical Activity (2)9.) Classify each of the following quantitative variables considered in the study as discrete or continuous. a. Number of children b. Household income in the previous year c. Daily intake of whole grains (measured in grams per day)
a. Number of children ( Discrete) b. Household income in the previous year ( continuous ) c. Daily intake of whole grains (measured in grams per day) ( continuous)
A lurking variable is
an explanatory variable that was not considered in a study but that affects the value of the response variable in the study. In addition, lurking variables are typically rt to explanatory variable in the study.
One goal of statistics is to
describe and understand sources of variability
Cohort studies
identifies a group to participate in a study and then observes this group over a period of time. Results are recorded and some individuals are exposed to different factors. This is considered a prospective study as it collects data over time.
Descriptive statistics - Number describe the data
methods for summarizing the data, which include graphs, tables, and numbers, such as the mean - Purpose - reduces the data to simple summaries without distorting or losing information
