STAT 201- Exam 1 TAMU

Ace your homework & exams now with Quizwiz!

experimental unit

a single object or individual to be measured

sample

a subset of the population of interest that we will collect data on

representative sample

a subset of the population that has the same characteristics as the population of interest

placebo

a treatment that cannot influence the response variable

population

all individuals, objects, or things whose properties are being studied

ethics

all planned studies that involve human participants must be approved in advance by the IRB (institutional review board)

systematic sample

randomly select a starting point and then take every n^th data piece after that, frequently chosen because it's a simple method

relative frequency

ratio (fraction or proportion) of the number of times a value of the data occurs in the set of all outcomes to the total number of outcomes.

statistical fraud

researchers have a responsibility to verify that proper methods are being followed

What type of sample? A completely random method is used to select 75 students. Each undergraduate student in the fall semester has the same probability of being chosen at any stage of the sampling process.

simple random sample

What type of sample? A sample of 100 undergraduate San Jose State students is taken by organizing the students' names by classification (freshman, sophomore, junior, senior), and then selecting 25 from each

stratified

What type of sample? A random number generator is used to select from the alphabetical listing of all undergraduate students in the Fall semester. Starting with that student, every 50th student is chosen until 75 students are included in the sample.

systematic

frequency table

table with data value and frequency

descriptive statistics

organizing and summarizing data

steps for constructing a box plot

- Obtain the Five Number Summary (Min, Q1, M, Q3, Max) - Compute IQR = (Q3 − Q1) - Draw the line from the Min to the Max - Draw a box from Q1 to Q3 - Draw a line in the box at Median M - Compute the outlier cutoffs: L = Q1 − 1.5 × IQR and U =Q3 +1.5×IQR - Erase the line from Min to L: any data in this region is a potential outlier, mark with an × or ◦ - Erase the line from U to Max: any data in this region is a potential outlier, mark with an × or ◦

observational study

- observes individuals and measures variables of interest - does not attempt to influence responses

percentile

- percentiles divide ordered data into hundredths; percentiles may or may not be part of the data - the kth percentile is the value such that k% of data is less than it, and (100-k)% of data is greater than it

Identify population, statistic, parameter, sample, variable, & data: We want to know the average (mean) amount of money spent on school uniforms each year by families with kids at Knoll Academy. We randomly survey 100 families with kids in the school. Three families spent $65, $75, and $95, respectively

- Population: families whose kids attend Knoll Academy - Statistic: mean amount of $ spent on uniforms - Parameter: price spent on school uniform - Sample: 100 families w/ kids at school - Variable: amount of $ spent - Data: $65, $75, $95...

example: On a 20 question math test, the 70th percentile for number of correct answers was 16. Interpret the 70th percentile in the context of this situation.

- Seventy percent of students answered 16 or fewer questions correctly. - Thirty percent of students answered 16 or more questions correctly. - A higher percentile is considered good in this context, as answering more questions correctly is desirable

IQR= Q3-Q1

- The IQR can help to determine potential outliers - A value is suspected to be a potential outlier if it is less than (1.5)(IQR) below the first quartile or more than (1.5)(IQR) above the third quartile.

example: On a timed math test, the first quartile for time it took to finish the exam was 35 minutes. Interpret the first quartile, in the context of this situation.

- Twenty-five percent of students finished the exam in 35 minutes or less. -Seventy-five percent of students finished the exam in 35 minutes or more. - A lower percentile is considered good in this context, as finishing more quickly on a timed exam is desirable. (If you take too long, you might not be able to finish.)

variable

- a characteristic or measurement that can be determined for each member of a population - 2 types

statistic

- a number that represents the property of interest in the sample - an estimate of the parameter value that we get looking only at the sample

problems with samples

- a sample must be representative of the population - a non-representative sample is biased - biased samples that are not representative of the population give results that are inaccurate and not valid

lurking variables

- a variable that has an important effect on the relationship among the variables in a study - is not one of the explanatory variables studied

outlier

- an observation of data that does not fit the rest of the data - some are due to mistakes while others may indicate that something unusual is happening - it may take some background info to explain outliers

stratified sample vs. cluster sample

- both divide population into groups - stratified samples include individuals in EVERY group - cluster samples include every individual in SOME groups

pie chart

- categories of data are represented by wedges in a circle and are proportional in size to the percent of individuals in each category - displaying parts of a whole, cannot add up to be more than 100%

non-response/refusal of subject to participate

- collected responses may no longer be representative of the populations - people with strong positive or negative opinions may answer surveys which can affect results

histogram

- consists of contiguous (adjoining) bars - horizontal axis is labeled with what the data represents - vertical axis is labeled either frequency or relative frequency (or percent frequency or probability) of each bin - graph will have the same shape with either label

sampling bias

- created when a sample is collected from a population and some members of the population are not as likely to be chosen as others - can cause incorrect conclusions drawn about the population that is being studied

variation

- data values will not always be the same for each element of the population - this difference is called variation

experiment

- deliberately imposes some treatment on individuals to measure their responses - purpose is to study whether the treatment causes a change in the response

random experiment- outcomes

- different outcomes measured in the response variable, therefore, must be a direct result of the different treatments - in this way, an experiment can prove a cause-and-effect connection between the explanatory and response variables

stem-and-leaf plot

- divide each observation of data into a stem and leaf - the leaf consists of a final significant digit - good choice for showing the distribution of a numerical variable when the data set is small

simple random sample (SRS)

- every member of the population has an equal chance of being chosen 1. give each member of the pop. a number 2. use a random number generator to select a set of labels 3. these randomly selected labels identify the members of your sample- use them to find out which members were sampled

how to find relative frequency

- frequency (number of times each data value occurs) / (total) - the number 2 occurs 3 times in a data set out of 20= 3/20 or 0.15

time series graph (line graph)

- given a paired data set, we start with the standard Cartesian coordinate system: horizontal axis is used to plot date/time increment, vertical axis is used to plot the values of the variable that we are measuring - by doing this, we make each point on the graph correspond to a date and a measured quantity - the point on the graph are connected by straight lines in the order in which they occur

misleading use of data

- improperly displayed graphs - incomplete data - lack of context

five number summary

- minimum (min) - first quartile (Q1) - Median (M) - Third Quartile (Q3) - Maximum (Max)

descriptive statistics

- numerical and graphical ways to describe and display your data - a graph is a tool that helps you learn about the shape or distribution of a sample/population - statisticians often graph data first to get a picture of the data then apply more formal tools

Identify the population, sample, experimental units, explanatory variable, treatments, and response variable: Researchers want to investigate whether taking aspirin regularly reduces the risk of heart attack. Four hundred men between the ages of 50 and 84 are recruited as participants. The men are divided randomly into two groups: one group will take aspirin, and the other group will take a placebo. Each man takes one pill each day for three years, but he does not know whether he is taking aspirin or the placebo. At the end of the study, researchers count the number of men in each group who have had heart attacks

- population: men aged 50-84 - sample: 400 men who participated - experimental units: individual men in study - explanatory variable: aspirin/oral medication - treatments: aspirin and placebo - response variable: whether a subject had a heart attack

causality

- relationship between two variables does not mean that one causes the other to occur - they may be related (correlated) b/c of their relationship through a different variable

bin

- represents a range of data - used when displaying large data sets - also classes or intervals - all bins have the same width

key protections mandated by law

- risks to participants must be minimized and reasonable with respect to projected benefits. - participants must give informed consent. this means that the risks of participation must be clearly explained to the subjects of the study. - subjects must consent in writing, and researchers are required to keep documentation of their consent - data collected from individuals must be guarded carefully to protect their privacy

sample size issues

- samples that are too small are often unreliable - the larger the better - small samples are sometimes unavoidable and can still be used to draw conclusions - ex: crash testing cars and medical testing for rare conditions

self-funded/self-interest studies

- study performed by a person or organization in order to support their claim - is the study impartial? Read the study carefully to evaluate the work - do not automatically assume that the study is good, but do not automatically assume the study is bad either - evaluate it on its merits and the work done

categorical/qualitative variable

- take on a label - nominal: no natural order to the variables (ethnicity) - ordinal: is a natural order to variables ( movie reviews)

numerical/quantitative variable

- take on a number value of units - discrete: measured in whole units (count data) - continuous: can be measured in decimals (time to finish a race)

sampling errors

- the actual process of sampling causes sampling errors - a sample will never be exactly representative of the population so there will always be some sampling error - as a rule, the larger the sample, the smaller the sampling error

sample mean

- the average in a set of data - used to estimate the population mean - add up all the data values and divide by the total number

quartile/percentile

- the median of the data is the second quartile or 50th percentile - the first and third quartiles are the 25th and 75th percentiles, respectively

interquartile range

- the range of the middle 50% of the data values - the IQR is found by subtracting the first quartile from the third quartile

control group

- the treatment group that receives the placebo - helps researchers balance the effects of being in an experiment with the effects of active treatments

confounding (lurking/explanatory)

- two variables are confounded when their effects on a response variable cannot be distinguished from each other - the confounded variables may be either explanatory variables or lurking variables

confounding

- when the effects of multiple factors on a response cannot be separated - makes it difficult or impossible to draw valid conclusions about the effect of each factor

box plot (box-and-whisker plot)

-graphical representation of the concentration of the data - shoes how far the extreme values are from most of the data

self-selected samples

-responses only by people who choose to respond, are often unreliable - ex: call-in surveys

randomized experiment- random treatments

-when subjects are assigned treatments randomly, all of the potential lurking variables are spread equally among the groups - at this point the only difference between groups is the one imposed by the researcher.

role of statistician

1. Design Studies 2. Analyze Data 3. Translate data into knowledge and understanding of the world around us

constructing a histogram

1. decide how many bins 2. calculate the width of each bin: (starting point-ending point)/ (number of bins) 3. calculate the bins (intervals): boundary= starting point + k * bin width (k= 0,...., number of bins + 1) 4. calculate bin frequencies (or relative frequencies) 5. draw the histogram, where the height of each bar is given by the frequency/relative frequency of its corresponding bin

Determine correct data type: 1. number of pairs of shoes you own 2. type of car you drive 3. distance from home to nearest grocery store 4. number of classes you take per school year 5. type of calculator you use 6. weights of sumo wrestlers 7. number of correct answers on a quiz 8. IQ scores

1. quantitative discrete 2. categorical 3. quantitative continuous 4. quantitative discrete 5. categorical 6. quantitative continuous 7. quantitative discrete 8. quantitative discrete

methods of random sampling

1. simple random sample (srs) 2. systematic sample 3. stratified sample 4. cluster sample

variation example

16-ounce cans of beverage may contain more or less than 16 ounces of liquid. - in one study, eight 16 ounce cans were measured and produced the following amount (in ounces) of beverage: 15.8; 16.1; 15.2; 14.8; 15.8; 15.9; 16.0; 15.5 - measurements of the amount of beverage in a 16-ounce can may vary because different people make the measurements or because the exact amount, 16 ounces of liquid, was not put into the cans

convenience sample example

A computer software store conducts a marketing study by interviewing potential customers who happen to be in the store browsing through the available software. The results of convenience sampling may be very good in some cases and highly biased (favor certain outcomes) in others.

frequency/frequency table example: Twenty students were asked how many hours they worked per day. Their responses, in hours, are as follows: 5,6,3,3,2,4,7,5,2,3,5,6,5,4,4,3,5,2,5,3 Make a frequency table.

Data Value 2 3 4 5 6 7 Frequency 3 5 3 6 2 1

cluster sample example

If you randomly sample four departments from your college population, the four departments make up the cluster sample. Divide your college faculty by department. The departments are the clusters. Number each department, and then choose four different numbers using simple random sampling. All members of the four departments with those numbers are the cluster sample.

stratified sample example

You could stratify (group) your college population by department and then choose a proportionate simple random sample from each stratum (each department) to get a stratified random sample. Suppose there are 5 departments, and you want to choose a sample of 50. You choose 10 from each department using SRS sampling.

distribution

a listing or function showing all the possible values (or intervals) of the data and how often they occur

parameter

a number that is used to represent a characteristic of the population and that generally cannot be determined easily

bar graph vs. histogram

bar graph: - used to visualize distribution for categorical variables - number of bars cannot be changed ( it's the # of categories) histogram: - used to visualize distribution for quantitative variables - number of bars/bins is your choice

double-blinding

both the subjects and the researchers involved with the subjects are blinded

cause-and-effect in observational studies

cause-and-effect relationships usually cannot be drawn, since lurking variables confound the effects that one variable may have on another variable

non-sampling errors

caused by factors not related to the sampling process

What type of sample? The freshman, sophomore, junior, and senior years are numbers one, two, three, and four, respectively. A random number generator is used to pick two of those years. All students in those two years are in the sample.

cluster

undue influence

collecting data or asking questions in a way that influences the response

What type of sample? An administrative assistant is asked to stand in front of the library one Wednesday and to ask the first 100 undergraduate students he encounters what they paid for tuition the Fall semester. Those 100 students are the sample.

convenience

frequency

count number of times each data value occurs

quartile

divide ordered data into quarters; quartiles may or may not be part of the data

inferential statistics

formal methods for drawing conclusions from data

explanatory and response variable relationship

in an experiment, the researcher manipulates values of the explanatory variable and measures the resulting changes in the response variable

randomized experiment- proving change

in order to prove that the explanatory variable is causing a change in the response variable, it is necessary to isolate the explanatory variable.

formula for finding a quartile

k= the quartile (1, 2 or 3) or (25, 50, 75) i= the index (position of a data value) n= total number of data 1. order data from smallest to largest 2. calculate index i= k/4(n+1) if using 1,2,3 or i= k/100(n+1) if using 25,50,75 3. find the value that corresponds to index i: - if i is a whole number, find the ith data value in the ordered list - if i is a decimal, take the two indices closest to i and average their corresponding data values

probability

mathematical study of uncertainty- foundation of statistics

convenience sample

non-random and involves using results that are readily available

sampling with replacement

once a member is picked, that member goes back into the population and thus may be chosen more than once

sampling without replacement

once a member is picked, that member goes out of the pool and thus cannot be chosen more than once ( only becomes mathematical issue when population is small)

response variable (experimental design)

the affected variable is called the response variable

treatments (experimental design)

the different values of the explanatory variable

bar graph

the length of the bar for each category is proportional to the number or percent of individuals in each category. bars may be vertical or horizontal

blinding

the subjects do not know who is receiving the active treatments and who is receiving the placebo treatment

describing data

to describe the distribution of data, look for an overall pattern and any outliers

cluster sample

we divide the population into clusters (groups) and then randomly select some of the clusters. all members from these clusters are in the cluster sample.

stratified sample

we divide the population into groups called strata and then take a SRS from each stratum

parameter is to population as statistic is to sample

we use the statistic, calculated from data obtained from a sample, in order to estimate the parameter which is unknown 1. Have parameter that you are interested in learning about for some population 2. Obtain a sample from the pop. 3. Gather data on that sample 4. Calculate a statistic 5. Use the statistic as an estimate of the parameter

explanatory variable (experimental design)

when one variable causes change in another, we call the first variable the explanatory variable

placebo effect

when participation in a study prompts a physical response from a participant, the experimenter must take further steps to isolate the effects of the explanatory variable.

interpreting percentiles, quartiles, and median

when writing the interpretation of a percentile in the context of the given data, the following information should be included: - information about the context of the situation being considered - the data value that represents the percentile - the percent of individuals below the percentile - the percent of individuals above the percentile

systematic sample example

you have to do a phone survey. your phone book contains 20,000 residence listings. you must choose 400 names for the sample. number the population 120,000 and then use a simple random sample to pick a number that represents the first name in the sample. then choose every fiftieth name thereafter until you have a total of 400 names (you might have to go back to the beginning of your phone list).

Example: Consider this data set on house prices: 114,950; 158,000; 230,500; 387,000; 389,950; 479,000; 488,800; 529,000; 575,000; 639,000; 659,000; 1,095,000; 5,500,000 Are there any outliers?

• Q1 = 194,250 • Q3 = 649,000 • IQR = 454,750 • Lower Cutoff: L = 194, 250 − 454750 × 1.5 = −487, 875 • Upper Cutoff: U=659,000+454750×1.5=1,341,125 ∴ The 5.5M house is an outlier.

See all study sets

STAT 201- Exam 1 TAMU

Related study sets

AOP Chemistry: Measurement and Analysis - Quiz 2 - Precision, Significant Figures, and Scientific Notation

BCIS - Exam 1 Review

BUS 421 Ch. 11 Data Visualization

143 Module 2 - Upper Respiratory Tract Disorders (PRACTICE QUESTIONS)

Western Civilization 2 CLEP

Access Ch. 4 MC Questions

2101 Test 3 - Conflict Resolution & Psychosocial

Decreasing Behavior

CH 4 AP GOPO

Lesson 4: The Spiritual Self

BEP Quiz 2

Data Analytics Course 2 Week 1

Disaster

OMIS ch 13

3rd

Accounting Exam 2: Chapters 4,5 Old exam Barrett

ITSY-1300 Chapter 8 - Cryptography

Intro to business chapters 6,7,8,9 exam

Psyc 598

P.6 / 3. Key words ; Leading statements 64