Stats

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

what is the difference between a permutation and commutation

order - permutation you look at the order - commutation

on occasion coding does ranking such as

1= bachelor 2= master's 3= doctorate

Likert scale

a special case that is frequently used in survey research - usually its where a statement is made and the respondent is asked to indicate his or her agreement/disagreement ex. strongly agree, somewhat agree, neither agree nor disagree.... researchers believe the intervals between the numbers are the same ex. the distance from 1 to 2 is the same as the interval say from 3 to 4

column chart

a vertical display of data p 78

difference between a sample and population

census: an examination of all items in a defined population sample: looking only at some items selected from the population

mean absolute deviation (MAD)

an additional measure of dispersion that reveals the average distance from the center

probability density function (PDF)

an equation that shows the height of the curve f(x) at each possible value of X any continuous PDF must be nonnegative and the area under the entire PDF must be 1

subjective probability

based on informed opinion or judgement example: there is a 60% chance that that toronoto will bid for the 2024 Winter Olympics

trimmed mean

calculated like any other mean, except that the highest and lowest k percent of the observations in the sorted data array are removed

pie charts

can only money a general idea of the data because it is hard to assess areas precisely - the only correct way to use a pie chart is to portray data that sum to a total

if the outcome is a continuous measurement, the sample space....

cannot be listed but can be described by a rule

events that include more than one outcome from the sample space are known as

compound events

ordinary data

connote a ranking of data values ex: How often do you use microsoft access? 1. frequently 2. sometimes 3. rarely 4. never

data set

consists of all the values of all of the variables for all of the observations we have chosen to observe

target population

contains all of the individuals in which we are interested

empirical data

data collected through observations and experiments

left skewed

mean < median

bivariate data sets

two variables

random number

"pick" the name

what is normal?

- be measured on a continuous scale - possess a clear center - have only one peak (unimodal) - exhibit tapering tails - be symmetric about the mean (equal tails)

which of the following events are mutually exclusive? - being on time & being late for an appointment - passing a stats test & passing an english test - being of German descent & being of Mexican descent - rolling an odd # & even # on the same roll of a die

- being on time & being late for an appointment - rolling an odd number & an even # on the same roll of a die

the sum of the probability of all outcomes in the sample space is

1

place in order, from beginning to end, the steps to calculate the mean absolute deviation

1. calculate the arithmetic mean for the data set 2. find the absolute difference between each data set value and the mean 3. sum the absolute differences 4. divide by the sample (or the population) size

the highest possible probability, of the choices below for an event is: 1.1 1.0 2.0 0.99

1.0

pareto chart

A special type of column chart used in business - it displays categorical data, with categories displayed in descending order of frequency, so that the most common categories appear first - 80/20 rule holds true for many aspects of business and the majority of the chart is usually leaning one way or the other

mutually exclusive events

Events A and B are mutually exclusive (or disjoint) if their intersection is the null sell (o with line through it) which contains no elements

match the excel normal CDF to the explanation

NORM.S.DIST --> area to the left of a z score NORM.DIST --> area to the left of a given x value

match the excel normal function to the explanation

NORM.S.INV --> Z score for a given cumulative area NORM.INV --> X value for a given cumulative area

actuarial science

a career that involves estimating empirical probabilities - they help companies calculate payout rates on life insurance, pension plans, and health care plans

variable

a characteristic of the subject or individual, such as the employee's income or an invoice amount

bar chart

a horizontal display of data p. 78

systematic sampling

a method of random sampling to choose every nth item fro a sequence or list, starting from a randomly chosen entry among the first k items on the list ex. in book shows a bunch of x's and every 4th x is highlighted an attraction of systematic sampling is that it can be used with unlistable or infinite populations, such as production processes or political polling.

judgment sampling

a non-random sampling method that relies on the expertise of the sampler to choose items that are representative of the population. for example, to estimate the corporate spending on research and development in the medial equipment industry, we might ask an industry expert to select several "typical" firms

focus group

a panel of individuals chosen to be representative of a wider population

observation

a single member of a collection of items that we want to study, such as a person, firm, or region ex: an employee or an invoice mailed last month

What type of data are these? a. the manufacturer of your car b. your college major c. the number of college credits you are taking

a. categorical b. categorical c. discrete numerical

cluster samples

are taken from strata consisting or geographical regions. we divide a region (say a city) into subregions (say, blocks, subdivisions, or school districts) in one stage cluster sampling, our sample consists of all elements in each of k randomly chosen subregions (or clusters) in two stage cluster sampling, we first randomly select k subregions (clusters) and then chose a random sample of elements within each cluster cluster sampling is useful when: - population frame and status characteristics are not readily available - it is too expensive to obtain a sample or stratified sample - the cost of obtaining data increases sharply with distance - some loss of reliability is acceptable

cumulative distribution function (CDF)

denoted as f(x) and it shows the cumulative area to the left of a given value of X it is used for probabilities, while the PDF reveals the shape of the distribution

arithmetic scale

distances on the Y axis are proportional to the magnitude of the variable being displayed

statistics can help you handle

either too much or too little information

logarithmic scale

equal distances represent equal ratios (for this reason, a log scale is sometimes called a ratio scale). when data vary over a wide range, say, by more than an order of magnitude, we might prefer a log scale for the vertical axis, to reveal more detail for small data values. a log graph reveled whether the quantity is growing at an increasing percent (convex function) , constant percent (straight line), or declining percent (concave function) on a log scale, equal distances represent equal ratios log scale is useful for time series data that might grow rapidly ex. GDP, the national debt, or your future income

empirical probability

estimated from observed outcome frequency example: there is a 2% chance of twins in a randomly chosen birth

simple random sample

every item in the population of N items has the same chance of being chosen in the sample of n items a physical experiment to accomplish this would be to write each of the N data values on a poker chip, and then to draw n chips from a bowl after stirring it thoroughly

interval data

has meaningful intervals between scale points examples are the celsius or fahrenheit scales of temperature intervals between numbers represent *distances*

ratio data

have all of the properties of the other three data types, but in addition possess a *meaningful zero* that represents the absence of the quantity being measured we can record ratio measurements downward into original or nominal measurements (but not conversely) zero does not have o be observable in the data

time series data

if each observation in the sample represents a different equally spaced point in time (years, months, days), we have time series data its the periodicity is the time between observations

cross sectional data

if each observation represents a different individual unit (ex. a person, firm, geographic area) at the same point in time in cross sectional data we are interested in *variation among observations* or in *relationships*

non-random sampling

is less scientific than random sampling but is sometimes used for expediency

sampling error

it is uncontrollable random error that is inherent in any random sample. even when using a random sampling method, it is possible that the sample will contain unusual responses. this cannot be prevented and is generally undetectable. it is not an error on your part

random sampling

items are chosen by randomization or a chance procedure the idea of this is to produce a sample that is representative of the population

standard normal distribution

its mean is 0 and its standard deviation si 1, denoted N(0,1). the maximum height of f(z) is 0 (the mean) and its points of influencetion are at + or - 1 (the standard deviation) the shape of the distribution is unaffected by the z transformation

right skewed

mean > median

sampling without replacement

means that once an item has been selected to be included in the sample, it cannot be considered for the sample again

4 levels of measurement

nominal, ordinal, interval, and ratio

coverage error

occurs when some important segment of the target population is systematically missed. for example, a survey of notre dame university alumni will fail to represent non college graduates or those who attended public universities

nonresponse bias

occurs when those who respond have characteristics different from those who don't respond. for example, people with caller ID, answering machines, blocked, or unlisted numbers, or cell phones are likely to be missed in telephone surveys. since these are generally more affluent individuals, their socioeconomic class may be underrepresented in the poll

numerical data

or quantitative data arise from counting, measuring something or some kind of mathematical operation can be broken down into 2 types: 1. discrete: a variable with a countable number of distinct valued 2. continuous: a numerical value that can have any value within an interval (this would include things like physical measurements ex. distance, weight, speed, time or financial variables ex. sales, assets, inventory turns)

data

plural tense each column is a variable (m) and each row is an observation (n) n x m

stratified sampling

procedure where a random sample of the whole population could be taken, and then individual strata estimates could reduce cost per observation and narrow the error bounds

if an event is getting a letter grade of A in your stats class, what is the complement of receiving an A?

receiving any grade except an A

inferential statistics

refers to generalizing from a sample to a population, estimating unknown population parameters, drawing conclusions, and making decisions

measurement error

results when the survey questions do not accurately reveal the construct being accessed

selection bias

self-selected sample for example, a talk show host who invites viewers to take a web survey about their sex lives will attract plenty of respondents

binary variables

some categorial variables have two values which we call binary variables (usually uses 1 or 0) ex: 1= female 0=male or vice versa

many statisticians feel which two tables are better than a pie chart

table or bar chart (pie charts do appear daily in companies tho)

continuity correction

the 0.5 on normal approximation is called this

stacked column chart

the bar height is the sum of several subtotals pg. 79

sampling frame

the group from which we take the sample if the frame differed from the target population, then our estimates might not be accurate ex. names and addresses of all registered voters in Colorado Springs, Colorado

symmetric data

the mean and median are about the same

center

the middle or typical values of a distribution

mean

the most familiar statistical measure of center is the mean its the balancing point because it has the property that distances from the mean to the data points always sum to 0

sample space

the set of all possible outcomes

nominal measurement

the weakest level of measurement and the easiest to recognize. it identifies a category it is common to use OTHER as the last item on the list ex.: which cell phone service provider do you use? 1. At&t 2. sprint 3. tmobile 4. verizon 5. other

sampling with replacemetn

this means that the same random number could show up more than once

line chart

used to display a *time series data *, to spot trends, or to compare time periods. they can be used to display several variables at once. - usually has no vertical gridlines - numerical variable is shown on the Y axis - to avoid graph clutter, numerical labels are committed on a line chart usually

parameters and statistics

we use different symbols for each parameter and its corresponding statistic parameter: a measurement or characteristic of the population (ex. a mean or proportion). usually unknown since we can rarely observe the entire population. usually (but not always) represented by a greek letter (pi or upside down h) statistic: a numerical value calculated from a sample (ex. a mean or proportion). usually (but not always) represented by a Roman letter (ex. x with line on top or p)

interviewer error

when the interviewer's factual expressions, tone of voice, or appearance influences the responses

one characteristic of a well-defined probability density function of a continuous random variable X is that the are under the curve, f(x) over all values of x is

equal to one

using the multiplication rule, the joint probability of event A and event B is computed by multiplying the conditional probability of event A given event B by the probability of

event B

one condition of a well defined probability density function of a continuous random variable X is that f(x) is

greater than zero for all values of X

what is normal

must be: - continuous - possess a clear center - have only one peak - exhibit tapering tails - be symmetric

dichotomous (or binary) events

two mutually exclusive events, collectively exhaustive events; example: a car repair is either covered by the warranty (A) or is not covered by a warranty (A'). There can be more than two mutually exclusive collectively exhaustive events. For example, a Walmart customer can pay by credit card (A), debit card (B), cash (C), or check (D).

the _____________ of two events, A and B, contains all of the outcomes in either A or B, or both A and B

union

empirical or relative frequency approach

use this to assign probabilities by counting the *frequency* of observed outcomes defined on the experimental sample space example: to estimate the default rate on student loans: P(a student defaults)= fln= # of defaults/# of loans

combination formula

used to determine the # of different ways to arrange a group of x objects from a total of n objects and the order of the objects is irrelevant

an example of a random variable that closely follows the normal distribution is

weight of newborn babies

the normal distribution is the most extensively used distribution in statistical studies because

- many physical measurements have a bell-shaped distribution - economic and financial data often display bell-shaped distribution - it has important features used in sampling and estimation

standard deviations can be compared

- only for data sets with the same measurement units and similar magnitude - only for data sets with the same measurement units

which of the following statements about the variance of a continuous variable are true?

- the standard deviation is the square root of the variance - the variance is the weighted average of the squared deviations from the mean

consider rolling two dice. which of the following describe two events that are collectively exhaustive? - event 1: a value of 9 or more. event 2: a value of 7 or less - event 1: rolling an even #. event 2: rolling an odd number - event 1: a value of 7 or more. event 2: a value of 6 or less - event 1: a value of 6 or more. a value of 8 or less

all except the first one

categorial data

also called qualitative data, have variables that are described by words rather than numbers for ex: structural lumber can be classified by the lumber type (ex. fir, hemlock, pine), automobile styles can be classified by size (ex. full, midsize, compact, subcompact) and movies can be categorized using common movie classifications (ex. action and adventure, children and family, classics, comedy, documentary)

combination

an arrangement of r items chosen at random from n items where the order of the selected items is not important (ex, XYZ is the same as ZYX) a combination is denoted little n C little r

permuation

an arrangment *in a particular order* of r randomly sampled items from a group of n items and is denoted little n P little r in other words, how many ways can the r items be arranged from n items, treating each arrangement as different (ex., XYZ is different from ZYX)?

random experiment

an observational process whose results cannot be known in advance

event

any subset of outcomes in the sample space

which of the following statements from the empirical rule is correct?

approximately 95% of values fall within 2 standard deviations of the mean for data with bell shaped histogram

the CDP for a continuous random variable gives the cumulative ________ under the PDF to the left of x.

area

what is an example of a discrete random variable

binomial

union of two events

consists of all outcomes in the sample space S that are contained either in event A or in event B or in both The union of A and B is sometimes denoted A U B or "A or B" as illustrated in the Venn diagram. the symbol U may be read "or" since it means that either or both events occur.

it is meaningful to compute the probability that a continuous random variable is between 2 numbers, greater than or equal to a #, or less than or equal to a number. only thing that is not meaningful is

exactly equal to a number

true or false: a continuous random experiment can have a finite set of integer values

false

fundamental rule of counting

if event A can occur in n1 ways and event B can occur in n2 ways, then events A and B can occur in n1 x n2 ways in general, m events can occur n1 x n2 x..... x nm ways example: stock keeping labels: how many unique stock keeping unit (SKU) labels can a hardware store create by using two letters (ranging from AA to ZZ) followed by four numbers (0 through 9)?

special law of addition

in the case of mutually exclusive events, the addition law reduces to: P (A U B)= P(A) + P(B)

the ______________ of the two events A and B contains only those outcomes that are in both A and B

intersection

normal distribution

it is continuos (if you start to draw it you trace it out-theres no skips or missing pieces in the middle) it is symmetric it has a high point in the middle which is the highest point and that is the mean

What is not a characteristic of the midrange?

it is robust to outliers

classical probability

known from a priori by the nature of the experiment example: there is a 50% chance of heads on a coin flip

symmetrical

mean= median

standard deviation

measures the degree of variation in the data

multivariate data sets

more than two variables

Is an imperfect analysis survey better than no survey even if only 80 people participate out of 1000?

no, causation is not shown

coding

on occasion if the categorial variable might be represented using numbers which is called coding ex. 1= cash 2= check 3= credit/debit card 4= gift card

dependent events

when P(A) differs from P(A I B) dependent events may be causally related, but statistical dependence does not prove cause and effect. It only means that knowing that event B has occurred will affect the probability that event A will occur

independent events

when knowing that event B has occurred does not affect the probability that event A will occur in other words, event A is independent of event B if the conditional probability P(A I B) is the same as the unconditional probability P(A); that is if the probability of event A is the same whether event B occurs or not. For example, if text messaging among high school students is independent of gender, this means that knowing whether a student is male or female does not change the probability that the student uses text messaging

event A= {1, 2, 3, 4} and event B= {2, 3, 6, 7}. A U B=

{1, 2, 3, 4, 6, 7}

If the revenue over a four year period was $2000, $2000, $3000, and $5000, what si the geometric mean revenue? round answer to a whole number

$2783

If Fund A has a coefficient of variation of 1.1 and Fund B has a coefficient of variation of 0.9, which Fund has the greater relative dispersion?

A

probability of an event

a number that measures the relative likelihood that the event will occur the probability of event A, denoted P(A), must lie within the interval from 0 to 1: 0< or equal to P(A) < or equal to 1 P(A)=0 means the event cannot occur while P(A)=1 means the event is certain to occur in a discrete sample space, the probabilities of all simple events must sum to 1, since it is certain that one of them will occur P(S)= P(E1) + P(E2) + ...... P(En)= 1

simple or elementary events are....

a single outcome

random experiment

a trial or process that produces several possible outmodes that cannot be known in advance

true or false: under appropriate circumstances, many discrete random variables can be described by the normal distribution

true

univariate data sets

one variable

classical approach- what is a priori?

priori: the process of assigning probabilities before the event is observed or the experiment is conducted - priori probabilities are *based on logic*, not experience - when flipping a coin or rolling a pair of dice, we do not actually have to perform an experiment because the nature of the process allows us to envision the entire sample space - instead of performing the experiment, we can use deduction to determine the probability of an event - this is the classical approach to probability

descriptive statistics

refers to the collection, organization, presentation, and summary of data (either using charts and graphs or using a numerical summary)

subjective approach of probability

reflects someone's informed judgement about the likelihood of an event - used when there is no repeatable random experiment - for example, what is the probability that a new truck product program will show a return on investment of at least 10 percent? - what is the probability that the price of Ford's stock will rise within the next 30 days?

law of large numbers

says that as the number of trials increases, any empirical probability approaches its theoretical limit

standard normal distribution

shape: symmetric, mesokurtic, and bell-shaped domain: -infinity < z < + infinity mean: 0 standard deviation: 1

law of large numbers

states that as the # of trials increases, an empirical probability will approach the theoretical probability

general law of multiplication

states that the probability of the intersection of two events A and B is P(A upside down U B)= P(A I B)P(B)

general law of addition

states that the probability of the union of two events A and B is: P(A U B)= P(A) + P(B)- P(A upside down U B)

critical thinking

stats is an essential part of critical thinking because it allows us to test an idea against empirical evidence

where should tables go

tables should be embedded in the narrative (not on a separate page) near the paragraph in which they are cited

geometric mean

the appropriate measure to use when evaluating growth rates

the probability that a continuous random variable takes on a particular value is zero because why?

the area under a curve AT a certain point is zero

when comparing two data sets with different units of measurement, what is the relative measure of dispersion?

the coefficient of variation

complement

the complement is the A' on the outside of the circle (consists of everything in the sample space S except event A) the probability of a complement A is found by subtracting the probability of A from 1: P(A')= 1-P(A)

Event A is independent of event B if

the conditional probability P(A I B) is the same as the marginal probability P(A)

the function used to find the area under the f(X) of a continuous random variable X up to any value x is called

the cumulative distribution function or CDF

intersection of two events

the event consisting of all outcomes in the sample space S that are contained in both event A and event B the intersection of A and B is denoted A upside down U B or "A and B" as illustrated in the venn diagram. the probability of A upside down U B is called the *joint probability* and is denoted P(A upside down U B)

which of the following are examples of conditional probabilities? - if Neil has already purchased groceries, then the probability of Colleen purchasing groceries - the probability of Angel going to the movie, given that Derrick is going to the movie - the probability of Amir purchasing a video game or the probability of Natasha purchasing a video game - the probability of Marilyn going to the football game and Tom going to the football game

the first two

which of the following statements is true? - two data sets could have different means but the same standard deviation - two data sets could have the same mean but different standard deviations - if two data sets have the same mean they must have the same standard deviation - if two data sets have different means they must have different standard deviations

the first two

mode

the measure of center that identifies the most frequently occurring value in the data set

median

the measure of center where half of the data set lie above this measure and half the data set lie below the measure

post hoe fallacy

the mistaken conclusion that if A proceeds B then A is the cause of B

n factorial

the number of ways that n items can be arranged in a particular order n factorial is the product of all integers from 1 to n

conditional probability

the probability of an event given that another event has already occured

conditional probability

the probability of event A given that event B has occurred denoted P (A I B) the vertical line "I" is read as "given"

the normal distribution is asymptotic in the sense that

the tails gets closer and closer to the horizontal axis, but never touch it

the addition rule is used to calculate

the union of two events

events are collectively exhaustive if

their union is the entire sample space S

factorials

they are useful for counting the possible arrangements of any n items there are n ways for counting the possible arrangements of any n items there are n ways to choose the first, n-1 ways to choose the second, and so on a home appliance service truck must make 3 stops (A, B, C). in how many ways could the three stops be arranged? 3!= 3 x 2 x 1= 6


Ensembles d'études connexes

Evolution of Computers Digital Literacy

View Set

Mrs. King's Semester Exam: Essay Writing, MLA Formatting, and Grammar Topics

View Set

Test on the chemical bonding unit

View Set