Statistics chapter 1-7

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Pareto charts

are bar charts that show the frequency of the categories that cause quality control problems Show quality problem categories in decreasing order • The most problematic categories are shown first

Bar Graphs

are more useful when you want to highlight the actual data values and when the classes combined don't form a whole

The Union of Events

of Events A and B represents the number of instances where either Event A or B occur or both events occur together chp 3 pg 27

Direct Observation or Focus Group Experiment Surveys or Questionnaires

primary data collections

Contingency tables

provide a format to display observations that have more than one value associated with them •Use rows and columns for separate variables to summarize the data efficiently

Scatter plots

provide a picture of the relationship between two data points that are paired together

sample correlation coefficient

rSMALLxy , measures both the strength and direction of the linear relationship between two variables ex: relationship between # of hours that students study and their exam score. CHP 3 PG 77

Standard Variation

real world measurement

class boundaries

represent the minimum and maximum values for each class

Population

represents all possible subjects that are of interest in a particular study

The mean of the binomial distribution

represents the long-term average number of successes to expect based on the number of trials conducted Proposition A: n = 10, p = 0.4, and q = 0.6 U= np =(10)(0.4)=4.0 out of ten randomly selected voters, on average 4 of them (40%) will support proposition A THIS EXAMPLE IS CHP 5 PG 25-28

The mean of the binomial distribution

represents the long-term average number of successes to expect based on the number of trials conducted Formula for Calculating the Mean of a Binomial Distribution U = np μ = The mean of the binomial distribution σ = The standard deviation of the binomial distribution n = The number of trials p = The probability of a success q = The probability of a failure chp 5 pg 24

sample correlation coefficient,

rxy , indicates both the strength and direction of the linear relationship between the independent and dependent variables •The values of r range from -1.0, a strong negative relationship, to +1.0, a strong positive relationship •When r = 0, there is no relationship between variables x and y chp 3 pg 81

The Coefficient of Variation Formula for the sample coefficient of variation:

s = the sample standard deviation x= the sample mean Formula for the population coefficient of variation O= the population standard deviation U = the population mean chp 3 pg 42

Quantitative Data

Described by numerical values: 1.Counted: Examples: • Number of Children • Defects per hour (Counted items) 2.Measured: Examples: • Weight • Voltage (Measured characteristics)

How many data values can be found in a specific interval?

Discrete Random Variables • a finite number of values within an interval and Continuous Random Variables • an infinite number of outcomes within an interval

Constructing a Box-and-Whiskers Plot

Draw a horizontal number line that spans the length of the data values •Draw a box above the number line extending from Q1 to Q3, with a center line at the median (Q2) •Whiskers extend from the central box to the highest and lowest values that are not outliers •If outliers exist in the data set, they are plotted with an asterisk above the number line chp 3 g 71

A discrete probability distribution meets the following conditions:

Each outcome in the distribution needs to be mutually exclusive with other outcomes in the distribution • The probability of each outcome, P(x), must be between 0 and 1 (inclusive): The sum of the probabilities for all the outcomes in the distribution must be 1 where n equals the total number of possible outcomes. chp 5 pg 9

Example: The Mean of Grouped Data

Example An online merchant has collected the following grouped data for the number of web pages viewed by a sample of its customers: The merchant would like to calculate the average number of viewed pages.

Mode example Example with numerical data: • Number of children per family in a sample of 24 families: 0,0,0,0,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,4,5

Example with numerical data: • Number of children per family in a sample of 24 families: 0,0,0,0,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,4,5 the mode is 2 out of 8 times because there are 8 twos

MEDIUM INDEX When the index point is an even whole number, the position of the median is halfway between the index point (i) and the next highest data point (the i + 1 position) When there are an even number of data values, the median is halfway between the two middle values

Example with sample of size n = 6: 145 157 170 182 204 209 The index number is i = 0.5(n) = 0.5(6) = 3 The index number is a whole number so the median value is halfway between the third and fourth values in the sorted data 145 157 170 182 204 209 median = 176 = (170 + 182)/2

The Central Limit Theorem

the average value of all possible sample means computed from all possible random samples of a given size from the population is equal to the population mean: The standard deviation of the sample means computed from all random samples of size n is equal to the population standard deviation divided by the square root of the sample size:

Statistics

the mathematical science that deals with the collection, analysis, and presentation of data, which can then be used as a basis for inference and induction

Mode example Example with categorical data cars: Toyota: 7 Acura :3 Ford: 5 BMW: 1 pretend there is bar graph

the mode is toyoya because it is the one with the most

The Range

Simplest measure of variation Difference between the highest value and the lowest value in a data set Range = Highest value - Lowest Value Example: 1, 2, 4, 4, 6, 8, 8, 8, 8, 9, 11, 11, 12, 13 Range = 13 - 1 = 12

Percentiles To find percentiles manually:

Sort the data from lowest to highest • Calculate the index point, i Where: p = the percentile of interest n = the number of data value If i is not a whole number, round i to the next whole number. The ith position represents our value of interest If i is a whole number, the midpoint between the ith and i + 1 position is our value of interests i is not the value of the percentile, it is the position of the percentile value in the ranked data FORMULA IS PN PG 59 CHP 3

Business Statistics

Statistics applied to the business world in an effort to improve people's decision making in fields such as marketing, operations, finance and human resources to name a few

Sampling and Nonsampling Errors

Statistics will vary from sample to sample • A sample statistic is not likely to be exactly equal to the population parameter, since only a portion of the population is in the sample Sampling error : is defined as the difference between the sample statistic and the population parameter. Because a statistic is based on just a portion of the population, it would be unreasonable to expect the sample mean and population mean to be the same.

Stratified vs. Cluster Sampling

Strata are defined with a common characteristic •Values have something in common, such as each student being a freshman • Strata tend to be homogeneous collections, each with a certain characteristic of interest Clusters are "mini-subsets" of the larger population • Tend to be a combination of various characteristics • Each cluster should be representative of the entire population

Surveys or Questionnaires

Subjects are asked to respond to questions or discuss attitudes Example: E-mail surveys to customers to assess service quality

Working with Grouped Data

Suppose data has already been summarized by a frequency distribution •The individual data values are no longer shown •Only grouped data is available To estimate the average for the frequency distribution: •Find the midpoint for each group (The midpoint is the halfway point in each group) •Use the midpoint as a representative value for that group

Example: Probability between two values

Suppose income is normally distributed for a group of workers, with μ = $45,000 and σ = $5,000 Find the probability that a randomly selected worker from this group has an income between $38,000 and $48,000

Advertising

Household surveys, TV viewing habits • Viewing habits

Bar charts

are a good tool for displaying qualitative data that have been organized in categories

The interquartile range, IQR,

describes the middle 50% of a range Find the IQR by subtracting the first quartile from the third quartile

Excel calculates percentiles using the PERCENTILE.EXC function:

=PERCENTILE.EXC(array, k) array = The data range of interest k = The percentile of interest between 0 and 1 inclusive

The Variance and Standard Deviation of Grouped Data

Formula for the Sample Variance: Grouped Data Formula for the Population Variance: Grouped Data pg 55

information

Analyzing the data can provide information for decision making Table 1.1 | Golf-Score Data (Did a new driver after 7/1 change the average golf score?) because after 7/1 the scores started to seem lowered the scores in june

Biased Sample

- a sample that does not represent the intended population • can lead to distorted findings • biased sampling can occur intentionally or unintentionally •results can be manipulated by how we ask questions and who is responding to them

Ideally, the number of classes in a frequency distribution should be between___ and _____

40-20

Rules for Classes for Grouped Data

1. Equal-size classes. All classes in the frequency distribution must be of equal width 2. Mutually exclusive classes. Class boundaries cannot overlap 3. Include all data values. Make sure all data values are accounted for in the total row of the frequency distribution 4. Avoid empty classes. It is undesirable for a histogram to display a class so narrow that there are no observations in it 5. Avoid open-ended classes (if possible). These violate the first rule of equal class sizes

A Poisson process has the following characteristics:

1. The experiment consists of counting the number (x) of occurrences of an event over a period of time, area, distance, or other type of measurement 2. the mean of the Poisson distribution (λ) has to be the same for each equal interval of measurement 3. the number of occurrences during one interval has to be independent of the number of occurrences in any other interval 4. the intervals defined in the Poisson process cannot overlap

The Characteristics of a Binomial Experiment

1.The experiment consists of a fixed number of trials, denoted by n 2.each trial has only two possible outcomes, a success or a failure 3.the probability of a success p and the probability of a failure q are constant throughout the experiment 4.each trial is independent of the other trials in the experiment A binominal probability distribution allows us to calculate the probability of a specific number of successes for a certain number of trails Examples of binomial settings Counting number of successes in a fixed number of trials •A survey response to a question is "yes I will buy" or "no I will not" •An electronic component is either defective or acceptable •New job applicants either accept an offer or reject it

frequency distribution

A _____________________ ____________________shows the number of data observations that fall into specific intervals •Graphically summarize information not readily observable by merely looking at data in a table ex: number of ipods sold so its a quantitive is the graph that has a bunch of numbers and qualitative which is frequency is when you count. ok how many 0=5 how many 1=8 how many 2's=14 and so on

Using a Poisson Distribution to Calculate the Probability of Arrivals

A common use of the Poisson distribution is to determine the probability of customer arrivals Example: On average, 12 customers per hour arrive at the bank drive-through window • assume these arrivals follow the Poisson distribution What is the probability that exactly 4 customers will arrive during the next 30 minutes? Answer: First adjust the average number of arrivals per hour to a 30-minute interval • If the bank averages 12 customers arriving per hour, it would average 6 customers every 30 minutes, so λ = 6.0 To find the probability that exactly 4 customers will arrive during the next 30 minutes, we use the formula below with x = 4: FORMULA SOLUTION IS CHP 5 PG 40

posterior probability

A conditional probability is also known as a _____________ ______________, which is a revision of the prior probability using additional information

Specific Discrete Probability Distributions

Binomial Poission Hypergeometric

Covariance Calculations

A positive value implies a positive linear relationship (as one variable increases, the second variable also tends to increase) • A negative covariance indicates a negative linear relationship (as one variable increases, the second variable tends to decrease) • A covariance close to zero indicates no relationship between the two variables

The Empirical Rule

According to the empirical rule, if a distribution follows a bell-shaped, symmetrical curve centered around the mean, we would expect: Approximately 68% of the values to fall within ± 1 standard deviations from the mean Approximately 95% of the values to fall within ± 2 standard deviations from the mean Approximately 99.7% of the values to fall within ± 3 standard deviations from the mean

Systematic Sampling

Advantages of systematic sampling: • Easy to do manually • Can avoid bias by not allowing judgment or convenience to affect the sample EX: bias toward selecting some students rather than others Disadvantages: • One concern about systematic sampling is periodicity, which is a pattern in the population that is consistent with the value of k • Example: Sampling every 8 hours might obtain values only from the beginning or end of a shift, which might not be representative of all values during the day

The Range Advantages: Disadvantages:

Advantages: • Easy to calculate and understand Disadvantages: •Only based on two numbers in the data set (Ignores the way in which data are distributed) • Sensitive to outliers: example: 1, 2, 4, 4, 6, 8, 8, 8, 8, 9, 11, 11, 12, 1000 1000-1=999

sample space

All the possible outcomes, or results, of an experiment • The sample space for our single-die experiment is {1, 2, 3, 4, 5, 6} chp 4 pg 6

simple event

An event with a single outcome in its most basic form that cannot be simplified • An example of a simple event is rolling a five with a single die

Calculating Probabilities for Normal Distributions Using Normal Probability Tables

Any normal distribution (with any mean and standard deviation combination) can be transformed into the standard normal distribution (z) Need to transform x units into z units • The resulting z value is called a z-score chp 6 pg 13 and 14

The Effect of the Sample Size on the Sampling Distribution

As the sample size increases •the standard error of the mean becomes smaller •which in turn reduces the sampling error CHP 7 PG 44

The Effect of the Sample Size on the Sampling Distribution

As the sample size n increases • The width of the interval from to becomes narrower • The sampling error decreases • So X gets closer in value to U If n = N, then the entire population is known, so the mean of that sample is the population mean, This is known as a census

Using Poisson Distributions to Approximate Binomial Distributions

Binomial probabilities can be calculated using the Poisson distribution when the following conditions are present: •When the number of trials, n, is greater than or equal to 20 and •When the probability of a success, p, is less than or equal to 0.05 • This approximation to the binomial is very close when the number of trials is large and the probability of success per trial is low, and can be easier to calculate by hand than the binomial formula FORMULA CHP 5 PF 47

Formula for the Poisson Probability Distribution

CHP 5 PG 36 x = The number of occurrences of interest over the interval λ = The mean number of occurrences over the interval e = 2.71828 P(x) = The probability of exactly x occurrences over the interval

Calculating Probabilities for a Hypergeometric Distribution Formula for the Mean of the Hypergeometric Distribution Formula for the Standard Deviation of the Hypergeometric Distribution

CHP 5 PG 57-62

Calculating Normal Probabilities Using Excel

CHP 6 PG 34-39

Formula for the exponential probability density function:

CHP 6 PG 46 A discrete random variable that follows the Poisson distribution with a mean equal to λ has a counterpart continuous random variable that follows the exponential distribution with a mean equal to μ = 1/ λ

Formula for the standard deviation of the Exponential Distribution:

CHP 6 PG 49

Calculating Exponential Probabilities Using Excel

CHP 6 PG 52-53

Formula for the Continuous Uniform Probability Density Function:

CHP 6 PG 56 where: a = Smallest allowable continuous random variable b = Largest allowable continuous random variable

Formula for the Uniform Cumulative Distribution Function:

CHP 6 PG 57 x1 = Lower endpoint of the interval of interest x2 = Upper endpoint of the interval of interest

continuous data

Can potentially take on any value, depending only on the ability to measure accurately • Often measured, fractional values are possible • thickness of an item • time required to complete a task • temperature of a solution • height, in inches In the whole foods (continuous) example in the book, there are infinite number of wait times within the interval 0-5 minutes. 3, 3.2, 4.5 minutes etc. Because we are measuring time on a continuous scale, the only limitation is the number of values within this 0-5 minute interval which is our measuring instrument's level of precisio

Methods of Assigning Probability

Classical Empirical subjective

Classical Probability Example

Classical probability assumes that each event in the sample space has the same likelihood of occurring (the chance of rolling a one is the same as rolling a two and so on. The set of events is collectively exhaustive if the sample space includes every possible simple event that can occur (grender)

Qualitative Data

Classified by descriptive terms to measure or classify something of interest Examples: • Marital Status • Political Party • Eye Color (Defined categories)

Nonprobability Sampling

Convenience

Contingency Tables with Probabilities

Convert table frequencies into probabilities by dividing each number in the table by the total number of observations

Finance and Economics

Data on income, credit risk, unemployment • Bank lending

Example: Using the Poisson Distribution to Approximate the Binomial Distribution

Example: 3% of the workers in a large factory are absent each day. From a random sample of 60 workers, what is the probability that exactly 1 worker is absent? Binomial distribution answer, with n = 60, p = .03, and x = 1: Example: 3% of the workers in a large factory are absent each day. From a random sample of 60 workers, what is the probability that exactly 1 worker is absent? The Poisson distribution approximation can be used since the conditions for the approximation are met: • The number of trials is n = 60, which is greater than or equal to 20 • The probability of a success is p = 0.03, which is less than or equal to 0.05 Poisson distribution approximation, with np = (60)(.03) = 1.8 : CHP 5 PG 48-50

Empirical Probabilityexample

Example: A survey of 400 new graduates asked how much they owed in student loans. The results are shown in the following table: a) What is the probability that a randomly selected graduate has between $5,000-$9,999 in student loans? b) What is the probability that a randomly selected graduate has $20,000 or more in student loans EXMAPLE IS ON CHP 4 PG 15

Using Normal Probability Tables EXAMPLE

Example: Finding the z or x value Suppose that μ = 12 and σ = 3 for a normal distribution Find the x value so that P(z ≤ x) = 0.95 1. Find the necessary z-score What z value is needed to include 95% of the area under the curve? Look in the body of the table for 0.9500 The value 0.9500 would be found in the 1.6 row and between the 0.04 and 0.05 columns. This means our point of interest is halfway between these columns at 1.6 + 0.045, or z = 1.645 CHP 6 PG 22-25

Uniform Probability Distributions example

Example: Suppose the temperature of a solution varies with a uniform distribution between 55 and 155 degrees What is the probability that the next measured temperature is between 70 and 90 degrees? The total area under the distribution must be 1.0, so if the width is 100 (155 degrees - 55 degrees), the height must be 0.01:

Calculating Exponential Probabilities Exponential Probability Distributions EXAMPLE

Example: The mean time between arrivals is 2 minutes What is the probability that the next arrival is within the next 3 minutes? Time between arrivals is exponentially distributed with mean time between arrivals of 2 minutes (30 per 60 minutes, on average)

Cluster Sampling example of cluster

Examples of clusters: •Individually boxed packages of bulk parts in a large delivery from a supplier •Individual cities where a new product is introduced • Customer account balances arranged in clusters by first letter of last name

Stratified Sampling

Examples of strata: • For an undergraduate population, strata could be class standing: Freshman, Sophomore, Junior, and Senior • For factory production, strata could be 1st shift, 2nd shift, and 3rd shift • For a population of workers, strata might be different age categories of workers Using stratified sampling helps insure that all classes, shifts, or ages are represented in the sample

Simple Random Sample EXCEL SHEET

Excel can be used to select a simple random sample: 1.Select Data > Data Analysis 2.In the Data Analysis dialog box, select Sampling and click OK 3.In the Sampling dialog box, click on the text box for Input Range: and select the desired range of cells 4.Select Random under Sampling Method 5.In the Number of Samples: text box, type the number desired for your sample size 6.Click on the text box for Output Range: select a cell from an empty area in your spreadsheet, and click OK Excel's random sampling tool uses sampling with replacement • This means that after a value from the population has been selected for the sample, the value is placed back into the population and can be chosen again for the same sample Sampling without replacement means that once a value from the population is selected for the sample, it is not returned to the population so that value cannot be chosen again

Systematic Sampling EXCEL

Excel can help with systematic sampling 1.Select Data > Data Analysis 2.Select Periodic in the Sampling dialog box 3.In the Period: text box, type in the value for k (which you must have already determined) 4.Click on the text box for Output Range:, and select a cell in an empty column, and click OK

Classical Probability Example

Experiment: Roll a die once Sample space = {1, 2, 3, 4, 5, 6} Define Event A as rolling a five • There are six possible outcomes in the sample space • Event A (rolling a five) can happen one way P(A) = 1/6 = 0.167, or a 16.7% probability • This is a Simple Probability: it represents the likelihood of a single (simple) event occurring by itself

Marketing Research

Focus group data, customer surveys • hotels

The Addition Rule

For mutually exclusive events, the addition rule states that the probability of two events occurring is simply the sum of their individual probabilities: P(A or B) = P(A) + P(B) If Events A and B are not mutually exclusive: P(A or B) = P(A) + P(B) - P(A and B) chp 4 pg 33

Independent and Dependent Events

Formula for Determining if Events A and B are independent P(A|B) =P(A) If P(A|B) ≠ P(A) then events A and B are not independent chp 4 pg

The Sampling Distribution of the Proportion with a Finite Population

If the ratio of n/N is greater than 5% and sampling is without replacement a finite population correction is needed When the population is small the proportion of the sample size to the population size, n/N, is large Small populations require an adjustment to the standard error of the mean calculation if the proportion n/N is greater than 5% and sampling is without replacement

Inferential Statistics

Making statements about a population by examining sample results

example with mean median and mode Prices for 5 homes have been collected House Prices: $2,000,000 500,000 300,000 100,000 100,000 Sum 3,000,00

Mean: ($3,000,000/5) = $600,000 Median: middle value of ranked data = $300,000 Mode: appears most often = $100,000

Measures of Association Between Two Variables

Measures of Association Between Two Variables=Sample Covariance AND Sample Correlation Coefficient

Specific Continuous Probability Distributions

Normal Exponential Uniform

Direct Observation or Focus Group

Observing subjects in their natural environment Example: Watching to see if drivers stop at a stop sign

2 k n where k = Number of classes n = Number of data points • Find the lowest value of k that satisfies the rule Suppose n = 50 2^5 = 32 < 50 (k = 5 is too small) 2^6 = 64 > 50 (k = 6 is a good choice)

One method to determine the number of classes in a frequency distribution is the rule

event

One or more outcomes of an experiment • The outcome, or outcomes, is a subset of the sample space • An example of an event is rolling a pair with two dice

the Poisson distribution table

Organized by values of λ, the average number of occurrences • The sum of the probabilities in a column for a particular value of λ is equal to 1 •One limitation of using Poisson tables is that you are restricted to using only the values of λ that are shown in the table CHP 5 PG 41-45

Sampling and Nonsampling Errors

Parameters : are values that describe some characteristic of a population, such as its mean or median Statistics: are values calculated from a sample, such as the sample's mean or median

Advantages: • collected by the person or organization who uses the data Disadvantages: • Can be expensive and time consuming to gather

Primary Data advantages and disadvantages

Basic Properties of a Probability

Probability Rule 1 • If P(A) = 1, then with certainty, Event A must occur • Ex: rolling a single six-sided die and observing 1,2,3,4,5,6 Probability Rule 2 • If P(A) = 0, then with certainty, Event A will not occur Probability Rule 3 • The probability of any event must range from 0 to 1 • Probabilities can never be negative or greater than 1. The probability that I will buy a pair of shoes next month could be 0 (0%) or 1 (100%). Not 1 (-100%) or 2 (200%). Probability Rule 4 • The sum of all the probabilities for the simple events in the sample space must be equal to 1 • refer to pg. 153 table 4.5 Probability Rule 5 • The complement to Event A is defined as all of the outcomes in the sample space that are not part of Event A. The complement is denoted as A' Probability of the complement of an event occurring is 100% minus the probability of the event itself occurring. • Page 154 cookie example P(A) + P(A' ) = 1 or P(A) = 1 - P(A' )

The Two Main Types of Data

Qualitative Data Quantitative Data

Two Main Types of Data and their Corresponding Levels

Qualitative-Nominal and Ordinal Quantitative-Interval and Ratio

Operations

Quality control, reliability, operate better • Cheez-it

Expressing z-Scores in Terms of x chp 3 pg 49

Question: For a symmetric bell shaped population with a mean of 20 and a standard deviation of 3, what interval will contain about 95% of all the values? Answer: About 95% of the values are within ± 2 standard deviations: About 95% of the values will fall between 14 and 26

Data

Raw facts or measurements of interest Table 1.1 | Golf-Score Data (Each individual value is considered a data point)

Sampling from a Population

Sampling from a Population=Probability Sampling AND Nonprobability Sampling

Advantages: • Readily available • Less expensive to collect Disadvantages: • No control over how the data was collected • Less reliable unless collected and recorded accurately

Secondary Data advantages and disadvantages

Systematic Sampling EXAMPLE

Select a systematic sample of size n = 30 from a population of N = 270 • From a list of all population values, choose every 9th value for the sample

Probability Sampling

Simple Random Systematic Stratified Cluster Resampling

Putting the Central Limit Theorem to Work

Suppose people drive an average of 12,000 miles per year with a standard deviation of 2,580 miles per year • What is the probability that a randomly selected driver will drive more than 12,500 miles? Suppose people drive an average of 12,000 miles per year with a standard deviation of 2,580 miles per year • What is the probability that a randomly selected driver will drive more than 12,500 miles? What is the probability that a randomly selected sample of 36 drivers will drive, on average, more than 12,500 miles? • Since n =36 we can apply the Central Limit Theorem, so will be normally distributed with mean and standard deviation:

Using the Central Limit Theorem to Test Claims

The CLT can be used to check the validity of claims made about a population parameter • Idea: Use a sampling distribution to see how unusual a sample result is, if the claim is true • If the sample result is very unusual, we conclude that the claim is not valid

The Multiplication Rule Formula for the multiplication rule for two independent events:

The Multiplication Rule Formula for the multiplication rule for two independent events: P(Aand B) = P(A)P(B) When multiple events are all independent, the probability of them all occurring is simply the product of their individual probabilities:

Primary Data Secondary Data

The Sources of Data

The Variance and Standard Deviation of a Discrete Probability Distribution

The Variance is a measure of the spread of the individual values around the mean of a data set σ2 = The variance of the discrete probability distribution xi = The value of the random variable for the i the outcome μ = The mean of the discrete probability distribution P(xi) = The probability that the i the i outcome will occur n = The number of outcomes in the distribution chp 5 pg 12 and 13

dependent variable/independent variable

The ______________ ______________, which is placed on the vertical axis of the scatter plot, is influenced by changes in the , ______________ ________________ which is placed on the horizontal axis

Characteristics of the Normal Probability Distribution

The distribution is bell-shaped and symmetrical around the mean •Because the shape of the distribution is symmetrical, the mean and median are the same value •Values near the mean, where the curve is the tallest, have a higher likelihood of occurring than values far from the mean, where the curve is shorter The total area under the curve is always equal to 1.0 Normal Probability Distributions f(x) x μ • Because the distribution is symmetrical around the mean, the area to the left of the mean equals 0.5, as does the area to the right of the mean • The left and right ends of the normal probability distribution extend indefinitely A distribution's mean (μ) and standard deviation (σ) completely describe its shape Changing μ shifts the distribution left or right Changing σ increases or decreases the spread CHP 6 PG 10 11

Measures of Association Between Two Variables

The goal of this section is to examine two descriptive statistics that measure the linear relationship between two variables

Example with sample of size n = 7: 21 27 27 28 34 45 50

The index number is i = 0.5(n) = 0.5(7) = 3.5 The index number is not a whole number so round up to i = 4 The median value is therefore in the fourth position of our sorted data which is the number 28 ps:The median is not sensitive to outliers 21 27 27 28 34 45 5000 • The median is still 28

Bias

The manner in which survey questions are asked can affect responses

The mean and standard deviation are useful when comparing two different distributions Example: Number of rings before a call is answered • Atlanta vs. Boston call centers

The mean and standard deviation are useful when comparing two different distributions Example: Number of rings before a call is answered • Atlanta vs. Boston call centers chp 5 pg 14

STRATA VS CLUSTERS

The members of a specific stratum all have something in common, such as being a freshmen. As a result, strata tend to be homogenous collections, each with a certain characteristic of interest. Clusters, on the other hand, are "mini-subsets" of the larger population and therefore tend to be a melting pot of various characteristics. For example, a particular classroom (cluster) could have a mixture of freshman, sophomores, juniors or seniors.

Using the Normal Distribution to Approximate the Binomial Distribution

The normal distribution can be used as an approximation to the binomial distribution •Normal probabilities are easy to look up in Appendix A, Tables 3 and 4 •Binomial probabilities are more difficult to calculate The normal distribution approximation can be used when the sample size is large enough so that np ≥ 5 and nq ≥ 5

joint probability

The probability of the intersection of two events is known as a

experiment

The process of measuring or observing an activity for the purpose of collecting data • An example is rolling a single six-sided die

The shape of the exponential distribution depends on the value λ

The shape of the exponential distribution depends on the value λ Compared to normal distributions: 1.The exponential distribution is right-skewed, not symmetrical 2.The shape is completely described by only one parameter, λ 3.The values for an exponential random variable cannot be negative CHP 6 PG 47

The Effect of the Sample Size on the Sampling Distribution

The shape of the population distribution will affect the shape of the sampling distribution, as will the size of the sample As the sample size gets large enough, the sampling distribution becomes almost normal regardless of shape of population CHP 7 PG 46

Use a standard normal probability table (Table 3 or Table 4 in Appendix A) to calculate normal probabilities

The table provides the cumulative area under a standard normal distribution curve that lies to the left of the z-score CHP 6 PG 21-23

Z-SCORE EXAMPLE FOR NORMAL DISTRIBUTION

The time customers spend on the phone for service follows the normal distribution with a mean of 12 minutes and a standard deviation of 3 minutes. What is the probability that the next customer who calls will spend 14 minutes or less on the phone? Known: μ = 12 and σ = 3 Find the z-score for x = 14: ANSWER: 0.67 This says that x = 14 is 0.67 standard deviations (0.67 increments of 3 units) above the mean of 12 chp 6 PG 17-19

The Z-score

The z-score identifies the number of standard deviations a particular value is from the mean of its distribution • A z-score has no units The z-score is - zero for values equal to the mean - positive for values above the mean - negative for values below the mean pg 45 chp 3

Continuous data examples

Time required to read chapter 2 • Thickness of paint applied to a car body • Voltage of batteries produced in August

Experiment

Treatments are applied in controlled conditions Example: Crop growth from different plots using different fertilizers

Pie Charts

are another excellent tool for comparing proportions for categorical data Each segment of the pie represents the relative frequency of one category

Independent and Dependent Events

Two events are considered independent of one another if the occurrence of one event has no impact on the occurrence of the other event If the occurrence of one event affects the occurrence of another event, the events are considered dependent

Formula for the Population Mean

U=the population mean (the Greek letter "mu") N = the number of data values in the population

Outliers

Upper Limit = QSMALL3 + 1.5 (IQR) Lower Limit = Qsmall1 - 1.5 (IQR) Values beyond these limits are considered outliers

Pie Chart

Use a __________ __________ to compare the relative sizes of all possible categories

Subjective Probability

Used when classical and empirical probabilities are not available •Instead use experience or intuition to estimate the probabilities • Example: The probability that inflation will be greater than 4% next year

The Variance and Standard Deviation for a Population

Used when the data set represents an entire population rather than a sample from a population U= the population mean N = population size x little i - U = the difference between each data value and the population mean chp 3 powerpoint pg 34

Discrete

Values are whole numbers (integers) • Usually counted, not measured • number of complaints per day • number of TVs in a household • number of rings before the phone is answered In the Marriot Hotel (discrete) example in the book, there are only five possible outcomes within the interval 1-5 for a customer to choose from when rating his or her satisfaction.

Calculating Probabilities for a Hypergeometric Distribution

When sampling is without replacement, the probability of success changes during the sampling process • This violates the requirements for a binomial probability distribution •Use the hypergeometric distribution instead formula for the Hypergeometric Distribution pg 54 chp 5 where: N = The population size R = The number of successes in the population n = The sample size x = The number of successes in the sample Example: 5 of 50 accounts are delinquent. If an auditor randomly selects 10 accounts without replacement, what is the probability that at least one is found to be delinquent? • Need to find P(x ≥ 1) = 1 - P(x = 0) Use: N = 50 = The population size R = 5 = The number of successes in the population n = 10 =The sample size x = 0 = The number of successes in the sample

The Standard Normal Distribution

When the original random variable, x, follows the normal distribution, z-scores also follow a normal distribution with μ = 0 and σ = 1 This is known as the standard normal distribution

Empirical Probability

With Classical probability "There are 4 aces in a deck of 52 cards, so the probability of drawing an ace is 4/52. - Empirical probability: Involves conducting an experiment to observe the frequency with which an event occurs. Requires that you count the frequency that an event occurs through an experiment and calculate the probability form the experiment's relative frequency distribution. P(A) = Frequency in which Event A occurs/ Total number of observations chp 4 pg 14 and 15

Uniform Probability Distributions

With the continuous uniform probability distribution, the probability of any interval in the distribution is equal to any other interval with the same width https://www.youtube.com/watch?v=m1vXj- 6Asik

Parameter

a described characteristic about a population

Statistics

a described characteristic about a sample

The uniform distribution

describes data where all the values have the same chance of occurring

Advantages and Disadvantages of Using the Mean to Summarize Data

advatages • Simple to calculate • Summarizes the data with a single value Disadvantages: • With only a summary value you lose information about the original data • Sample 1 with n = 3: 999, 1000, 1001 = 1000 • Sample 2 with n = 3: 0, 1000, 2000 = 1000 • Just knowing the mean does not help you know what the underlying data looks like

weighted mean

allows you to assign more weight to certain values and less weight to others • Formula for the Weighted Mean:

Pareto charts

also plot the cumulative relative frequency as a line on the chart known as an ogive

Continuous random variables

are outcomes that take on any numerical value in an interval, as determined by conducting an experiment •Usually measured rather than counted • Examples of continuous data include time, distance, and weight The purpose of this chapter is to identify the probability that a specified range of values will occur for continuous random variables, using continuous probability distributions

Combinations

are the number of different ways in which objects can be arranged without regard to order Formula for the Combinations of n Objects Selected x at a Time: chp 4 pg 52 and 53

Permutations

are the number of different ways in which objects can be arranged in order: 123, 132, 213, 231, 312, 321 The number of permutations of n distinct objects is n! n! = n(n - 1)(n - 2)...(2)(1) By definition, 0! = 1 chp 4 pg 51

Contingency Tables with Probabilities Decision trees :

are used to display marginal and joint probabilities from a contingency table

Discrete data

are values based on observations that can be counted and are typically represented by whole numbers • represent something that has been counted • take on whole numbers such as 0, 1, 2, 3 Because discrete data can be counted, they have a finite number of values within an interval,

Displaying Qualitative data

are values that are categorical • Can be nominal or ordinal measurement level •Describe a characteristic, such as gender or level of education

Continuous data

are values that can take on any real numbers, including numbers that contain decimal points • usually measured rather than counted • Examples are weight, time, and distance whereas continuous data have an inifinite number of values available

standard deviation vs variance

both derived from mean of a given data set - both are statistical measures of dispersion of data. They represent how much variation there is from the average or to what extent the values typically "deviate" from the mean Examples: data set includes the height of six dandelions. The variance is 7.25 and SD is 2.69. This means that any dandelion within 2.69 inches of the mean (5.5 inches) is norma

contingency table

can be used to show the number of occurrences of events that are classified according to two categorical variables

Continuous probability distributions

can have a variety of shapes CHP 6 PG 7

Bias

can occur when a question is stated in a way that encourages or leads a respondent to a particular answer

Continuous random variables

can take on any value within a specified interval Because there are an infinite number of possible values, the probability of one specific value occurring is theoretically equal to zero Probabilities are based on intervals, not individual values • Probability is represented by an area under the probability distribution

The formula for the Sample Mean from Grouped

chp 3 pg 52

Find the midpoint for each class

chp 3 pg 54 chp 3 pg 53 54 55

Classical Probability

chp 4 pg 8 Used when the number of possible outcomes of the event of interest is known • Requires that you know the number of outcomes that pertain to a particular event. You also need to know the total number of possible outcomes in the sample space • Formula for classical probability P(A) = Number of possible outcomes that constitute Event A/ Total number of possible outcomes in the sample space where: P(A) = The probability that Event A will occur

The Mean of a Discrete Probability Distribution

chp 5 pg 10,11 The mean, μ, of a discrete probability distribution is the weighted average of the outcomes of the random variables that comprise it Also known as the expected value, E(x) μ = The mean of the discrete probability distribution xi = The value of the random variable for the i th outcome P(xi ) = The probability that the i th outcome will occur n = The number of outcomes in the distribution

Formula for the Variance of a Poisson Distribution

chp 5 pg 37 The variance of the distribution is the same as the mean EXAMPLE: If a bank receives an average of λ = 4 bad checks per week, what is the probability that it will receive 3 bad checks next week? Solution: λ = 4 and x = 3, There is about a 19.5% chance that the bank will receive 3 bad checks next week. FORMULA SOLUTION FOR EXAMPLE OS CHP 5 PG 38

The goal is to create a histogram to ___________ and __________ show the pattern in the data

clearly and usefully

Measures of relative position

compare the position of one value in relation to other values in the data set Measures of Relative Position= Percentiles and Quartiles

Secondary data

data collected by someone else

Nominal Data:

data described as a category or labels Examples: gender (female/male) marital status (married, single, divorced, widowed) NO RANKING ALLOWED EX: ZIPCODES

Information

data that are transformed into useful facts that can be used for a specific purpose, such as making a decision

Primarty Data

data that you have collected for your own use

sampling distribution of the proportion

describes the pattern that sample proportions tend to follow when randomly drawn from a population Suppose that CBS, the network that broadcasted the 2013 super bowl, estimated that 45% of the U.S. households would tune in to the game when they established the cost of a 30 second commercial before the big event. Also, suppose that Coca-Colas, one of the game's advertisers, wanted to verify this claim independently. After the game, Coca-Cola randomly selected 200 household and found that 84 of them watched the Super Bowl. Based on this sample, can Cocoa-Cola validate the claim made by CBS? CHP 7 PG 47-48 x = The number of observations of interest in the sample (successes) n = Sample size (trials) p = Population proportion n = Sample size CHP 7 48-53

Relative frequency distributions

display the proportion of observations of each class relative to the total number of observations •shows the fraction of observations in each class •found by dividing each frequency by the total number of observations •the fractions in a relative frequency distribution add up to 1.00 EXAMPLE SO IF THERE are 50 past days and there is a graph which sows 0=5 4=6 -2=4 since its past 50 days you do 5 divided bu 50 and 4 divided by 50 and6 divided by 50 etc.

Stratified sampling

divides the population into mutually exclusive groups, or strata (recall chapter 4 that 2 events are considered to be mutually exclusive if they cannot occur at the same time during an experiment.) • A random sample from each strata is selected • Strata are based on important variables that can have an impact on the data collected and the results that are achieved • Example: We used stratified sampling because we felt that the class the students belonged towhether they were freshmen, sophomores, juniors or seniors- was an important factor in how they would respond to your survey. This helps ensure that the sample is representative of the overall population.

In systematic sampling,

every k the member of the population is chosen for the sample. The value of k is determined by dividing the size of the population (N ) by the size of the sample (n). This means every second or every third member and so on can be chosen

percentage polygon

graphs the midpoint of each class as a line rather than a column • The height of each midpoint represents the relative frequency of the corresponding class • Used to compare the shape of two or more distributions on one graph

Microsoft Excel

has built-in options for data presentation and statistical analysis

Ratio level of measurement:

have all the features of internal data with the added benefit of having a true zero point Example: salary ( 0 salary no money), money $ 20 twice as much as $10 MEANING DIFFERENCES TRUE ZERO POINT INCOME ($48,000,$0)

Ordinal Data

have all the properties of nominal data, with the added feature that we can rank-order the values from highest to lowest Examples: educational level RANKING ALLOWED NO MEASURE MEANING TO THE NUMBER DIFFERENCES EX:EDUCATION LEVEL (MASTES,BACH,AA)

Continuous random variables

have outcomes that take on any numerical value as a result of conducting an experiment ex: length of time a customer waits in the checkout line at whole foods; ounces of soda consumed by an adult in 1 month.

Discrete random variables

have outcomes that typically take on whole numbers as a result of conducting an experiment

Revisiting the Empirical Rule

https://www.youtube.com/watch?v=ykmT12Ipigc CHP 6 PG 29AND 30

the percentile rank

identifies the percentile of a particular value within a set of data Formula to find the approximate percentile rank for a value x: formula is chp 3 pg 62

standard deviation

is the square root of the variance • Has the same units as the original data Sample standard deviation formula: chp 3 powerpoint pg 32 and 33 4 6 8 9 11 12 12 18 n = 8 Mean = = 10 formula (square root) 130/7=18.571 (answer) because 18.571 is in sqaure root then the answer REALLY is 4.309

median formula:i = 0.5(n)

is the value in the data set for which half the observations are higher and half the observations are lower • First arrange the data in ascending order •Use an Index Point to determine the position of the median in the data set (middle of the data value) Formula for the Index Point for the Median:

• Cluster sampling

involves dividing the population into mutually exclusive groups, or clusters, that are each representative of the population • Then randomly select clusters to form the final sample • These clusters are often selected based on geography to help simplify the sampling process

A simple random sample

is a sample in which every member of the population has an equal chance of being chosen

A nonprobability sample

is a sample in which the probability of a population member being selected for the sample is not known

The exponential probability distribution

is another common continuous distribution • Commonly used to measure the time between events of interest • Examples: • the time between customer arrivals • the time between failures in a business process

A convenience sample

is used when sample values are selected simply because they are easily accessible Convenience Nonprobability Sampling • Advantages: • Quick and easy to get sample data • Provides general information about the population • Disadvantages: • May not be representative of the population. Ex: choosing current stat class as sample to provide feedback on this text book. This may not represent all of the students in the nation who read stats books

hypergeometric distribution

is used when samples are taken from a finite population without being replaced. Samples are no longer independent in this case. Under these conditions the probabilities of success change repeatedly because the sample space becomes smaller after each selection.

A discrete probability distribution

is • a listing of all the possible outcomes of an experiment for a discrete random variable along with the relative frequency of each outcome

A probability sample

is a sample in which each member of the population has a known, nonzero, chance of being selected for the sample Advantage: can perform inferential statistical tests to draw reliable conclusions about the population

The standard deviation

is a common measure of consistency in business applications, such as quality control • The standard deviation measures the amount of variability around the mean The standard deviation is affected by the scale of the data •When sample means are different, comparing standard deviations can be misleading

A histogram

is a graph showing the number of observations in each class of a frequency distribution • Excel uses the term "bins" for the classes in the distribution

box-and-whisker plot

is a graphical display showing the relative position of the three quartiles as a box on a number line It also shows the minimum and maximum values in the data set and any outliers

The cumulative percentage polygon, or ogive

is a line graph that plots the cumulative relative frequency distribution

probability

is a numerical value ranging from 0 to 1 Probability indicates the chance, or likelihood, of a specific event occurring •If there is no chance of the event occurring, the probability is 0 •If the event is absolutely going to occur, the probability of it occurring is 1

Line Chart

is a scatter plot in which the data points in the scatter plot are connected with line segments • Often used with time series data When graphing a time series the convention is to place the time data on the horizontal axis

Central tendency

is a single value used to describe the center point of a data set

Resampling

is a statistical technique where many samples are repeatedly drawn from a population One type of resampling methods is the bootstrap method •Involves using computer software to extract many samples with replacement in order to estimate a parameter of the population, such as a mean or proportion

PHStat

is an Excel Add-in developed by Prentice Hall to provide students with additional features for statistical analysis

The expected monetary value (EMV)

is the mean of a discrete probability distribution when the discrete random variable is expressed in terms of dollars • The EMV represents a long-term average, as if outcomes from the distribution occurred many times chp 5 pg 17

The mean, or average,

is the most common measure of central tendency • Calculate the mean by adding all the values in a data set and then dividing the result by the number of observations

Conditional probability

is the probability of Event A occurring, given the condition that Event B has occurred

Conditional probability

is the probability of Event A occurring, given the condition that Event B has occurred

Mode

is the value that appears most often in a data set • If no data value or category repeats more than once, then we say that the mode does not exist • more than one mode can exist if two or more values tie for most frequent The mode is a particularly useful way to describe categorical data

The binomial probability distribution

is used to calculate the probability of a specific number of successes (x) for a certain number of trials (n), given specified probability of success (p) and probability of failure (q) Formula for the probability of exactly x successes from n trials: FORMULA CHP 5 PG 21 P(x,n) = The probability of observing x successes in n trials n = Number of trials x = Number of successes p = Probability of a success q = Probability of a failure EXAMPLE" Example: 40% of all voters support Proposition A. If a random sample of 10 voters is polled, what is the probability that exactly five of them support the proposition? find P(x = 5) if n = 10, p = 0.4, and q = 0.6 pg 22 chp 5

The addition rule for probabilities

is used to calculate the probability of the union of events • the probability that Event A, or Event B, or both events will occur Two events are considered to be mutually exclusive if they cannot occur at the same time during the experiment

The exponential distribution

is used to describe data where lower values tend to dominate and higher values don't occur very often

The multiplication rule

is used to determine the probability of the intersection (joint probability) of two events occurring, or P(A and B) Formula for the multiplication rule for dependent events: chp 4 pg 43 and 45

Poisson distribution

is useful for calculating the probability that a certain number of events will occur over a specific interval of time or space; Counting number of success in a given time interval Examples: •Number of customers per hour •Number of flaws per meter of cloth •Number of accidents per month

The normal probability distribution

is useful when the data tend to fall into the center of the distribution and when very high and very low values are fairly rare

Inferential statistics

making claims or conclusions about the data based on a sample • Population: represents all possible subjects that are of interest to us in a particular study • Sample: refers to a portion of the population that is representative of the population from which it was selected.

Variance

mean is simply the average of all data points, the variance measures the average degree to which each point differs from the mean. (the greater the variance, the larger the overall data range.) - Variance http://www.investopedia.com/terms/v/variance.asp

Percentiles

measure the approximate percentage of values in the data set that are below the value of interest The pth percentile of a data set (where p is any number between 1 and 100) is the value that at least p percent of the observations will fall below Examples: •20% of the data values are below the 20th percentile •73% of the data values are below the 73rd percentiles

The coefficient of variation, CV,

measures the standard deviation in terms of its percentage of the mean • A high CV indicates high variability relative to the size of the mean • A low CV indicates low variability relative to the size of the mean A smaller coefficient of variation indicates more consistency within a set of data values A smaller coefficient of variation indicates more consistency within a set of data values Pg 104 Nike vs google stock price. Nike 7.4 % and google 6.7 %. Even though Google's stock price has a higher standard deviation than Nike's does, it is more consistent because the coefficient of the variation is lower In the investing world, the coefficient of variation allows you to determine how much volatility (risk) you are assuming in comparison to the amount of return you can expect from your investment. In simple language, the lower the ratio of standard deviation to mean return, the better your riskreturn tradeoff.

Nonsampling errors

occur as a result of issues such as • ambiguous survey questions • questions that lead respondents to a certain "correct" answer • data collection errors These are errors not related to sampling variability

The intersection

of Events A and B represents the number of instances in which Events A and B occur at the same time

According to the Central Limit Theorem

sample means from samples of sufficient size, drawn from any population, will be normally distributed •In most cases, sample sizes of 30 or larger will result in sample means being normally distributed, regardless of the shape of the population distribution •If the population follows the normal probability distribution, the sample means will also be normally distributed, regardless of the size of the samples

Measures of variability Range Variance-for a sample and for a population Standard Deviation- for a sample and for a population

show how much spread is present in the data.

Quartiles

split the ranked data into 4 equal groups: •The first quartile (Q1) is the value that constitutes the 25th percentile •The second quartile (Q2) is the value that constitutes the 50th percentile •Note that the second quartile (the 50th percentile) is the median •The third quartile (Q3) is the value that constitutes the 75th percentile CHP 3 PG 65

stem and leaf display

splits the data values into stems (the larger place values) and leaves (the smaller place value) By listing all of the leaves to the right of each stem, we can graphically describe how the data are distributed •All the original data points are visible on the display • Easy to construct by hand • Provides a histogram-like view of the distribution

Chebyshev's Theorem

states that for any number z greater than 1, the percent of the values that fall within z standard deviations above and below the mean will be at least pg 50 chp 3 Applies regardless of the shape of the distribution •At least 75% of the data values will fall within ±2 standard deviations around the mean •At least 89% of the data values will fall within ±3 standard deviations around the mean •At least 94% of the data values must fall within ±4 standard deviations around the mean

The Central Limit Theorem

states that the sample means of large-sized samples will be normally distributed regardless of the shape of their population distributions • A key concept to be used repeatedly throughout the rest of the book

law of large numbers

states that when an experiment is conducted a large number of times, the empirical probabilities of the process will converge to the classical probabilities Example: Flip a coin a large number of times • The observed number of heads would be very close to 50%

Interval measurement level:

strictly quantitative, allow us to measure the differences between the categories with actual numbers in a meaningful way Examples: temperature measurements, GPA; do not have a true zero point. The term true zero point means that a zero data value indicates the absence of the object being measured MEANINGFUL DIFFERENCES NO TRUE ZERO POINT EX:CALENDAR YEAR (2014,2015)

Short-Cut Formula Example pg 97

sum of the data values=186 sum of the square data values=5,952 so (186) square root of 2= 34,596 5952-34,596/6/6=5952-5776/6=31 chp3 powerpoint pg 37

Sample Covariance

sxy , measures the direction of the linear relationship between two variables • A relationship is linear if the scatter plot of the independent and dependent variables has a straight-line pattern • If the linear relationship between x and y are positive that means that as the value of x increases, the value of y tends to increase. chp 3 pg 75

Distribution Shape symmetric LeftSkewed RightSkewed

symmetric is when the top is in the middle and then to right it goes down but also the left LeftSkewed- from small to big but starting on the left side RightSkewed-from small to big but starting on the right side If the mean is greater than the​ median, the shape of the distribution is said to be​ right-skewed. If the median is greater than the​ mean, the distribution is​ left-skewed. If the mean and median are close​ (or equal), the distribution is said to be symmetric

cumulative relative frequency distribution

totals the proportion of observations that are less than or equal to the class at which you are looking • Shows the accumulated proportion as values vary from low to high • Example: if the manger of the apple store wanted to determine the percentage of days that three or fewer ipads were sold. which is the set of numbers which made you find the relative frequency so you get the first relative frequency from hen you did it. and then you add it with the next frequency so .10+.16=.26 .26+ .28= .54 .54+ .26= .80 and so on

Data

values assigned to observations or measurements and are the foundation of statistics

Cross Section Data

values collected from a number of subjects during a single time period ex: an unemployment graph which shows on the left the amount # but on the bottom it shows what state (US, Canada, etc)

Time Series Data

values that correspond to specific measurements taken over a range of time periods EX:unemployment graph which on the left shows how many but on the bottom is shows 2208,2009,2010 and 2011

Features of z-scores for Normal Distributions Using Normal Probability Tables

z-scores are negative for values of x that are less than the distribution mean • z-scores are positive for values of x that are more than the distribution mean • The z-score at the mean of the distribution equals zero

Descriptive statistics

• Collecting, summarizing, and displaying data. Allows us to get an overview of the information. Can be useful, but has limitations. By summarizing large quantities of data, you lose information

why sample

• Examining the entire population would be expensive and time consuming • Can't examine everything if the test is destructive If a sample is selected properly and the analysis performed correctly, sample information can be used to make an accurate assessment of the entire population

Bias

• Example: "Do you agree that the current overly complex tax code should be simplified and made more fair?

Discrete data examples

• Number of children per family • Number of cars listed per insurance policy • Vacation days per month

Once k is known, the width of each class can be found

• The width is the range of numbers to put into each class • Round this estimate to a useful whole number that makes the frequency distribution more readable • 17.4-.06/6= 2.8 round to 3 (info from table 2.6 pg 31) k

Sample

•refers to a portion of the population that is representative of the population from which it was selected


Kaugnay na mga set ng pag-aaral

chapter 17 pre op nursing management

View Set

Data Science Foundation: Fundamentals

View Set

BUS 320: Nonprofit Organizations Exam #1

View Set

Chemistry I - Chapter 5 Section 2

View Set

marketing midterm multiple choice

View Set

Principles of Accounting - D074: UNIT 3

View Set