STA1LS Semester 2, 2017 Revision

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

A function that assigns a unique numerical value to each outcome in a sample space is known as a: Select one: a. parameter b. unique number generator c. simple event d. random variable Correct

d.random variable

To compute P(A)

Add up the probabilities of each outcome, or simple event in A

Experiment

An activity which there are at least 2 possible outcomes and the result of the activity cannot be predicted with absolute certainty - i.e. Coin toss = Heads, Tails - i.e. Diabetes Test = Positive, Negative

Complement of an Event

Any event A denoted as A' , is the set of all outcomes in the same sample space that do not belong to A .

Name and Describe the 2 Correlation Co-efficients

Are use to numerically summarise the relationship between 2 quantitative variables and include : 1. Linear correlation coefficient: Pearson Product moment correlation coefficient: - Measures the strength of the LINEAR relationship between 2 quantitative variables 2.Spearman's rank correlation coefficient: Spearman's rank correlation coefficient: - The strength of the monotonic(of a function or quantity) varying in such a way that it either never decreases or never increases) relationship between 2 quantitative variables

Limiting Relative Frequency

As the relative frequencies tend to stabilize and become almost constant - One number is left, which is the limiting relative frequency - The probability of an event is the limiting relative frequency of that event. - The probability of an event A, P(A), is equal to the limiting relative frequency of event A.

Bimodal Distribution

- Has two peaks, not common and may occur with population mixing

Histograms (Describe, sample size, and detects)

- Its a bar chart, horizontal axis = data classes, vertical axis = frequencies , height of bars frequency values -Detects clusters, shape, location, spread and outliers - sample size n>30 - Its used to create a frequency distribution table

Unimodal distribution - Describe and Define

- Its a distribution with one peak - Its symmetric if there is a line down the middle - Its skewed if its not symmetric - Unimodal distributions include: Normal, bell-shaped, or normal curve distributions - Approximates many populations

Box Plots

- Its a graph that consists of a line extending from the minimum value to the maximum value and a box with lines drawn at the first quartile, the median and the third quartile -WEAKNESS: Doesn't Identify Clusters

Sample Standard Deviation

- Its the square root of the sample variance - s= square root s ^2 - Is more reliable to determine variability due to being in the same units as the original data - (Repeat write formulas in lecture notes - WK 1 )

Sample Mean

- Measures the central location of the data - Found by adding the values of each observation in the sample and dividing by the total sample size - SEE LECTURE SLIDES WK 1

Sample Variance

- Measures the variability in the sampled data, expressed in square units - Roughly interpreted as the squared difference between the data and the sample mean - The larger the sample variance the greater the variability in the data (REPEAT WRITE FORMULA IN LECTURE NOTES- WK1)

Multimodal Distribution

- More than one peak, also a distribution with more than 2 distinct peaks, very rare

State the notations for estimates and parameters (draw symbols)

- The sample mean , is an estimate of the population mean , mu - The sample variance s^2 is an estimate of the population variance - The sample standard deviation s, is an estimate of the population standard deviation - Greek Letters = Population Parameters - English Letters = Sample statistics

Relative Frequency Histogram

- Used to compare histograms with different sample sizes - Bar chart, horizontal axis = data classes, vertical axis = relative frequencies - Multiply relative frequency by x100 to obtain the percentage

Numerical summaries depend on ........

- Whether or not the distribution is bell-shaped

Properties of the Linear Correlation Coefficient

Values are to be between -1 and +1 - r=1 when x and y points fall on the increasing line (Strong) - r= -1 if x and y points fall on the decreasing line (weak) - r= 0 corresponds to a scatter plot cloud of points with no linear orientation

What are the 3 measures of spread ?

Variance, standard deviation, and IQR

Experts give a certain poor performing professional sports team an 80% chance of winning at least one game during their upcoming regular season. Based on this prediction, what is the probability that they will not win any games? Select one: a. 0.20 b. 0.15 c. 0.80 d. 0.50

a. 0.20

Suppose we want to explore whether a change in gender (male, female) explains a change in weight (kilograms). Is weight an explanatory or response variable? Is weight a categorical or quantitative variable? Select one: a. response, quantitative b. explanatory, categorical c. response, categorical d. explanatory, quantitative

a. response, quantitative

Define and Describe a Scatter Plot

Is a graphical summary for 2 quantitative variables - its a plot of paired (x,y) data with a horizontal axis (x) and a vertical axis (y) - Each individual pair (x.y) is plotted as a single point DOES NOT PROVIDE THE STRENGTH OF THE RELATIONSHIP BETWEEN 2 VARIABLES - the explanatory variable = x-axis - the response variable = y-axis

Sample Space

Is a list of possible outcomes in an experiment, denoted by S i.e. S= { E1, E2 ..... } where E1, E2, and E3 are possible outcomes of the experiment

Define Correlation coefficient

Is a numerical summary that measures both the STRENGTH and DIRECTION of the relationship between 2 QUANTITATIVE variables - Its a free numerical summary

Binomial Experiment

Is a statistical experiment that has the following properties: 1. The experiment consists of n trials 2. Each trial can only have one of two mutually exclusive outcomes, Denoted (S) Success or (F) Fail 3. The outcomes of the trials are independent 4. The probability of a success, p is constant from trial to trial

Spearman's correlation coefficient

Is calculated by first ranking the data for each quantitative variable and then applying the linear correlation co-efficient formula to the rankings 0.8 to 1 : Strong 0.5 to 0.8 : Moderate 0 to 0.50 : Weak

Response Variable

Is the focus of a question in a study or experiment.

How can a PROBABILITY distribution for a DISCRETE random variable be shown ?

It can be : - An itemized listing - A Table - A Graph - Or a function

Relative Frequency

Its the occurrence of an event is the number of times the event occurs divided by the total number of times the experiment is conducted

Define Interval Probability and how it is calculated

Its the probability that a random variable X takes on a value between a and b (where a and b are constants) - Calculated: Adding P(X = x) of an interval

Define Upper Tail Probability and how it is Calculated

Its the probability that a random variable X takes on a value greater than or equal to a (where a is a constant) - Calculated: Adding P(X = x) of the upper tail, see slide 4-15 for an example

Define Cumulative Probability and how it is calculated

Its the probability that a random variable X takes on a value less than or equal to a (where a is a constant) - Calculated : Adding P(X = x) values, see slide 4-13 for example

Joint Probability Table

Joint probability is a measure of two events happening at the same time, and can only be applied to situations where more than one observation can be occurred at the same time

What are the 3 measures of Central Location ?

Mean, median, mode

Mutually Exclusive Events

No 2 outcomes in the sample space can occur simultaneously on any 1 trial in the experiment - A and B have no elements in common, they are disjoint or mutually exclusive A (upside down ) u B = o (line across)

Quantitative Variable

Numbers of counts or measurements - 2 types of this data

Continuous

Numbers that can take on infinite values I.e. Body temperature

Data

Observed values of variables

Multiplication Rule

P(A ∩ B) = P(A) P(B|A) The rule of multiplication applies to the following situation. We have two events from the same sample space, and we want to know the probability that both events occur.

A listing of all the possible outcomes from an experiment using set notation is called the: Select one: a. sample space. b. experimental outcome list. c. sample point. d. event.

a. sample space

\bar{x} denotes the: a. population mean b. sample mean c. sample variance d. population variance

b. sample mean

\tilde{x} denotes the a. sample mean b. population mean c. sample median d. sample standard deviation

c. Sample median

If Ø is defined as the empty set, then P(Ø) must by definition be: Select one: a. between 0.5 and 1. b. less than 0. c. more than 0. d. equal to 0.

d. equal to 0

The variance, standard deviation and interquartile range are all measures of a. sample size b. central location c. central tendency d. spread

d. spread

A random variable that can take on an infinite number of values within the limits the variable ranges is said to be: Select one: a. discrete. b. compact. c. predictable. d. continuous

d.continuous

Suppose the random variable X denotes the number of shoppers who use the "self scan" aisle at the local supermarket during the day. X is a ________________ random variable. Select one: a. compact b. predictable c. continuous d. discrete

d.discrete

Law of Total Probability

is a fundamental rule relating marginal probabilities to conditional probabilities. It expresses the total probability of an outcome which can be realized via several distinct events—hence the name. see 3-36 for the formula

Population Standard Deviation

Is equal to the square root of the variance - Expressed in the same units as the variable of interest

Addition Rule

P(A ∪ B) = P(A) + P(B) - P(A ∩ B)

Probability Density Function

For continuous variables that can take on infinite values, the computing for the variable is done within a set interval

Dot Plots (Detect, sample size, help identify)

- For small sample sizes: <10 - Detects clusters - Helps identify the original population group under study

Relative frequency histogram ( Describe)

- A bar chart, Horizontal axis = data classes, vertical axis = relative frequencies, Bar height = relative frequency values - Its used to create a frequency distribution table - Relative frequency values must be between 0-1

Modified Boxplots (WRITE THE FORMULAS FOR UPPER AND LOWER FENCES)

- Can be done in SPSS and manually - Identifies mild and extreme outliers - Step 1: Find quartiles, median, and IQR - Step 2: Find the 2 inner fences upper and lower - IFL= 1-1.5X(IQR) -IFH= Q3+1.5X(IQR) - OFL= Q1-3X(IQR) - OFH= Q3+ 3X(IQR)

Name 2 Numerical Summaries for Quantitative Variables

1- Central Location (Central Tendency) 2- Spread (Dispersion)

Name 3 types of Graphical Summaries for Quantitative Variables and suitable sample size

1- Dot plots - small samples 2-Histograms and relative frequency histograms - Large samples 3- Box plots - large samples

What 2 measures of data are used for a bell-shaped distribution ?

1- Mean : To indicate central location 2- Sample variance/ deviation: To indicate the variance of the data

What 2 data distributions is used with skewed data ?

1- Median (Central Location) 2- IQR (Spread)

State the terms involved with the 5-number summary (Write out formulas for Q1 and Q3)

1- Min - Them minimum value of the data set 2- Q1 - The 25th Percentile, Separates the bottom 25% with the upper 75% of sorted values 3- Median: Separates the upper and bottom tails denoted Q2, it is the middle value in the set of values, even set = n+1/2 - then divide that by 2 4- Q3 : The 75th Percentile, Separates the bottom 75% with the upper 25% of sorted values : Q3 = n+(n+1/4)(n2-n1) = Q3 5- Max : The maximum value of the data set IQR= Q3-Q1

What are the 5 characteristics of distribution? (Quantitative Variables)

1- Shape: Symmetrical, bell-shaped , skewed 2- Centre: A typical value ( mean, median, mode) - Where most of the data lies 3- Spread (Dispersion): Variation in the data (standard deviation, IQR, and range) 4- Clusters: Groups of observations that give rise to a bi-modal or multi-modal appearance 5- Outliers: A data point that is not consistent with the rest of the data set

What 2 graphical summaries that show the relationship between the explanatory and response variable ?

1. Dot Plots (n<10) 2. Box Plots (n>10)

if the data in all groups is approximately bell-shaped thenn what measurements do we use for central location and spread ?

1. Mean = Central Location 2. Standard deviation = Spread

If at least one of the groups of data has a skewed distribution then what measurements do we use for central location and spread ?

1. Median = Central Location 2. IQR = Spread

Name the 3 measure of a population

1. Population mean 2.Population variance 3.Population standard deviation - All can be calculated using the PROBABILITY DISTRIBUTION of a DISCRETE RANDOM VARIABLE

Properties of Probability

1. Some events are more likely to occur than others 2. For an event A, we assign a number that conveys the likelihood of occurrence of A 3. This number is called the probability of the event A, denoted P(A)

What are the 2 requirements for a pdf ?

1. The pdf f(x) cant be negative for any values of x 2. The total area underneath the curve has to equal to 1 - See 5-5 and 5-7 for 2 good examples

Properties of Probability

4. Given the sample space the probabilities assigned to the outcomes must satisfy 2 basic requirements - The probability of an outcome must lie between 0 and1 0<P(Ei) < 1 for all i - The sum of probabilities of all of the outcomes in the sample space must equal 1

Pareto Chart

A bar chart for categorical data, where bars are arranged in order according to frequencies or relative frequencies - Bars are arranged, descending from the tallest (at the left) to the smallest (at the right)

Variable

A characteristic of a subject i.e. Students and height

Random Variable

A function that assigns a unique numerical value to each outcome in a sample space - Values can not be predicted with certainty - i.e. x and y

What graph is used to summarize the relationship between 2 categorical variables ?

A stacked bar graph : Is a graph that is used to break down and compare parts of a whole. Each bar in the chart represents a whole, and segments in the bar represent different parts or categories of that whole.

Categorical Variable

A variable split into categories with non-numeric categories - Has 2 types

Explanatory Variable

A variable that might cause or explain a change in the response variable

Continuous Probability Distributions / pdf

Completely describes the random variable and is used to compute probabilities associated with the random variable - Probability density function (pdf): Or a density curve, is the probability distribution for for a continuous random variable

Union of Events

Denoted A u B is the event that consists of all outcomes in the sample space that are in A or B or in Both

Intersection of Events

Denoted as A (upside down )u B is the joint event that consist of all outcomes in the sample space that are in A or B or in Both

Binomial Random Variable

Describes the outcome of a Binomial Experiment - Maps each value of a Binomial experiment between 0 and n (number of successes)

What are the 3 things that can show the distribution of the data set ?

Dot plots, histograms and boxplots: Detect shape, location, spread and outliers Dot plots and histograms: Detect Clusters Boxplots: Don't detect clusters

Simple Events

E1, E2 and E3 are called simple events - Its each individual trial in an experiment - You can't breakdown simple events down further

Frequency distribution table

For numerical data, it summarises and displays classes, frequencies, relative frequencies, and cumulative relative frequencies - SEE LECTURE SLIDES EXAMPLE WK-1 I.e.- Freq, interpretation: 4 observations fall in between 130 and 134 exc. 134 -Rel frequency interpretation: 10% if the observations fall in between 130 and 134 exc. 134 -Cumulative relative frequency: 85% of the observations are less than 134

Describe the numerical and graphical summaries for Categorical Variables (SEE WK 1 LECTURE NOTES)

Frequency and Relative Frequency Table 1- Count the frequency 2- Calculate the relative frquency 3- Store this all in a frequency table

Nominal

Has no order to it, characterised by data containing names, labels, and categories I.e. Gender

Ordinal

Has some order to it, characterised by data containing names, labels, and categories I.e. T-shirt size

Describe IQR and what it measures

IQR or the Interquartile Range is the difference between the 3rd and 1st quartile - IQR= Q3-Q1 - The IQR provides the range of the middle 50% of the data - Measure the variation like standard deviation

Binomial Random Variable in Statistical Sampling

If the sample size is smaller than population successes then it is a binomial distribution

Continuous (Random Variable)

If the set of all possible values can take on an infinite number of values within the variable ranges i.e. Height, weight - Associated with measuring

Discrete (Random Variable)

If the set of all possible values is finite or countably finite . i.e. Number of people taking STA1LS -Associated with counting

What is probability mass function (pmf) and how is it found ?

Pmf is denoted P(X = x) and is the probability that a discrete random variable X is equal to some specific value x - To find pmf look at x (NOT X) and outcomes then add the probabilities associated with outcomes - See slide 4-10 for properties of a valid probability distribution for a Discrete Random Variable

Population Variance

Population variance (σ2) tells us how data points in a specific population are spread out. It is the average of the distances from each data point in the population to the mean, squared. - Expressed in squared units -Expected value of squared standard deviations about the population mean

The Complement Rule (For a Binomial Random Variable)

States that the sum of the probabilities of an event and its complement must equal 1, or for the event A, P(A) + P(A') = 1 see 4-34 for formula

Complement Rule

The Complement Rule states that the sum of the probabilities of an event and its complement must equal 1, or for the event A, P(A) + P(A') = 1

What kind of table is used to summarize the relationship between 2 categorical variables ?

Two-way table: - A table that covers all contingencies for the combination of the 2 variables - Explanatory Variable = Rows - Response Variable = Columns

Define and describe Probability

The extent to which an event is likely to occur, measured by the ratio of the favorable cases to the whole number of cases possible. - It provides a link so that the sample statistics can be used to make inferences about the population parameters: - Population Parameters: Most difficult to measure i.e. the height of everyone at a University - Sample statistic i.e. mean height : Gives an estimation of population parameter

Population Mean

The population mean is an average of a group characteristic The population mean symbol is μ. μ = (Σ * X)/ N - Population mean may not be possible values of the random variable

Empty set

The probability = 0 - P(o) = 0 (line across) - That is the empty set contains no outcomes

Probability of an Event

The probability of an event A is a number between 0 and 1 (including those end points) that measures the likelihood of A will occur - If the probability of an event is close to 1 then the event is likely to occur - If the probability of an event is closer to 0 then the event is not likely to occur

Cumulative Probability / Cumulative distribution function (cdf)

The probability that a random variable X takes on a value less than or equal to a (where a is a constant) - cdf: see 4-32 for formula - see 4-35 and 4-36 for Cumulative Difference Rule I and II

Conditional Probability Rule

The probability that event A occurs, given that event B has occurred, is called a conditional probability. The conditional probability of A, given B, is denoted by the symbol P(A|B). - See Slide 3-29 for formula

Independence

Two events are independent when the occurrence of one does not affect the probability of the occurrence of the other. P(A|B) = P(A) or P(B|A) = P(B)

Mutually Exclusive Events

Two events are mutually exclusive if they have no sample points in common. P(A∩B) = 0

Discrete

Whole numbers that have a finite value I.e. Number of leaves on a plant


Ensembles d'études connexes

Religion: Unit 4: The Church is Catholic

View Set

DIFS ~ Small Business Management II ~ Part 1&2

View Set

Part-66 Questions; Gas Turbine Engines

View Set

Arithmetic and Number Property Rules (GRE)

View Set

Week 3 Chapters 4 & 5 Psychology 456

View Set