Probability Theory Exam 1 9/29/22 UNFINISHED
Pie chart
What type of chart is depicted here?
Scatter plot
What type of chart is shown here?
Left skew
What type of distribution shape is shown here?
Deviation
An observation's what is the distance between its value x and the sample mean?
Right skew
What type of distribution shape is shown here?
Symmetrical
What type of distribution shape is shown here?
Complement
"The event A did not occur" is described as the what of the event A?
Categorical
Are pie charts and bar charts categorical or numerical?
Numerical
Are stem and leaf plots, histograms, boxplots, and scatterplots categorical or numerical charts?
Probability is defined as a proportion, and it always takes values between 0 and 1 (inclusively).
Describe what this formula means.
Probability
For any phenomena, the __________ of an event is the proportion of times the event is expected to occur.
The median and IQR
For distributions containing extreme observations, which data provide a more accurate sense of center and spread?
Formula.
Formula for the sample mean.
1. Symmetry, or lack of it (skew) 2. Minimum and maximum values 3. Regions of high frequency (modes)
Histograms show important features of the shape of a distribution. Name three.
They are shown with dots outside of the whiskers.
How are potential outliers shown on boxplots?
We denote events by upper-case letters at the beginning of the alphabet: A, B, C, ..., as well as the special notation 𝜙
How do we denote events?
(A∩B)
How is the intersection of A and B denoted?
We calculate quartiles by finding the median (Q2), and then the median of the dataset between the lower data point and the median for Q1, and then the median of the dataset between the higher data point and the median for Q3.
How, in this class, are we to calculate the quartiles?
Prob(A⋂B) = 0
If (A⋂B) is an empty event, what is the prob(A⋂B)?
Their intersection is the empty event 𝜙 and has probability 0.
If A and B are mutually exclusive events, they cannot both occur in the same experiment. What is their intersection?
Independent
If Prob(A∩B) = Prob(A) x Prob(B), Events A and B are said to be what?
The median is the average of the two middle observations
If the number of observations in a set is even, what is the median?
The median is the middle observation
If the number of observations in a set is odd, what is the median?
This is statistical induction, or inference.
If we flipped a coin 2,000 times and we got 1,973 heads, we would reasonably claim that we have very good evidence that the coin is biased towards heads. This is considered a statistical what?
Histogram
In this type of graph: 1. The bars touch 2. X-axis has intervals 3. Y-axis has counts, percentages, or proportions 4. You include left endpoint (but not the right)
It always takes values between 0 and 1 (inclusively). It may also be expressed as a percentage between 0% and 100%.
Probability is defined as a proportion, and it always takes values between what numbers?
Good to know
Remember!
The sample mean
The __________ of a variable is the sum of all observations divided by the number of observations where x1, x2.... etc represent the n observed values in a sample.
Distribution
The collection of values for a numerical, continuous variable (e.g., weight) is the called the what for that variable?
Stem and leaf plot
The data presented here is in what kind of chart form?
Intersection
The event "both A and B occur in the same experiment" is said to be the what of the events A and B? This event is denoted (A∩B).
Union
The event "either A occurs or B occurs or both events occur in the same experiment" is said to be the what of the events A and B? This involves the General Addition Rule. This event is denoted A∪B.
Mutually exclusive (disjoint)
The events A and B are said to be what if they cannot both occur in the same experiment?
Numerical
What type of a chart is a stem and leaf plot?
Median and IQR
These data are called robust estimates because they are less likely to be affected by extreme values than the mean and standard deviation.
Bar chart
What type of chart is depicted here?
The standard deviation
This (𝑠) is the square root of the variance. It measures (approximately) the distance between a typical observation and the mean.
Event
This consists of one or more outcomes and is a subset of the sample space (i.e., rolling an even number)
Memorize it.
This depicts how to find the probability of the union of three events and how they intersect.
Ordinal variable
This is a categorical variable that has groups that can be ordered (e.g., education)
Nominal variable
This is a categorical variable with no natural ordering of levels (e.g., gender).
Variable
This is a characteristic observed that takes on different values in different persons, places, or things
Parameter
This is a measure from a population.
Statistic
This is a measure from a sample.
Discrete variable
This is a numerical variable that can only take on integer values (e.g., # of family members)
Outlier
This is a striking deviation from the overall pattern or shape of the distribution
Sample
This is a subset of items selected from a population.
Categorical variable
This is a variable that can be separated into groups.
Numerical variable
This is a variable that takes on numerical values, such that numerical operations (sums, differences, etc.) are reasonable.
Continuous variable
This is a variable with an infinite number of numerical values (e.g., height)
Binary/dichotomous variable
This is a variable with only two levels (e.g., pass or fail).
Study the image.
This is an image describing the differences between populations, samples, data, statistics, and parameters.
Population
This is any set of items or measurements of interest.
This is the union of events A and B
This is denoted A∪B.
The interquartile range (IQR)
This is the distance between the third and first quartiles. (Q3- Q1)
Formula
This is the formula for the deviation of an observation.
Study the formula.
This is the formula for the interquartile range.
Study it.
This is the formula for the probability of an event.
Study.
This is the formula for the sample variance.
Study it.
This is the formula for the standard deviation.
Outcome
This is the result of a single trial in a probability experiment (i.e., rolling a 6)
Statistics
This is the science of analyzing data where chance has played some part. It provides a process for handling data where randomness arises.
Sample space (S)
This is the set of all possible mutually exclusive outcomes (i.e., {1, 2, 3, 4, 5, 6}).
The sample variance 𝑠^2
This is the sum of the squared deviations divided by the number of observations minus 1.
Range
This is the term for the Maximum value - Minimum value.
The 𝑝th percentile
This is the term for the observation such that 𝑝% of the remaining observations fall below this observation.
Memorize.
This is the union of three events. Memorize.
Median
This is the value of the middle observation in a sample.
Scatter plot
This type of chart shows strength, direction, and structure of the relationship between the variables.
Boxplot
This type of graph indicates the positions of the first, second, and third quartiles of a distribution in addition to potential outliers, observations that are far from the center of a distribution.
25th percentile
To what percentile does the first quartile (Q1) correspond?
The second quartile is the media, it's the 50th percentile
To what percentile does the second quartile (Q2) correspond?
75th percentile
To what percentile does the third quartile (Q3) correspond?
Boxplot
What type of chart is depicted here?
TRUE
True or False: Given a sample space, the sum of the probabilities of each outcome must equal 1.
TRUE
True or False: Mean will always fall in the direction of the skew!
Prob(A∩B) = Prob(A) x Prob(B)
Two events A and B are said to be independent if and only if what is true?
Histograms are not so good for: -Displaying median, quartiles -Showing subtle skewing -Identifying extreme values
What are histograms not good for displaying?
1. The bars touch 2. X-axis has intervals 3. Y-axis has counts, percentages, or proportions 4. You include left endpoint (but not the right)
What are the four main traits of a histogram?
Stem and leaf plots, histograms, boxplots, and scatterplots
What are the four main types of numerical charts?
Symmetry, Skewness, Center, Spread, Peaks, Clusters, Gaps, Outliers
What are the main features of data for graphing?
1. Write data values in ascending order. 2. Decide a stem (typically everything to the left of the last digit) 3. Decide a leaf (typically last digit) 4. Write the stems in ascending (or descending) order 5. Write the leaves in ascending order 6. Include stems even when there are no leaves
What are the steps to create a stem and leaf plot?
1. They show the shape of distribution, center, and spread 2. Good for comparing different groups 3. Use Five Number Summary
What are the three main characteristics of boxplots?
1. The categories are on the x-axis 2. The counts or percentages are on the y-axis 3. The bars don't touch
What are the three main traits of a bar chart?
Ordinal and nominal
What are the two main categories of categorical variables?
Discrete and continuous
What are the two main categories of numerical variables?
Numerical and categorical
What are the two main categories of variables?
Pie charts and bar charts
What are the two main types of categorical charts?
Categorical and numerical
What are the two main types of graphs?
Center and spread
What are the two most important characteristics of a distribution?
This is the intersection of events A and B, the event that both A and B occur in the same experiment.
What does (A∩B) mean?
A^c is the complement of event A.
What does A^c mean?
Histogram
What type of chart is depicted here?
The occurrence of one of the events does not change the probability that the other event occurs.
What does it mean if two events are independent?
50% of observations lie below/above the median.
What does it mean to say that the median is the 50th percentile.
This is the union of event A and B, the event either A occurs or B occurs or both events occur in the same experiment
What does this venn diagram depict in terms of probability?
This is the intersection of events A and B. (A∩B).
What is denoted by the blue region?
Minimum, Q1, Median, Q3, Maximum
What is the Five Number Summary?
A∪B.
What is the denotation of the union of events A and B?
If the value < Q1 - (1.5 x IQR)
What is the formula for finding a lower outlier of a box plot?
If the value > Q3 + (1.5 x IQR)
What is the formula for finding an upper outlier of a box plot?
Prob(Event) = Count of outcomes in event/count of outcomes in sample space
What is the formula for the probability of an event?
Prob(A^c) = 1-Prob(A)
What is the formula for the probability of the complement of A?
Prob(𝐴∪𝐵) = Prob(A) + Prob(B) - Prob(A⋂B)
What is the formula for the union of events A and B?
Guideline is to use √n number of bins, where n is the sample size
What is the guideline for how many bars or "bins" are used in a histogram?
𝜙
What is the notation for an impossible event?
Whiskers capture data between Q1−(1.5×IQR) and Q3+(1.5×IQR). Whiskers must end at data points.
What is the set of data that the whiskers of a box plot capture?
It is a deduction or implication.
What is the term for a probability calculation?
Five Number Summary
What is the term for the collection of data that includes the Minimum, Q1, Median, Q3, and Maximum?
The events A and B are mutually exclusive.
What is the venn diagram displaying here in terms of probability?
The sample variance
What is this formula for?
Standard deviation
What is this the formula for?
The interquartile range
What is this the formula for?
The probability of the complement of A.
What is this the formula for?
The union of events A and B The event "either A occurs or B occurs or both events occur in the same experiment."
What is this the formula for?
50th
What percentile is the median?
Categorical
What type of a chart is a bar chart?
Numerical
What type of a chart is a boxplot?
Numerical
What type of a chart is a histogram?
Categorical
What type of a chart is a pie chart?
Numerical
What type of a chart is a scatterplot?
The rectangle extends from the first quartile to the third quartile, with a line at the second quartile (median).
Where is the rectangle on a boxplot? What does the line in the middle of the rectangle mean?
They are less likely to be affected by extreme values than the mean and standard deviation.
Why are the median and IQR called robust estimates?