TOPIC 2 Probability
because a sample is randomly chosen from a larger population,
any statistic such as the value of the sample mean, will vary from sample to sample.
The outcome of a statistical experiment
is a random variable. It is called random because its value cannot be predicted and is known only after the experiment has been performed. For example, we do not know beforehand whether a coin will fall with head or tails on top ± the experimental process can randomly generate either a head or tail as result.
The symbol `!'
is called factorial and works as follows: n! = (n)(n±1)(n±2)...(1). Furthermore, 1! = 1, and 0! = 1. For example: 6! = 6 6 5 6 4 6 3 6 2 6 1 = 720.
Cumulative probability
is the measure of the chance that two or more events will happen. Usually, this consists of events in a sequence, such as flipping "heads" twice in a row on a coin toss, but the events may also be concurrent.
favourable event
is the specific event in which we are interested, and outcomes are the set of logically possible events relating to the particular problem. In the coin example there are two possible outcomes: the coin could fall either heads (one event) or tails (another event).
12. A researcher is interested in the IQ of students at his college. The researcher believes that the IQ of college students, measured by means of a standardised test, has a mean of 110 and a standard deviation of 15. The researcher takes a random sample of college students and finds that the mean IQ is 120. Which of the following situations provides the strongest evidence that the mean IQ of his students is greater than 110? (Note: n = size of sample.)
n = 100 larger samples are more likely to describe the population mean accurately.
3. A street magician shows you a deck of 52 playing cards and asks you to randomly pick a card from the pack, without showing him what card you take. He then correctly informs that you picked an ace. What is the probability that he could have guessed correctly simply by chance?
1/13 There are four aces in an ordinary deck of 52 cards. The probability of picking an ace by chance is P(ace) =4/52 . = 1/13
Suppose that the population distribution of a dependent variable is assumed to be normal, with a mean of 50 and a standard deviation of 10. Suppose, further, that samples of the same size are drawn randomly from the population with replacement. 2 Can the frequency distribution of all the samples be fully specified? There could be an infinitely large number of samples, each with its own mean and standard deviation.
2 Yes, provided that the population mean is specified (hypothesised) and the distribution of sample means can be assumed to be normal. The latter assumption may be made if it can be assumed that the population has a normal distribution, or if a very large sample is selected from the population.
14. A national survey of college students indicates that students drink an average of 4.1 alcoholic beverages per week. A researcher randomly selects 30 college students and asks each one how many alcoholic beverages he or she consumes per week. The researcher finds that the students surveyed drank 182 alcoholic beverages during the week. What is the population mean?
4.1 Our best guess about the population mean is given by the national survey, because it presumably involved a very large sample of students. We can, therefore, assume that the population mean is 4.1
15 A national survey of college students indicates that students drink an average of 4.1 alcoholic beverages per week. A researcher randomly selects 30 college students and asks each one how many alcoholic beverages he or she consumes per week. The researcher finds that the students surveyed drank 182 alcoholic beverages during the week. What is the sample mean?
6.1
the following can all be described using a binomial distribution:
= A diagnosis of patients as either diabetic or non-diabetic. = Student answers classified as either correct or incorrect on a multiple-choice question. = A participant in a parapsychology experiment guessing a card correctly or incorrectly. = The gender of a newborn baby born in a hospital (i.e. either male or female). = Psychology students passing or failing their statistics course. = People watching a movie who either like it or don't like it.
The formula for normal distribution produces distributions that are all bell-shaped, but the actual shape of the curve , how high it is or how spread out it is, depends only on the mean and the standard deviation of the distribution concerned. They share a number of key properties, such as the following
= They are bell-shaped. The most observations occur at the midpoint of the curve. = They are symmetrical. The left side is a mirror image of the right side. = They are continuous. Theoretically, the values which the variables can assume are infinite and are measured on a truly continuous scale so that the curve is smooth. = Their curves are asymptotic, which means that the two tails never touch the horizontal axis, moving ever closer to infinity, because there is always some probability that more extreme values will occur.
Binomial distributions can be symmetrical or skewed.
= Whenever p = 0.5, the binomial distribution will be symmetrical regardless of how large or small the value of the sample size, n. = when p = 0.5, the distribution will be skewed. If p < 0.5, the distribution will be positive or right-skewed; if p 4 0.5, the distribution will be negative or left-skewed. The distribution will become more symmetrical as p gets closer to 0.5 and as the sample size, n, gets larger.
The probability of an event not happening is
(1 - p(E)).
Suppose that the population distribution of a dependent variable is assumed to be normal, with a mean of 50 and a standard deviation of 10. Suppose, further, that samples of the same size are drawn randomly from the population with replacement. 1. What would the mean of the sample means approach as the number of samples of the same size that are drawn approaches infinity?
1. It will approach the numerical value of the population mean.
8. A researcher at Unisa administers an attitude scale to a group of research participants. Their average score is 2, and the standard deviation is 2. Suppose a participant obtains a score of 2, what is her z-score?
0 Since the score that you obtained is exactly the same as the mean, we know that your z-score is 0, because it does not deviate from the mean at all.
probabilities fall in a range from
0.0 to 1.0 when expressed decimally,
9. The first exam in a statistics course yielded a normal distribution of scores with a mean of 35 and a standard deviation of 10. If you were to select the score of one student at random, what is the probability that the score would be 45 or above?
0.16 The z-score for a score of 45 is z= 45 - 35 / 10 = 10/ 10 = 1 The score, therefore, lies one standard deviation above the mean, so that 0.84 of the scores lie below it. The probability of a score of 45 or higher is, therefore, 1 - 0.84 = 0.16.
1. A researcher randomly selects a child from a group of 300 boys and 400 girls to participate in a research experiment. What is the probability that the child selected will be male?
0.43 To calculate the probability that the child will be a boy, we use our formula p(child was a boy.) = Number of favourable outcomes / Number of possible outcomes = 300/ 700 = 0:43 Therefore, there is a 0.43 probability that the child will be a boy,
2. With reference to the question above, what is the probability that the child selected will not be a male?
0.57 Since we have determined that p(boy) = 0.43, we know that p(not boy) will be 1 ± 0.43 = 0.57. There are only two possibilities (boy or not boy) so one is the other subtracted from 1.
6. The standard normal distribution has a mean of ............... and a standard deviation of ............
0; 1 The definition of the standard normal distribution is that it has a mean of 0 and a standard deviation of 1.
Probability can be studied in three ways.
1 It can be approached in an a priori or classical manner in which the focus is purely on reasoning and mathematical deduction. 2 It can also be studied in an empirical or frequentist manner, where probability is analysed in terms of the relative frequency of an event's occurrence by actual observations and conducting experiments. 3 think of probability in a purely subjective manner, as a degree of belief in something happening.
The sum of the probabilities of all simple events in S (sample space) equals
1. This characteristic follows from two facts, namely, that (a) the sample space lists all the possible outcomes associated with a given statistical experiment, (b) when all the outcomes are summed together we have the maximum possible probability, which is 1.
the binomial distribution applies in all cases where a random variable has the following properties:
= random variable is for a sample that consists of a fixed number of experimental trials. Probabilities can be computed for trials of various lengths, but the length of the trials must be kept constant for the determination of each probability in the distribution. = The random variable has only two mutually exclusive and collectively exhaustive events, typically labelled as `success' and `failure'. = The terms `success' and `failure' apply to any outcome that has a binary character, such as `yes' and 'no', `hit' or `miss', `pass' or `fail', `0' or `1', `heads' or `tails', `correct' or `incorrect'. = The probability of an event being classified as a success, p, and the probability of an event being classified as a failure, 1 - p, are both constant in all the experimental trials. This simply means that the probabilities cannot change during the trials. = You cannot start a probability with one value and then later change this to 0.6. = The event (success or failure) of any single experimental trial is independent of (i.e. not influenced by) the event of any other trial.
Some interesting facts about this theorem should be noted:
= theorem gives the sample distribution of the sample means for any population, irrespective of the shape, mean or standard deviation of the original population. = distribution of sample means will become more normal as sample size (n) increases, so that with larger and larger samples the shape of the distribution of sample means will become increasingly normal in form. = In fact the distribution of sample means approximates a normal distribution very rapidly: by the time the sample size reaches n = 30, the distribution is very close to perfectly normal.
Continuous variables
= variables such as age, weight and length are continuous. = real numbers, which can take on any value, within whatever limits its values may range between
Suppose a population distribution is normal with m = 50 and s = 15. 3. Suppose a single case is selected from this population using a random selection process. What is the probability that the score will be greater than 70? What is the difference between `case' and `score'?
As we have already calculated for 2, p(x > 70) =0.92 A `case' is the particular entity being observed, whereas a `score' is a numerical value reflecting some characteristic of the case being considered.
5. A student writes in her research report that p(Hypothesis 1: true) 4 ±0,3. Upon reading this, her supervisor becomes angry. Why?
A probability cannot be negative. Probability values fall in the range 0 to 1, and are always positive.
17. Why is a probability distribution always of a theoretical nature?
A relative frequency can only be used as a probability distribution if we assume that probability theory applies. Probability theory is a theoretical approach that makes use of theoretical constructs such as probability distributions to describe empirical phenomena.
How would you derive a probability distribution for the mean?
Assume that the theoretical distribution of sample means of the same size, selected randomly from the population, is normal with a mean that is equal to the population mean, and a standard deviation equal to =p.n.. (if s is known). The distribution can now be transformed to the standard normal distribution if s is known. (Note: If s is not known, we shall have to use another distribution, as you will see in Topic 3 (i.e., the t-distribution).)
18. Suppose we randomly draw a sample of five scores from a population and calculate the mean for this sample. The same procedure is repeated 10 times. 1. Why are the mean values for the samples not the same?
Each sample provides a different estimate of the population because of random sampling error.
Does random sampling ensure a sample that is representative of the population?
No it does not, because of sampling error that will play a role even if random sampling is used. However, since sampling is random, it does make it possible for one to derive a probability distribution for a particular sample statistic, such as the sample mean. Given an hypothesised value for the population mean, we can judge the likelihood of our sample mean under the derived probability distribution of means.
Suppose a population distribution is normal with m = 50 and s = 15. 4. What is the probability that a single case selected randomly will have a score between 45 and 55?
For x = 45 we find z = 45 - 50/15= 0.33 And for x = 55 we find z= 55 - 50/ 15 = 0.33 These two values are therefore an even distance away from the mean (of z=0) on the standard normal distribution. We can, therefore, look up p(x > 45) = p(z > ±.33) which is 0.6293, and subtract from it p(x > 55) = p(z > 33) which is 0.3707 (the larger portion and the smaller portion respectively for z = 0.33 on the standard normal distribution tables in Appendix D). This gives p(45 5 x 5 55) = 0.6293 ± 0.3707 & 0.26.
central limit theorem formula
If a simple random sample of size n is selected from a population with mean μ and standard deviation σ, the sampling distribution of means obtained from all possible samples is approximately normal with mean μ and standard deviation σ/ √n
4. Select the statement below that provides the most accurate formulation of the law of large numbers:
If a statistical experiment event is performed a large number of times, a specific outcome will tend to converge on its theoretical probability.
law of large numbers, states the following:
If an experiment is done repeatedly, and if the outcomes are independent of one another, the observed proportion of favourable occurrences of an event will eventually approach its theoretical probability.
Suppose a researcher transforms each score in a non-normal population to a z-score. Will these scores be normally distributed? Are the z-scores in the z-tables normally distributed?
No. The z-transformation does not change the shape of the original distribution. The z-scores in the z-tables are normally distributed because the z-table specifies, by definition, the standard normal distribution.
Why is the central limit theorem important?
It often happens that we doubt the assumption that the population distribution is normal. However, the central limit theorem states that, for a large sample size, the sampling distribution of a mean is close to normal, irrespective of the shape of the population distribution of the original data. This enables us to make inferences about means and develop test statistics for means.
a normal distribution.
Many of the scores that we use are also clustered around the average, and tail off to the ends of the distribution. Because it can be used to describe the distribution of many naturally or 'normally' occurring continuous variables also commonly referred to as the normal curve, because the distribution can be plotted by a bell-shaped curve,
Because the concept of a sampling distribution is rather abstract, it is useful to consider a simple example.
Suppose that an entire population consists of only five medical residents working in the emergency section (ER) of Johannesburg General Hospital, and that we want to determine the average age of the residents. We could proceed by selecting a sample of two residents, determine their individual ages, calculate their mean age and use this to estimate the corresponding population parameter.
When are we required to consider events as being interdependent?
The Lotto is an example of such a situation: if you draw one number, you cannot draw the same number again. The fact that a number that was drawn in the Lotto cannot be drawn again affects the probability of the next number to appear in the sequence of numbers.
two important rules for combining probabilities,
The additive rule multiplicative rule
18. Suppose we randomly draw a sample of five scores from a population and calculate the mean for this sample. The same procedure is repeated 10 times. 2 Given the 10 samples plus the fact that the mean of the population distribution is unknown, what would the best estimate of the population mean be?
The best estimate of the population mean can be obtained by calculating the mean of the 10 means. NB: The 10 means can now be considered in the same way as any set of 10 scores, and we might be interested in the mean, standard deviation, frequency distribution, et cetera, of this set of scores.
16. A university researcher is interested in the incomes of her psychology graduates. A national survey shows that university graduates (from all departments) earn R127 500 on average, per year, with a standard deviation of R30 000. The researcher believes that her college's alumni make more than R127 000 a year (i.e. the researcher believes that the population of psychology students is different from the population of all university students). The researcher says: `I talked to two graduates [a sample of 2] and they both make over R200 000 a year! Obviously, our students make more than R127 500 per year after university.' Do you agree with this researcher? If not, indicate why not. Cite statistical evidence in support of your answer (but no calculations are necessary).
The researcher is generalising on the basis of the two students who were questioned about their salaries. The researcher's sample (only two students) is probably too small to make valid inferences about the population of graduate students.
10. Which of the following statements about population parameters is the most accurate?
They are usually unknown. Population parameters are usually unknown and have to be inferred from sample data.
one form of the normal distribution that is of special importance.: standard normal distribution/ z-distribution curve
This curve has a mean of μ = 0 and a standard deviation of σ = 1
Suppose a population distribution is normal with m = 50 and s = 15. 5. Suppose 25 cases are randomly selected from the population and the mean for this sample is 45. Is it possible to obtain a sample mean of 45 when one assumes that the population mean is equal to 50? Is it possible for the researcher to derive a theoretical distribution of the mean without selecting even a single sample?
Yes, any result is possibly due to sampling error. However, some results (sample means) in this example will be less probable than others. It is possible to derive a theoretical distribution for the mean provided one knows the population standard deviation and hypothesises a value for the mean of the population.
The classical approach to probability theory h
as its origin in games of chance, and is used to help us estimate the likelihood of something happening based on reasoning alone. The approach works by analysing something happening in terms of all the possible outcomes associated with that something. The `something happening' is called an event. In statistics an `event' could be almost anything.
13. When the sample size (n) decreases, the dispersion of the sample means
becomes greater. The spread or dispersion of the sampling distribution of means is given by the standard deviation of the sampling distribution of means (i.e. the standard error).
When decimal notation is used to describe probabilities, they fall in a range
between 0 and 1, with values closer to 1 indicating a greater likelihood (or chance of success) than values close to zero.
The central limit theorem gives a precise description of t
he distribution that you will obtain if you selected every possible sample, calculated every sample mean, and constructed the distribution of the sample mean. The importance of the theorem lies in the fact that we can use it to describe a sampling distribution without actually having to sample a population of raw scores `infinitely', and because of this we can calculate the extent to which any sample mean approximates the mean of the population from which it was drawn.
a probability can never be
higher than 1 or lower than 0. Note that a probability can be 0, but to say that a probability is 0 is actually the same as saying that the event is impossible and can never happen. Likewise, to say that the probability of an event is 1 is to assert that it is an absolute certainty. In actual practice, probabilities fall within these two extremes.
The probability value tells us at a glance
how frequent or infrequent the event is, and what the likelihood is of obtaining a favourable outcome associated with it. In the case of a game of chance such as playing roulette, or throwing a die, a calculation of probabilities will give us information about our chances of success.
the event
is a particular occurrence (e.g. the coin lands heads up) where various other events, called outcomes (i.e. results) are possible. we determine the number of ways in which the event that we are interested in can occur, and divide this number by all the possible events (i.e. outcomes).
7. A ............... is completely described by the mean and the standard deviation.
normal distribution Any normal curve can be generated provided that we know its mean and standard deviation. Populations and samples are not necessarily normally distributed, so that further information may be needed to describe them.
Probability=
number of favorable outcomes / total number of outcomes
Statisticians have derived a equation (or formula) which describes the normal curve, and have shown that it contains
only two variables, the mean (μ) and the standard deviation (σ), with the rest of its terms being constants.
Probability formula
p -denotes probability, E - represents the particular event of which we want to calculate the probability, and each of the different possible outcomes is assumed to be equiprobable (i.e. equally likely to occur).
The multiplicative rule states that
p(A and B) = p(A) x p(B) {+} where A and B are both independent events. {+} This rule is used to determine the product of two or more probabilities and is indicated by the word `and' (i.e. the probability of A and B). {+} we assume that the probabilities of the two events, A and B, are independent of one another. {+} in some cases a particular probability is conditional on something else happening. For example, the probability of event A occurring may be conditional on the prior occurrence of event B. {+} Conditional probabilities are written as p(B|A), where | indicates that a condition applies. p(B|A) is read as `the probability of B given A.' Likewise p(A|B) is read as `the probability of A given B', or equivalently, as `the probability of A happening on condition that B has occurred'.
The multiplicative rule that we use when we have conditional probabilities is
p(A and B) = p(A) x p(B|A)
The additive rule
p(A or B) = p(A) + p(B). {+} rule is used when two or more events are mutually exclusive. {+} rule is used to determine the sum of two or more probabilities, and is signalled by the use of the word `or' (i.e. the probability of A or B). {+} When dealing with events that are not mutually exclusive, a more general form of the rule is used, namely, p(A or B) = p(A) + p(B) ± p(A and B). {+} The more general rule allows for the possibility that there may be an overlap between the probabilities, which is why p(A and B) must be subtracted as shown above.
Suppose a population distribution is normal with m = 50 and s = 15. 2. Suppose the size of the population is 10 000. How many scores greater than 70 are there in the population?
p(z > 70) = 0.092 So we can estimate that 0.092 x 10 000 = 920 scores in the population should be greater than 70.
we often refer to the probability of an event (or statistic) as its
p-value. So you will find statements such as `the p-value of x is 0.02' Stating that a p-value is 0.02 means that there is a 0.02 probability that the particular result (value of x) can occur by chance.
A z-score is the original measurement transformed into a
point on a standard normal distribution. Therefore, all the characteristics of the standard normal distribution apply. For example, the size of the z-score always reflects the number of standard deviations that a particular score lies above or below the mean.
P(E)= f(E) / n
represents the probability of an event occurring, given a particular statistical experiment (which is also called a random experiment), and the probability is, therefore, estimated on the basis of the results of the statistical experiment.
z transformation
standard normal curve presents a standardised distribution of probability values, which is very useful in hypothesis testing. Any variable (x) that comes from a normal distribution can be transformed to its representation on a standard normal distribution, provided that we know the mean and the standard deviation of the variable scores. where x represents the variable, μ is the population mean, and σ the standard deviation of the population from which x was obtained.
gambler's fallacy is based on the assumption
that if a certain event has not occurred in a number of trials, its probability of occurring in the next trail increases. Thus someone might notice that a coin has landed heads up seven times in a row, and incorrectly thinks that it is now time for it to land tails up. He or she might then start betting on tails coming up. Here the mistake lies in not realising that whether the event occurs or not, its probability is not altered because each flip of a coin is an independent event and the probability stays the same (i.e. 0.5).
discrete variables,
that is, variables that take whole numbers (i.e. integers) as values.
The ability to predict the characteristics of a sample is based on the concept of
the distribution of sample statistics.
Probability distributions indicate
the likelihood of an event or outcome. ... p(x) = the likelihood that random variable takes a specific value of x. The sum of all probabilities for all possible values must equal 1. Furthermore, the probability for a particular value or range of values must be between 0 and 1.
Two events are said to be independent if
the occurrence of one has no effect on the probability of the other occurring. In the coin flipping example, the probability of the coin landing on its head each time is independent of the result of the previous flip (i.e. the coin has no memory).
Two events are said to be mutually exclusive if
the occurrence of one precludes the occurrence of the other. For example, if a single coin is flipped, the events heads and tails are mutually exclusive. The coin can fall with either heads or tails up, but not both at the same time.
We can use this distribution to calculate
the probabilities of certain events, using a relative frequency approach
if we know how events are distributed, we can determine
the probability that the event will occur. The standard normal distribution makes it possible for us to apply this knowledge to normal distributions in general. It provides us with a standardised scheme for interpreting a distribution of probabilities, as long as we know it is normal, or approximately normal.
sample space
the range of values of a random variable. (which we also call a population)
The sampling distribution of a statistic is
the set of all possible values of the statistic when all possible samples of a fixed size are taken from the population. refers to the variation of a statistic, for example, the sample mean x̄, from sample to sample. Note that here we are not concerned with the variation of individual elements in the sample, or individual elements in the population, but with the variation of a summary value (such as the mean) for a sample. refers to variation over a hypothetical set of all possible samples.
the standard deviation of the sample means is estimated by
the standard error of the mean. Like a standard deviation, the standard error of the mean tells us by what average amount the sample means deviate from the mean of the sampling distribution. It is an estimate of the size of the error we shall make if we use the mean of the distribution of sample means as an estimate of the true population mean
The relative frequency approach
the theoretical probability of an event occurring can be approximated by the relative frequency, or proportion of times that the event occurs p(E) = number of observations of E / number of times the experiment was performed or P(E)= f(E) / n where f denotes frequency, n the number of times the experiment is performed and f(E) the frequency of the events.
The purpose of sampling is to
use a relatively small number of cases to draw conclusions about a much larger group. The group you wish to study is the population and the group you actually involve in your research is the sample; in other words, the sample represents the population. Once you have obtained certain results based on the sample, you can go on to generalise (or apply) your results to the population.
You typically determine binomial probabilities by
using either a formula or a table of binomial probabilities, or statistical software (such as the Microsoft Excel function BINOMDIST).
two important rules for combining probabilities, and these are both influenced by
whether the events are dependent or independent.
binomial probabilities formula
x denotes the probability of the number of `successes' that we want to determine for the specific random variable, n stands for the number of trials, and p stands for the probability associated with any single, independent outcome of the variable.
Suppose a population distribution is normal with m = 50 and s = 15. 1. What is the raw z-score for a raw score of 70 in the population?
z = 70 - 50/ 15 = 20/ 15 = 1.33
A probability value represents
{+} a proportion (i.e. the proportion of outcomes supporting the event). {+} A proportion is a decimal number between 0 and 1 and indicates the fraction of the total.
a model can be
{+} a table with values, a computer program, a set of equations, or a formula. {+} The important property of such a model is that it can take particular values as input and then generate an output. {+} The model is, therefore, just a method or mechanism for calculating an answer.
Probabilities can be expressed as
{+} percentages (e.g. a 10% probability), {+} as fractions (e.g. a 1/10 probability), {+} as a decimals (e.g. a 0.10 probability). All these uses are quite commonplace, but in psychological research probabilities are typically written in decimal format. This is mainly because probability values in decimal format can easily be compared. We know that 0.008 is less than 0.009, whereas the difference between two fractions such as 13/27 and 14/31 is more difficult to evaluate.
In a binomial distribution
{+} random variable is discrete (a whole number or integer). {+} distribution has been derived mathematically and, therefore, the formula for it is known. {+} This makes it possible to work out the probabilities of specific outcomes without repeating complicated experiments a large number of times, as long as we know our variables are of a certain kind.