practice quiz questions u got wrong
NOMINAL
NOMINAL: These are sometimes called nominal variables because the variable consists of names for categories, like sex/gender, country of origin, model of car, etc. There is no order to these categories; any numbers assigned to them are ultimately arbitrary - that is, they can take any value according to the researcher's personal preferences. Also be aware that the coding system used for a nominal variable is arbitrary. For example, if you want to code a dichotomous variable - that is, a variable with only two possible values - for employment, you can use any of the methods shown in Table 1.1. It doesn't matter which version you use; the variable is always categorical. However, you may find that one coding system is easier than another to enter or interpret in a given situation, so make your choices accordingly.
for a normal distribution kurtosis is.... undefined 1 0 negative
0 --- because it serves as the defining point. leptokurtic has a positive value for kurtosis and plaotkurtic has a negative value for kurtosis
no matter what the shape of a distribution, if it is converted to zscore, then its standard deviation will be. same as the original distribution cannot be determined 0 1
1- is the answer. 0 is the answer for the mean.
A categorical variable
A categorical variable is one that, as its name implies, indicates different categories. Examples include: •Gender •College major •Experimental condition Categorical variables can be subdivided into two other common types of variables: nominal and ordinal.
continuous quantitative
A continuous quantitative variable is one that can theoretically be measured in infinitely small steps or what mathematicians call "an arbitrary level of precision." Examples include: •Physical distance between two people •Time spent working on a puzzle •The mathematical constant π (pi) You can record distance, for example, as •4 feet •4.1 feet •4.121738502767485960 . . . feet There is a common error on this point that I have seen even in respectable statistics books. Many people call a variable "continuous" when they should call it "quantitative." This is an error because a quantitative variable can also be discrete. Similarly, I have seen the word "discrete" used to describe categorical data. This too, is an error. These differences, however, are not major and as long as your audience knows what you are talking about, then you should be fine.
dependant variable.
A dependent variable is the variable being tested and measured in a scientific experiment. The dependent variable is 'dependent' on the independent variable. As the experimenter changes the independent variable, the effect on the dependent variable is observed and recorded.
histograms,
A histogram is the kind of chart that people use to make bell curves for quantitative variables. They are exceptionally helpful for getting a feel for a set of scores - how spread out they are, where the middle is, whether there are extremely high or low scores, etc. They also make it very easy to describe the shape of a distribution. Finally, histograms can be used for a simple kind of prediction, where the predictor variable is the one listed across the bottom and the thing being predicted is, for example, the probability of falling into a certain score grouping, which is given by the height of the bar for that score on the X variable, as will be seen below.
platykurtic
A platykurtic distribution, as shown in Figure 2.22, is relatively flat; "platykurtic" means "flat bulge." Think of the flat-tailed platypus of a flat-topped plateau. This kind of distribution can happen when you have "censored" values, which means that scores can't go above or below a particular value. As a result, it tends to have very few outliers. The value of kurtosis for a platykurtic is negative K= -1
population
A population is an entire group of people - or countries or cell cultures or companies - that you are interested in, such as "college students" or "Nongovernmental organizations (NGOs) in developing countries."
How to calculate the IQR (interquartile range) for a data set
A quartile is a fourth of the distribution. If the distribution is divided into four parts, you get five numbers, because the beginning of the first quarter is included. The results looks like this: •Q4 = 100th percentile = highest score = Xmax •Q3 = 75th percentile = three-quarters up •Q2 = 50th percentile = middle score = median •Q1 = 25th percentile = one-quarter up •Q0 = 0th percentile = lowest score = Xmin This is generally called the five-number summary. (By the way, the labels "Q0-4," for "Quartiles 0-4," are something I made up. However, I use them because they provide continuity. They are more commonly labeled "minimum," "median," and "maximum." Use whichever makes sense to you.) I generally find it easiest to report the five quartiles as shown in Table 4.1. However, there are situations when it is helpful to provide a single number that describes the variability of the distribution. In these situations, it is common to report the interquartile range, or IQR. The IQR is a measure of variability that corresponds to the median. Both the median and the IQR are based on percentiles and both are robust measures. There are only two numbers involved in computing the IQR: the third quartile number, also known as Q3 or the 75th percentile score, and the first quartile score, also known as Q1 or the 25th percentile scores. The IQR is just the difference between these two: IQR = Q3 - Q1 As mentioned just above, the IQR is a robust measure of variability. That means that it is not easily affected by outlying, open-ended, or undefined scores, as long as these problematic scores constitute less than 25% of the values on either end.
DEFINE zscore
A z-score simply indicates how far away a score is from the distribution's mean in terms of standard deviations. For example, if a person takes a test and gets a z-score of +1, then their score is one standard deviation above the mean for all the people who took that test.
an independent variable
An independent variable is the variable that is changed or controlled in a scientific experiment to test the effects on the dependent variable.
an outlier
An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. In a sense, this definition leaves it up to the analyst (or a consensus process) to decide what will be considered abnormal. Before abnormal observations can be singled out, it is necessary to characterize normal observations.
Bar charts,
Bar charts are one good way to display categorical variables where you are showing, for example, the number of people (as either a frequency - the raw number - or as a percentage) in a particular group. They also work well for showing the means of quantitative variables, which we will discuss in a later chapter. Bar charts are easy to interpret and nearly everybody has experience with them. However, some people think that they're boring and try to spice them up with all of the effects that their computer offers, like this example in Figure 2.
DISCRETE:
DISCRETE: There is one further distinction that exists but usually doesn't matter. A discrete quantitative variable is one that can only take specific values. Examples include: •The number of children in a family •The number of times a person has been to Brazil Although this subcategory doesn't usually make a big difference, it can make certain charts like histograms or scatterplots (which we will discuss later) look strange. There are, however, ways to deal with that, which we will discuss later.
degrees of freedom
Degrees of freedom Degrees of freedom are often broadly defined as the number of "observations" (pieces of information) in the data that are free to vary when estimating statistical parameters.
Deviation values (or deviation scores)
Deviation values (or deviation scores) •(X - μ) is the difference between one individual's score and the population mean. This is also called their deviation score. If a person's score is above average, then they will have a positive deviation scores (e.g., 115 - 100 = +15). If their score is below average, then they will have a negative deviation scores (e.g., 85 - 100 = -15).
example of independant and dependant variable.
For example, a scientist wants to see if the brightness of light has any effect on a moth being attracted to the light. The brightness of the light is controlled by the scientist. This would be the independent variable. How the moth reacts to the different light levels (distance to light source) would be the dependent variable.
INTERVAL
INTERVAL : To be technically correct, the first two examples of quantitative variables above are called interval variablesbecause they indicate the size of the difference between scores but they don't have zero starting point, either because they don't have a zero (as with IQ) or because you can go right past zero into the negative numbers (as with temperature). Because of this, it isn't possible to say that one value is twice as much as another (e.g., that 80 degrees is twice as hot as 40 degrees). One situation where it is common to hear the word interval is in races where the time for the leader or winner is given and then intervals, or the time behind the leader, are given for all the competitors who follow.
which measure of variation of least sensative to outliers? IQR standard deviation range all are the same
IQR: because the IQR ignores the ends of the data where the outliers usually lie, it is THE LEAST SENSITIVE TO OUTLIERS. and it wont even show that. it throws the outleirs out.
unimodal
In statistics, the mode of a distribution is the most frequently occurring score. In a histogram, this shows up as the highest point or hump of the distribution. If there is only one obvious peak, then the distribution is called unimodal, as in "one mode," like the normal distribution we saw earlier. Figure 2.14 gives an example of a unimodal distribution.
Why samples have different formulas that population
In the sample, though, the sum of squared deviations is not divided by N but rather by n-1, which is known as the degrees of freedom (df). (Also note that an upper case N is used for the population and a lower case n is used for the sample.) The change is because the variance and standard deviation are biased statistics. That is, if the population formula were used with sample data, the result would consistently be too small. This is why the denominator in the sample formulas is n-1 instead of n (that is, the one less than the sample size).
leptokurtic
K=+1 Finally, a leptokurtic distribution, as shown in Figure 2.23, is narrow and pointed, with long tails, compared to the normal distribution, as "leptokurtic" means "narrow/thin bulge." Think of the long tails of p. 20 p. 21 "leaping" kangaroos or a cliff that a skydiver can leap from. Leptokurtic distributions have very little central variation but lots of outliers, which is probably the most important thing about these distributions. Leptokurtic distributions have positive values for kurtosis. Values of kurtosis for each shape (e.g., K < 0, K = 0, K > 0)
when a distribution of raw scores is converted to zscores the mean and the standard deviation of the distribution will be... the same as they were before different but unpredictable M=0 SD=1 M=1 SD=0
M=0 SD=1 NO MATTER WHAT IT LOOKS LIKE. THIS IS WHAT IT WILL ALWAYS BE.
Sensitivity of each measure to outliers
Mean - most affected median - affected somewhat Mode - least affected
Minimum levels of measurement needed for each measure
Mean: interval, Mode: nominal, Median: ordinal.
ORDINAL
ORDINAL There are also ordinal variables or rank variables, which indicate first, second, third, and so on. Examples include tallest to shortest, first to last, fastest to slowest, etc. Although there are special statistical procedures for ordinal variables, they are difficult to deal with. As such, ordinal variables are usually treated as categorical variables without doing much damage to the data.
bimodal distributions
On the other hand, a distribution could have two pronounced peaks or humps, in which case it is called bimodal, as in "two modes." Figure 2.15 shows an example of a bimodal distribution. An important thing to note is that when you have a bimodal distribution, what usually happens is that there are actually two unimodal distributions that got combined, as shown in Figure 2.16. In this example, there is a narrower normal distribution on the left (shown by the red curve), and a wider one on the right (shown by the blue curve). (The two curves are different heights because they are spread out differently.) When the two different distributions are combined, the gray histogram results, which has a bimodal shape.
negativley skewed
On the other hand, a distribution may have the extreme scores on the low end, which is called negatively skewedvi or left skewed, again, because that's where the unusual scores are. This is what the distribution for infant health looks like (as most infants are relatively healthy but a smaller number are very sick), as well as the distribution of college GPAs (as people who do poorly tend to drop out). It looks like Figure 2.20.:
positivley skewed
One important thing to look for in a histogram is whether the data are symmetrical or whether they are skewed one way or another. For example, a normal distribution like we saw earlier is completely symmetrical. On the other hand, a distribution might have most of the scores at the low end and a few especially high ones. This kind of distribution often happens with things relating to money, such as income, where most people earn a small or moderate amount of money each year but a smaller number of people earn an enormous amount. This is called a positively skewedv distribution, or skewed right, because that's where the extreme scores are, as in Figure 2.19.
bell curve
One of the most prominent characteristics of a distribution is its general shape. For example, one of the most important distributions in statistics is the bell curve, which is technically known as a "normal distribution." ("Normal" has a special meaning in statistics and always refers to bell-curve shaped distributions.) A normal distribution is curved, with a single hump or mode in the middle, and symmetrical tails, as shown in Figure 2.10.
values of a normal distribution
One of the most prominent characteristics of a distribution is its general shape. For example, one of the most important distributions in statistics is the bell curve, which is technically known as a "normal distribution." ("Normal" has a special meaning in statistics and always refers to bell-curve shaped distributions.) A normal distribution is curved, with a single hump or mode in the middle, and symmetrical tails, as shown in Figure 2.10. Kurtosis =0 skewnewss=0
Quantitative variables
Quantitative variables are ones where you can measure the size of the differences between scores and not just that they are different, as with categorical variables, or that they differ in rank, as with ordinal variables. Examples include: •IQ scores •Temperature (in Fahrenheit or Celsius) •Age •GPA •Time to complete a task Quantitative variables can also be subdivided into two other common types of variables: interval and ratio.
LEVELS OF MEASUREMENT
RATIO INTERVAL ORDINAL NOMINAL
Which contains the most/least information
RATIO- category, zero, distance, order INTERVAL - Cat, zero, distance ORDINAL- cat, distance NOMINAL, cat
Coding for nominal variables
Table 1.1 Possible Coding Schemes for a Dichotomous Variable These are sometimes called nominal variables because the variable consists of names for categories, like sex/gender, country of origin, model of car, etc. There is no order to these categories; any numbers assigned to them are ultimately arbitrary - that is, they can take any value according to the researcher's personal preferences. Also be aware that the coding system used for a nominal variable is arbitrary. For example, if you want to code a dichotomous variable - that is, a variable with only two possible values - for employment, you can use any of the methods shown in Table 1.1. It doesn't matter which version you use; the variable is always categorical. However, you may find that one coding system is easier than another to enter or interpret in a given situation, so make your choices accordingly.
calculate the range
The easiest way to describe the variability of a distribution, but not the ideal way, is to simply give the high and the low scores. If you want to give a single number, you can give the difference between the two, which is called the range. The range, which is the difference between the highest score (also known as the maximum, Xmax, or Q4, for "quartile 4") and the lowest score (the minimum, Xmin, or Q0, for "quartile 0"), like either of these: Range=Xmax−XminRange=Q4−Q0 The range isn't a very good measure of variability, though, because it can be easily changed by high or low outliers. On the other hand, if the distribution is at least a little well-behaved, then it's a nice way to get a ballpark for the variable. For example, student apartments near my school have monthly rents between $250 and $700.
kurtosis
The final characteristic of distributions that we'll discuss is kurtosis, which comes from the Greek word κυρτός, kyrtos or kurtos, which means "bulging." Kurtosis has to do with how flat or pointed the distribution is compared to a normal (i.e., bell curve) distribution. In practical terms, the thing that most influences kurtosis is the presence of outliers, as outliers will give the distribution unusually long tails. This has the effect of making the middle part look relatively narrow and pointed.
When each measure works best
The mean and the sd work best with bell curves If things get skewed or have outliers the median works best and the IQR And if you have a nominal then its the mode that works best.
Effects of open-ended and undefined scores
The mean cant handle it The range cant handle it The median CAN handle open ended scores The mode CAN handle open ended scores
the mean
The mean is the average, the same one that everyone is familiar with. Unlike the mode and the median, the mean has different symbols and formulas for samples and populations. The symbol for the mean of a population is µ, which is a Greek mu - pronounced "myoo" - or a lower case Greek letter m. The mean for a sample, on the other hand, is created either by placing a bar over the name of the variable, so that the mean for x is ¯x, or, as the American Psychological Association prefers, just a capital, italicized M. This is the symbol that I will generally use because it is easier to type. The mean has several advantages: •It is easy to understand. •It is the basis of many other statistics that we will cover. •It is efficient. There are, however, some important disadvantages to the mean: •It requires interval or ratio level data, which makes it the most restrictive measure of the three. •It can't handle non-normal distributions, which means that the data must be symmetrical and with one mode (i.e., the high point of the distribution). •It can't handle outliers (or very many) because, compared to other measures of center, the mean is the most easily distorted. •It can't handle open-ended scores. •It can't handle undefined scores.
M and SD of distribution
The mean of the z-distribution is always 0. •The standard deviation of the z-distribution is always 1.
median
The median is the middle score with 50% of the distribution above and 50% below. Sometimes people use the abbreviation Mdn as a symbol for the median, although it's more common to just write out the word "median." The median has several important qualities as a measure of central tendency. •It is easy to understand. •It can handle non-normal distributions. •It can handle outliers because it relies on just the middle score. •It can work with ordinal data because it is an ordinal statistic. That is, you first put all of the data in order and then count in from the ends to find the median. •It can handle open-ended scores, for the same reason. •It can handle undefined scores, for yet again the same reason. p. 26 p. 27 As with the mode, there are a few catches. •It is not very efficient. In fact, it needs one-third more people to reach the same precision as the mean (which we will discuss next). •It doesn't lend itself to inferential statistics. We'll address this issue later, but it has to do with the fact that there is no easy formula to go from the sample to the population and it requires an iterative process. We will not cover that process in this book. The median's ability to deal with statistical aberrations is one of its great advantages. This is one of the reasons that whenever there are skewed data, such as income or house prices in which most of the data are at the low end but a few are at the very high end, the median is typically reported.
mesokurtic
The normal distribution, shown in Figure 2.21, is called mesokurtic, which means "middle bulge." If you were to calculate its score on the kurtosis statistics, the normal distribution would have a value of zero. K=0
mode
The simplest measure of center is the mode. In a sense, it's the easiest one of all - it's just the most common scores or the most repeated one. There is no symbol for the mode - just write "mode = 3" - and there is no difference between the sample and the population modes. The mode, as commonly construed, has several advantages as a measure of center or representative categories. •It is easy to understand. •It can handle open-ended scores, which means scores like 1, 2, and 3+. The "3+" is the open-ended part. However, because the mode only counts whatever appears most often, a potentially unrepresentative open-ended score could end up being the mode, which would be misleading - and therefore bad - so watch your coding. •It can handle undefined scores, which would happen, for example, if a person in a race never actually finished it and received no time at all (although sometimes this kind of data is still very important so you don't want to throw it out). •It can handle nominal data for the same reason. This makes it unique, because it is the only measure we will cover that can be used for unordered categories, which is another name for nominal data. In this case, though, it's not measuring center but, rather, something like representativeness. The mode also has a few disadvantages. •It is not efficient, which means that it requires more observations or people to have the same level of precision (e.g., ±5 points) as other measures of center. •It doesn't lend itself to inferential statistics. This refers to the process of taking sample results and extrapolating them to their populations of origin. None of the procedures that we will cover in this course allow us to make inferences about the mode.
describe ratio
Variables have equal intervals between values, the zero point is meaningful, the numerical relationships between numbers is meaningful. Examples: weight, pulse rate, respiratory rate. the last three examples, however, have definite zeroes that indicate the complete absence of something. These are called ratio variables and you can say that one value is twice as much as another (e.g., 10 minutes is twice as long as 5 minutes). However, the distinction between these is usually irrelevant for most analytic purposes and so it's easier to just call both of them quantitative variables. Whenever possible, you should use quantitative variables instead of categorical variables because quantitative variables make a lot more statistical procedures possible
define z-score
What a z-score means A z-score simply indicates how far away a score is from the distribution's mean in terms of standard deviations.
uses of the range
Whenever you are giving the difference high scores and low scores
what is an undefined score? a score for a task that was started but never completed a score that was recorded incorrectly a score for a task that the researchers did not administer a score for an unbounded range of values like 9+
a score for a task that was started but never completed ITs undefined because they participated they started but they didnt finish not in the time that you were looking at. that data can still be very importnat in that data you dont want to throw it out. but it can be problematic bc you cant find hte mean with that.
in a normal distribution which measure of central tendency has the highest value? mean median mode all are the same
all are the same.... because its a bell curve and bc its symmetrical it is ALL THE SAME.
box plots
and boxplots A boxplot is a way of looking at an entire distribution at once. Any time that I am working with at a quantitative variable, I make two charts: (a) a boxplot, which we'll talk about here, and (b) a histogram, which we'll talk about in the next section. The good things about boxplots are: •It's easy to judge symmetry. •It's easy to see outliers. Boxplots get their name from the box in the middle of the chart; that box shows the scores that mark off the middle 50% of the distribution by starting at 25th percentile score (or first quartile) of the distribution and stopping at the 75th percentile score (or third quartile). The line in the middle of the box is the median, or the value that splits the distribution into two equal sized groups of people. The lines on the left and right of the box go out to the lowest and highest non-outlying scores, while the circles are used to show outliers. Outliers, which are unusually high or low scores, play an important role in data analysis because they can dramatically distort many common statistics. While there are many ways to determine if a score is an outlier, one of the most common and effective ways is based on the size of the box in the middle of a boxplot. As the chart below shows, the width of the box gives something called the "interquartile range." (This will be discussed more in detail in a later chapter.) If that interquartile range is multiplied by 1.5 and then tacked on to the top and bottom of the box, then anything further away than that is considered an outlier and should be given special attention or removed in any data analysis. Figure 2.8 shows the anatomy or components of boxplots. The boxplot is shown with no fill and I have overlaid a dotplot of the data that has been "jittered," or slightly scattered so points don't lay on top of each other. I like to draw boxplots horizontally because that puts the scale in the same orientation as the scale in a histogram, which makes it easier to compare the two. On the other hand, you can also draw boxplots vertically.
Minimum level of measurement for a bar chart
bar chart,--- nominal
which type of chart would be most appropriate for a nominal variable histgram scatter plot bar graph box plot
bar graphbc you use a bar to indicate how many people are in each category.
the variable that is measured as the outcome is an expiriment is called the..... variable. dependent independent quasi experiemtnal pattern
dependant: the varaibles scores depends on what happens in the expiriement. it is the outcome variable. the independent is the one that is manipulated. it doesnt depend on anything you just make it what it is.
inferential statistics are typically contrasted with ..... statistics a) representative b) hypothesized c) descriptive d) null
descriptive: you are actually saying what is immediatly in front of you. an inferential statistic is when you are trying to past of what you talking about.
How each is affected by skewness and outliers
if your distribution is perfectly symmetrical and unimodal - that is, it has one clear peak - then all three measures - the mean, the median, and the mode - will have the same value right at the center. When the distributions are skewed, however, the three measures separate in predictable ways. The mode stays at the peak value but the mean follows the extreme scores. The median is in between the two. For example, Figure 3.2 shows the three values in a positive skewed distribution, where the extreme values or outliers are on the high end of the distribution. In this case, the mode is the furthest to the left and has the lowest value p. 29 p. 30 on the outcome variable. (Do not get confused: the mode is defined by the score with the highest frequency, so the vertical measurement helps to locate the mode, but because it is to the left of all the others, it has the lowest value or score.) The median is to the right of the mode. Finally, the mean is slightly to the right of the median, giving the mean the highest value on the outcome variable, even though it has the lowest frequency of the three measures.
in an expiriement, the varialbe that the researcher manipulates is called the..... variable dependent skewed independent observational
independant bc its the one that you control. its the outcome depends on it. the outcome varialbe.
Minimum level of measurement for a histogram
interval
if the number of games a team has lost in a season is subtracted from the number of games that they won in that season, the resulting varaible would be at the .... evel of measurement
interval bc it can be negative or positive.
what is the minimum level of measurement needed to calculate the mean? nominal ordinal interval ratio
interval: bc you have to know how far apart they are.
which measure of central tendency is most influenced by outliers? mean mode median all are the same
mean
the 3 most common measures of centeral tendency are nominal ordinal ratio inferential, descriptive, explanatory mean median mode variance standard deviation IQR
mean median and mode bc (nominal ordinal and ratio) those are 3 of the 4 common levels of measurement.
in a positivley skewed distribution which measure will generally have the highest value? all are the same median mode mean
mean: it follows the outliers
Which of the following is not a condition for causality? association temporal precedence representativeness the elimination of alternate explanations
means the data from your sample is a good representation or stand in for the population in which it came from.... thats good to have but its not what affects causial implications... like finding out causes x and y. there has to be temporal precedence... cause comes before effect.
which of these measures is least affected by outliers? mean median standard deviation geometric mean
median
which measure of central tendency is least efficient mean mode median all are the same
mode: bc it can bounce arround depending on the quirks of the data.
in a negatively skewed distribution, which measure will generally have the highest value? mean mode median all are same
mode: bc it stays put.
which measure of cneteral tendency works for nominal variables median mode mean no measures
mode: bc you can still say how common a nominal variable occurs.
if you have outliers on the far left it is....
negativley skewed
a frequency bar graph would be most appropriate for which measurement scales? nominal or odinal ordinal or ratio nominal or interval interval or ratio
nominal or ordinal use a bar graph bc you cant specify how far apart they are. you seperate them bc you cant specify how far apart each are.
if a persons sex is coded as male=0 and female =0 then that variable is at the ...... level of measurement nominal ordinal interval ratio
nominal, sex or gender is smply this or that it is a categorical thing. just bc it has numbers on it doesnt change the fact that it is not nominal.
Minimum level of measurement for a boxplot
ordinal
what is the minimum level of measurement needed to calculate the median? nominal ordinal interval ratio
ornidal bc you need to put them in order
a distribution such as income that most of the people at the bottom or middle but few people with extreme high scores is referred to as bimodal negativley skewed positively skewed unprespresetnative
positively skewed bc its the outliers that determine the skewness. in this case its the people on the high end that makes the skewedness (bc they are outliers there are less of them)
if you have outliers on the far right it is....
positivley skewed
when open-ended scores are present in the data, which measure of variability is most appropriate? standard deviation quartiles (IQR) range median
quartiles (IQR)
when data are negatively skewed which measure of variability is most appropriate? standard deviation quartiles (IQR) range median
quartiles (IQR) while you have outlying scores, this wouldnt be invluded in the IQR... it makes it the best option
which measure of cnetral tendency is most influenced by outliers? variance range standard deviation IQR
range it can be thrown off completely by a single outlier. the outlier determines how skewed the range will be.
The number of students enrolled at UVU each year is an example of what level of measurement? interval ratio ordinal nominal
ratio bc the number of students starts at zero and you can say that 2 times the people enrolled this year than the last year. it has an absolute zero. if it was interval it would explain how far apart other schools are compared to the other.
which measure requires a degrees of freedom cacluation? sample variance population standard deviation sample IQR population range
same variance: the population equations dont require the degrees of freedom
which of the following is not a measure of central tendency? mean median mode standard deviation
standard deviation bc its a measure of variation.
for a nomral distribution which measure of varaiblility is most efficient (in the statistical sense?) standard deviation quartiles (IQR) range median
standard deviation: efficiency means getting alevel of a certian level of precision being more accurate with a smaller number of people... this works BEST for a normal distribution
when a distribution of raw scores is converted to zscores the shape of the distributions .... flattens out to the uniform distribution approaches a normal distribution is unpredicle stays the same
stays the same. it is just standardized. it is just noted differently.
a normal distribution is categorical symetrical bimodal uniform
symmetrical: its the same on both sides. bell curve is a mirror image on each side.
one visual difference between a bar chart and a histogram is that in a histogram... the bars indicate group membership the high of bars indicates the value o X the bars can be placed in any order the adjacent bars touch but in a bar chart they are seperate.
the bars can be placed in any order the adjacent bars touch but in a bar chart they are seperate: a histogram is put together bc its a quantitaive variable. you are making these bins in a bar chart they are seperate, they can be changed because u can put women before men and men before women it doesnt matter.
if formula for the population standard deviation were used with sample data then result would be.... too big too small indentical impossible to calculate
too small. look at the formulas compared to the sample and the population. populations are theortically infinite. if you use the population formula with the sample it would be too small.. by changing the denomonator by subtracting 1 it increases the overall value
when scores in a data set are very close to one antoher then the variance will be cannot be calculated will be baised for samples will be based on n>50 will be close to zero
will be close to zero
if scores in a data set are very different from eachother, them the standard deviation will be..... cannot be calculated will be baised for the sample will be low will be high.
will be high when they are very different, there will be a lot of spread between them.... when they are far apart, thgere will be a lot of distance between them and the standard deviation will be high.
a sample
•A sample is a group of people that come from the population and that you are actually able to get data from, such as "132 local students enrolled in Introductory Psychology classes who completed a survey" or "40 NGOs in Southeast Asia listed by the United Nations."
What it means when variability statistics are high vs. when they are low
•Ease of prediction. It is easier to predict a person's score on a variable when the scores are tightly distributed (i.e., when the scores have less variability). •More significant hypothesis tests. We won't deal with this for a few more chapters, but distributions with less variability generally make it easier to get statistically significant results when conducting hypothesis tests.