COGS 14B Midterm 1

¡Supera tus tareas y exámenes ahora con Quizwiz!

Standard Normal Curve

Z-score distribution that nearly any normal curve can be translated into if you try hard enough µ=0, σ=1

Sample Number Symbol

n

Raw Score

xi=ziσ+μ

Sample Mean

Population Mean

μ

Population Standard Deviation

σ

Population Standard Deviation Formula

σ=(√SS)/N

Sum of Squares

∑((Xi-x̅)^2)

Histogram

(Bar) Graph depicting the frequency of (a study's) results

You collect a number of observations of eye color in your COGS 14B section. You find that 10 people have brown eyes, 3 people have blue eyes, 1 person has green eyes, and 2 people have hazel eyes. The central tendency of eye color for this group is ____

Brown

Descriptive Statistics

Data for sample population, makes concrete estimates

What is the name of the property or characteristic that changes in response to the manipulation (treatment) of a different property or characteristic in an experiment?

Dependent Variable

Range

Describes data spread Highest Score - Lowest Score Extremely vulnerable to outliers

"According to calculations of average annual global temperatures performed independently at NASA and NOAA, the last five years have been the warmest on record." As stated, this is best seen as an example of __________ Statistics.

Descriptive

A clinical paper presents a case study of memory loss in a patient following a major stroke. The paper reports that they are able to recall only 5% of items from a list of words presented a day earlier. This specific result is an example of ___ Statistics.

Descriptive

Grouped Frequency Distribution

Distribution showing the frequency of "bins" (categories containing specific value sets)

Ungrouped Frequency Distribution

Distribution showing the frequency of each data point/value

Cumulative Frequency Distribution

Distribution showing the number of values in the category plus all other preceding categories compared to the full dataset The last category should = n or N

Relative Frequency Distribution

Distribution showing the proportion (decimal number) of a category compared to the full dataset Should total to ≈1

Grouped Relative Frequency Distribution

Distribution showing the proportion (decimal number) of each bin compared to the full dataset Should total to ≈1

Relative Cumulative Frequency Distribution

Distribution showing the proportion of values in the category plus all other preceding categories compared to the full dataset The last category should ≈1

True or False: A single outlier will not affect your calculated mean for a sample.

False

True or False: A theoretical construct can be directly observed in an experiment and is a valid dependent variable.

False

True/False: When building a grouped frequency distribution, you can combine bins of data that are nominal, ordinal, or interval/ratio.

False

Sampling Error

Inevitable differences between the sample and population due to random error

IQR

Interquartile Range Q3-Q1 Describes spread of the middle 50% of data Not affected by outliers

Mode

Most occuring datapoint(s) Used for nominal, ordinal, interval, & ratio data

Population Number Symbol

N

In a _________-skewed distribution, we would expect the median to have a greater value than the mean.

Negatively

You are conducting an experiment to determine how types of exercise will affect the amount of physiological stress in a group of participants. Amount of physiological stress is determined by measuring the amount of cortisol (a stress hormone) in the blood. You randomly assign your participants to three conditions: aerobic exercise (treadmill), anaerobic exercise (weight lifting), and sedentary (control). You measure levels of cortisol in the blood (in micrograms per deciliter, mcg/dL) before the experiment and after three weeks of participation. What is the level of measurement for the independent variable?

Nominal

Degrees of Freedom

Number of values that can vary in a dataset to properly reflect something ***** For SD, =n-1

Unbiased Statistic

On average correctly estimates the population parameter e.g., x̅ as an estimate for µ

Biased Statistic

On average over/underestimates the true population parameter e.g., SD as a measure of σ without the (n-1) correction in the denominator

A journal article reports the systolic blood pressure (measured in millimeters of mercury - mmHg) due to levels of stress in a randomly selected sample of people. The subjects are randomly assigned to categories of high, medium, and low levels of experimentally induced stress. What is the level of measurement that best describes the independent variable?

Ordinal

A ______ is a value that reflects a value in the entire population, while a _____ reflects a value from a sample.

Parameter; Statistic

Nominal

Qualitative variables that don't have an order/hierarchy

Ordinal

Qualitative variables with an order/hierarchy

Continuous Variable

Quantitative variable that can be any number (though potentially within a range) Ratio and Interval Examples: time, height, weight

Interval

Quantitative variables with an order/hierarchy (e.g., number-line categories)

Ratio

Quantitative variables with an order/hierarchy and true 0

Scientific Process

Question, Research, Hypothesis, Experiment, Analyse, Report, Repeat

Measures of Variability

Range, InterQuartile Range, Standard Deviation Describe how different the data points are from each other (spread/data frequency)

Cumulative Percentile Ranks

Represents the percentage of values at/below a value or bin (e.g., 99th percentile is scoring better than ~98.9% of the population)

Sample Standard Deviation

S, SD

Sample Standard Deviation Formula

SD=(√SS)/(n-1)

Inaccurate y-Axis Trick

Starting the y-axis at a value other than 0 to make statistics seem more significant

Approximate Ranks

The percentile/ranking of a number within a bin Can't be exact because it's within a bin, so estimated as the bin's rank

Exact Ranks

The specific ranking of a number/datapoint Can only be done in ungrouped distributions

You give a personality test to 10 friends. You are actually interested in the time it takes for them to each complete the test. Their scores (in minutes) are: 18, 2, 9, 12, 17, 12, 20, 14, 13, 13. You decide to re-calculate the standard deviation, assuming that the above scores are a sample from a larger population of interest. The result, rounded to the nearest hundredth, is:

5.06

As part of a project, you collect a sample of IQ scores from 10 friends. The scores are: 110, 108, 98, 98, 105, 115, 130, 102, 104, 110. What is the standard deviation for this sample? (Round to two decimal points.)

9.44

Line Chart

Best used when x- and y-axes are continuous

Scatter Plots

Best used when x- and y-axes are continuous Can show bivariate/multivariate data trends through clusters/revealing modes

Bar Chart

Best used when y-axis is continuous and x-axis is discreet

Z-Score Formula

(xi - μ)/σ

As part of a project, you collect a sample of IQ scores from 10 friends. The scores are: 110, 108, 98, 98, 105, 115, 130, 102, 104, 110. You find out that the population mean for IQ scores is 100, and the population standard deviation is 15. What is the z score of 108, relative to the population? (Round to two decimal points.)

0.53

Volume Trick

Blowing up the proportions of a (bar) chart to make one statistic seem more significant than the other

Which of the following are most appropriately pursued as observational studies (as compared to experimental studies)? (Mark all that apply.) 1. Examining the incidence of brain tumors in cell-phone users. 2. Studying the dosage-dependent effects of a new medication on blood pressure. 3. Studying the variations in strains of HIV in different geographic populations. 4. Examining the effects of different levels of attention on reaction times in a visual task. 5. Measuring the effect of levels of word familiarity on different brain wave components. 6. Exploring the relationship between poverty and academic achievement outcomes.

1. Examining the incidence of brain tumors in cell-phone users. 3. Studying the variations in strains of HIV in different geographic populations. 6. Exploring the relationship between poverty and academic achievement outcomes.

Which definition below best captures the meaning of "sampling error"? 1. The difference between the population and sample that is due to the random selection of individuals in a variable population. 2. The difference between the population and sample that is due to measurement errors during data collection. 3. The difference between the population and sample that is due to biased observations by the investigator. 4. The difference between the population and sample that is due to self-selection by the experiment participant.

1. The difference between the population and sample that is due to the random selection of individuals in a variable population.

You give a personality test to 10 friends. You are actually interested in the time it takes for them to each complete the test. Their scores (in minutes) are: 18, 2, 9, 12, 17, 12, 20, 14, 13, 13. To get a quick idea of the variability in the data, you calculate the range and report that it is ___

18 minutes

Which of the following are discrete variables? (Mark all that apply.) 1. Height 2. Family Size 3. Blood Type 4. Electoral Votes 5. Reaction Time

2. Family Size 3. Blood Type 4. Electoral Votes

The degrees of freedom (df) are defined as the number of values (measurements) that are free to vary, given one or more mathematical restrictions. What is the mathematical restriction that makes you lose one degree of freedom when calculating the standard deviation for a sample? 1. The mean can be affected by outliers, so we lose one extreme value. 2. The deviations from the sample mean must sum to zero. 3. The deviations from the population mean must sum to zero. 4. The sum of squares can't be negative.

2. The deviations from the sample mean must sum to zero.

Which of the following symbols is associated with a population? [Mark all that apply.] 1. M 2. µ 3. N 4. n 5. x̅

2. µ 3. N

You have gathered data on the reaction times of 50 participants when performing sudden turns in a simulated driving task. You want to put the average times of each participant into a grouped frequency distribution histogram. Which of the following is not a required rule for producing the distribution? 1. All classes should be listed, even those with zero frequencies. 2. Each score should be included in only one class/grouping. 3. All classes should have upper/lower bounds 4. All classes should have equal intervals.

3. All classes should have upper/lower bounds

What is the median of the following distribution of scores? 8, 5, 1, 1, 3, 7, 2, 18

4

An unscrupulous person could exaggerate differences in bar graphs by: 1. Omitting the lower end of the dependent measure scale to make the differences appear larger in magnitude. 2. Increasing the width of the taller bar(s). 3. Adding volume to the bars with 3D visual cues. 4. All of the Above

4. All of the Above

Of the options below, which is a weakness of the IQR? 1. It is susceptible to outliers. 2. It relies on only two values. 3. It can't be used for interval/ratio scale data. 4. It ignores half of the data.

4. It ignores half of the data.

"Do selective serotonin re-uptake inhibitors reduce the effects of post-traumatic stress?" This question is best characterized as a: 1. Statistical Hypothesis 2. Null Hypothesis 3. Alternative Hypothesis 4. Research Question

4. Research Question

You give a personality test to 10 friends. You are actually interested in the time it takes for them to each complete the test. Their scores (in minutes) are: 18, 2, 9, 12, 17, 12, 20, 14, 13, 13. Although the IQR looks reasonable, you want to measure the variability using all of the scores. You calculate the standard deviation for the above population and find that it is ___

4.80 minutes

You give a personality test to 10 friends. You are actually interested in the time it takes for them to each complete the test. Their scores (in minutes) are: 18, 2, 9, 12, 17, 12, 20, 14, 13, 13. After finding the range, you worry that it is being affected by outliers. You calculate the interquartile range and find that it is:

5 minutes

Where COGS14B (statistics) fits into the Scientific Process

Analysis

Mean

Average (population or sample) Used for interval/ratio data =(∑xi)/n [or (∑xi)/N]

You are writing a paper that discusses the carbon emissions of different countries throughout the world. To compare the emissions of 10 different countries for a single year, what kind of chart would be best?

Bar Chart

Bi/Multimodal

Having multiple modes/most occuring datapoints Set can still be bimodal even if the counts for 2 values are not perfectly equal--just so long as the counts are significantly higher than for the other values Can indicate interactive effects of multiple variables/other circumstances

Grouped Frequency Histogram

Histogram showing the frequency of results, but the results are categorised into clusters/bins (e.g., intervals of 4,999 units) Rules: All values must be accounted for (0-max value), bins must be of equal size, ***INSERT THE THIRD RULE I'M FORGETTING HERE***

Symmetric Distribution

Histogram that can be cut into perfect equal halves Ideally, mean=median=mode

Skewed Distribution

Histogram that cannot be perfectly halved because data tends to one-side

Variability

How (much) a score differs from the mean

Median

Middle of the data (50% of data is above and 50% is below) Not affected by outliers Used for ordinal, interval, & ratio data Find the spot by taking (n+1)/2 If number is decimal, add the values for the interval above and below, then divide by 2

You want to known whether the college that students attend at UCSD has a large impact on their salaries in the first year after graduation. For one year, incoming students are randomly assigned to different colleges. When those students graduate, you calculate the median salaries for this group of students from each college. As stated, the level of measurement for the dependent variable is _____

Interval/Ratio

You want to test the effectiveness of two training programs to overcome test anxiety. You divide participants into two groups: Treatment A and Treatment B. You track the effect of the treatments by measuring the heart rate of each participant on a series of simulated tests over the course of a year. What would be the most effective data visualization for comparing the average effects of the two treatments over time?

Line Chart

Q1

Lower quartile The "median" between the lowest value and median Spot found through the equation (n+1)/4 If spot is a decimal value, average the datapoint above & below said value (same process as median)

Measures of Central Tendency

Mean, Median, Mode

Z-Score

Measure of how many standard deviations a score is from the mean Unitless

Parameter

Measurement describing the population

Statistic

Measurement describing the sample

Standard Deviation

Measures how much scores differ from the mean Square root of variance WHY IS IT SO IMPORTANT?!***************

A distribution of test scores includes the following values: 35, 51, 63, 68, 74, 81, 83, 90, 91, 91, 91, 92, 92, 92, 92, 93,93, 95. Which measure of central tendency is most appropriate for describing how students tended to do on the exam?

Median

You are testing the effectiveness of a new blood pressure medication on a group of 30 participants. Assume that drug effectiveness is measured using a continuous, interval/ratio dependent variable. The drug appears to be very effective (yielding consistently high scores) for 26 of your participants, but has little effect on 4 people (who have extremely small scores). Before even graphing the data, you suspect the best measure of central tendency for your entire data set will be the _____. [Note: this isn't necessarily the only value you would report.]

Median

Which section of a scientific paper will contain the most detailed description of the research design used?

Methods

Negatively Skewed Distribution

Tail tends to the left, Mode > Median > Mean

Positively Skewed Distribution

Tail tends to the right, Mode < Median < Mean

Data Omission Trick

Taking out relevant data from being presented in a chart/graph to make the desired result seem more/less significant e.g., Taking a short window of a line graph to make a change in data seem more significant than it actually is in context of the full dataset

Inferential Statistics

Translating descriptive statistics about sample to predicted population estimates

A doctor wants to understand the possible relationship between eating fast food and getting high blood pressure. The doctor asks his/her patients to fill out a survey of their eating habits and records their blood pressure over many years. The observations that the doctor collects is best characterized as a convenience sample. (True/False)

True

True or False: The Sum of Squares (SS) must always be positive (assuming it isn't zero).

True

True or False: The median is a valid measure of central tendency for ordinal scale data.

True

True/False: A relative frequency distribution can be created for nominal data.

True

Q3

Upper quartile The "median" between the median and highest value Spot found through the equation (3(n+1))/4 If spot is a decimal value, average the datapoint above & below said value (same process as median)

Outlier

Value that is much bigger/smaller than the rest of the data set Has biggest effect on the mean/standard deviation and range; has no effect on the median

Discrete Variables

Variables that can only have specific measurements, e.g. whole-number integers Type: Nominal and Ordinal (can also be Interval/Ratio) Examples: +++REFERENCE NOTES+++


Conjuntos de estudio relacionados

State and Local Government Exam 2

View Set

Federal Tax Considerations for Life Insurance & Annuities

View Set

Chapter 11: IPv4 Addressing Chapter end questions and QUIZ

View Set

Patho CH 35 Dynamic Study Module Acute Musculoskeletal Disorders

View Set

Chapter 3 - Managerial Statistics

View Set

Código internacional de ética periodística - UNESCO (COLUMNA 1)

View Set