MAT 232 Exam I

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

How to find midrange and define it.

the measure of center that is the value midway between the maximum and minimum values in the original data set. It is found by adding the maximum and minimum data value and then dividing the sum by 2.

Define Mode and the different types of mode

the values that occur with the greatest frequency. 1) Bimodal: when two data values occur with the same greatest frequency, each one is mode and the data set is said to be a bimodal. 2) Multimodal: When more than two data values occur with the same greatest frequency each is a mode and the data set is said to be multimodal. 3) No mode: When no data is repeated, there is no mode.

A boxplot is skewed left if

there is a longer left tail showing that relatively few data values are high values, and most of the data vlues are to the right.

A boxplot is skewed right (positively skewed) if

there is a longer right tail showing that relatively few data values are high values, and most of the data vlues are to the left. Ex:

_________ is defined and calculated as the average squared deviation from the mean.

variance

When a result occurs that is very unlikely to occur by chance.

Statistical Significance

A form of sampling that subdivides population into groups with the same characteristics, then randomly sample those within those groups.

Stratified Sampling

Sample

Subcollection of members selected from a population.

A form of sampling that selects every "nth" person.

Systematic Sampling

Statistics

The science of planning studies and experiments.

__________ is the number of standard deviations that a given value x is above or below the mean.

Z-Score (standard score)

Skewed

a boxpolot can be used to identify this. A distritbution of data is skewed if it is not symmetric and extends more to one side than the other.

Boxplot (or box-and-whisker diagram):

a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile, the median, and the third quartile. Ex:

Mean

of a data set is the measure of center found by adding all of the measure of center found by adding all of the data values and dividing the total by the number of data values. 1) Sample mean (n) 2) Population mean (N)

Symbol for population (parameter) pearson correlation coefficient

("rho")

Symbol for population (parameter) standard deviation

("sigma")

The Empirical Rule states that for data sets having a distribution that is approximately bell-shaped, the following properties apply:

1) About 68% of all values fall within 1 standard deviation of the mean. 2) About 95% of all values fall within 2 standard deviations of the mean. 3) About 99.7% of all values fall within 3 standard deviations of the mean.

Range Rule of Thumb (list)

1) Significantly low values - (mean -2 x standard deviation) or lower 2) Significantly high values - (mean + 2 x standard deviation) or higher 3) Values not significant are any values between the significantly low and high

Population

the complete collection of all measurements or data that are being considered.

Consider a value to be significantly low if its z score less than or equal to minus2 or consider a value to be significantly high if its z score is greater than or equal to 2. A test is used to assess readiness for college. In a recent​ year, the mean test score was 21.2 and the standard deviation was 4.6. Identify the test scores that are significantly low or significantly high.

Add 2 standard dev to the mean for max. Subtract 2 standard dev from the mean for min. max= 31.4 min=13

The blood platelet counts of a group of women have a​ bell-shaped distribution with a mean of 259.3 and a standard deviation of 67.8. ​(All units are 1000 ​cells/mu​L.) Using the empirical​ rule, find each approximate percentage below. Approximately ____% of women in this group have platelet counts between 123.7 and 394.9.

1) You need to determine how many standard deviations 123.7 and 384.9 are away from the mean. 2) Either add the standard deviation (67.8) to the lower value (123.7) or subtract the standard deviation from the higher number (394.9) UNTIL YOU HAVE GOTTEN TO THE MEAN. 3) However many standard deviations it is, convert it (using the empirical rule) to a percentage. IN THIS CASE= 2 standard deviations or 95%

Use the following cell phone airport data speeds​ (Mbps) from a particular network. Find the percentile corresponding to the data speed 14.2 Mbps.

1) find the location of 14.2 (44) 2) Go one above it (43) 3) Divide that second location by the total number of values in the data set (/50) 4) multiply by 100 = 86%

How to find MAD (mean absolute deviations)

1) subtract x and y and divide by 2 2) add up all the values on the "mean absolute deviation column" and divide by the total #= MEAN OF THE SAMPLE MEAN ABSOLUTE DEVIATIONS 1) add up your 3 values that are given OUTSIDE of the table. Divide by 3. Subtract your answer by each of those 3 values again. Add them. Divide by 3.= POPULATION MEAN ABSOLUTE DEVIATION The sample mean absolute deviation is a biased estimator of the population mean absolute deviation because the sample statistic centers around a different value than the population parameter.

Experimental Designs

1. Completely Randomized Experimental Design 2. Randomized Block Design 3. Matched Pairs Design 4. Rigorously Controlled Design

2 Types of Numerical Data

1. Discrete Data 2. Continuous Data

4 Levels of Measurement

1. Nominal Level 2. Ordinal Level 3. Interval Level 4. Ratio Level

2 Categories of Research Studies

1. Observational Study 2. Experiment

Steps of Scientific Inquiry

1. Prepare 2. Data Collection 3. Analyze Data 4. Conclusions

Sampling Methods

1. Random Sample 2. Simple Random Sample 3. Systematic Sampling 4. Convenience Sampling 5. Stratified Sampling 6. Cluster Sampling

Sampling Errors

1. Sampling Error 2. Nonsampling Error 3. Nonrandom Sampling Error

Locating a percentile given a data set

20/50 x100= 40% --> P40=11.8

Add

Second Quartile is the same value as

P50 and same as the median. It separates the bottom 50% of the sorted values from the top 25%.

Data sets so large and so complex that their analysis is beyond the capabilities of traditional software tools.

Big Data

Listed below are pulse rates​ (beats per​ minute) from samples of adult males and females. Does there appear to be a​ difference? Find the coefficient of variation for each of the two​ samples; then compare the variation.

CV= standard deviation/ mean x100 (You could also just use stat crunch)

Names or labels

Categorical

__________ is the proportion of any set of data lying within K^2 standard deviations of the mean is always at least 1 - 1/K, where K is any positive number greater than 1. For K=2 and K=3, w get the following statements:

Chebyshev's Theorem 1) At least ¾ (or 75%) of all values lie within 2 standard deviations of the mean. 2) At least 8/9 (or 89%) of all values lie within 3 standard deviations of the mean.

Using the accompanying table of​ data, blood platelet counts of women have a​ bell-shaped distribution with a mean of 255.1 and a standard deviation of 65.4. ​(All units are 1000 ​cells/mu​L.) Using​ Chebyshev's theorem, what is known about the percentage of women with platelet counts that are within 3 standard deviations of the​ mean? What are the minimum and maximum possible platelet counts that are within 3 standard deviations of the​ mean?

Chebyshev's theorem = 1-1/k^2x 100 K= the number of standard deviations AWAY from the mean = 89% Min and Max= add your standard deviation however many times it gives you (3) to the mean for MAX. Subtract your standard deviation however many times it gives you (3) for your MIN. min= 58.9 max=451.3

A form of sampling that partitions that population in clusters, then randomly selects some clusters, then select all members of the selected clusters.

Cluster Sampling

___________ for a set of nonnegative sample or population data, expressed as a percent, describes the standard deviation relative to the mean.

Coefficient of variation

Assign subjects to different treatment groups through a process of random selection.

Completely Randomized Experimental Design

When there is a noticeable effect, but the factor that caused it isn't identifiable.

Confounding

"Prepare"

Context of the data- the circumstances that create the setting for the event being researched.

Infinitely many possible quantitative values, where the collection of values is not countable.

Continuous Data

A form of sampling that uses data that is easiest to get (Least accurate).

Convenience Sampling

Data are observed, measured, and collected at one point in time, not over a period of time.

Cross-Sectional Study

Involves applications of statistics, computer science, and software engineering, along with some other relevant fields.

Data Science

False advertisement to get specific people.

Deliberate Distortions

The data values are quantitative and the number of values if finite.

Discrete Data

Data

Facts and statistics collected together for reference or analysis.

Third Quartile is the same value as

P75. It separates the bottom 75% of the sorted values from the top 25%.

Numerical measurement representing a population.

Parameter

Mean of a frequency distribution

First calculate the mean and when working with data summarized in a frequency distribution, we make calculations possible by pretending that all sample values in each class are equal to the class midpoint. Steps include: multiply each frequency and class midpoint and then add the products.

"Analyze Data"

Graph & Explore- graphing the data to be observed for analysis. Apply Statistical Methods- use technology to obtain results.

Population Standard Deviation

If the data is being considered a population on its own, we divide by the number of data points, N.

Sample Standard Deviation Formula

If the data is being considered a sample of a population, we divide by the number of data points, n - 1.

________ is a measure of variability, based on dividing a data set into quartiles.

Interquartile Range IQR (=Q3-Q1)

Differences are meaningful, but there is no natural zero starting point and ratios are meaningless.

Interval Level (order + differences)

"Conclusions"

Is there statistical significance? Is there practical significance?

Standard deviation

It is the measure of variation most commonly used in statistics. The standard deviation of a set of sample values, denoted by s, is a measure of how much data values deviate away from the mean. The formula we use for standard deviation depends on whether the data is being considered a population of its own, or the data is a sample representing a larger population.

For a data set of brain volumes ​(cm3​) and IQ scores of nine ​males, the linear correlation coefficient is found and the​ P-value is 0.497. Write a statement that interprets the​ P-value and includes a conclusion about linear correlation.

Just multiply by 100 =49.7% which is above 5% or less, which makes it high and not allowing sufficient evidence for a linear correlation

Use the following cell phone airport data speeds​ (Mbps) from a particular network. Find Upper P 40.

L= k/100 x number of values L= 40/100x50= 20--> 20th value

Use the following cell phone airport data speeds​ (Mbps) from a particular network. Find Upper P 75.

L= k/100 x number of values L= 75/100 x 50 37.5--> ROUND UP--> 38= Location___> 10.4

Questions that are intentionally worded to elicit a desired response.

Loaded Questions

Compare two treatment groups by using subjects matched in pairs that are somehow related or have similar characteristics.

Matched Pair Design

__________ is the measure of center that is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude.

Median (resistant measure)

When a statement is made that is not justified by the statistical analysis.

Misleading Conclusions

When values are missing - this can be either random or intentional.

Missing Data

When a percentage is inaccurate and misleads people.

Misunderstanding Percentages

Labeling a percentile given a specific data point in a data set

N= number of values and K= percentile. In this instance, K=25th percentile, and N= 50. L= (k/100) x N ----> 25/100 x50 = 12.5 ROUND TO WHOLE NUMBER = 13 Therefore, P25=13th value= 7.9 in specific data set

Data cannot be arranged in order. Only categories.

Nominal Level (name)

The result of using a sampling method that is not random, such as using a convenience sample or a voluntary response sample.

Nonrandom Sampling Error

When someone either refuses to respond to a survey question or is unavailable.

Nonresponse

The result of human error, including such factors as wrong data entries, computing errors, questions with biased wording, false data provided by respondents, forming biased conclusions, or applying statistical methods that are not appropriate for the circumstances.

Nonsampling Error

Observational Study

Observe and measure but do not modify.

The order of questions can be loaded by having certain questions follow one another to elicit a desired response.

Order of Questions

Data can be arranged in order, but difference either can't be found or are meaningless.

Ordinal Level (order)

First Quartile is the same value as

P25. It separates the bottom 25% of the sorted values from the top 75%.

µ

Parameter/ Population mean symbol ("Mu")

_________ are measures of locations, denoted by P1,P2, P3..... which divide a set of data into 100 groups with about 1% of the values in each group.

Percentiles

If you have the data for the whole population available, it is __________.

Population variance (sigma ^2 = square of the population standard deviation sigma.)

Using common sense to determine when a result is practical in the greatest number of situations.

Practical Significance

Data are collected in the future from groups that share common factors.

Prospective Study

Finding the values of quartiles can be accomplished with the same procedure used for finding percentiles. Use these relationships:

Q1= P25 Q2=P30 Q3= P75

Numbers representing counts or measurements.

Quantitative

A sample of any amount of subject where anyone has an equal chance of being selected.

Random Sample

Dividing subjects into different groups with similar characteristics and then randomly selecting from them.

Randomized Block Design

__________ of a set of data values is the difference between the maximum data value and the minimum data value.

Range

There is a natural zero starting point and ratios make sense.

Ratio Level (order + differences + ratios)

When people report results, they oftentimes exaggerate, whereas when you measure the results yourself, you get the most accurate measurements.

Reported Results rather than measured results

Experiment

Research modifies situation via experimental condition.

_______ is a statistic that is resistant of the presence of extreme values (outliers) does not cause it to change very much (ex: median).

Resistant measure

Data are collected from a past time period by going back in time.

Retrospective Study

Carefully assign subjects to different treatment groups, so that those given each treatment are similar in the ways that are important to the experiment. (The most difficult one)

Rigorously Controlled Design

Standard deviation uses _______ units. Variance uses _______.

SD= original, variance= units^2

_______________ is the number of data values.

Sample Size

When you don't don't have all the data available, you estimate the population's standard deviation from a ________.

Sample Variance (S^2 = square of the standard deviation s.)

When the sample has been selected with a random method, but there is a discrepancy between a sample result and the true population result.

Sampling Error

Favors whomever published the study.

Self-Interest Study

A sample of "n" subjects is selected so that every sample of the same size "n" has the same chance of being selected.

Simple Random Sample

A successful basketball player has a height of 6 feet 10 ​inches, or 208 cm. Based on statistics from a data​ set, his height converts to the z score of 4.81. How many standard deviations is his height above the​ mean?

Simply 4.81, because a z-score measures how many standard deviations something is above the mean.

When a sample is done over a small group of people which won't get accurate results.

Small Samples

"Data Collection"

Source of the data- where the data is coming from. Sampling Method- the procedure in which a researcher selects members from a population. Voluntary Response Sample- the respondents decided whether to be included or not.

Numerical measurement representing a sample.

Statistic

Outliers

anything more than 1.5*IQR greater than Q3 or 1.5*IQR less than Q1) when analyzing data, it is important to identify and consider outliers because they can strongly affect values of some important statistics such as mean and standard deviation. They are sample values that lie very far away from the vast majority of the other values in a set of data AND data values meeting specific criteria based on quartiles and the interquartile range when considering a modified boxplot.

S is a _______ estimator of σ, However s2 is an _________ estimator of σ2.

biased, unbiased (Simply means that the values of the sample standard deviation s do not center around the value σ.)

Use the​ F-scale measurements of tornadoes listed in the accompanying table. The range of the data is 4.0. Use the range rule of thumb to estimate the value of the standard deviation. Compare the result to the actual standard deviation of the​ data, 1.1.

just divide by 4. s = 1

Measures of center

mean, mode, median

The brain volumes ​(cm cubed​) of 20 brains have a mean of 1058.6 cm cubed and a standard deviation of 126.7 cm cubed. Use the range rule of thumb to identify the limits separating values that are significantly low or significantly high. For such​ data, would a brain volume of 1262.0 cm cubed be significantly​ high?

mean- 2(standard deviation) or mean+2(standard deviation) 1058.6-2(126.7) or 1058.6+(126.7)

Quartiles Q1, Q2 (= median), Q3

measures of location, denoted by Q1, Q2, and Q3, which divide a set of data into four groups with about 25% of the values in each group.

The 5 number summary consists of

minimum, Q1, Q2 (median), Q3, and maximum

Which measure of center measures qualitative data and is the most popular?

mode

Symbol for sample (statistic) pearson correlation coefficient

r

coefficent of variation

sample (x bar and S) and population (sigma and Rho)

Measures of variation

standard deviation, variance (standard deviation C^2), and range

Census

the collection of data from every member of the population.


Kaugnay na mga set ng pag-aaral

AWS Cloud Practitioner Exam Study Guide

View Set

Reproductive/Maternity/Newborn medications

View Set

TEAS 6: Science - Human Anatomy & Physiology: Skeletal System

View Set

Web Programming and Applications Final Exam MCQ and Fill in the blanks.

View Set

Environmental Economics: Economics of pollution control

View Set

Chapter 29- Caring for Older Adults at the End of Life

View Set

icev risk management, Contract & Employment Law, Ethics In Business, Aspects of a Business Plan

View Set