STA2023 Test1

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Matched Pairs Design

Compare two treatment groups by using subjects matched in pairs that are somehow related or have similar characteristics.

Converting a Percentile to a Data Value

Converting a Percentile to a Data Value

P-Value

the probability of getting paired sample data with a linear correlation coefficient r that is at least as extreme as the one obtained from the paired sample data.

Replication

the repetition of an experiment on more than one individual. Good use requires sample sizes that are large enough so that we can see effects of treatments.

Nonsampling error

the result of human error, including such factors as wrong data entries, computing errors, questions with biased wording, false data provided by respondents, forming biased conclusions, or applying statistical methods that are not appropriate for the circumstances.

Nonrandom sampling error

the result of using a sampling method that is not random, such as using a convenience sample or a voluntary response sample.

Regression

the straight line that "best" fits the scatterplot of the data. y hat = B0 + B1x

Delete Cases

: One very common method for dealing with missing data. Delete all subjects having any missing values.

Pie Charts

A very common graph that depicts categorical data as slices of a circle, in which the size of each slice is proportional to the frequency count for the category

Pareto Charts

A Pareto chart is a bar graph for categorical data, with the added stipulation that the bars are arranged in descending order according to frequencies, so the bars decrease in height from left to right.

Randomized Block Design

A block is a group of subjects that are similar, but blocks differ in ways that might affect the outcome of the experiment.

Skewness

A distribution of data is skewed if it is not symmetric and extends more to one side than to the other.

Histogram

A graph consisting of bars of equal width drawn adjacent to each other (unless there are gaps in the data)

Bar Graphs

A graph of bars of equal width to show frequencies of categories of categorical (or qualitative) data. The bars may or may not be separated by small gaps.

Dotplots

A graph of quantitative data in which each data value is plotted as a point (or dot) above a horizontal scale of values. Dots representing equal values are stacked

Time-Series Graph

A graph of time-series data, which are quantitative data that have been collected at different points in time, such as monthly or yearly

Frequency Polygon

A graph using line segments connected to points located directly above class midpoint values A frequency polygon is very similar to a histogram, but a frequency polygon uses line segments instead of bars.

Simple Random Sample

A sample of n subjects is selected in such a way that *every possible sample of the same size n* has the same chance of being chosen. - Note that this every possible sample of the same size. Meaning, yes, a simple random sample can be a random sample, but these samples can further be broken down into equal smaller samples

Scatterplot (or Scatter Diagram)

A scatterplot (or scatter diagram) is a plot of paired (x, y) quantitative data with a horizontal x-axis and a vertical y-axis. The horizontal axis is used for the first variable (x), and the vertical axis is used for the second variable (y).

Completely Randomized Experimental Design

Assign subjects to different treatment groups through a process of random selection.

Voluntary Response Sample

By their very nature, all are seriously flawed because we should not make conclusions about a population on the basis of samples with a strong possibility of bias

Rigorously Controlled Design

Carefully assign subjects to different treatment groups, so that those given each treatment are similar in ways that are important to the experiment.

Multistage Sampling

Collect data by using some combination of the basic sampling methods

Data

Collections of observations, such as measurements, genders, or survey responses

Cluster Sampling

Divide the population area into sections (or clusters), then randomly select some of those clusters, and choose all the members from those selected clusters.

Pictographs

Drawings of objects, called pictographs, are often misleading. Data that are one-dimensional in nature (such as budget amounts) are often depicted with two-dimensional objects (such as dollar bills) or three-dimensional objects (such as stacks of coins, homes, or barrels).

Double-Blind

Experimenter & subjects don't know if receive or give placebo or drug The subject doesn't know whether he or she is receiving the treatment or a placebo. The experimenter [ Mr.physician ] does not know whether he or she is administering the treatment or placebo.

CVDOT

Explore the data by analyzing the histogram to see what can be learned about "CVDOT": the Center of the data, the Variation, the shape of the Distribution, whether there are any Outliers, and Time.

Calculating a z score

For a sample: z = x - (x-bar) ÷ s For a population: z = x - µ ÷ σ Note: s = standard deviation for sample σ = standard deviation for population µ = arithmetic mean for population x-bar = arithmetic mean for sample

Practical Significance

It is possible that some treatment or finding is effective, but common sense might suggest that the treatment or finding does not make enough of a difference to justify its use or to be practical.

Levels of Measurement

Nominal - categories only Ordinal - categories with some order Interval - differences but no natural zero point Ratio - differences and a natural zero point

10 - 90 quartile range

P90 - P10

Finding the Percentile of a Data Value

Percent value of x = number of values less than x ÷ total number of values × 100 The process of finding the percentile that corresponds to a particular data value x is given by the following (round the result to the nearest whole number):

Percentages

Some studies cite misleading percentages. Note that 100% of some quantity is all of it, but if there are references made to percentages that exceed 100%, such references are often not justified.

Midquartile range

Q3 + Q1 ÷ 2

Interquartile Range (IQR)

Q3 - Q1

Semi-interquartile range

Q3 - Q1 ÷ 2

interquartile range (IQR)

Q3 − Q1

The Gold Standard

Randomization with placebo/treatment groups is sometimes called the "gold standard" because it is so effective. (A placebo such as a sugar pill has no medicinal effect.)

Stemplots (or stem-and-leaf plot)

Represents quantitative data by separating each value into two parts: the stem (such as the leftmost digit) and the leaf (such as the rightmost digit).

Round-off Rule for z Scores *(and really any value in statistics)*

Round z scores to two decimal places (such as 2.31). - If greater than or equal to 5, round up - If lower than 5, round down

Q2 (Second quartile):

Same as P50 and same as the median. It separates the bottom 50% of the sorted values from the top 50%.

Q3 (Third quartile):

Same as P75. It separates the bottom 75% of the sorted values from the top 25%.

Q1 (First quartile):

Same value as P25. It separates the bottom 25% of the sorted values from the top 75%.

Systematic Sampling

Select some starting point and then select every kth element in the population.

Using z Scores to Identify Significant Values

Significant values are those with z scores ≤ −2.00 or ≥ 2.00.

Stratified Sampling

Subdivide the population into at least two different subgroups (or strata) so that the subjects within the same subgroup share the same characteristics. Then draw a sample from each subgroup (or stratum).

Population

The complete collection of all measurements or data that are being considered. Typically, a population is the complete collection of data that we would like to make inferences about.

Class width

The difference between two consecutive lower class limits in a frequency distribution usually between 5 and 20. Max - Min ÷ n

Upper class limits

The largest numbers that can belong to each of the different classes

Class boundaries

The numbers used to separate the classes, but without the gaps created by class limits Max - Min ÷ n

Statistics

The science of planning studies and experiments, obtaining data, and organizing, summarizing, presenting, analyzing, and interpreting those data and then drawing conclusions based on them

Lower class limits

The smallest numbers that can belong to each of the different classes

Class midpoints

The values in the middle of the classes Each class midpoint can be found by adding the lower class limit to the upper class limit and dividing the sum by 2. Lower Class Boundary + Upper Class Boundary ÷ 2

Convenience Sampling

Use data that are very easy to get.

Impute Missing Values

We "impute" missing data values when we substitute values for them.

boxplot (or box-and-whisker diagram

a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile Q1, the median, and the third quartile Q3.

Parameter

a numerical measurement describing some characteristic of a population

Statistic

a numerical measurement describing some characteristic of a sample

Blinding

a technique in which the subject doesn't know whether he or she is receiving a treatment or a placebo. a way to get around the placebo effect, which occurs when an untreated subject reports an improvement in symptoms.

outlier n a modified boxplot

above Q3, by an amount greater than 1.5 × IQR or below Q1, by an amount greater than 1.5 × IQR.

Statistical Significance

achieved in a study if the likelihood of an event occurring by chance is 5% or less

Experiment

apply some treatment and then proceed to observe its effects on the individuals. (The individuals in experiments are called experimental units, and they are often called subjects when they are people.)

Nominal Level

characterized by data that consist of names, labels, or categories only, and the data cannot be arranged in some order (such as low to high). Example: Survey responses of yes, no, and undecided

Categorical (or qualitative or attribute) data

consists of names or labels (not numbers that represent counts or measurements). Example: The gender (male/female) of professional athletes

Quantitative (or numerical) data

consists of numbers representing counts or measurements. Example: The weights of supermodels Example: The ages of respondents

5-number summary

consists of these five values: 1.) Minimum 2.) First quartile, Q1 3.) Second quartile, Q2 (same as the median) 4.) Third quartile, Q3 5.) Maximum

Ratio Level

data can be arranged in order, differences can be found and are meaningful, and there is a natural zero starting point (where zero indicates that none of the quantity is present). Differences and ratios are both meaningful. Example: Class times of 50 minutes and 100 minutes

skewed data

data that is not symmetric and extends more to one side than to the other

Linear Correlation Coefficient r

denoted by r, and it measures the strength of the linear association between two variables. Note: r, is always between −1 and 1. If r is close to −1 or close to 1, there appears to be a correlation. If r is close to 0, there does not appear to be a linear correlation.

Correlation

exists between two variables when the values of one variable are somehow associated with the values of the other variable.

Linear Correlation

exists between two variables when there is a correlation and the plotted points of paired data result in a pattern that can be approximated by a straight line.

Data science

involves applications of statistics, computer science, and software engineering, along with some other relevant fields (such as sociology or finance).

Interval Level

involves data that can be arranged in order, and the differences between data values can be found and are meaningful. However, there is no natural zero starting point at which none of the quantity is present. Example: Years 1000, 2000, 1776, and 1492

Ordinal Level

involves data that can be arranged in some order, but differences (obtained by subtraction) between data values either cannot be determined or are meaningless. Example: Course grades A, B, C, D, or F

Pk variable

kth percentile (Example: P25 is the 25th percentile.)

L variable

locator that gives the position of a value (Example: For the 12th value in the sorted list, L = 12.)

Percentiles

measures of location, denoted P1, P2, . . . , P99, which divide a set of data into 100 groups with about 1% of the values in each group.

Quartiles

measures of location, denoted Q1, Q2, and Q3, which divide a set of data into four groups with about 25% of the values in each group

skewed to the left

negative skewed longer left tail

Observational Study

observing and measuring specific characteristics without attempting to modify the individuals being studied

Nonresponse

occurs when someone either refuses to respond or is unavailable.

Sampling error (or random sampling error)

occurs when the sample has been selected with a random method, but there is a discrepancy between a sample result and the true population result; such an error results from chance sample fluctuations

k variable

percentile being used (Example: For the 25th percentile, k = 25.)

skewed to the right

positively skewed longer right tail

Big Data

refers to data sets so large and so complex that their analysis is beyond the capabilities of traditional software tools. Analysis of big data may require software simultaneously running in parallel on many different computers.

modified boxplot

regular boxplot constructed with these modifications: 1.) A special symbol (such as an asterisk or point) is used to identify outliers as defined above 2.) the solid horizontal line extends only as far as the minimum data value that is not an outlier and the maximum data value that is not an outlier.

Continuous Data

result from infinitely many possible quantitative values, where the collection of values is not countable. Example: The lengths of distances from 0 cm to 12 cm. {length could be 16.99998865545 and thus reaching never-ending values between the two given sets]

Discrete Data

result when the data values are quantitative and the number of values is finite, or "countable." Example: The number of tosses of a coin before getting tails [numbers 1,2,3.....900][not 1.4465565 like continuous]

A z score (or standard score or standardized value)

the number of standard deviations that a given value x is above or below the mean.

z score

the number of standard deviations that a given value x is above or below the mean. A data value is significantly low if its z score is less than or equal to −2 or the value is significantly high if its z score is greater than or equal to +2.

n variable

total number of values in the data set

Randomization

when subjects are assigned to different groups through a process of random selection. The logic is to use chance as a way to create two groups that are similar.


Ensembles d'études connexes

Chapter 11 Lifespan Practice Test

View Set

Statistics - 1.1 Introduction to the Practice of Statisticw

View Set

1.1.2 stretch shortening cycle A level pe

View Set

Motor Controls Test 1 (Units 1,2,3,&4)

View Set

Java Test 2 (Ch 11, 13, 19, 20, 21)

View Set

Consumer Behavior Final (Chapters 8-11, 14-16, and 18) (quiz 5-12)

View Set