Biometry test 1

¡Supera tus tareas y exámenes ahora con Quizwiz!

What is 6% of 12,000 responses?

(6/100)(1200)= 72

Midquartile

(Q3+Q1)/2

Interquartile Range

(Q3-Q1)

Semi-interquartile range

(Q3-Q1)/(2)

For small data sets (20 values or fewer)...

Use a table instead of a graph.

Bar Graphs

Uses bars of equal width to show frequencies of categories of categorical (qualitative) data.

Cluster Sampling

We first divide the population area into sections (or clusters), then randomly selecting a few of those sections, and the choosing all the members from those selected sections.

Systematic Sampling

We select some starting point and then select every Kth element in the population.

Convenience Sampling

We simply use data that is very easy to get.

Stratified Sampling

We subdivide the population into at least two different subgroups (or strata) so that subjects within the same subgroup share the same characteristics.

Important uses of Histograms

-Visually displays the shape of the distribution of the data. -Shows the location of the center of the data. -Shows the spread of the data -Identifies outliers

Double Blind

neither the person giving or receiving treatment know if they are giving/receiving treatment.

Sampling Error

occurs when the sample has been randomly selected with a random method, but there is a discrepancy between a sample result and the true population result.

stratified sampling

population is separated into strata (homogenous groups), then randomly selected

Relative Frequency Equation

relative frequency = frequency / sum of all frequencies

Stem Plot

represents quantitative data by separating each value into parts: the stem (the left most digit) and the leaf (the right most digit).

Continuous (numerical) data

result from infinitely many possible quantitive values, where the collection of values is not countable.

Discrete data

results when the data values are quantitative and the number of values is finite or "countable"

Practical significance

some treatment or finding is effective, but common sense, which might suggest that the treatment or findings does not make enough of a difference to be practical.

standard deviation (SD)

square root of the variance

Census

the collection of data from every member of the population.

Population

the complete collection of all measurements or data that are being considered; typically the population is the complete collection of data we would like to make inferences about.

A data value is missing completely at random if...

the likelihood of its being missing is independent of its value or any of the other values in the data set.

Median

the measure of center that is the middle value when the original data values are arranged in order of increasing value.

Midrange

the measure of center that is the value midway between the maximum and the minimum values in the original data set. (Max data value+Min data value)/2 Not resistant

A statistic is resistant if....

the presence of extreme values (outliers) do not cause it to change very much.

Relative frequency

the proportion (or percent) of observations within a category

Statistics

the science of planning studies and experiments; analyzing and interpreting those data and then drawing conclusions based on them.

Digression line

the straight line that best fits the scatter plot of the data.

5 numbers summary and box plot

the values of the minimum, maximum and the three quartiles are used for the 5-number summary and the construction of boxplot graphs.

what can be used to avoid measurement bias?

true placebo control groups

what can be used to avoid selection (sampling) bias?

truly random samples

Frequency Polygon

uses line segments connected to points located directly above class midpoint values. Similar to a histogram; uses segments instead of bars.

Relative Frequency Polygon

variation of the basic frequency polygon.

Experiment

we apply some treatment and then proceed to observe its effects on the individuals.

Observational Study

we observe and measure specific characteristics but we don't attempt to modify the individuals being studied.

Confounding

when we can see some effect, but we can't identify the specific factor that caused it.

Statistical Significance is achieved in a study when:

when we get a result that is very unlikely to occur by chance.

what is one type of measurement bias?

- Hawthorne effect - human subjects change behavior simply because they are being studied

variance

- S^2 - AKA mean square - mean of squares of all the deviation scores in a distribution - obtained by finding the deviation score (x) for each element then squaring these and obtaining their mean

what type of sampling is based on geographical areas?

- cluster samples - randomly selecting geographical locations and then taking # of samples from each

odds ratio

- measure of association between an exposure and an outcome - (A x C)/(B x D) = (cases exposed x control not exposed) / (control exposed x cases not exposed) - if ratio = 1 then NOT related to the disease - if ration > 1 then the risk factor is found more frequently among the cases than the controls - if ratio < 1 then the risk factor may actually be a protective factor against the disease - must be used instead of relative risk when analyzing CASE-CONTROL data

cluster sampling

- population already separated into strata, then randomly selected - Ex: choosing 100 medical students- then randomly selecting 10 med schools, and then 10 random students from each

relative risk

- probability of developing a disease over time period; considers risk factors - (incidence - exposed to risk factors)/(incidence- NOT exposed to risk factors) (A/A+B)/(C/C+D)

attributable risk

- risk difference - portion of incidence in exposed group that is due to exposure - [exposed - not exposed]

systematic sampling

- selecting elements in a specificly systematic way (Ex: pick every 3rd person); Every Kth number. - usually provides equivalent of simple random sample w/out using randomization

Features of a dot plot

-Displays the shape of the distribution of data -It is usually possible to recreate the original list of data points.

Features of a Stem Plot

-Shows the shape of the distribution of data -Retains the original data values -The sample data are sorted (arranged in orders)

Graphs the Deceive

-Start at non-zero axis -Pictographs (images)

Preparing Data

1. Context: what do the data represent; what is the goal of the study. 2. Source of data: are the data from a source with a special interest so that there is pressure to obtain results that are favorable to the source. 2. Sampling method: were the data collected in a way that is unbiased, or were the data collected in a way that is baised.

Analyzing Data

1. Graph the data 2. Explore the data: are there any outliers (numbers away from the majority of data points); what important statistics summarize the data (mean/standard deviation); how are the data distributed; are there any missing data; did many selected subjects refuse to respond? 3. Apply statistical data: use technology to conclude results.

Normal Frequency Distribution

1. The frequencies start low, the increase to one or two high frequencies, and decrease to a low frequency. 2. The distribution is approximately symmetric: Frequencies preceding the maximum frequency should be roughly a mirror image of those that follow the maximum frequency.

Histogram

A graph consisting of bars of equal width drawn adjacent to each other (unless there are gaps in the data). -The horizontal scale represents classes of quantitative data values, and the vertical scale represents frequencies. -The heights of bars correspond to frequency values.

multiplication rule

AND; independent events

σ "sigma"^2

Population variance

Skewed to the Right

Positive

Data

Collections of observations, such as measurements or survey responses.

Dot Plots

Consist of a graph of quantitative data in which each data value is plotted as a point (or dot) above a horizontal scale of values.

Weight Mean

Different x data values are assigned different weights w, we can compute a weight mean.

Skewed Distribution

Distribution is not symmetric and extends more to one side than the other.

Significance

Do the results have statistical significance; do the results have practical significance?

Boxplot

Graphical representation of the spread of a set of data

Randomization

Individuals assigned to groups by random selection.

Experimental units

Individuals in the experiments; often called subjects when they are people.

r

Linear correlation coefficient

Relative Frequency Distribution

Lists each category of data together with the relative frequency. The sum of all the relative frequencies should add up to 1.

Variance

Measure of variation equal to the square of the standard deviation.

Standard Deviation

Measure of variation equal to the square root of the variance. How much data values deviate away from the mean. Never negative; only zero when all data values are the same.

Skewed to the Left

Negative

addition rule

OR; mutually exclusive (dependent) events

Mode

Of a data set is the value(s) that occur with the greatest frequency. Two data values: bimodal More than two data values: multimodal No values: no mode

10-90 percentile range

P90-P10

Types of data

Parameter: a numerical measurement describing some characteristics of a population. Statistic: a numerical measurement describing some characteristic of a sample. "population parameter, sample statistic" Quantitative (or numerical): consists of numbers representing counts or measures. Categorical(or qualitative or attribute): data consists of names or labels (not numbers or measurements)

Multi-stage sample design

Pollsters select a sample in different stages, and each stage might use different methods of sampling.

μ "mu"

Population mean

σ "sigma"

Population standard deviation

Big data

Refers to data sets so large and so complex that their analysis is beyond the capabilities of traditional software tools. Analysis of big data require software simultaneously running in parallel on many different computers.

Modified Boxplot

Regular boxplot with these modifications 1. Special Symbol 2. Solid horizontal line extends only as far as the maximum data value that is not an outlier.

x̅ "x-bar"

Sample mean

x~ (x tilde)

Sample median

S

Sample standard deviation

S^2

Sample variance

Features of Bar Graphs

Shows the relative distribution of categorical data so that it is easy to compare the different categories.

Percentiles

The 99 values that divide ranked data into 100 groups with approximately 1% of the values

Quartiles

The three values that divide ranked data into four groups with approximately 25% of the values in each group.

Cumulative Frequency Distribution

The frequency for each class is the sum of the frequency for that class and all previous classes.

Range

The measure of variation that is the difference between the highest and lowest values. Range =Max data value-Min data value

Normal Distribution

The population distribution is normal if the pattern of the points in the normal quantile plot is reasonably close to a straight line and the points do not show some systematic pattern that is not a straight line pattern.

Not a Normal Distribution

The population distribution is not normal if the normal quantile plot has either or both of these conditions: -the points do not lie reasonably close to a straight line pattern -the points show some systematic pattern that is not a straight line.

Replication

The repetition of an experiment on more than one individual.

Non-Sampling Error

The result of human error, biased wording, false data provided, or applying statistical methods that are not appropriate for the circumstances.

Nonrandom Sampling Error

The result of using a sampling method that is not random, such as using convenience sample or a voluntary response sample.

Mean/Average

The sum of a set of values divided by the number of values. Sensitive to outliers.

Linear Correlation Coefficient (r)

a numerical measure that can help make decisions more objectively using parallel data,w e can calculate the value of the linear correlation coefficient (r). Value of -1 or 1 means there is a strong correlation. Value of 0 means little to no correlation.

Scatter Plot

a plot of parallel (x,y) quantitative data with a horizontal x-axis and a vertical y-axis

Voluntary response samples/self-selected sample

a sample in which the respondents themselves decide whether to be included.

Sample

a sub-collection of members selected from a population

Measure of Center

a value at the center or middle of a data set.

Lurking variable

affects the variables in the study but the lurking variable is not involved in the study.

Ordinal level of measurement

can be arranged in some order, but differences (obtained by subtraction) between data values either cannot be determined or are meaningless. (rank of colleges in the U.S.)

Nominal level of measurement

characterized by data that consist of names, labels, or categories only. Not possible to arrange in any order. (eye color)

Pie Charts

common graph that depicts categorical data as slices of a circle.

Retrospective study

data are collected from a past time period by going back in time to examine records, interviews, etc.

Prospective Study

data are collected in the future from groups that share common factors.

Cross-sectional study

data are observed, measured, and collected at one point in time, not over a period of time.

Interval level of measurement

data can be arranged in order, and differences between data values can be found and are meaningful, but data at the level do not have a natural zero starting point at which none of the quantity is present. (body temp. in degrees)

Ratio level of measurement

data can be arranged in order, differences can be found and are meaningful, and there is a nature zero starting point at which none of the quantity is present. (height, length, distances, volume)

what can be used to avoid confounding bias?

double blind design

what can be used to avoid experimental expectancy/bias?

double blind design

simple random sampling

every element has an equal chance of selection

Correlation

exists between two variables when the values of one variable are somehow associated with the values of the other variable.

Linear Correlation

exists between two variables when there is a correlation and the plotted points of paired data result in a pattern that can be approximately by a straight line.

Block

groups of subjects that are similar, but blocks differ in ways that might affect the outcome of the experiment.

Relative frequency histogram

has the same shape and horizontal scale as a histogram, but the vertical scale uses relative frequencies (as percentages or proportions) instead of actual frequencies.

A data value is missing not at random if...

if the missing value is related to the reason that is it missing.

Data science

involves applications of statistics, computer science, and software engineering, along with some other relevant fields (such as biology).

Blinding

is used when the subject doesn't know whether he or she is receiving a treatment or a placebo.


Conjuntos de estudio relacionados

photosynthesis and reproduction in plants

View Set

Properties of matter- Unit 2 Quiz

View Set

The Endocrine System Chpt 9 Test

View Set

Chapter 6 Implementation and Evaluation

View Set

ANP 120 - Module 11 - CH 10: What is a hominin?

View Set

Financial Final Smart Book Questions

View Set