ch3 and 4info

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Bayes' Theorem is

Bayes' Theorem is a probability rule that can be used to compute a conditional probability based on specific available information.

best method to summarize Categorical variables

Categorical variables are variables with two or more distinct responses but the responses are unordered. Some examples of categorical variables measured in the Framingham Heart Study include marital status, handedness, and smoking status. For categorical variables, frequency distribution tables with frequencies and relative frequencies provide appropriate summaries. Cumulative frequencies and cumulative relative frequencies are generally not useful for categorical variables, as it is usually not of interest to combine categories as there is no inherent ordering

describe Categorical variables

Categorical variables, sometimes called nominal variables, are similar to ordinal variables except that the responses are unordered. Race/ethnicity is an example of a categorical variable. It is often measured using the following response options: white, black, Hispanic, American Indian or Alaskan native, Asian or Pacific Islander, or other. Another example of a categorical variable is blood type, with response options A, B, AB, and O.

describe Continuous variables, sometimes called measurement or quantitative variables

Continuous variables, sometimes called measurement or quantitative variables, take on an unlimited number of distinct responses between a theoretical minimum value and maximum value.

describe Continuous variables

Continuous variables, sometimes called quantitative or measurement variables, in theory take on an unlimited number of responses between defined minimum and maximum values. Systolic blood pressure, diastolic blood pressure, total cholesterol level, CD4 cell count, platelet count, age, height, and weight are all examples of continuous variables. For example, systolic blood pressure is measured in millimeters of mercury (mmHg), and an individual in a study could have a systolic blood pressure of 120, 120.2, or 120.23, depending on the precision of the instrument used to measure systolic blood pressure.

should you use Cumulative frequencies and cumulative relative frequencies for CATEGORICAL VARIABLES?

Cumulative frequencies and cumulative relative frequencies are generally not useful for categorical variables, as it is usually not of interest to combine categories as there is no inherent ordering

Dichotomous variables are best summarized using ______ charts

Dichotomous variables are best summarized using bar charts. The response options (yes/no, present/absent) are shown on the horizontal axis, and either the frequencies or relative frequencies are plotted on the vertical axis, producing a frequency bar chart or relative frequency bar chart, respectively

Dichotomous variables are often summarized in

Dichotomous variables are often summarized in frequency distribution tables. The first column of the frequency distribution table indicates the specific response options of the dichotomous variable (in this example, male and female). The second column contains the frequencies (counts or numbers) of individuals in each response category (the numbers of men and women, respectively). The third column contains the relative frequencies, which are computed by dividing the frequency in each response category by the sample size (e.g., 1625 / 3539 = 0.459). The relative frequencies are often expressed as percentages by multiplying by 100 and are most often used to summarize dichotomous variables

define Dichotomous variables

Dichotomous variables have only two possible responses. The response options are usually coded "yes" or "no." Exposure to a particular risk factor (e.g., smoking) is an example of a dichotomous variable. Dichotomous variables take on one of only two possible responses. Sex is an example of a dichotomous variable, with response options of "male" or "female," as are current smoking status and diabetes status, with response options of "yes" or "no."

Histograms are appropriate graphical displays for __________________ variables.

Histograms are appropriate graphical displays for ordinal variables. A histogram is different from a bar chart in one important feature. The horizontal axis of a histogram shows the distinct ordered response options of the ordinal variable. The vertical axis can show either frequencies or relative frequencies, producing a frequency histogram or relative frequency histogram, respectively. The bars are centered over each response option and scaled according to frequencies or relative frequencies as desired. The difference between a histogram and a bar chart is that the bars in a histogram run together; there is no space between adjacent responses. This reinforces the idea that the response categories are ordered and based on an underlying continuum.

If the mean and median are very different, it suggests

If the mean and median are very different, it suggests that there are outliers affecting the mean.

list difference in probability sampling vs nonprobability sampling

In probability sampling, each member of the population has a known probability of being selected. In non-probability sampling, each member of the population is selected without the use of probability.

In simple random sampling, we start with what is called the

In simple random sampling, we start with what is called the sampling frame, a complete list or enumeration of the entire population. Each member of the population is assigned a unique identification number, and then a set of numbers is selected at random to determine the individuals to be included in the sample.

describe systematic sampling

In systematic sampling, we again start with the complete sampling frame and members of the population are assigned unique identification numbers. However, in systematic sampling every third or every fifth person is selected. The spacing or interval between selections is determined by the ratio of the population size to the sample size (N / n). For example, if the population size is N 5 1000 and a sample size of n 5 100 is desired, then the sampling interval is 1000 / 100 5 10; so every tenth person is selected into the sample. The first person is selected at random from among the first ten in the list, and the first selection is made at random using a random numbers table or a computer-generated random number.

type of ordering is appropriate for ordinal vs categorical variables?

Marital status is a categorical variable, so there is no ordering to the responses and therefore the first column can be organized differently. For example, sometimes responses are presented from the most frequently to least frequently occurring in the sample, and sometimes responses are presented alphabetically. Any ordering is appropriate. In contrast, with ordinal variables there is an ordering to the responses and therefore response options can only be presented either from highest to lowest (healthiest to unhealthiest) or vice versa. The response options within an ordinal scale cannot be rearranged.

describe Non-probability samples

Non-probability samples are often used in practice because in many applications, it is not possible to generate a sampling frame. In non-probability samples, the probability that any individual is selected into the sample is unknown.

describe Ordinal and categorical variables in terms ordered vs unordered, response types

Ordinal and categorical variables have a fixed number of response options that are ordered and unordered, respectively. Ordinal and categorical variables typically have more than two distinct response options, whereas dichotomous variables have exactly two response options. Summary statistics for ordinal and categorical variables again focus primarily on relative frequencies (or percentages) of responses in each response category Frequency distribution tables, similar to those presented for dichotomous data, are also used to summarize categorical and ordinal variables.

descirbe Ordinal and categorical variables

Ordinal and categorical variables have more than two possible responses but the response options are ordered and unordered, respectively.

define sens, spec, false neg and false pos

Sensitivity 5 True Positive Fraction = P(screen positive | disease) = a / (a + c). Specificity 5 True Negative Fraction = P(screen negative | disease free) = d / (b + d). False Positive Fraction = P(screen positive | disease free) = b / (b + d). False Negative Fraction = P(screen negative | disease) = c / (a + c).

Sensitivity is

Sensitivity is also called the true positive fraction and is defined as the probability that a diseased person screens positive.

describe Simple random sampling

Simple random sampling is a technique against which many other sampling techniques are compared. It is most useful when the population is relatively small because it requires a complete enumeration of the population as a starting point. In this sampling scheme, each individual in the population has the same chance of being selected. We use N to represent the number of individuals in the population, or the population size. Using simple random sampling, the probability that any individual is selected into the sample is 1 / N

Specificity is

Specificity is also called the true negative fraction and is defined as the probability that a disease-free person screens negative.

give some examples of ordinal variables

Symptom severity is an example of an ordinal variable with possible responses of minimal, moderate, and severe. blood pressure as normal, pre-hypertension, Stage I hypertension, or Stage II hypertension. Blood pressure category is an ordinal variable.

The sample mean is computed by

The sample mean is computed by summing all of the values and dividing by the sample size. and the formula for the sample mean is __ X= ΣX/n

The best graphical summary for dichotomous and categorical variables is a _____________, and the best graphical summary for an ordinal variable is a _______________

The best graphical summary for dichotomous and categorical variables is a bar chart, and the best graphical summary for an ordinal variable is a histogram. Both bar charts and histograms can be designed to display frequencies or relative frequencies, with the latter being the more popular display.

The difference between a histogram and a bar chart is that the bars in a histogram run

The difference between a histogram and a bar chart is that the bars in a histogram run together; there is no space between adjacent responses. This reinforces the idea that the response categories are ordered and based on an underlying continuum. Usually, relative frequency histograms are preferred over frequency histograms, as the relative frequencies are most appropriate for summarizing the data. The bars of the histogram run together to reflect the fact that there is an underlying continuum of total cholesterol measurements.

The following formula can be used to compute probabilities of selecting individuals with specific attributes or characteristics.

The following formula can be used to compute probabilities of selecting individuals with specific attributes or characteristics. P(characteristic) = Number of persons with characteristic _______________________________________________________________ Total number of persons in the population (N)

The mean and standard deviation summarize ________________ and the median and interquartile range summarize ____________________________ respectively

The mean and standard deviation, or the median and interquartile range, summarize location and dispersion, respectively.

The mode is defined as

The mode is defined as the most frequent value The mode is a useful summary statistic for a continuous variable. It is presented not instead of either the mean or the median but rather in addition to the mean or median.

The most widely used measure of variability for a continuous variable is called the

The most widely used measure of variability for a continuous variable is called the standard deviation If all of the observed values in a sample are close to the sample mean, the standard deviation is small (i.e., close to zero), and if the observed values vary widely around the sample mean, the standard deviation is large. If all of the values in the sample are identical, the sample standard deviation is zero.

The sample median is

The sample median is the middle value in the ordered dataset, or the value that separates the top 50% of the values from the bottom 50%

The two popular types of sampling are

There are two popular types of sampling, probability sampling and non-probability sampling.

A very popular method to determine outliers in a sample is

There are several methods to determine outliers in a sample. A very popular method is based on the following: Outliers are values below Q1 - 1.5 x (Q3 - Q1) or above Q3 + 1.5 = (Q3 2 Q1), or equivalently, values below Q1 - 1.5 = IQR or above Q3 + 1.5 - IQR These are referred to as Tukey fences

When there are no outliers in a sample describe the summary of a typical value vs having outliers

When there are no outliers in a sample, the mean and standard deviation are used to summarize a typical value and the variability in the sample, respectively.

When there are outliers in a sample

When there are outliers in a sample, the median and IQR are used to summarize a typical value and the variability in the sample, respectively.

describe median per even and odd number of variables

When there is an odd number of observations in the sample, the median is the value that holds as many values above it as below it in the ordered dataset. When there is an even number of observations in the sample, the median is defined as the mean of the two middle values in the ordered dataset. note: The median is unaffected by extreme or outlying values.

With ordinal variables, two additional columns are often displayed in the frequency distribution table, called the

With ordinal variables, two additional columns are often displayed in the frequency distribution table, called the cumulative frequency and cumulative relative frequency, respectively The cumulative frequencies reflect the number of patients at the particular blood pressure level (or other variable) or below; The cumulative relative frequencies are very useful for summarizing ordinal variables and indicate the percent of patients at a particular level or below. the cumulative frequency is equal to the sample size (n = 3533) and the cumulative relative frequency is 100%, indicating that all of the patients are at the highest level or below

Bar charts are appropriate graphical displays for

categorical variables.: Bar charts are appropriate graphical displays for categorical variables. Bar charts for categorical variables with more than two responses are constructed in the same fashion as bar charts for dichotomous variables. The horizontal axis of the bar chart again displays the distinct responses of the categorical variable. Because the responses are unordered, they can be arranged in any order (e.g., from the most frequently to least frequently occurring in the sample, or alphabetically). The vertical axis can show either frequencies or relative frequencies, producing a frequency bar chart or relative frequency bar chart, respectively.

Characteristics—sometimes called variables, outcomes, or endpoints—are classified as

classified as one of the following types: dichotomous, ordinal, categorical, or continuous

Box-whisker plots provide a very useful and informative summary for

continuous variables. Boxwhisker plots are also useful for comparing the distributions of a continuous variable among mutually exclusive (i.e., non-overlapping) comparison groups.

dscribe convenience sampling,

convenience sampling, we select individuals into our sample by any convenient contact. For example, we might approach patients seeking medical care at a particular hospital in a waiting or reception area. Convenience samples are useful for collecting preliminary data. They should not be used for statistical inference as they are generally not constructed to be representative of any specific population.

A box whisker plot is meant to convey the

distribution of a variable at a quick glance: A box-whisker plot is a graphical display of these percentiles. Figure 4-18 is a box-whisker plot of the diastolic blood pressures measured in the subsample of n 5 10 participants described in Example 4.3. The horizontal lines represent (from the top) the maximum, the third quartile, the median (also indicated by the dot), the first quartile, and the minimum. The shaded box represents the middle 50% of the distribution (between the first and third quartiles). A box whisker plot is meant to convey the distribution of a variable at a quick glance.

When a dataset has outliers, variability is often summarized by a statistic called the

interquartile range (IQR).: When a dataset has outliers, or extreme values, we summarize a typical value using the median as opposed to the mean. When a dataset has outliers, variability is often summarized by a statistic called the interquartile range (IQR). The interquartile range is the difference between the first and third quartiles. The first quartile, denoted Q1 , is the value in the dataset that holds 25% of the values below it. The third quartile, denoted Q3 , is the value in the dataset that holds 25% of the values above it. The IQR is defined as IQR = Q3 - Q1 the IQR is the range of the middle 50% of the data. When the sample size is odd, the median and quartiles are determined in the same way. When the sample size is 9, the median is the middle number, 72. The quartiles are determined in the same way, looking at the lower and upper halves, respectively. There are four values in the lower half, so the first quartile is the mean of the two middle values in the lower half, (64 - 67) / 2 = 65.5. The same approach is used in the upper half to determine the third quartile, (77 - 81) / 2 = 79

The best numerical summaries for continuous variables

median and interquartile range: The best numerical summaries for continuous variables include the mean and standard deviation or the median and interquartile range, depending on whether or not there are outliers in the sample

descibe quota sampling

quota sampling, we determine a specific number of individuals to select into our sample in each of several non-overlapping groups. The idea is similar to stratified sampling in that we develop non-overlapping groups and sample a predetermined number of individuals within each group. For example, suppose we wish to ensure that the distribution of participants' ages in the sample is similar to that in the population. Suppose our desired sample size is n 5 300 and we know from census data that in the population, approximately 30% are under age 20, 40% are between 20 and 49, and 30% are 50 years of age and older. We then sample n 5 90 persons under age 20, n 5 120 between the ages of 20 and 49, and n 5 90 who are 50 years of age and older. Sampling proceeds until these totals, or quotas, are reached in each group. Quota sampling is different from stratified sampling because in a stratified sample, individuals within each stratum are selected at random. Here we enroll participants until the quota is reached.

The best numerical summaries for dichotomous, ordinal, and categorical variables are

relative frequencies: The best numerical summaries for dichotomous, ordinal, and categorical variables are relative frequencies. .

The sample range is computed as follows

sample range. The sample range is computed as follows: Sample range = Maximum value - Minimum value

The more common measure of variability in a sample is the

sample standard deviation: The more common measure of variability in a sample is the sample standard deviation, defined as the square root of the sample variance

A quantity that is often used to measure variability in a sample is called the

sample variance: A quantity that is often used to measure variability in a sample is called the sample variance, and it is essentially the mean of the squared deviations. The sample variance is denoted s2 and is computed as follows: : __ s squared = ∑( X -X) /n-1 Dividing by (n - 1) produces a better estimate of the population variance. The sample variance is nonetheless usually interpreted as the average squared deviation from the mean.

describe stratified sampling

stratified sampling, we split the population into nonoverlapping groups or strata (e.g., men and women; people under 30 years of age and people 30 years of age and older) and then sample within each strata. Sampling within each strata can be by simple random sampling or systematic sampling. The idea behind stratified sampling is to ensure adequate representation of individuals within each strata. For example, if a population contains 70% men and 30% women and we want to ensure the same representation in the sample, we can stratify and sample the requisite numbers of men and women to ensure the same representation.

two events are said to be independent if

the probability of one is not affected by the occurrence or non-occurrence of the other


Ensembles d'études connexes

Chapter 01: Maternity and Women's Health Care Today Foundations of Maternal-Newborn & Women's Health Nursing, 7th Edition

View Set

Science of Nutrition Chapter 3 Quiz

View Set

Psychology chapter 5 online quiz

View Set

Disorders of the Musculoskeletal,

View Set

Anatomy of the Pelvis and Upper Femora

View Set

Chapter 3: Collecting Objective Data: The Physical Examination.

View Set