Overview Statistics

Ace your homework & exams now with Quizwiz!

A Closer Look at More Graphs

-Pie - Bar - Histogram - Frequency Polygon - Line - Scattergram or correlational diagrams tables

Steps for Hypothesis Testing

5

Interval Level Data example

Engineers' Salary Range

Expressed in terms of probability

Results might indicate... •Data suggest that... •One explanation might be that....

Ordinal Level Data example

let's say a researcher wants to examine the pain level of patients one day after a knee replacement surgery. The researcher utilizes a 5-point Likert scale measurement for pain (1= no pain, 2=mild, 3= moderate, 4= severe, and 5= excruciating pain). We can create a table to convey this data:

Formula

p(A(an event))= number of occurrences of A total number of possible occurrences •Predicting a priori= before the event •For example- the probability of throwing a "head" with a fair coin Formula: p(H (Head))= number of occurrences of H total number of possible occurrences = H 1 = 0.5 H + T (tails)= 2 First price lottery ticket with 100,000 tickets sold Formula: p(First price)= 1 = 0.00001 100,000

Sample Variance Formula

s^2 = ∑(x- x̄)^2 / n-1 s^2 is the sample variance x is the value of each observation in the sample x̄ is the mean of the sample n is the number of observations in the sample

Sample Standard Deviation Formula

sum of [ (sample value minus sample average) squared] divided by numbers of samples less one.

Descriptive Statistics

§A branch of statistics that enables the researcher to describe summarize collected data numerically §No attempt is made to generalize findings beyond the sample §Statistics are limited to describing only the characteristics of the sample group, using measures of central tendency and dispersion, and correlation techniques

Pie Graph or Chart:

§A circle that displays parts of the whole in terms of percentages or proportions at one point in time. The whole pie equals 100%. §Be sure all sectors added together equals 100% §Clearly differentiate sectors and include percentages or proportions §Works best with <6 sectors, too many categories may appear confusing

Standard Deviation

§A large standard deviation indicates the scores of a group are widely dispersed around the mean §Small standard deviation indicated the distribution of scores will be clustered around the mean (scores are closely together)

Mean

§Commonly used index of central tendency §Highly affected by extreme scores- underestimation or overestimation of central tendency §Biased indicator of asymmetrical distributions (skewed) §Most accurately employed to determine average of interval or ratio scaled distribution of scores that are symmetrical §Frequently used in parametric statistics

Pearson Product Moment Correlation (Pearson r)

§Correlation simple and useful descriptive statistical measure for determining the strength of linear relationship between two or more variables §If the two measurements to be evaluated are continuous at least interval scaled, and the relationship between variables is linear, Pearson r is the statistic

Use of Tables

§Data can be summarized into tables §Tables should be relatively simple and easy to read §Title placed above table; clear, concise, and to the point; indicates what is being tabulated §Units of measure for the data is provided §Each row and column should be clearly labeled §Codes and abbreviates explained in footnotes

Measure of Association Correlation

§Describes the relationship between two or more sets of observations or variables §Causal relationship (degree and direction of relationship between variables §Health sciences- determining validity and reliability of measures, health problems, behavioral or environmental factors §There is a positive relationship between cigarette smoking and lung damage §There is a negative relationship between being overweight and life expectancy §What is the degree or magnitude of the associations between these variables?

Correlation Coefficient

§Descriptive statistic or number that expresses the magnitude and direction of the association between two variables

Types of Graphs

§Determining the type of graph to use, consider: §Does the data represent one point in time (cross-sectional or survey) or several points over time (time-series, cohort, or longitudinal) §Was the data qualitative or quantitative? §What was the scale of measurement?

Bar Graph:

§Displays frequency at which data fall into different mutually exclusive categories or groups. §Used to present frequency of phenomenon that is discrete and nominally or ordinally scaled. §Characteristics §Bars do not touch §Bars denotes frequency of each dependent variable score §Bar can be organized vertical or horizontal §Ranked lowest to highest/highest to lowest

Ordinal Level Data

§Frequency distribution can be used to convey ordinal data in a table.

Line Graph

§Frequently used because lines can be superimposed on the same graph §Independent variables are plotted on the horizontal axis and dependent variable on vertical axis

Use of Graphs:

§Graphs are diagrams that consists of points, lines, curves, or areas that represent the state, status, condition or behavior of a population. §Relationship among numbers of various magnitudes; seen quickly and easily §Visually represent frequency distributions §Effectively communicate summary of results Guidelines for Graphs: §Simplest graph consistent with its purpose is the most effective §Should be clear and accurate (3/4 high rule) §Self-explanatory §Correctly and unambiguously labeled with title, data source (if needed), legends (if needed) §Vertical scale so the zero line appears on graph §The graph generally proceeds from left to right from bottom to top

Scattergram Diagrams

§Illustrates the degree of linear or curvilinear between two different variables at one point in time §Use to display the degree of functional relationship between two or more variables measured §Variables are linearly related- positive, negative, or not at all §Correlation is expressed with a r value ranging from -1 to +1; 0 means no correlation § The closer r value is to 0; the weaker the correlation

Descriptive Statistics

§It is in the name- "describing" the collected data §Easily understood by the readership §First step with any data collected prior to performing any statistical tests §The level of measurement determines the type (counts, frequency, percentage, mean, median, mode, interquartile range, standard deviation) of descriptive static the researcher will use §Tables and Graphs can be used depending on the amount of data the researcher wants to "describe"

Measures of Central Tendency

§Mean- point on the score scale that is equal to the sum of scores divided by the number of scores (N) §Median- point on a numerical scale above which and below which 50% of the cases fall (middle number within a dataset) §Mode- numerical value that occurs the most frequently

Interpreting Correl National statistics

§Measure= correlation coefficient (-1 = perfect negative; +1=perfect positive; 0= no relationship ) §Sign (- or +) indicates the direction of the relationship §Actual number represents the strength of the relationship §- = negative or inverse relationship §+= positive relationship

Negative Skew

§Most scores are on the "higher" end and spread to the lower end of the distribution. The "tail" is on the left = negative skew.

Frequency Polygon has what kind of curves/skew?

§Normal curve §Positive skew §Negative skew

Ratio Level Data

§Oftentimes researchers will convert a continuous variable such as age and group the ages for table display. We would not report the raw data of each age - 1, 2, 4, 5, 20, 45, 60 etc. This would be an extremely large table if the sample (n=100) people whose age is reported.

Frequency Polygon

§One way to describe continuous data §Frequency distribution §Constructed from a histogram §Point is placed at the top in the middle of each interval §Each point represents the frequency or percentage of the variable being measured §A line is drawn to connect the points

Measure of Dispersion/Variability

§Range- highest score minus the lowest score in a given distribution §Percentiles- data is divided into 100 equal proportions; scores may be located in any one of the proportions §Variance- mathematically, the square of the standard deviation §Standard Deviation- analyzes spread of scores in a distribution (variability) §S or SD= population, s or sd= sample §Used to describe the "dispersion" of data

Normal Curve

§Represents that most scores fall within the middle and few scores towards either "tail"

Median

§Scores must be arranged in ascending order to locate midpoint §If distribution of scores is odd, the median is the middle score/number §If the array of scores is an even number, take the two middle scores and divide by 2 6, 17, 19, 20, 21, 27 6, 17, 19, 20, 21 §Used for ordinal, interval, or ratio data §Not affected by extreme scores, used to describe skew distribution §Can be used in statistics inference, more commonly used in descriptive statistical methods §The interquartile range (IQR) is reported with the median for highly skewed data §The formal definition of the interquartile range is the range of the middle 50% of the participants.

Nominal Level Data: EXAMPLE

§Sex (variable of interest). Researchers are interested in the sex of patients who underwent open-heart surgery at in the month of October. The researcher is granted access to the raw data in the patient portal system (after IRB approval) and records it below. What is the frequency of males (M) and females (F) undergoing open-heart surgery in October? §F, M, M, F, F, F, M, F, M, M, M, M, M, M, F, F, F, F, M, F, M, M, F, M, F, M, M

Histogram:

§Similar to bar graph except the bars are placed side-by-side and touch each other. Use to represent the frequency of a phenomenon that are continuous, and interval or ratio scaled. §For example, income a ratio scaled variable §Quantitative data is best displayed- age in years, PHP scores, QOL index scores, BP scores, BG levels etc

Range

§Simplest measure of dispersion §Ordinal statistic- analyze dispersion of data originally scaled §Easily determined by inspection; considered somewhat unreliable since derived from only two scores in the total distribution §One extreme score in a distribution can increase the range, leading to possible misinterpretation of group variability

Correlation AssumpTions

§Specific correlational technique is based on §Number of variables to be correlated §Nature of variable (discrete or continuous) §Scale used to measure the variable §Nominal §Ordinal §Interval §Ratio

Variance and Standard Deviation

§Standard deviation is the positive square root of the variance §Variance and standard deviation are based on the mean and value of each score withinthe distribution §Both indices are primarily used in parametric statistical procedures with interval orratio data §The greater the dispersion of scores from the mean of the distribution, the greater the standard deviation and variance

Variance

§Sum of squared deviations from the mean divided by N §Compute the arithmetic mean of the distribution of scores, subtract each score in the distribution from the mean, square each difference to eliminate negative numbers, sum all of the squared values, and divide the sum by §N-1(population) or n-1 (sample)

Spearman rho

§Two variables being correlated are ordinal §Measurements are discrete and ordinally scaled, Spearman rho is computed §The calculation of choice (Pearson r or Spearman rho) is based on variables meeting assumptions of linearity by plotting obtained scores on a two-dimensional graph, scatter diagram

Overview of Data Analysis and Interpretation Process

§Use of tables for descriptive statistics §Types of Graphs §Descriptive Statistics §Measures of central tendency §Measures of dispersion §Measures of association

Mode

§Used to describe nominal level data §Most frequently occurring score in a distribution §Mode can be determined from ordinal, interval, or ratio scaled data §Can occur at any point in a distribution §Unimodal, bimodal, multimodal, or no mode at all §Used to analyze descriptively categorical values collected from questionnaires §Best indicator for skewed distributions

Phi coefficient

§Used when the two or more variables being correlated are nominally scaled §For example- Gender and Political Affiliation

Kendall's Tau

§Used when two or more variables being correlated are at least ordinal scaled §Non-parametric measure, just as Spearman rho §Used as an alternative to Spearman rho, specifically if the sample size is small

¾ High Rule

§X- axis is the abscissa reserved for the scale employed to measure the dependent variable or variable of interest. Represents the range of the scale used to measure the variable from 0 to ? §Y-axis the ordinate usually reserved for the frequency or percentage of scores occurring along the scale of measurement. Represents the frequency range from 0 to the highest possible frequency or percentage. rule minimizes misrepresentation and misinterpretation of the research findings that can result from expanding or collapsing of scales. Present research data in an unbiased manner

Statistic

§methods and procedures for collecting, classifying, summarizing, and analyzing data; and for making scientific inferences for such data.

selecting the appropriate stat test

•1. The scale of measurement used to obtain the data, which is ___________, ___________, __________, __________ •2. The number of groups used in an investigation (one or more) •3. Whether the measurements were obtained from independent subjects or related samples; i.e., repeated measurements of the same subjects •4. The assumptions involved in using a stat tests, such as distributions of score or the minimum required sample size

ANCOVA

•Allows for comparison of one variable in two or more groups taking into account variability of other variables- covariates. •Covariates are continuous variables not apart of the main experimental manipulation but have influence on the dependent variable. •Ask yourself this question when doing research: •Are there other variables that could influence the DV?

Wilcoxon Matched Pairs

•Alternative t-test for dependent samples, or paired t-test •Used to determine differences between two related samples •Small samples, ordinal scaled data, violates parametric assumptions

Mann Whitney U-Tests

•Alternative test to independent t-test •Used to compare two sample means when the dependent variable is ordinal and not normally distributed. •Used for small sample sizes

Kruskal-wallis

•Alternative to one-way ANOVA •Used to determine differences between 3 or more groups •Small sample size •Ordinal scale data- violates parametric assumptions

Independent samples t-tests

•Compares means based on between two groups. •Interval or ratio scale •Non-parametric test •Mann-Whitney-U •Ordinal scale

Systematic Review

•Comprehensive review/analysis of a topic (NARROW) that includes only articles of the highest level of evidence •Uses reproducible methods to identify and appraise •Addresses the same research question •Summarizes the 'best evidence' on a specific topic •Good way of keeping current

One-way Anova

•Determine whether there are statistically significant differences between the means of 3 ore more groups. Tend to see used more with 3 groups. i.e. Solomon 3 or 4 group design. •Can be used when you manipulate more than one Ivs •Interval or ratio scale

Importance of effect size

•Effect size (d) •Refers to the magnitude (e.g., size) of a difference when it is expressed on standardized scale. •The statistic d - most popular for describing the effect size of the difference between two means. •[3.00 or -3.00] standard deviation range

Practical Significance

•First step determining statistical significance •Practical significance involves 5 steps •Cost-benefit analysis •Crucial difference (increase or decrease) •Client acceptability •Public or political acceptability •Ethical and legal implications

Topics in Part II:

•Inferential statistics •Parametric tests •Nonparametric tests •Hypothesis testing •Reporting results and discussion

Selecting the Appropriate Test

•Let's say a researcher is investigating the effectiveness of a new treatment compared to a conventional treatment used for pain management. Two groups of participants (n=50) are equally randomized to one of the treatment groups. •The outcome (pain) is measure on a 5- point Like rt scale (5= extremely painful to 1= not painful at all. • The researcher is interested in the difference in the mean pain scores between the two groups. What statistical test would you advise this researcher to use?

Positive Skew

"upper" end of the distribution. The tail is on the "right" side= positive skew

Descriptive

- Describe/synthesize sample data - Measures of Central tendency - Measures of dispersion - Measures of association

Inferential

- Infer sample data to population - T-test - Analysis of variance (F test) - Analysis of covariance - Measures of association

Inferential

Used to infer research findings from the sample to the general population from which the sample was drawn.

Assumptions

•Normal distribution •Additivity and linearity •Homoscedasticy/homogeneity of variance •Independence Parametric tests are the preferred test, as long as All assumptions can be met. Assumptions: •Normally distributed is relevant to •Parameters •Confidence intervals around parameters •Null hypothesis testing This assumption incorrectly gets interpreted as the "data needs to be normally distributed." •The assumption of normality matters in small samples •Large samples we are concern with outliers •Additivity and linearity •The outcome variable is linearly related to any predictors •The relationship forms a straight line •Homoscedasticity/homogeneity of variance •When testing more than one group of participants, samples should come from populations with the same variance •This can impact- parameters and null hypothesis testing •Independence •Data point values for variables from different groups should be independent of each other.

Anova:

•One-way Analysis of Variance (ANOVA) •Two-way Analysis of Variance (ANOVA) •Analysis of Covariance (ANCOVA)

Discussion Section

•Organized around each hypothesis/problem statement •Explanation of the statistical outcomes •Current knowledge in the field/other authors •Limitations •Statistical significance vs. practical significance •Application to practice/thinking

Organizing the findings Results Section

•Organized around each hypothesis/problem statement •Narrative description of statistical outcomes •Obtained test statistic •Degrees of freedom •Alpha level •Statement of whether null hypothesis was rejected/retained •Example- There was a significant main effect of time on knowledge, F (1, 31) =12.67, p= 0.001

Regression Basics: Predicting the Outcome

•Outcome= dependent variable •Multiple independent variables to predict the outcome will occur. •Multiple linear regression: used with continuous data •Predicts the outcome one dependent variable and at least one independent variable (generally multiple) •Multiple logistic regression: used with categorial data

Probability

•P-value = Probability of the obtained result occurring by chance •Small p- observed result is less likely to be a chance occurrence •Large p- observed result is more likely a chance occurrence

Inferential Statistics

•Recall •Used to infer research findings from the sample to the general population from which the sample was drawn. •Statistical analysis is an essential stage in quantitative research. •Health sciences research- examines samples drawn from a population •Draw inferences from "samples" •Probabilistic- even with random samples there is always a chance of sampling error

inferential statistics

•Recall: descriptive statistics "describing" the data •Inferential statistics - generalizing findings back to the population •The probability (p-value) associated with the value of an inferential statistic informs the researcher of the likelihood that the results obtained were due to chance or if the results are sig. given a probability level (i.e., 0.05) provides the researcher with a means of determining how reproducible the obtained results are by enabling access to probability

Nonparametric Tests

•Require less stringent assumptions about the population and are useful when data is nominally or ordinally scaled, small samples and when parametric assumptions cannot be met. •Whereas parametric tests requires the researcher to make stringent assumptions about the population of interest. •Less powerful than parametric test

Meta-analysis

•Statistical process commonly used with SRs (but does not have to be included) •Combining data from multiple studies into one analysis •Example: Cochrane Collaboration •http://www.cochrane.org •ODU Libraries-http://guides.lib.odu.edu/az.php?a=c

Paired t-test

•T-test for dependent samples or paired samples t-test •Compares means based on related data. •Interval or ratio

Probability

•The concept of probability is the cornerstone to understanding inferential statistics. •Probability is expressed in a proportion between 0 and 1 •0= an event is certain not to occur •1= an event is certain to occur

T-tests

•Two basic tests for parametric stats •T-tests are used to test the observed difference between two means. •Two basic types •Depends on whether the IV was manipulated using the same or different participants •Paired-t (dependent samples)- Wilcoxon, paired signed rank •Independent sample/Mann-Whitney U

Two-way Anova

•Understand if an interaction exists between two IVs on the DV Example •Gender (IV) (male/female) and educational level (IV) (undergrad/grad) on test anxiety (DV) among college students. •Is the effect of gender on test anxiety influenced by educational level?

Chi-square

•Used for nominal data •Differences between frequencies •Small samples •2 or more groups •Bivariate analysis

Freidman two-way ANOVA

•Used for ordinal data •Small sample •Violates parametric assumptions

In interpreting the data, the investigator answers the following questions

•What could these results mean? •What factors might be contributing to these results? •Are these results what was expected based on the theoretical framework of the study and what is known in the field? •Do these results agree or disagree with the findings of other researchers on the subject area? •Are any known limitations or threats to internal and external validity contributing to the results? •What conclusions do these results lead to? •How do the findings contribute to or advance the current knowledge or practice in the field? •In what populations, conditions, settings would these same results hold true? •What are the implications of the results to current practices?

Non-parametric test

•Wilcoxon matched •Ordinal scale •Example •Week 1- provide ISVS survey •Weeks 3-7 provide IPE activity •Week 9- provide ISVS survey again

statistical SIG

•p= 0.05 typical standard •p= 0.001 research where there is a life-or-death outcome •p= 0.10 exploratory research into a new era •The level of significance is only a statement of probability; it does not mean the research hypothesis is correct, important, or meaningful, or of value to a real-life situation.


Related study sets

7th Grade Microbiology Unit Pre-Assessment

View Set

Unidad I: Metodología de la investigación: Introducción y revisión de literatura

View Set

CNA 100 - Chapter 8 - Configuring Cisco Devices

View Set

Lecture 7 Wireless and mobile network security

View Set