Basic Statistical Concepts, Data Analysis, and Applied Statistics (HHGB)

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

= intervals used in a histogram - cutpoints = values in the beginning and the end of the bins - frequency = count of the number of data values in each bin

- bins

= values in the beginning and the end of the bins

- cutpoints

count of the number of data values in each bin

- frequency =

- value that describes the population or the theoretical world • Thus, when making statistical inferences, you estimate population parameters based on sample statistics

- parameter

- is the r significantly different from 0 • was there no relationship?

3. statistical significance of the coefficient

Pearson r Assumes continuous x and y Point biserial r When one variable is continuos and 1 is dichotomous Phi coefficient When both variables are dichotomous Spearman's rho When both variables are ordinal (ranked data) Kappa (K) 2 or more nominal variables w/ 2 or more categories Intraclass correlation 2 or more continuous variables Cramer's V Two categorical variables

Alternative Correlation Coefficients

In an experiment, you always try to get to the event which is least expected to happen (usually, this is the event you hypothesized). Usually, when you get results, it's prone to chance, chamba lang, or underpowered (not enough proof To prove something wrong, you need an overwhelming amount of proof

Ang sabi mo, ang mean weight ng frog ay 15 (greater than the real world) According to your ROL / previous studies, the average weight of the frog is around 10 This is your claim: Kunwari, gusto mong sabihin na because of climate change, lumaki na po sila kaya ang average weight nila, ang hypothesis ko, ay 15 na So you sample frogs to try to prove it The result of Experiment 1 (UPLB): 10.9 Not near the hypothesized value, so you still lack evidence; the truth still stands that the average weight has not changed The result of Experiment 2 the following year (UPB): 9.6 It got smaller, but the probability that you get this value is still higher than you getting an even smaller or an even higher one But the truth still stands that the average weight (9-11) has not changed You did more experiments, until finally during your 6th experiment, involving a representative sample of 1000 frogs from different areas in the Philippines, you found that the average weight was 16.9. What is the probability of you getting that event? Smaller compared to the probability of getting the weights of the previous experiments But, this event is so rare that when you observe that it happened, it's enough to shake the previous assumptions (which is, the average weight of the frogs has not changed). Overwhelming proof This is enough evidence to say that the weight of the frog has indeed changed

• X and Y are valid and reliable • linear relationship between X and Y - check scatter plots (does it look like a linear relation? or does it look complex?) • both X and Y have enough variability to demonstrate a relationship - limited range of values may result to a low, uninterpretable r • homoscedasticity (distance between data point from the regression line) - homo ("same") + skedasis ("dispersion") - check scatterplots

Assumptions of Correlation Coefficients

nominal

Bar graph: ideal for _________ data

Sampling error Probability

Basis of Statistical Inference

Ratio of the standard deviation to the absolute value of the mean Used to compare the variation of two or more different variables Expressed as percentage

Coefficient of variation

• method for giving a range of possible values for our theoretical parameters given the information we have in our data from the real world (data should be unbiased, randomised, unrelated to each other) • Expressed as a probability percentage - 90% confidence level - 95% confidence level - 99% confidence level • "I am 99% confident that the population mean would fall within this range of values" • If we generate a distribution of sample means • Draw 50 samples from the true population • Compute for the 90% confidence intervals for each sample mean - 5 out of 50 intervals would not contain the "true" mean - 45 out of 50 intervals would contain the "true" mean

Confidence Interval

- other variable that may contribute to the dependent variable E.g. Medical diagnosis

Confounding Variable (CV)

- continuous predictor variable

Covariate

DESCRIBE Describing your data EXPLORE Visualizing it or summarizing it using graphs, tables, etc. INFER Making causal assumptions *Your aim for analysis depends on the question

DATA ANALYSIS (3 aims of researches)

Sampling Variables Conceptual framework Data collection tools

DATA COLLECTION

Before you proceed to the analysis, you need to first process the data so it'll be neat. It's like quality assurance—na malinis yung data mo For example, you have missing data. What do you do? Sometimes you fill in the values, sometimes you remove them altogether Who will handle the data? Where will you store the data? Who will the data be accessible to? Mechanism for "cleaning" data Excel

DATA PROCESSING / QUALITY ASSURANCE

Measures of central tendency Mean Median Mode Measures of dispersion Standard deviation Variance Range Measures of location Percentile Quartile Decile Median IQR (interquartile range) Distribution of data Histograms Box plots Scatter plots Visual, narrative/textual, tabular methods Graphs, tables, or text Ideal to have all 3 in results section

DESCRIPTIVE (no inference yet)

Descriptive statistics - Frequency distributions - Measures of central tendency - Measures of dispersion - Shape of the distribution - Skewness

Data Analysis

• Type 1 error (α) - Incorrectly rejecting a true null hypothesis - Claiming a FALSE POSITIVE - Alpha is kept at a low level when it is important not to make a mistake of rejecting a true Ho • Type II error (β) - Error of not rejecting a false null hypothesis - Beta is kept at a low level if it is important not to accept a false Ho [[[Ginagamit ang hypothesis testing to help you make decisions about the things in the world. For you to make the right decision dapat makuha mo yung correct values. Two things can happen in the real world Null hypothesis is true Null hypothesis is false Kapag sinabi ng experiment mo na walang effect tas ang katotohanan talaga ay walang effect → CORRECT DECISION In the same way, if sabi mo false yung hypothesis na walang effect, meron talaga nangyari, and yung yung totoo → CORRECT DECISION May two errors na pwede macommit Type 1 error or alpha Type 2 error or beta TYPE 1 ERROR Ang result ng experiment mo sinsabi mo na may effect pero ang katotohonan ay wala. TYPE 2 ERROR Sabi mo walang effect, pero meron talaga ]]]

Decision Making Errors

variable that is predicted to be affected by the IV E.g. Mean length of utterance

Dependent Variable (DV) -

procedures used to summarize, organize, and simplify the available data uses numbers and figures E.g. standard deviation, mean, charts, graphs

Descriptive Statistics

Spread of scores or variation in the dataset how far each point is from the central value The ↑ variance = ↑ histogram spread

Dispersion

pattern of the values in the data showing their frequency of occurrence relative to each other usually visualized through a histogram

Distribution

• Point Estimates - A single numerical value used to approximate/estimate the population parameter • Confidence Intervals - A range of values within which the parameter is expected to lie with a certain degree of confidence

Estimating the Mean of the Population

According to literature, 20% will have pneumonia Now, your hypothesis is, the incidence of pneumonia in patients with dysphagia in PGH is higher (because of poorer care, etc.) → 30% When you want to estimate the true incidence of dysphagia in the wards, you get a sample, and you get the point estimate You obtained 25% When you estimate something, does it have to be exact all the time? Usually no This is where interval estimation comes in Usually, 25%, but the range or the confidence interval lies from 21-29%. Estimation: trying to know the value of the parameter from the test statistic

Ex. Incidence of dysphagia in PGH

- categorical predictor variable E.g. Intensive speech therapy

Factor

number of times the variable occurred in the data set Excel Formulas =COUNTIF(Range,Criteria)

Frequency Counts

A table format that shows all possible values of a variable; includes: - class intervals (category/group) - raw frequencies (actual number of cases) - relative frequencies (% of cases) - cumulative relative frequencies (% of cases lying within and below each class interval)

Frequency Distribution

Better for ratio or interval data Midpoints of each class interval are joined

Frequency polygon

Histograms Bar graph Frequency polygon Ogive Stem-and-leaf plots Boxplot

Graphical Presentations of Frequency Distributions

Get frog samples from all the provinces in the Philippines, and get at least 100 samples from each province Measure the color, and see the average (using descriptive stats) Now, you can estimate that the mean / average color of frogs in the Philippines is brown This leads to more accurate inferences The reality is, there is always a sampling error. Whenever you get a sample, you will have an inherent certain level of error, that you have to minimize as much as possible through sampling techniques.

How do you reduce the sampling error?

• If consequences are not that life threatening, then its okay to set p value at a larger point • The significance level of a test (alpha α) gives a cut-off for how small is small for a p-value

How small is small?(for p-value)

State the null and alternative hypotheses Determine alpha / significance level Check normality Select appropriate test Calculate test statistic Compare test statistic with alpha Make a statistical decision Draw conclusion

Hypothesis Testing

- Set of procedures than ends in either the REJECTION or NON-REJECTION of a hypothesis - A method of making decisions using data, whether from a controlled experiment or an observational study • Could the differences observed be just due to chance? Due to natural variability within the data? - Or there is really something not right about our assumptions about the theoretical world • In the theoretical world, assume NO difference • Testing is like proof by contradiction

Hypothesis Testing • Statistical Hypothesis Testing

What statistical inference means Sampling error, standard error of the mean Probability Probability histograms Estimation Point Interval Hypothesis testing Power

INFERENTIAL

- variable manipulated Predictor variable

Independent Variable (IV)

involve methods that allow formulating inferences beyond the actual data or making conclusions based on the data generalizing about population parameters based on analysis of sample statistics (inductive reasoning) E.g. chi-square test, t-test, one-way ANOVA)

Inferential Statistics

• .00 - .25 = little, if any correlation • .26 - .49 = low • .50 - .69 = moderate • .70 - .89 = high • .90 - 1.00 = very high

Interpreting correlations 1. magnitude/strength of the correlation (r value)

- r2 = coefficient of determination - % of variance shared by the two variables

Interpreting correlations 2. variance shared by the two variables (r2)

• the correlation coefficient is a sample statistic - only represents the relationship of X and Y at the level of the sample - it does not apply to individual cases in the sample - not everyone is on the regression line (unless r = 1.0 or -1.0) * Not all correlations can be best explained by a linear model *

Interpreting correlations 4. confidence intervals around the coefficient

IQR = 3Q - 1Q Middle 50% of the data (75th - 25th percentile) Good for ordinal, interval, ratio data

Interquartile range

marginal distribution - distribution of only one of the variables in a contingency table conditional distribution - distribution within a fixed value of a second variable

Joint Distributions (2/3)

frequency or relative frequency of the observations for the two variables considered together as a combination a.k.a. contingency table / cross tabulation /two-way table

Joint distributions (1/3)

average of all observations not resistant to extreme observations in cases were there are outliers, better to use the TRIMMED MEAN!!!!! (deletes the upper and lower 10% of the data before averaging) good for normal/symmetrical distributions good for interval and ratio data

Mean

Minimum First Quartile Median Third Quartile Maximum Z-scores

Measures of Location

middle score (score below which 50% of the distribution falls) not affected by extreme scores or skew good for ordinal data

Median

score that occurs most often peak of the histogram most "popular" score best for nominal

Mode

• "Guilty beyond reasonable doubt?" • p-value is a number between 0 to 1 that quantifies the strength of the evidence against the null • Smaller p values, stronger evidence

More about P-Values

- any recording of information (numerical or categorical)

Observation/Element (X)

Cumulative frequency polygons

Ogive

- any numerical measure describing some characteristics of the population (Greek letters)

Parameter

- 1. Randomly drawn from parent populations - 2. Samples are normally distributed - 3. Variances of groups being compared are roughly equal / homogeneity of variances - 4. Measured on interval or ratio scales - 5. Independent samples [[[Before you determine what test to use, tignan mo muna kung anong data meron ka. Then do a descriptive analysis. Do a histrogram. Check if normally distributed siya. If di siya normally distributed gagamit ka ng test na nonparametric. Parametric tests are used for normally distributed data, independent and homogenous (pareparehas ng klase) May parametric test na may counterpart na nonparametric depending on the data you have. To be safe, do a nonparametric test kasi yung assumptions are much more free. You rarely get a parametric data. If you're doing a before and after test, you can't do parametric test anymore cause di na siya independent.]]]

Parametric vs. Non-Parametric • Characteristics of parametric data:

- entire collection of observations to which we want to generalize

Population (N)

- The probability of reject a false Ho - The ability to detect a TRULY SIGNIFICANT result • Many studies set α at 0.05 and β at 0.20 (or a power of 0.80), but these are arbitrary values only: - α may be between 0.01 and 0.10 - β may be between 0.05 and 0.20 [[[POWER Power of a study is one minus beta The probability of getting a significant result when the reality is totoo naman talaga na may significant result When a study is high in power, mas able ka to detect significant result when they are actually happening Initially pa lang you try to estimate yung required sample size na masasabing adequate, to detect a result na high yung power or yung ability to say na may significant result Important early on, even as early as determining the sample size Beta usually at o.8 or 80% Ibigi sabihin your experiment has 80% power to to detect significant result Pag low yung sample size liliit yung power]]]

Power (1 - β)

• the likelihood that any one event will occur, given all the possible outcomes • Notation: - P(A) • where A = any event • Event (A): any subset of S (sample space) • Probability: 0% ≤ P(A) ≤ 100%

Probability

• instead of thinking of this as a histogram of individual observations, think about the normal curve as a distribution of sample means • the "SD" of the probability histogram is the SE

Probability Histograms

- if an experiment is repeated over and over again, then the average result with converge to the experiment's expected value

Probability Theories • Law of Large Number (LLN)

peak of the distribution = MODE number of peaks - 0 peaks = UNIFORM - 1 peak = UNIMODAL - 2 peaks = BIMODAL - >2 peaks = MULTIMODAL extent of spread presence of gaps / outliers skewness (symmetry) kurtosis (peakedness) figure that shows the entire distribution of data divided into bins, wc are mutually exclusive (so they dont overlap)

Properties of Histograms

- If univariate, do unadjusted analysis - If multivariable, do adjusted analysis • To minimize effect of confounders • Perform regression analysis

Q1: Univariate or multivariate?

- Do you want to test for a difference between groups or want to test correlation between variables? - Example: • Student t-test (comparing 2 means) • Mann-Whitney U test/ Wilcoxon rank sum test (comparing 2 medians)

Q2: Difference?

- Were the groups paired on unpaired / dependent or independent? - Are you measuring more than once from one sample? - Examples: • Student t-test (to compare control group from intervention group) • Paired t-test (to compare outcome before and after intervention)

Q3: Paired?

- Continuous or categorical/discrete/factor? - Examples: • Chi-square test (nominal data) • Kruskal-Wallis H test (continuous)

Q4: Level of measurement?

- Is it normally distributed? Check if it looks like a bell-shaped curve - PARAMETRIC tests are used for NORMAL distributions, while NON-PARAMETRIC tests are used for NON-NORMAL ones

Q5: Normality?

Range = Highest value - lowest value Good for ordinal, interval, ratio data

Range

DESCRIPTIVE INFERENTIAL

STATISTICAL ANALYSIS

H0: null hypothesis "nothing is happening"..."no difference" "innocent until proven guilty!" assume that it is true throughout the rest of the testing HA: alternative hypothesis Usually the research hypothesis (i.e. the hypothesis that the researcher believes in) true if we have strong evidence against the null hypothesis alternatives can be one-sided or two-sided [[[First step: Know your claims and the status quo The null and alternative hypothesis a.k.a. The current truth vs. your assumptions Alternative hypothesis: the researcher's hypothesis The assumption that you forward Can be one-sided or two-sided Ex. Frogs in earlier example Hypothesis was that the weight of the frogs was higher UNIDIRECTIONAL / ONE-SIDED The size of the frogs is less than 10.2 Used in trials and interventions, because you have to compare Ex. Intervention 1 has greater effect than Intervention 2 So you can say that the effect of one is bigger than the other BI-DIRECTIONAL / TWO-SIDED But if the hypothesis says that the weight of the frogs is not equal to 10.2 already, it can go into: Frogs are smaller Frogs are bigger Used in most research cases Null hypothesis: the status quo Usually, you claim that there's nothing happening "Ang katotohanan (at first, before you try beginning your experiment) ay, things are okay as is. Wala namang effect or difference. Walang nangyayari." Analogy: Court trial A suspect is innocent until the lawyer presents overwhelming evidence to say otherwise The same goes with science: when you experiment, the truth is wala pang nangyayari, hanggang sa makahanap ka ng evidence In most cases, ang nababasa na research → hindi siya equal, ex. estimation lang, Sa trial or sa intervention, dapat ma compare mo. Ex intervention 1 is higher than intervention 2 First step: state the truth and assumptions Ang mistake ng tao sa with the understanding of the concept of Ho is → if hindi siya totoo, you accept the null hypothesis. Ang statistical testing is to reject or not reject, hindi mo sinsabi na accept. Bilang researcher, gagawin mo ang lahat ng ways to show na significant siya so yun yung bias mo. dapat pessimistic ka rin. Dapat dun ka rin na side na feel ko wala mangyayari → para may lumitaw na postive na effect sobrang unexpected. Minsan if may gumawa na isang experiment then sinabi na totoo pa rin yung null, may kalaban ka na sa assumption mo. • H0: assumes that the difference between groups occurred by chance as a result of sampling error • HA: there is true difference between groups • "Disproving" the null hypothesis: - you do the test to decide if H0 is false - reject or not reject the H0 ]]]

STEP 1: State the null hypothesis and the alternative hypothesis

• The probability of committing a Type 1 error • probability of an observation given a hypothesis - NOT: the probability of the null hypothesis being true is p - usually, alpha = 0.05 • MEANING: "There's 5% probability that the result I have cannot `be just due to chance." - Alpha can be set also at 0.01, or 0.1 [[[Alpha → probability of obtaining an event or outcome or a result, given that the null hypothesis is true The probability of obtaining an outcome given na totoo yung null hypothesis mo Yung kanina pinakita na histogram, the probability of getting a significant result is very small. ANo yung nababasa niyo na p-value? Diba usually .5 or alpha was set at 0.05. Ang ibig sabihin nito the probability of getting the proposed result is just 5%. It's still very low compared to the 95& na probability na di makukuha significant effect Ang p-values or ang alpha mo or significance level mo, iba iba yan depending on the nature of research If you're studying germs, yung madaling macontrol, usually your room for error is small lang For example if you want to find a medicine for something, minsan .01 lang yung binigay mo. Eto lang yung confidence mo na makakuha ng significant na result. Sa tao tho, mas maraming room for error. If tao, dapat mas lenient ka. → Ang acceptable na alpha ay 0.05. Convention lang to pero ngayon may challenge na cause sometimes. Sometimes kasi yung effect, walang statistical significance pero may clinical significance. Iba tong dalawang to. When you do research aim to have both. Alamin mo yung value or hypothesis and tignan yung chance na makakuha ng significant na result. ]]]

STEP 2: Determine alpha level or p-value - the probability of obtaining an observation /outcome given that the null hypothesis is true

- It is like "collecting the evidence" - summarize data into a test statistic - a test statistic is constructed assuming that H0 is true

STEP 3: Determine what test to use

- a subset of the population

Sample (n)

For example, you want to measure the color of all the frogs in the Philippines. When you just take samples of frogs ONLY from the College of Med, and you concluded that the true color of frogs in the Philippines (parameter) is brown, but you based it from your sample only, are you making the right conclusion? Probably not This is what you call a sampling error: you make the wrong conclusion because the sample that you got isn't good. Supposed to be representative, big, etc.

Sampling Error

Tendency of the sample values to differ from the population values Sampling error = μ - x̄ The ↑ sampling error, ↓ estimate of population mean PROBLEM: we don't usually know how big the population is and the population parameters [[[ Sampling error = population value minus sample value The higher the sampling error, the worse the estimate of the mean will be Population: usually represented by capital Greek letters Sample: usually represented by lowercase Greek letters We don't know how big the population is and what their parameters are, so you're really going to have a hard time estimating. How do you estimate / quantify the error? Calculate the standard error of the mean Quantitative value of sampling error]]]

Sampling Error (definition)

This means that this study was able to prove that the true state of things has changed SIGNIFICANTLY (assuming that you did it with the least amount of bias possible) Now, your claim / hypothesis that PROMPT's effect is higher (true effect / parameter) is SUPPORTED

Say you did a third experiment, this time enrolling 7,000 kids. You got an effect size of 45%, but since you enrolled so many, your CI became 43-47%. Small CI, small margin of error "Kapag nagkamali ka pa, ewan ko na lang" → kind of what you want to point out DOES NOT overlap with the previous experiments

• Factors to be considered: - Objectives of the study (descriptive? Or inferential?) - Type of variables (quanti or categorical?) - Level of measurement (nominal, ordinal, interval, or ratio?) - Whether the samples are related or independent - Assumption about the test (parametric or nonparametric?) [[[SELECTING WHICH TEST TO USE What procedure to do to evaluate the evidence you got The result is usually a test statistic, then you compare it with the p-value that you set Check if significant sila Selecting the Right Test Statistic • Related or independent samples - Independent = the probability of selecting samples in one group is not affected by the sample selection in the other - Related = sample in one group is dependent / affected by the other group • Parametric vs non-parametric test - Parametric = can be used if the assumptions of NORMALITY, INDEPENDENCE and HOMOGENEITY are met - Non-parametric = when you can't assume N-I-H Yung tatanongin mo sa sarili mo when you create tests Are you dealing with an experiment na isa lang yung variable or madami Hindi na gumagawa ngayon ng univariate na analysis, laging multivariate na. Kasi sa mundo lahat naman ay affected by multiple variables If you are dealing with a multivariate na data, tignan mo yung color. Mili ka n lang test kasi specified na kung anong test for multivariate (check table above) Experimental na study usually mas nagddwell sa ganito (check table), pero sa observational study → rarely do you get a study na isa lang ang variable of interest Ano yung aim Do you want to investigate a difference, to correlate, or to predict Are the samples independent or paired Pag independent, the way that one group is selected/sampled is not influenced by the way you sampled the other. Dependent → match samples (ex. Match in terms of characteristics), before and after studies/repeated measures (ex. Sila mineasure mo or baseln and sila mineasure mo after 6 mos. → same lang yung sample) Ano klasing variable yung outcome (is it continous? Ordinal? Ilan yung groups? (ano yung sample size) THEN FINALLY THE TEST Para makuha yung yellow na box, the aim of research must be to predict something. Kasi lahat ng regression na tinatawag you're trying to predict an outcome based from a set of independent variables. Ex. UP med, they ran a regression analysis, they checked yung predictors of success in medschool (age, gwa, college youre from), basta maraming variables --? Variables = x, success/unsuccess = y → binary outcome so logistic regression. Tas they found out na gwa is the most predictive factor. If yun yung gusto mong gawin, yung mag predict ka → do regression study ]]]

Selecting the Right Test Statistic

-2 and +2 are considered acceptable in order to prove normal univariate distribution (George & Mallery,2010). George, D., & Mallery, M. (2010). SPSS for Windows Step by Step: A Simple Guide and Reference, 17.0 update(10a ed.) Boston: Pearson.

Sidenote: What is an acceptable value for skew and kurtosis?

• a way to estimate the sampling error (since it's hard to know the population characteristics (e.g. mean)) • an estimate of the population standard deviation • Formula: - s= SD - n = sample size • The larger the n, the smaller the SE SEx= s over square root of n [[[Application: Go back to frog example From the distribution of the frogs, you saw that the colors show a small standard deviation Meaning, the colors are not very widely spread out—they're almost all the same If observations / data are closer together and have a small SD, sampling error is smaller If the SD is larger and the colors are more widespread, what will happen to the sampling error? It will increase, because it's directly proportional to the SD (refer to formula) The larger the sample size, the smaller the sampling error Because they're inversely proportional (refer to formula) What is the implication when you create your own research? This is why we calculate for the sample size (usually statistician does it for you) You should know that as much as possible, you get a large enough sample, so that the data and conclusions will not suffer from the sampling error]]]

Standard Error of the Mean

Average absolute distance of each point from the mean ↑SD = ↑spread Cannot be used to compare variation of 2 different variables - Coefficient of variation is used

Standard deviation

- any numerical value describing some characteristic of the sample (English alphabet)

Statistic

It is a statement about the value of a population parameter (e.g. mean, median, mode, variance, standard deviation, proportion, total) An assertion or proposition about the relationship between 2 or more variables Formulated as a result of years of observation and research

Statistical Hypothesis

characteristic of the sample / values observed from the collected data

Statistical Inference • Recall: - statistic -

- Collecting - Organizing - Summarizing - Analyzing - Interpreting data • Using imperfect information to infer facts, make predictions, make decisions

Statistics

1. "Clean" the data Checking for missing data, unusual values Running frequencies on every variable Checking for adequate variability 2. Organize and display data Using descriptive statistics to describe characteristics of the sample Creating tables, charts, to visualize data Central tendency, dispersion, shape, outliers 3. Use inferential statistics to test hypotheses Using inferential statistics to test hypotheses Depends on research design, sample size, distribution, measurement scale

Steps in Data Analysis

Frequency counts Relative frequency Joint distributions - Marginal distributions - Conditional distributions Pie charts Barplots - Stacked barplot - Juxtaposed barplot

Summarizing Categorical Data

• Statistical inference - making conclusions and decisions based on data Draw conclusions based on LIMITED / AVAILABLE data What's available? What you collected What you obtained from ROL When you infer, you try to validate your assumptions and use them to make assumptions regarding the true state of things. You can say that PROMPT is effective for kids aged 6 and below if your data showed it, but if you want to claim that it's effective, the basis of this conclusion is the collected data + scientific and statistical models When your sample size is bigger, less bias = more reliable research With your research, you're going to obtain values (these are called statistics). Example value: effect of PROMPT If you want to measure this, you'll have statistics from your data, but the TRUE EFFECT is reflected on what is called the parameter

The Process of Statistical Inference

To deduce or conclude something (e.g. a pattern) from evidence and reasoning, rather than from exquisite statements When you make inferences, you are claiming or proposing that something is true

To infer

PRIMARY basic clinical experimental SECONDARY

Types of Designs in Medical Research

Descriptive & Exploratory Describing situations and events e.g. survey, qualitative studies Explanatory Do not attempt to determine causality Aims to know how variables are related to each other Predictive Which variables are predictive? Determining causality Usually quasi-experimental/experimental

Types of Research

estimate a parameter or a characteristic of the real world / true population from the statistic that one derived from the analysis.

Usually, the goal of inferential statistics is to:

Deviation score = element - mean Variance = mean of the squares of all deviation scores in the distribution

Variance (and deviation scores)

statistical procedure used to measure and describe the relationship between two quantitative variables (denoted x and y) correlations are both descriptive and inferential useful in making predictions always range from -1 to +1 +1.0 = perfect positive correlation -1.0 = perfect negative correlation shown as scatterplots regression line = the best-fitting line that can be drawn through all the data points (line of best fit)

What is a correlation?

This means that your claim that PROMPT is more effective is NOT SUPPORTED Because you said that it's effective by 35% but could be as low as 28% or as high as 42% Wala pa rin siyang pinagkaiba kasi sakop pa rin siya ng CI ng naunang study To demonstrate a significant effect, the confidence intervals MUST NOT OVERLAP

When you compare 2 experiments, for example... Experiment A (other study) said that the articulatory capacity of the child with PROMPT will increase by 30%, but the confidence interval is 25-35% Now, you want to prove that the effect of PROMPT is higher—so you get a larger sample and you got 35% as a result Your CI is 28-42% Large margin of error *Sir graphed the 2 studies* Intersected with the other study (Experiment A)

When you try to come up with a result, especially a numerical one, always report the point estimate and the corresponding interval estimate / confidence interval. Point estimate: exact point estimate Interval estimate: other possible values (range)

When you see studies, especially quantitative studies, what you need to read is the point estimate and the interval. PLS-4 Ex. Language score of 6 y/o child is 120 120 is the point estimate, that comes with a confidence interval (ex. 118-122) You can choose what confidence level you want: 90%? 95%? You are 90% or 95% sure that the true value lies within this interval Kapag 90% lang, mas mababa confidence mo but mas malawak 'yung range In the same way, if it's 99%, the range is smaller

look at all the data first before looking at the summary! to overcome natural tendency to rely upon summary information, such as an average/mean

Why start with histograms?

Specifies the location of any element in a normal distribution in terms of SD units

Z-scores

• Specifies the location of any element in a normal distribution in terms of SD units 1.64 standard deviations away from the mean → ganun kalayo 'yung nakuha mong value 2.58 → outlier Ex. In exams If your score is very high relative to your classmates Because the probability that the class will get that grade is 1/25 only (because it's only one person) When you do an experiment, what you want to do is to disprove your current assumption about the world. Because when you say that it's effective, the truth is, wala siyang effect It's like "innocent until proven guilty" The same goes for research: As long as you don't have evidence to prove your assumptions, the status quo would still prevail (which is "walang effect" or "hindi natin alam 'yung effect") The probability of getting a significant effect usually lies /here/ or /here/ Because the chances that your status is true are higher If you get results from your study that has a very low probability of occurring (you really didn't expect it to happen), but you still obtained that data, it means that it provides the support to say that your conclusions are correct

Z-scores / z-deviates

1. DATA COLLECTION 2. DATA PROCESSING / QUALITY ASSURANCE 3. DATA ANALYSIS (3 aims of researches)

intro (3 steps)

Probability is the inverse of statistics—alam mo na kung ano ang nasa loob ng pail, tapos huhulaan mo kung ano yung nasa kamay mo (ano yung magiging sample). On the other hand, in statistics, you only know the sample. From the sample that you have, you infer—ano kaya 'yung kabuuan? PAIL EXAMPLE PROBABILITY: Given the information in the pail containing all the marbles, what is in your hand? You know that there's 100 red marbles and 100 blue marbles The probability of getting a blue marble is ½ STATISTICS: Meron kang nabunot na dalawang marble. Ano kaya nag distribution of colors sa loob ng pail? This is what you try to infer To make it applicable to research... Whenever you do an experiment, you always end up having a conclusion that a particular event is going to happen, or there is a truth that exists At merong kaakibat na PROBABILITY What you're always trying to say when you do research is, for example, "We are concluding that attitudes of women regarding disability..." The probability of your claims being true depends on the strength of the statistics that you obtain Your conclusions are based on the limited evidence that you get from the data you have Usually, the probability of the occurrence of an event can be estimated This will be the basis for hypothesis testing

probability vs. statistics

- If an experiment is repeated over and over, then the probabilities of the average result will converge to a normal (bell-shaped distribution) • The random sampling distribution of means will always tend to be normal, irrespective of the shape of the population distribution • The random sampling distribution of means will become closer to normal as the size of the samples increases

• Central Limit Theorem (CLT)


Set pelajaran terkait

Nursing Care of Children Final Study Guide

View Set

Preparing For College and Careers~ Final Exam!

View Set

5 - Operations Management Strategy

View Set

Ch 36: Pretest, Post Test, Chapter Test

View Set