Stats Exam 2
how do you calculate f ratio?
(variability due to treatment + error)/ error
Example of a thematic analysis
** asking the question of what makes a person effective. Use a case study approach for this and do open ended interviews with successful people and the find patterns and themes that emerged to explain effectiveness. Themes: 1. Being proactive 2. Beginning with the end in mind 3. Putting first things first 4. Thinking win/win 5. Seeking first to understand 6. Engaging in creative cooperation 7. Self-renewal
(Morrison et al.) Model Fit
- A broad range of fit indices, encompassing four broad categories (i.e., overall model fit, incremental fit, absolute fit, and predictive fit), should be used - overall model fit: uses chi squared tests and tests whether the model fits the observed data -incremental fit: compare the model that is being tested to a baseline model which, typically, is one in which all variables are uncorrelated. Sample indices include: the normed fit index (NFI), the comparative fit index (CFI), and the Tucker Lewis index (TLI). -Absolute fit indices, such as the root mean square error of approximation (RMSEA), goodness-of-fit index (GFI), and the standardized root mean square residual (SRMR), determine how well a model specified a priori reproduces the sample data- The greater the absolute magnitude of a given correlation residual, the greater the misfit between the model and the actual data for the two variables in question -predictive fit indicators examine "how well the structural equation model would fit other samples from the same population"
(Morrison et al.) Best Practice Recommendations for Using Structural Equation Modelling in Psychological Research-- Abstract notes
- Although structural equation modelling (SEM) is a popular analytic technique in the social sciences, it remains subject to misuse - Purpose: to assist psychologists interested in using SEM by: 1) providing a brief overview of this method; and 2) describing best practice recommendations for testing models and reporting findings
(Meltzoff & Cooper) ABSTRACT TO ARTICLE
- An abbreviated summary or abstract of the study is presented at the beginning of the journal article or research report - abstract often have a word limit so the info included in highly condensed
what does the model chi sqaured static tell us?
- Difference between current model (e.g. Step 1 with predictors added) and previous model (e.g., Step 0 with no predictors added) • Comparing -2LL between steps • Significant change in -2LL indicates model is improving
(Boedeker & Kearns) Conclusion
- For evaluating overall classification, repeated k-fold cross-validation is recommended, when possible, and Huberty's I index is the recommended effect size for determining whether the classification model performs better than chance. - Posterior probabilities indicate the probability of group membership, and typicality probabilities are useful for identifying outliers or pointing to a potentially unnamed class.
in a study on memory, participants were divided into older (15-18 years old) and younger (11-14 years old) students. Some students were instructed to remember a list of 20 words by relating the word to themselves. A second group was told to remember them by using the first letters of the words to make sentences. The third group was not given and special instructions. what are the IVs and DVs? what is the appropriate statistical test?
- IV is the method used to remember the words and the age group. The DV is the effects on memory. - the appropriate statistical test is factorial ANOVA.
(Syed & Nelson) What Constitutes Agreement?
- In general, for nominal coding schemes k and/or Delta should be used to establish and report interrater reliability. - There are two important take-home lessons from this exposition. First, there are numerous factors that can affect the value of a reliability index, including the number of categories (k), the number of items to be rated (n), the difficulty of the coding system, the relative frequency of codes within a category, and the naivety of the raters, among others. (makes it hard to know what good agreement is). second, establishing reliability is about more than just the final coefficient that makes it into published articles (establishing reliability is a lengthy process bot just the final coefficient. it involves developing a coding system, training observers, and refining the understanding of that which the researchers are coding.)
(Syed & Nelson) What Is Reliability to a Qualitative Researcher?
- In the qualitative literature, the closest thing to reliability is the concept of ''rigor'' -rigor is a product of the entire research process and ''derives from the researcher's presence, the nature of the interaction between researcher and participants, the triangulation of data, the interpretation of perceptions and rich, thick descriptions'' -For many qualitative researchers, a collaborative research team forms the backbone of the process of developing a coding system and applying it to qualitative data in a manner that is deemed ''rigorous.' - not about truths about interpretations. -being close to the data and understanding it well enuf to interpret and rate it well (rigor is important to them coz its a long process and not jsut a coefficient)
(Boedeker & Kearns) Overview of LDA for Prediction
- LDA can be used to allocate new observation to previously defined categories thru a classification rule -. First, data for which group membership is known (training data) are used to derive LCFs. - A case's observations are submitted to each group's LCF to calculate a classification score, and the case is assigned to the group for which it has the highest score. -Posterior probabilities of membership in the groups and typicality probabilities—values representing how "typical" a case's data are for each group—are used to evaluate the assigned classification -The overall accuracy of prediction can be evaluated for the training data using hit rates and effect size (Huberty's I index). -Hit rate is the percentage of cases correctly predicted using the LCFs. Higher hit rates indicate more accurate prediction of group membership.
(Wiest et al.) Intro to logistic regression
- Logistic regression extends the concepts of linear regression to binary outcomes (dichotomous-- like being dead or alive) - binary outcomes are summarized using proportions or percentages - in this paper a data set from the nasal continuous positive airway pressure (CPAP) or Intubation at Birth Trial (COIN) which si study on preterm infants with respiratory distress who were randomly assigned to receive either nasal CPAP or intubation and ventilation shortly after birth - outcome/DV was dichotomous (poor (dying or remaining on oxygen) or good (living and not on oxygen), and intubation was the standard of reference treatment - other factors that can effect outcomes were also measured like birth weight
(Schmidt) Random and Fixed Models in Meta-Analysis
- Most readers are aware that meta-analysis is widely used today and that conclusions about cumulative knowledge presented in textbooks and elsewhere are increasingly based on metaanalysis results -So isn't the problem solved? Actually, there is still a serious problem, because an inappropriate statistical meta-analysis model is very frequently used in the literature. - fixed effects meta analyses ar problematic because they assume all variations across studies is due solely to sampling error and not other artifacts - random effects meta analyses are better because they treat this assumptions as a hypotehsis and test it to see whether or not all variance is accounted for by samplying error and consider other artifacts fo error as well - 90% of meta analyses do not correct for measurement error which is always present so this is a huge problem-- they also do not correct for other artifacts either -- this results in mean values that are too low and confidence intervals that are too narrow -- effect sizes are also biased -industrial organization psych literature often us the RE model instead of the FE model - problem of accurate analysis and integration of research literatures in psychology has not yet been fully solved.
(Morrison et al.) What Is Structural Equation Modelling?
- SEM is a multivariate statistical technique that can be conceptualized as an extension of regression and, more aptly, a hybrid of factor analysis and path analysis -he beauty of SEM is that it allows a researcher to analyse the interrelationships among variables (liek factor analysi) and test hypothesized relationship among constructs (similar to path analysis) -SEM ALLOW U TO TEST A NUMBER OF INTERRRELATIONSHIPS AT THE SAME TIME - EM often assumes linear relationships, it is similar to common statistical techniques such as analysis of variance (ANOVA), multivariate analysis of variance (MANOVA), and multiple regression - differs coz SEM departs from other statistical methods because it enables researchers to include multiple measures and reduce their measurement error—error inherent in any data utilized in the social sciences or related disciplines -confirmatory) hypothesis testing) appraoch) - A confirmatory approach is adopted because researchers specify a priori the interrelationships that are theorized to exist (i.e., through specification of a model), with the next step being to test how well the theorized model fits the obtained (sample) data
(Meltzoff and cooper) STAGE 2: SEARCHING THE LITERATURE
- Searching the literature and screening studies for their relevance is how research synthesists define their sampling frame - first check to see which deference databases the synthesists consulted like psycinfo, medline for medicine and health, etc. - Synthesists would also carefully choose the search terms they use in the database to find content related to the topic - As a critical reader, you should also assess how the synthesists supplemented their reference database search results by examining references mentioned in the relevant articles they find (called a backward search) - and by searching for articles that mentioned important articles on the topic (called a citation search or forward search)
(Boedeker & Kearns) Assumptions
- Two assumptions of LDA for prediction are multivariate normality of the distribution of variables within classifications and equality of variance-covariance matrices across classifications. - When multivariate normality is violated, the probabilities are not exact and must be interpreted with caution. Multivariate normality is particularly important for the utility of computed posterior probabilities and in calculating the intercept term of each LCF -The assumption of equal variance-covariance matrices is what makes LDA linear instead of quadratic
what are the 5 Ws of epidemiology described in the lecture?
- WHO - who is the disease effecting - WHAT - what is the disease of interest - WHEN - when will the NeXT spike of cases be or when does the virus effect ppl - WHERE - which geographic location is effected - WHY - why a certain area and not others effected
(Fink et al.) Incidence Rate and Incidence Rate Ratio
- When a cohort study has losses to follow up, the incidence rate, calculated as the number of new cases per observed person-time, is used to estimate the experience of health and disease within the sample - observed person-time: a measure of the time at risk that each participate contributed to the study follow up before either developing a disease or dropping out of the study
(Morrison et al.) Two-Stage Modelling
- When conducting SEM, it is recommended that the measurement models be assessed first, using confirmatory factor analysis (CFA), followed by simultaneous assessment of the measurement and structural models
(Fink et al.) Epidemiologic Experiments
- include field trials and randomized controlled trials. - field trials: treatment is randomly assignment to diff groups of participants to investigate the initial occurrence of a illness of interest (ex: putting patients with bipolar 1 disorder in diff treatment groups and following up with them after 24 months to compare relapse rates) -randomized control trials: a special kind of cohort study where participants are randomly assigned to an exposure at random (to balance the distributions of confounders) - random control trials are diff from cohort studies (observational in nature) is that the researcher has control over who is assigned to which exposure group -- more beneficial coz of control over assignment to conditions thru randomization - limitations of randomizations: while it can allow for cause-effect inferences, some research questions have risk factors that are not suited for randomization (for example it is unethical to expose ppl to traumatic events, harmful toxins, disease, or social conditions) -randomized control trials have limitations but are powerful in detecting causes
why wouldn't we want to use t-tests for the ANOVA study example above?
- inefficient because you would have to do multiple t-tests when u cud just do one anova -this also increases ur chances of type 1 errors (rejecting the null when it is true because youve done many t-tests and the error accumulates across tests)
whats included in the results section?
- info on participant flow like any missing data or sample sizes - Data analyses: descriptive statistics (like means and SD, and sample sizes) and inferential statistics which are like the null hypothesis sig testing and effect sizes and confidence intervals)
what are the purposes of meta analysis?
- integration which involves combining studies and comparing studies - theory development - furthering theory - building an understanding of what the central issues are in a particular discipline
(Meltzoff and cooper) STAGE 6: INTERPRETING THE EVIDENCE
- interpret all the evidence -- what Are the conclusions of these studies together
(Wiest et al.) Logistic regression for binary response variables
- regression models enable us to examine the independent association of the effect of multiple explanatory variables with the expected value of an outcome in a single model. - we run into problems if we use simple linear regression to model probability because probabilities must lie between 0 and 1 and linear regression functions take on any real numbers so that wud mean using linear regression wud give used predicted values for our probability that are greater than 1 or less than 0 which are impossible values in probability. probabilities generally also have nonlinear shaped relationships with covariates. SOLUTION TO BOTH: transform the probabilities to a scale that can take on any real number. This can be done using log odds transformation. - logistic regression model: fit a regression model into transformed probabilities
what do conference submission require?
- require u to ahve the same elements and sections of research article - but they have strict page or world limits so u need to be concise
(Meltzoff & Cooper) METHOD TO RESULTS TO DISCUSSION
- researchers need to give attention to all measures used not just a measure that yields results in the predicted direction - data mining- fishing for significance, or p-hacking. It capitalizes on chance findings. - selective reporting or data censoring: measures are mentioned in the Method section but no mention is made of some of them in Results and Discussion. also not reporting non sig results - if non sig results are omitted it wont be possible for u to tell how srs to take the results and other researchers might not be able to replicate the findings - u cant reject the null u can only fail to reject it
(Wiest et al.) Risk and relative risk
- risk is defined as the proportion of individuals exhibiting the outcomes of interest in a certain group -risk is calculated by dividing the number of cases (typically participants with a bad outcome) by the total number of participants in each group -relative risk: the ratio of the risk in the two groups. - a relative risk of 1 wud indicate the risk is the same for the intubation and CPAP groups (divide the risk of each group (main group divided by reference to get relative risk)) -values further from 1 represent a larger diff between the groups - there is a disadvantage with the relative risk in that it cannot be estimated from case-control designs
what are teh general study features that need to be coded?
- sample characteristics - source characteristics - methodological characteristics
(Meltzoff & Cooper) USE OF DECEPTION IN RESEARCH
- some deception might be necessary for some research designs - when there is deception researchers need to tell readers when debriefing is done and if participants were permitted to leave with false beliefs abt themselves.
(Wiest et al.) Example and interpretation
- standard errors are used to create 95% confidence ('uncertainty') intervals for the true values of each coefficient. - treatment was not signficant
what are the three main types of multiple regression?
- standard multiple regression - sequential (hierarchal) regression - statistical (stepwise) regression
(Meltzoff & Cooper) Inferences, Conclusions, and the Research Report Abstract summary
- statistical significance does not mean practical or clinical significance - you should compare effect sizes in a study o other effects that are related to it just like you would other treatments of the same disorder or other relevant outcomes variables -the descriptions of methods and results sections need to be consistent across the different sections of a report - a reporter needs to provide a complete description of the research and need to meet reporting standards
(Meltzoff & Cooper) Statistical Significance, Generality, and Practical or Clinical Significance
- statistical significant is not the same as generalizability - statistical sig jsut tells u how likely it is that the null hypothesis is true, or how likely it is that the null hypothesis will be rejected, in another sample of people like those sampled in this study -- says nothing about how robust a findings is to other samples, settings or measures - Statistical significance also does not mean the finding is important; more statistically significant is not synonymous with more important; highly significant is not the same as highly important - sometimes a one point difference is statistically significant in large samples on a IQ test but this literally means nothing - important of a findings is based o effect size - sometimes in studies with smaller samples the results can be non sig but look like they shud b-- mayb things wud be diff if the sample was bigger and shud look to see if they did a power analysis - trend shud be used to describe the changing course of data overtime rather than describing data that is near statistical sig (p = 0.09) - u shud refer to p values that fall between .05 and .10 as equivocal and indeterminate
what are the criticisms of meta analysis?
- studies that use different method and styles are being combined and compared (apples and oranges problem) and we need to deal with this by carefully coding for relevant study characteristics and testing moderator effects. Ideas need to be narrowed a little and we need to define things and reduce our focus - another issue is the garbage in garbage out problem which suggests that poorly designed studies combined with results of well-designed studies will give u crap results so exclude bad studies and code the quality of designs and consider what the differences are -file drawer problem where published research is very biased because only significant results get published and non-significant results do not and u need to dig thru the literature for studies with nonsignificant findings otherwise u will have a biased meta-analysis with only sig findings (search for studies, books, dissertations, conferences and ask for data from unpublished work) -multiple results from the same study
briefly describe the difference between an inductive approach and a deductive approach as discussed in the qualitative lecture.
- the deductive approach is a top down approach also known as the scientific method which starts with a theory and hypothesis, the you collect data, and then analyze data, and then confirm or reject ur hypothesis based on ur data - the inductive approach is a bottom up approach that start from data and then analyzes the data to draw general conclusions.
(Meltzoff & Cooper) METHOD TO DISCUSSION
- the researchers might sometimes report that during the research the sampling plan was altered or the procedures were modified for the study to be complete. Changes may compromise the internal validity of the study - differential attribution: a study starting out with random assignment to conditions but ending up with many subjects choosing their own condition - needs to be reported as a limitation if this happens - don't use casual language when discussing findings like saying influence or produced when reporting a static group comparison (there cant even be a cause here)
(Meltzoff and cooper) Research Syntheses and Meta-Analysis
- the synthesis of past research need to meet the same standards of scientific rigor as new data collection -stages of research synthesis: problem formulation, literature search, coding of studies by judges, evaluating of study methods, meta-analysis, and interpreting and presenting the results -meta analysts use individual studies characteristics and effect sizes as dara and affect sizes are averages to find overall conclusions as well as use d to search for influences on the individuals studies results - meta analysts face the same problems faced by collectors of new data like coder reliability and missing data - research synthesis are also known as research reviews or systematic reviews of already existing research to answer the question of what the literature already says about the thing that interests us -meta analysis; a set of procedures for summarizing the quantitative results from multiple studies
(Fink et al.) EPIDEMIOLOGIC ANALYSES
- the ultimate aim of epidemiological research is to quantify a causal relationship between exposure and disease.
(Boedeker & Kearns) Linear Discriminant Analysis for Prediction of Group Membership: A User-Friendly Primer -- Abstract notes
- there are different models that exist to predict group membership but it depends on the attributes of the data -in some cases linear discriminant analysis (LDA) has been shown to perform better than other predictive methods like logistic regression , multinomial logistic regression, random forests, supper vector machines, and the k-nearest neighbor algorithm. - Purpose of article: give a general overview of LDA and an example of its implementation and interpretation - discusses the decisions that need to be made when conducting LDA (e.g., prior specification, choice of cross-validation procedures) and and methods of evaluating case classification (posterior probability, typicality probability) and overall classification (hit rate, Huberty's I index) - LDA for prediction is described from a modern Bayesian perspective compared to how it originally wud be described
what are some the issues with the current peer review system?
- there are unethical practices during the review process (like agreeing to do a review w/out enuf expertise, being too familiar with the research area can be an issue for reviewers --maybe knowing the researcher coz their research is similar to urs, conflict of interest where u miht be less objective is someone is doing a study similar to urs and might effect urs if u review their paper and let it go thru -- effects objectivity) - publication bias -harking - p packing ** lots of pressure being placed on researchers to have sig findings
what are the advantages for multiple regression?
- there is flexibility in the level of measurement for IVs - multiple IVs are not an issue (as long as you have a sufficient sample size) - can be used for non experimental research designs where IVs are not randomly assigned or manipulated (in contrast anova is mainly used for experimental research)
(Morrison et al.) Sample Size Requirements
- there is not singular sample size reuriemtn coz it depends on model complexity, amount of missing data and the size of factor loadings - if a researcher wanted to conduct a confirmatory factor analysis with a single latent variable and 6 indicators (having average loadings of .65), a sample size of 60 was adequate. - for more complex mediation models of three latent factors and three manifest indicators requires a sample of 180
(Meltzoff & Cooper) ADVERSE REACTIONS
- there needs to be justification for the study before reserach begins - if particiapnts have bad reactions in the study we need to know if the researchers did something and if anything was done abt these reactions (ex: , did some veterans with posttraumatic stress disorder experience extreme anxiety caused simply by answering questions that had them focus on stressors in their lives?) - Details about the handling of any problems with participants are reported by ethical researchers, along with ways that the procedures may have been modified to prevent further incidents. - If no problems arose, a negative report is expected stating, in effect, that there were no signs of any adverse reactions as determined either by observations during the experiment or by inquiry in the postexperimental debriefing session. (give enuf info to show u complied with ethical standards) - Related, the researchers must respect the individual's freedom to decline to participate in or to withdraw from the research at any time (shud be in the results sections when researchers discuss attrition rates)
why are researching standard important?
- they ensure that researchers are communicating clearly how the research was conducted and reporting their research findings - recognizing the strengths and limitations of evidence obtained from ur research - allow for greater reliance on meta-analysis (reporting standards are important because if all studies give accurate info abt the research method used and data collected then we can rely on meta-analyses that pool results of all these studies together-- allows more studies to be included in the meta analysis)
(Fink et al.) defining and measures exposures
- this begins by articulating a research questions before the start of a follow up - an exposure is variable hypothesized by researcher and dictated by the research question that may cause a certain health outcome - exposures vary acorss several dimensions like type, chronicity, severity, and timing - type of exposure can be either individual (nutrition, smoking) to macrosocial (political economy, culture) - other dimensions: persistence of the exposure from acute to chronic AND severity from mild to severe, and when concerned with life span it is gestation to death - exposed group is a cohort study represent ppl who have been exposed to a risk factor (experience childhood trauma) - nonexposed is a group not exposed (did experience childhood trauma) - a good measure needs to be both reliable and valid and acts in a consistent and predictable manner - reliability: test retest, interclass correlation coefficient, ad calculate cronbachs alpha for internal consistency - validity: must be reliable and can measure validity by comparing the measure to a gold standard measure with known validity.
what are some examples of epidemiological questions for epi research?
- when can we expect the next surge of covid? - why are there high rates of CHD in certain regions in the US - how can uterine cancer be prevented? - how often and at what age shud healthy ppl have a coloscopy? what about those with a family history or with risk factors?
(Boedeker & Kearns) When to use LDA for prediction
- when the number of predictors was less than half the sample size and the predictors had relatively high correlations (> .6; though not so high as to cause multicollinearity issues), LDA was the method of choice. - LDA was shown to perform well when class membership was highly unbalanced - LDA can be used for classification of three or more groups (unlike logistic regression) and does not require specification of a reference group (unlike multinomial logistic regression) - LDA also has the advantage that it can be used to estimate model parameters under conditions of separability), that is, when a single predictor is able to perfectly separate cases into classes. - when data are not multivariate normal and the variance-covariance matrices are not approximately equal, LDA will not be optimal for classification
(Meltzoff and cooper) STAGE 1: FORMULATING THE PROBLEM
- you may ask what are the variables or interventions the researchers wanted to study and what are the operational definition the researchers claimed are measurable expression of their variables or constructs of interest - you will basically make the judgment about whether the research synthesis properly fits the concepts and operations together by examining the research evidence the researchers claim was relevant to the problem or hypothesis of interest -you must consider whether the conceptual and operational definitions are carefully defined and fit together well.
(Morrison et al.) What are Essential Features of Structural Equation Modelling?
-SEM includes, describes, and tests the interrelationships between two types of variables: manifest and latent. - Manifest variables: those that can be directly observed (measured) - Latent variables: are those that cannot be observed (measured) directly due to their abstract nature -Latent variables can be broken down into exogenous and endogenous variables - exogenous: simialr to independent variables because they cause fluctuation in the values of other latent variables in the model. they cannot be explained by the model (can be seen as factors external to the model and create changes in the value of exogenous varibalse (liek gender or socioecnomic status can be external to model and create fluctuations in other latent variables) - endogenous: are influenced by exogenous latent variables and this influence can occur directly or indirectly.--any change in the value of endogenous latent variables is thought to be explained by the model (coz they are found in the model) - SEM model consists of ameasrument model: a set of observed variables that represent a small number of latent (unobserved) variables. -measurement model describes the relationship between observed vairables and unobserved variables (connects instruments that are used to the constructs theya re hypothesized to measure) - SEM typically has a structural model: a schematic depicting the interrelationships among latent variables -a structural model that specifies direction of cause from one direction only is called a recursive model, and one that allows for reciprocal or feedback effects is termed a nonrecursive model
what Are the assumptions of DFA?
-absence of outliers (OUTLIERS THAT ARE TOO BIG CAN CAUSE ISSUES-- can transform or windsorize to take care of outliers) -homogeneity of variance (variability between the groups needs to be similar-- a problem if we have small sample sizes) -linearity (assuming a linear relationship between predictors and outcome) -multicollinearity (the assumption that predictors are too highly related-- if they are too hghly related some we may want to get rid of one or create an index)
what ARE the two main purposed of using DFA?
-classification (coming up with a decision rule that allows us to correctly classify a maximum umber of ppl into a group) -interpretation of functions (how are discriminant function distinguishing between groups)
(Fink et al.) The Fourfold Table: The Risk Ratio, Risk Difference, and Odds Ratio
-fourfold table was introduced for calculating and comparing the cumulative incidence (average risk) of disease among the exposed and unexposed groups. - epidemiological studies often investigate the association between binary exposure (exposed and unexposed) and binary outcome (diseased and non diseased) -four fold table: for cells (a,b,c,d) displays the count of individuals within the study sample with different combos of exposure and disease. , A is the count of study participants who were both exposed and diseased, B is the count of participants exposed and nondiseased, C is the count of participants unexposed and diseased, and D is the count of participants unexposed and nondiseased. - the organization of the data in the four fold table facilitates that analysis of three measures of association: the risk ratio, the risk difference and the odds ratio. data from the fourfold table can be used to calculate the cumulative incidence (conditional risk) of disease among the exposed (A/(A +B)) and among the unexposed (C/(C+ D)). The risk ratio is therefore [A/ (A +B)/C/(C + D] and the risk difference is [A/(A +B) C/ (C+ D)] -the fourfold table is used to calculate the odds of disease among the exposed (A/B) and the unexposed (C/D), and the odds ratio is [(A/B)/(C/D)] - the four fold table is the simplest way for calculating measures of association by more complex sceneries require use of statistical modeling -it is common for the exposure or outcome variable to take a different form than a two-level binary variable' -it is common to be interested in the exposure disease relation, conditional on one or more potentially confounding variables (i.e., a factor associated with both the exposure and the outcome variable that can bias measures of association).
when would you use a covariate?
-helps to increase statistical power in experimental research (can increase chance of rejecting null). This removes variance associated with covariate and removes it from error variance (smaller error variance = more powerful test) - can be used in non-experimental research when we cant randomly assign ppl to groups. this can control for preexisting difference between treatment groups
potential effect size needs to meet thefollowing criteria
-quantify magnitude and direction • comparable across studies • be consistently reported or computable • can compute standard error
what are the practical issues of DFA?
-sample size and unequal group sizes (SPSS automatically assumes that groups are equal so u need to let it know if they are not) -missing data (if someone does not have the DV we are trying to predict they cannot be in the study and if they dont have any one of the predictors they have to be removed-- this adds up if ur sample size is small) - assumptions -absence of outliers -homogeneity of variance -linearity -multicollinearity
What are the different ways in which we can view the causes of disease in epidemiology?
-tripartite Model - web of causation
(Hayes) Conditional process models subsume models that test for moderated mediation, where the strength or direction of an indirect effect depends on the value of a moderator
.
What are rates?
A type of proportion used in epidemiology that involves or implies disease at some point in time
What is the web of causation model?
A way of looking at multiple factors and how they interact to lead to a particular outcome. It can identify new relationships and pinpoint issues that can be addressed - example: heart attack- many factors lead to heart attack which are also influenced by among interrelated factors that interact t increase risk (attitudes abt smoking and exercise, hypertension, etc)
(Fink et al.) Planning a cohort study
1. begins with articulating a specific empirical research question and consider limitations of cohort studies - questions best suited for the cohort study aim to determine causal relationship between exposure and subsequent psychiatric outcome - ask causal questions which aim to determine whether one or more factors causes or effects change in or one or more outcomes (ex: does early adversity like childhood maltreatment predict later development of mental disorders) - cohort studies however are not well suited for some causal questions related to ubiquitous exposures, rare outcomes and disease that have long latency period - ubiquitous exposures; (TV view in the US) limits the variability necessary to acquire diff individuals with diff levels of exposure -rare outcomes (suicide, schizo) make it hard to accumulate a large number of cases -long latency periods require lots of resources and time to study 2. second step defining tot he population of interest or target population - if you dont have a target population of interest the data might not be appropriate for answering the research question (what type of people is our research question trying to focus on-- that will be the target population) - can define a target population based on gender, location, job, or exposure to a particular event of interest (like post 9/11 rescue workers) 3. Survey and administrative data can be used for epidemiological research - epi uses surveys to collect info on the health status and behaviors of ppl (study samples can be drawn using probability of non probability sampling) - administrative data is collected by governments or other orgs for record keeping or billing purposes and contain elements collected over time on larger proportion of populations. Common sources of this data include hospital discharge records, arrests and convictions, records of birth marriages and death
(Syed & Nelson) Recommendations for Researchers
1. the process of establishing reliability reported in most published articles is quite vague 2. Recognize that decisions about reliability are dependent on the particular type of the data and the research questions being pursued. 3. Developing a coding manual should be viewed as an iterative process, using some combination of top-town and bottom-up approaches. 4. Do not uncritically select a reliability index, and eschew notions of ''gold standards'' in terms of the appropriate index (i.e., k is not always the most appropriate index). 5. Consider reporting multiple reliability indexes that provide clearer information on the nature of the agreement (e.g., k and PA, k and D). 6. Remember that reliability is not validity. Although all researchers know this quite well, it seems that in the context of coding open-ended data, these two often get conflated. Just because the coding process led to an acceptable ICC or D value does not indicate that the material coded adequately captures the construct of interest. 7. Recognize that quantitative and qualitative approaches do not have to be in total opposition to one another. Considering research and data in context, as qualitative researchers do, can offer those who quantify qualitative data a novel and rich perspective on their participants' data. The value of this perspective is clear: More knowledge of the data is always a good thing and will certainly be helpful for achieving some sense of validity.
what is the minimum sample size needed for multiple regression?
110 in order to have sufficient statistical power (if effect is smaller than moderate u need a bigger sample size)
a researcher is interested in reported incidents of underage drinking on college campuses and whether that consumption is influence by factors related to the university. Specifically campuses could be "dry" (where not alcohol was permitted to anyone on campus regardless of age) or "wet" (where alcohol was permitted for those of legal drinking age). Further three different types of colleges were included: state institutions, private schools, and schools with religious affiliation. The number of incidents of underage drinking in the past year were collected from each college. 1. what is the appropriate statistical test? 2. why did u pick this statistical technique?
1. 2 by 3 factorial ANOVA 2. this test is appropriate because the IVs are nominal and the DV is ratio. It is appropriate because there are two IVs and one DV
a study was done to assess the impact of MHealthy program run for faculty and staff at the University of Michigan. Data were collected from faculty of each of the three UM campuses: AA (n = 50), Dearborn (n = 48), and flint (n = 51) following six months of participation in the program. Several different measures of health were fathered at the six month follow up: number of pounds lost, reduction in self reported stress levels, change in the number of days exercised each week. 1. What are the IVs and DVs? 2. what is the appropriate statistical test?
1. IV: MHealthy program DV: number of pounds lost, reduction of self reported stress levels, change in number of days exercised 2. MANOVA
after submitting a paper to a journal u will receive a decision letter with four possible different types of decisions for ur manuscript. what are these possible decisions?
Accepted Accepted with revisions Revise and resubmit Rejection * the last too are the most likely to happen and rejections are very likely
(Meltzoff & Cooper) Plagiarism
Acts of plagiarism can range from stealing an entire work, by simply changing the name of the author, to paraphrasing someone's work and not attributing the ideas to the original written document
We often want to compare the incidence of one group to another like hypertension in whites v blacks and observer which rate is larger. Comparing the difference between the two incidence rates is known as
Attributable risk (the risk of a particular disease that can be attributed to the group difference)
Why is it better to target more remote contributors in the web of disease that more direct contributors to a disease?
Because it allows to look for many causal factors if a disease in order to prevent disease - for ex: to prevent west Nile disease we wud have to kill all the west Nile mosquitos in the world which not practical. It's better to educate ppl so that they can prevent contracting the disease by attracting the mosquitoes like wearing lighter color clothing, wearing mosquito repellent, etc)
in the article by Wiest et al (2015), the odds of a poor outcome for babies in the CPAP group was 0.51 while the odds of a poor outcome for babies in the intubation group was 0.64. what is the odds ratio for CPAP relative to intubation? how should this odds ratio be interpreted?
CPAP relative intubation is 0.51/0.64. This is interpreted as the odds of a poor outcomes for babies on CPAP is 0.51/0.64. Value can be calculated thru division and be converted to percentage value to understand what the probability of a poor outcomes for CPAP is relative to intubation.
(Boedeker & Kearns) Model Evaluation
Case classification - A model's case classification is evaluated using posterior and typicality probabilities. A - A case's posterior probability for a given group indicates the certainty of that case's classification in that group. For instance, if there are two groups (i.e., Group 1 and Group 2), and the posterior probabilities for a case belonging to those groups are .01 and .99, respectively, the evidence is strong that the case is a member of Group 2 - If instead the posterior probabilities are .49 and .51, then the case's membership in Group 2 is questionable - fence rider; its classification is in doubt because it has approximately equal posterior probabilities for multiple groups -fences are the thresholds that determine which cases get classified into which groups. As the observations closest to these fences, fence riders are the cases whose classification is most likely to be affected by outlier data, the inclusion or exclusion of new predictor variables, or failure to meet assumptions - the presence of a large number of fence riders may suggest the possibility of another level of the grouping variable (e.g., a third group between two identified groups - typicality probability: it indicates how typical an individual's score is for a given group Generally, a larger distance (Mahalanobis distance) indicates that an individual is less typical of the group (and has a lower typicality probability) -it is important to note that a small typicality probability for a given group does not necessarily mean that an individual should not be assigned to that group. Rather, the individual is potentially an outlier in the data set. Overall classification - The overall accuracy of prediction (hit rate) in the training data is used to evaluate the utility of the prediction model (LCFs) -the hit rate for LDA models is inherently biased—in most cases, artificially inflated. This bias comes from using LCFs to reclassify the same data set from which they were derived, a process known as internal classification. - Methodologists have suggested that LDA hit rates should be calculated using external classification, that is, by calculating LCFs in one data set and then using those functions to classify cases in another data set - When external classification is not possible, crossvalidation (CV) methods can be used to give an unbiased estimate of the hit rate. -Cross validation (CV) methods: Three options are leave-one-out (LOO) CV, k-fold CV, and repeated k-fold CV - -The classification functions are estimated with one observation held out, and then the held-out observation is classified (LOO CV) repeated for all cases - k-fold CV procedure, the sample is randomly divided into k subsets. The classification functions are derived using all but one subset, and the cases in the held-out subset are then classified. repeasted for all subsets - repeated k-fold CV, the k-fold procedure is repeated a specified number of times, each time with a different division of the sample into k subsets. The results over repetitions are averaged. - Hit rates obtained using CV methods are typically lower than the original hit rates, but are less biased estimates of classification accuracy and, therefore, are the hit rates that should be reported and interpreted. - a hit rate of 80% may appear impressive at face value, but if 90% of the data were observed within one group, a hit rate of 80% would be less accurate than classification based solely on chance. Example: customer service, mechanic and dispatcher and found tht the diferent job calssifcaition did appeal to diff eprsonality types
What is the most frequently preformed quantitative measurement in epidemiology ?
Counts
What is the major thing that differentiates MANOVA from DFA?
DFA is used to classify cases into groups based on a set of variables
what is the first thing that you do in MANOVA?
DO A MULTIVARIATE TEST
Logistic regression study example
DV can be high blood pressure: - must be trying to predict something dichotomous (high or low blood pressure-- cant be continuous)
what is an interaction?
DV differs based on unique combinations of IV levels
what is the assumption of homogeneity of variance in MANOVA?
DVS are expected to have similar variance across conditions
what is multivariate normality?
DVS are normally distributed
What is mortality rate?
Death rates compared in two groups
what are the three important goals when writing a manuscipt?
Description: researcher shud describe what he/she did (needs to be clear and concise and thorough so the work can be replicated) Explanation: explain the decisions that were made (the decisions made and rationale underlying those choices) Contextualization \: putting the research in context and knowing how the study contributes to existing research (not easy to do this)
When you want to compare several diff rates what do u do?
Determine the ratio of all the rates using a single or standard reference group (you can do this with logistic regression where the dependent variables is lung cancer (yes/no) and the IV is the number of cigs smoked per day) - you then determine the odds ratio (odds of developing cancer) based on the number of cigs smoked per day (the odds of lung cancer based on cigs smoked per day compared to those who don't smoke so 3.5 for mental who smoked 1-9 per day compared to non smokers)
What are ratios?
Dividing one rate by another Ex: lung cancer study u can die the rate of smoking divided by non smokers 188 (smokers)/19 (non smokers)=9.9. You would interpret this as smokers have 9.9 times a greater risk of dying from lung cancer than non smokers
Research question
Does obesity predispose people to degenerative Arthritis in the knees?
what is the bigger question at the heart of SEM?
Does the model produce an estimated population covariance matrix that is consistent with sample (observed) covariance matrix? (does the model fit the data-- and if there is good fit then the parameter estimates will be closed to the sample covariance matrix)
(Meltzoff & Cooper) Other Ways to Manipulate Results
EXPERIMENTAL CONFOUNDS - One way researchers can manipulate results is to not hold things constant when "keeping other things equal" is central to the design of the study( for example an experiment where subjects may have been randomly chosen to show up early or late but the experimental conditions are confounded with the time of day) TAMPERING WITH RANDOM ASSIGNMENT - Tampering with the random assignment of subjects so that the hypothesis is favored is another way researchers can bias the study and mislead the reader. (like placing some participants in the experimental or control base don their characteristics and citing clinical ethical as rationale for this) POST HOC DETERMINATION OF CUTOFF SCORES - decision abt cut off scores need to be made before data analysis -- if done after it might be done so that groupings suit the results (changing things around instead of using three categories they use median split grouping to get a statistically sig difference) - gerrymandering: placing some of the "best" participants in the favored group and some of the "worst" into the not favored group--results will be misleading. BIAS IN SELECTION OF DEPENDENT VARIABLES - sometimes researchers will select a dependent measure that is in their favor (ex: in a cross cultural study of IQ or learning ability researchers who want to prove a point might choose highly verbal tests printed in english that will put a group not fluent in english at disadvantage) - dependent measures need to be valid, reliable, and appropriate equally for all participants ALTERING SAMPLING PROCEDURES - - some studies call for a fixed number of participants, trials or sessions which is decided on rational grounds of a power analysis before the study is done. - some researchers might run statistical analyses find significance and stop running participants data coz they want sig results and soome might extend it till they get results in the predicted direction CHANGING STATISTICAL ANALYSIS - shifts in mode of analysis in different parts of an article (ex: researchers not explaining why one test of group differences is made by a parametric test and another made by a non parametric test and there is not evident rationale for their decisions) GRAPHIC DISTORTIONS - Pictorial illustrations and representations can make an even greater impression than technical text materials and numerical tables. they bring data to life and make it visual - they can easily be deceptive when they are isolated and sometimes only a few cases are incldued in illustration USE OF ANECDOTAL EVIDENCE - Researchers can use postexperimental inquiries to find out how the participants experienced the research and how the experimental manipulation, if any, appeared to them. Because they are subjective qualitative data, it is easy for researchers to give a highly selective and biased reporting of this supplementary information.
Case control study for the research question
Find a group of people with osteoarthritis of the knees and get a control group of the same sex and age admitted to the same hospital, measure BMI and compare obesity levels of osteoarthritis and comparison groups
What is case fatality incidence rate ?
Focused on a specific time period - ex: number of deaths due to brain injury each year divided by the number of brain injuries each year
Prospective study for the research question
Go to a defined healthy adult populations(make sure they don't have arthritis first) And measure BMI and bring back years later to see if any new cases of osteoarthritis
there was a content analysis done of reviewer comments in about 100 manuscripts to understand and identify the areas of concern commonly expressed by reviewers in manuscripts. what were the reviewer concerns per section?
INTRODUCTION Conceptual and/or theoretical rationale (context for the researcher is bad) Proposed relationships/hypotheses Redundancies/lack of conciseness Scope and content of Literature review METHOD AND RESULTS Measures used Sampling strategies Missing methodological information Appropriateness of analyses Reporting of analyses Common method bias issues (it can look like lots of stuff or correlated but its just changes in mood) DISCUSSION Structure of the discussion section Missing components Overgeneralization of results Lack of meaningful interpretation WRITING Lacks clarity Poor grammar Spelling errors Confused verb tenses Need to use an active voice Data "were" not data "was" Use more paragraphs Avoid grandiose overstatement Avoid use of judgmental or evaluative statements Concerns relating to APA style
Retrospective study for the research question
Identify healthy adult population in the past and examine medical records to obtain BMI and compare prevalence of osteoarthritis currently in those who are obese and those who are not
What is morbidity ratio?
Illness compared in two groups
what are the disadvantages of blind peer review process?
In reality may not be blind (who you know and who knows you)-- researcher become with each other work over time Pressure to produce findings/to support hypotheses
how are MANOVA and DFA mathematically the same?
MANOVA looks for differences among a set of DVs and DFA ses differences among variables to predict group membership
What is p-hacking?
Manipulation of data to effect the determination of statistical significance (trying to make findings significant) -- also known as fishing or data mining
(Hayes) what is moderation?
Moderation is used to explore and test the conditional nature of effects. X's effect on Y is moderated if the strength or direction of the causal effect of X on Y depends on the value or level of a third variable, with that third variable called a moderator.
would you want homogeneity of covariance tests like levenes test to be significant in MANOVA?
NO BECAUSE YOU NEED TO MEET THE ASSUMPTION OF HOMOGENEITIY OF VARIANCE (should be the same variance across our three IV conditions)
(Hayes) an example
Now that the fundamentals of the framework for integrating mediation and moderation analyses have been discussed, we walk through an example loosely based on a study conducted by Barnett and colleagues (2010), who implemented a brief intervention targeting alcohol use in emergency department patients. The participants consisted of patients admitted to a hospital emergency department who had been under the influence of alcohol at the time of admission. Based on various diagnostic assessments, participants were classified as severe (W = 1) or not severe (W = 0) abusers of alcohol. The participants were then randomly assigned to a treatment that consisted of a motivational interview with a counselor followed by personalized feedback (X = 1) or were provided only personalized feedback without any interaction with a counselor (X = 0). This is the independent variable in the analysis below. Two potential mediators of the effect of the intervention, posttreatment perceived risk and benefits of alcohol use (M1) and degree of treatment seeking (M2), were measured. Higher scores on these represent greater perceived risks and more treatment seeking. The outcome variable (Y) is a composite measure of alcohol use (frequency and amount) at 12-month follow-up, with higher scores reflecting more use. In this example, the moderator W is a dichotomous variable. However, the procedure we discuss here applies whether W is dichotomous or a numerical continuum.
What are incidence rates?
Number of people developing a disease dividers by the total number of people at risk per unit of time (ex: melanoma is the number of new cases of melanoma in a year) -continuing occurrence of new cases
What is case fatality rate?
Number of people dying due to a particular disease divided by the total number with disease -ex: the number of deaths due to melanoma divided by the number of ppl with melanoma
What is a prevalence rate?
Number of people with a disease out of the total number of people in the group (ex: like how many people have diabetes across the US) - snapshot of existing situation
What is age specific mortality rate?
Number of ppl of a particular age dying divided by the total number in group per unit of time
what is a one way anova?
One IV with 2 or more levels; one DV
What is the multiple correlation coefficient?
R
What is relative risk ratio?
Risk ratio is the ratio of the probability of an outcome in an exposed group to the probability of an outcomes in an unexposed group (same as ratio above)
what Are the limitations of DFA?
TEHRE ARE THEORETICAL ISSUES - we cant infer causality because correlation does not equal causality (it predicts membership in naturally occurring groups not randomly assigned groups) -- can infer causality however if we randomly assign ppl to groups or conditions - another issue is how we choose our predictors and the best set of predictors can be chosen based on what we know from theory and research -- but predictor variables shud not be highly correlated because if they are then they are all measuring the same thing - generalizability is always an issue and u can only generalize to the population in which ur sample was drawn from
What is a prospective cohort study?
Start with a healthy group of individuals and measure characteristics at baseline and then follow them across time to see if the disease u want to study will develop ( ex: start with Middle Aged group with no diabetes and measure factors at baseline y think might be related to the development of the disease and then u follow the cohort and continue assessment)
What is attributed fraction in the exposed?
The difference in the rates between the exposed minus rate of unexposed divided by rate of the exposed If you get .90 how do u interpret that? 90% of lung cancers that developed in this group of smokers were attributable to smoking
(Hayes) what is a linear moderation model?
The effect of X on Y is moderated by W if the effect of X on Y varies as a function of W
what are main effects?
The effect of an IV on a DV when all other variables are ignored or not taken into account (main effect for low humidity only so wud get mean values only for this variable and ignore all others)
What are period prevalence rates?
The number of persons with a disease during a period of time divided by the total number in a group - ex: breast cancer amounts women in Michigan in 2022 (annual incidence plus prevalence where annual incidence is the number of women diagnosed with breast cancer in Michigan in 2022/ the number of women in Michigan and prevalence is the the number of MI women being treated for breast cancer divided by number of women in MI)
What is mortality rate?
The number of ppl dying either due to one particular cause or due to all causes divided by the total number on the group per unit of time Ex: number of ppl dying 35 or older due heart disease from the years 2018-2020
How do u conduct case control studies?
U need to find people with a certain disease and locate a control and comparison group without disease. And then examine the relationship of factors or characteristics to the disease by comparing ppl with and without disease (comparison group must be similar)
How can epidemiological data be obtained?
Using surveys, interviews and observations, and they rely heavily on clinical observations, clinical diagnosis, lab data, and archival data
Usually takes about numbers per 1000 per 100000 or per million too but typically
We use numbers per 100 to understand how prevalent a disease is
what are the 5 WS of epidemiology?
What? (what is the disease of interest) When? (timing of the disease) Where? (where is the geographical location of effected individuals) Who? (who is the person or group effected) Why? (causes, modes of transmission of disease, and risk factors)
What are retrospective cohort studies?
When an investigator is looking back in time. Aka historic cohort study. Involves looking back in time using archival or self report data to examine whether the risk of disease was different between ppl exposed to risk factor and those who were not Ex: looking at record in the past before someone had cancer and looking at what they did and how that might have risked cancer of the lungs like smoking or not smoking
When do we use attributable risk?
When one group is exposed to smth and the other is not (exposure to smoking) Ex: 118 smoker and 19 non smokers. 188- 19= 169 deaths that are attributable to smoking
Cross sectional study for the research question
X-ray people's knees in the group and measure BMI and compare prevalence of osteoarthritis in obese and non obese ppl
what does Y' that is closer to 0 mean?
Y is very unlikely to have occurred.
Are there potential for errors with clinical observations and clinical diagnoses?
Yes we might have errors in obtaining someone's clinics history, asking wrong questions m, misinterpret responses, errors happening during examination, etc
which approach to mediation is based on bootstrapping that involves generating thousands of hypothetical samples based on observed sample data to determine the most likely relationships among variables? a. Hayes process model b. the Sobel test c. barron and kennys three step process d. the keppel and keppel model
a
which of the following is an example of moderation? a. drug X reduces anxiety for ppl who have body weights that are one standard deviation below the mean but not for those with body weights that are one standard deviation above the mean. b. in a given sample women report higher levels of anxiety than men c. an intervention influences health-related behavior increasing self-efficacy. d. an anxiety intervention indirectly influences anxiety levels thru changes in client cognition
a
what is a symposia?
a collection of different studies on the same topic to provide the audience with the most up to date research (these studies are usually in their early stages and sometimes the data are not collected yet)
what is MANOVA? give an example
a generalization of ANOVA to situations where we have multiple dependent variables. - Ex: might be looking at the effects of different types of treatments on different types of anxiety (test anxiety, free-floating anxiety, anxiety due to life stressors)
what does a line with one arrow in SEM represent?
a lien with one arrow represent hypothesized direct relationships between the variables **the variable the arrow is pointing is the DV
what is the tripartite model of epidemiology?
a model that looks at many factors that can cause disease like the agent, host, and environment. (Ex: rheumatic fever. This comes form exposure to strep— the agent. Not all people exposed to strep get rheumatic fever — need to consider how susceptible person is when exposed — based of sex, age, nutrition status, reproductive and disease status, species, stress, immunity etc which are many factors that can risk or reduce risk— host factors. Environmental factors also contribute to the risk of rheumatic fever which include population density, season, crowding, toxins, etc)
What is participant observation?
a research method in which investigators systematically observe people while joining them in their routine activities
What is a meta-analysis?
a statistical analysis of a large group of data/results from individuals studies and the purpose is to summarize and integrate findings
what is mediation?
a statistical technique used to understand the relationship between an IV or predictor and a DV or criterion and were interested in how that relationship between the IV and DV happens via a third variable known as a mediator. it implies that the IV influences the mediator which in turn influences the DV
what is the lagrange multiplier test?
a statistical tests that assess model modification and tried to see if the model improves if or more of the parameters that are fixed (set to 0) are free (path available to be estimated) where significance indicates improvement with a new path added *data drives these improvements
make up an example of a simple mediation model. very briefly describe X,M, AND Y. draw a picture to show the model
aerobic exercise (X) directly effects body weight (Y) by decreasing it. Higher metabolism (M) mediates this relationship because it is a result a of more exercise and it effect body weight by decreasing it. Mediator influences the DV (Y) and is influenced by the IV (X) - for image reference quiz 6
what are the other names for measured variables?
aka observed variables, indicators, or manifest variables (often shown in boxes)
what are vote counting methods?
aka the tally method which were early methods of quantifying research finding using a tally method (early approach to meta analysis). Essentially they wud count the number of studies with positive, negative, and neutral effects and drew conclusions based on the category with the highest number of tallies.
what are univariate tests?
allows us to examine which specific DVs are effected by the IV
what are the advantages of using MANOVA compared to ANOVA?
allows you to control for type 1 error one measuring several DVs, more efficient that doing an ANOVA do each DV
what is stepwise lositic regression?
also known as statistical logistic regression. The inclusions or removal of predictors is based on statistics/data alone. It is a highly exploratory method.
what is log likelihood? (LL)
amount of unexplained variance aftermodel has been fitted
what is R^2?
amount of variability in DV accounted for by IVs
what is hybrid registered reports?
an approach that eliminated that need for researchers to obtain sig findings. you wud submit a hybrid registered report. the manuscript shud have intro section, method, measurement info, data analyses sent to the journal (no results or discussion are included in the first submission)-- as a result it in peer reviewed on its quality rather than what was found
what are ordered outcomes?
an example is No hay fever, moderate hay fever, severe hay fever
how do you do logitisitc regression in spss?
analyze > regression > binary logistic - you put ur outcome in the dependent box and ur predictors in the covariates (non assaulters shud be coded as 0 and and assaulters shud be coded as 1)
how do you do hierarchal regression in SPSS?
analyze > regression > linear and then define DV and block of IVS by dragging variables to appropriate boxes
you are trying to predict the level of happiness among a sample of 342 college students. variables include a measure of happiness (20-item scale), year in school, income, relationship status, and health. what is the most appropriate statistical test? what are teh IVs and DVs?
appropriate test is multiple regression and IV is year in school, income, relationship status, and health. DV is level of happiness.
do psychologists view qualitative research as a method?
as a method
what do negative values of beta mean?
as a predictor goes up the probability of DV goes down
SEM is best thought of as a combo of a. logistic regression and confirmatory factor analysis b. multiple regression and confirmatory factor analysis c. multiple regression and exploratory factor analysis d. MANOVA and exploratory factor analysis
b
what are panel discussions?
based less on research and based more on specific professional experiences (ex: panels on runnings student internships)
teh order of entry of IVs is for hierarchal regression is determined by the researcher base don what?
based on logic or theory (anything of greater theoretical importance is entered earlier into the regression equation)
what questions does standard multiple regression answer?
basic questions like how well do a set of IVs predict the DV and it tells us which IVs are important but its kind of theoretical
why is a substantial sampling needed for SEM?
because SEM is complicated and a large sample will reduce sampling error, parameter estimates will be more accurate, and it gives us more statistical power
it is sometimes suggested that the purpose of epidemiological research is to provide clues about cause so that lab scientists can go research for answers. this is a distorted view. why?
because certain questions can only be answered outside the lab (ex: transmission of AIDS was discovered by epidemiologists before lab scientists by comparing ppl with aids to those without)
how is logistic regression similar to multiple regression? different?
because the first step involves trying to see if a set of predictors does a good job at predicting our outcome and then we want to find the best set of predictors. differs coz what we want to predict must be dichotomous
why is it important for qualitative researchers to have training and experience?
because the quality of the data relies on how good the researcher is. Then need to be able to engage in systematic and rigorous observation, be good at interviewing and analyze the content of data well and reliably
what is MANOVA??
begin with groups (levels of IV) and see if differences among variables (DVs) based on group membership
what is the alternative to the baron and kenny method of mediation?
bootstrapping
what is an abstract?
brief summary that includes the research problem or question, the sample, the method, and major findings and implications (very hard to write coz there is a word limit and need to make a bunch of info very condensed)
what do regression coefficients "B" do?
bring the Y values that are predicted as close as possible tot he Y values that are actually measured
how do we know how good the regression relationship is?
by looking at r and r^2
to increase the chances of publication professor X commonly engaged in the practice of analyzing the data and then developing hypotheses. This ethically questionable practice is known as: a. fishing b. p-hacking c. Harking d. file drawer stacking
c
which of the following is required in order to conduct a multiple regression? a. continuous independent and dependent variables. b. dichotomous independent and dependent variables c. continuous or dichotomous independent variables and a continuous dependent variable. d. continuous independent variables and a continuous or dichotomous dependent variable.
c
what is clinical epidemiology?
clinical studies of the natural course of disease or effects of treatments (concerned with natural disease patterns in certain populations)
what is structural equation modeling (SEM)?
collection of statistical techniques that allow a set of relationships between one or more IVs and one or more DVs to be examined. it allows us to look at relationships between factors (a combo between factor analysis and multiple regression)
what is an analysis of variance?
comparing two types of population variance
what is the decision to publish a practitioner outlet based on?
decision can be made solely by the editor of the journal or by peer review
why do effect sizes tell us?
degree to which IV effects the DV -- we need to know how big the effect size is
what is a structural model (aka theoretical model)?
depicts the hypothesized relationships among constructs without the detail of the measurement model (only ovals and arrows---only shows constructs) **uses a covariance matrix
baron and kenny (1986)
developed the standard way for testing mediation -several requirement had to be met in order to prove a true mediation relationship *step 1: show that IV is a significant predictor of DV (needs statistically significant regression coefficient// need to be significantly correlated) *step 2: the IN needs to be a significant predictor of the mediator (needs to statistically significant) *step 3: mediator needs to be a significant predictor of DV when IV is statically controlled (simultaneous multiple regression is done to do this. the regression coefficient between IV and DV shud be reduced-- compared regression coeffecient from this first step to the one from 3rd step) *step 4: conduct a sobel test to test for significance of mediation (this test tells u whether the change in the regression coefficient is sufficient to be called mediation) ** if relation between IV and DV runs entirely thru mediator it is a full mediation but if relationship between IV and DV is reduced with mediator but is still significant it is partial mediation
what is non-normed fit index? (NNFI)
developed to get around the problem that small sample size results in NFI less than .9 indicating poor fit. it is adjusted NFI. the values can range beyond 1 (anything above .9 is higher)
why do we need to select a common metric for effect sizes in meta analysis?
different studies use many research designs and have diff statistics they reported and we need to convert all those statistics into a single common metric (there are online calculators variable for these conversions)
The way that outcome categories are coded determines the
direction of the odds ratios and sign of β coefficient
with discriminant function analysis we want to determine a pattern of variables that will be the best predictors to
distinguish groups
what is D in SEM?
disturbances - errors in prediction
give an example of an interaction effect between anxiety intervention and gender.
do men and women react differently to different kinds of anxiety interventions (if women do better with one intervention than another compared to men that wud indicate n interaction)
why is baron and kenny method of mediation problematic?
doesnt work well with small sample sizes
typically why dont we see full measurement models in articles?
due space constraints (we also dont typically see error--dependent variables often have error associated with them)
which of the following information does not belong in a discussion section? a. similarities and diffs in results compared to other research studies b. implications for further research c. research limitations d. support/non support for hypotheses e. all of the above belong in the dicussion section
e
what is backward deletion?
equation starts with the IVs entered and then deleted one at a time if they dont significantly contribute to the regression equation so the worst will comes out first and then the next worst and so on until whats left significantly contributes to the regression equation (when removed it cant be added back in)
variability that exists within a condition is due to
error (people react differently and some random error occurs)
what is E in SEM?
errors in measurement (contamination-- not measuring what we shud, deficiency-- not measuring aspects of our variable well enuf)
what is a fail safe N?
estimate of the number of additional studies needed to reverse a conclusion drawn from meta-analysis
what si the output in SEM?
estimated population covariance matrix
if the multivariate test for your IV is significant what do you do?
examine the DV for significance (univariate tests)
what are not ordered outcomes?
examples include democrat, republican, independent
what is exp (b)?
exponential value of B aka odds ratio • If value is greater than 1, then as predictor increases, the odds of outcome increases • If value is less than 1, then as predictor increases, the odds of outcome decreases
what is an agent?
factors related to the disease
what is the environment?
factors related to the environment that can cause disease
what is the host?
factors related to the person exposed to the disease
what are other names for latent variables?
factors, construct, or unobserved variables
MANOVA can either be used with either nominal or interval DVs. true or false??
false
hypotehsis testing using p < 0.05 is the accepted approach and tends to be above criticism. t or f?
false
if a finding is statistically sig, you can be sure that it is a reliable findings with the probability of replication being about .95. t or f?
false
in order to best maximize reliability when coding qualitative narratives, it is usually best to have a large number of coding categories. t or f?
false
in the study of tumor malignancy discussed in the lecture, malignancy was considered the "response category." if you believe family history (no/yes) is related to malignancy, having a family history of malignant tumors should be coded as 0. t or f?
false
larger LL values (log-likelihood) indicate better model fit. t or f?
false
prospective cohort studies involve identifying a group of cases with a particular disease and a comparable group of participants who do not have the disease. t or f?
false
the best indicator of an overall good model fit is a significant chi square test. t or f?
false
the biggest advantage of meta-analysis is that it eliminates concerns about the methodological quality of the studies. t or f?
false
unlike multiple regression, there is only one type of logistic regression - direct (standard) logistic regression. t or f?
false
where qualitative data often come from?
field work --researcher spends time in particular setting (e.g., hospital rehab ward; methadone treatment program)
what are round tables?
focus on discussion, conversation and debate. (ex: debate on free speech)
what are paradigms?
general world views that influence how u approach research
How should you code your predictor categories in logistic regression?
give higher codes to categories of predictors most likely to be associated with a response. (this makes parameters estimated like beta coefficients more likely to be positive-- which is better coz these are easier to interpret)
what si the chi square rule of thumb?
good fit when ratio of chi square to df is less than 2
what is the primary unit of concern in epidemiology?
groups of people not individuals (different form what psychologists do coz we measure things at individual level)
what is structured abstract?
has different sections. example sections include purpose of research, research design, findings, and implications. typically longer than general abstract
basic MANOVA has how many DV and IV?
has one IV and multiple DV
what is degree of relationship?
how good the regression equation is and is it providing a better than chance prediction and if the combo of IVs predicts a significant amount of variability in the DV and can the DV be reliably predicted based on this group or set of IVs
what does the eigenvalues table tell us in DFA?
how many discriminant functions we have
what d oes a eigenvalue in DFA TELL YOU?
how much discriminating ability a discriminant function has (under % of variance in eigen value table)
what does the classification table tell us?
how well the model predicts group membership and how well the combo of predictors correctly predicts outcome.
how might the use of hybrid registered reports help reduce publication bias and unethical behavior?
hybrid registered reports help to reduce publication bias because it removes the pressure to produce sig findings. the discussion and results are not discussed and the manuscript is review solely on its quality and not on whether or not there are sig findings.
what is harking?
hypothesizing after results are known
(Meltzoff & Cooper) RESULTS TO DISCUSSION
if the data srsly violates the assumptions of the statistical techniques and the researchers continue without any alternatives arguing the technique is robust and disregarding the issue later on in the discussion section then the researcher shud doubt the interpretation of the results
a post hoc test would be completed when ..
if there significant differences between groups and you have three or more levels of your IV. Post hoc helps you find exactly which groups differ.
(Hayes) what is a focal predictor?
in th linear model THE IVs effect on y is of most interest so it is called the focal predictor ** w is the moderator of th effect of iv on dv
what is sequencital logisitic regression?
in this type of logistic regression, you specify the order in which predictors are entered in the model in order is based on theory or expected importance of the repository. this is similar to hierarchical multiple regression
what interaction effect will we look at in our MANOVA study?
interaction between IV 1 and IV 2
how are standardized discriminant function coefficients interpreted?
interpreted like correlation coefficients (size and direction)
DV: level of depression. what is the scale of measure of this DV?
interval or ratio
what is a thematic analysis on qualitative research?
involves several steps. 1. make urself familiar with the data 2. start to code the data using meaningful chunks 3. create themes (words related to what ur looking at) 4. review themes 5. define themes 6. write a thematic analysis
what does it mean to say that the maximum likelihood method is an iterative process?
it begins with arbitrary values and then determines the direction and size of coefficients that maximize the likelihood of observed frequencies and stops when there is no more improvement to the model (when the model has maximized the likelihood of predicting the outcome)
(Hayes) what is the utility of moderation analyses in clinical research?
it can help researchers understand how the effectiveness of a certain treatment varies across individuals and where a particular effective treatment for a certain group may be be ineffective or harmful to another. it can also help u identify treatment inhibitors and enhancers.
whats included in the methods section of a paper?
it describes how the study was conducted and includes things liek participant characteristics (gender, age, education ,race ethnicity, how were they recruited), then talk about the research design (experiment or observational and manipulation), talk about your measures, how did u get information on IV and DV
what does a multivariate test do?
it examines the effect of the IV on combined DVs (looking at differences in DV based on IV in a combined manner)
what happens if you square a part correlation?
it give u sr^2 which gives an idea of the unique contribution of a particular IV in the prediction of the DV
how is logistic regression more flexible compared to other statistical techniques (despite being kind of similar to them)?
it has no assumptions about the distribution of predictor variables, variables don't need to be linearly related together or to the outcome, and homogeneity of variance is not a concern. - predictors can be a mix of continuous and discrete variables - it allows you to have two or more outcomes in logistic regression (multinomial regression)-- the outomces may be ordered or not.
what si the discrete outcome that is often being predicted by logistic regression in the health science?
it is often disease/no disease
quantitative approach is viewed as positivist. what does this mean?
it is the idea that facts or reality are easily measured
What is deductive reasoning?
it is the scientific method and also referred to as the top down approach. It is more quantitative and positivist. it start with a theory, and then a hypothesis, and then u collect and analyze data, and then reject or accept hypothesis
how is logistic regression similar to multiple regression?
it is trying to answer the same questions as multiple regression but differs because DV is dichotomous or discrete for logistic regression and DV is continuous for multiple regression
what is the purpose of predicting group membership?
it may help us develop intervention or prevent thing that may lead to becoming a perpetrator (might do things to change attitudes, etc)
if means are in different columns/subsets what does that mean?
it means they significantly differ from each other
if there are two categories under the correlation with discriminant function what does that mean?
it means we have two dimensions/functions that can discriminately between groups-- the number of functions are determined by the number of groups you have (n-1 meaning if u have 3 groups u can have 2 discriminant functions)
why would having many IVS in anova be difficult?
it would be difficult to analyze it coz there's many main effects and interactions
what is DFA?
its like MANOVA but turned around. You need to begin with variables (IV) to see if they can predict group member ships (DV). The foal is to find linear combinations of independent variables that discriminate between groups
what si the gold standard of reporting research findings in research?
journal articles
what are factors?
latent variables that cannot directly be measured
how to get the macro needed for mediation analysis on SPSS?
link in ppt and then open and run the syntax process.sps, and you will see the process macro as part of the regression options
how to read a discriminant function table
look at the correlation with discriminant function category (might be more than one function)-- the ones with a cross are significant at predicting group membership and then look at the means (which group is more different/stands apart from the other groups based on means)
what will be the main effects we will look at for the MANOVA with 2 IVs study mentioned above?
look at the effect of IV 1 and IV 2 on a combo of the DVs - first, do multivariate test (combo of DVs), then if its sig do a univariate test (looks at each DV separately) and then do post hoc test (looks at means for diff levels-- which training condition had an impact on anxiety) if that's sig
what is the most basic way to assess good fit in SEM?
looking for a non significant chi squared statistic which means there is no significant difference between the observed sample covariance matrix and estimated population covariance matrix (we want this)
what are we doing to the f ratio when we use a covariate?
makes the number on the bottom of f ratio smaller (removing error) F-ratio = MS between treatments (groups)/MS within treatments (groups) (error)
What do logarithmic transformations of the data do?
makes the relationship between variables into one that roughly approximates a normal distribution (makes the relationship linear)
what means do we look at for main effects?
marginal means
what does ANOVA test for?
mean differences among groups on a single DV
what does a significant chi-square value mean in logistic regression?
means that our model improved on prediction because -2LL has dropped in a significant manner
what do higher values of beta mean?
means the predictor is more important (and it is interpreted similar to correlation coefficients in the positive or negative direction)
if f change is significant in multiple regression model summary what does that mean?
means u can reject the null and say that there is a significant prediction of the DV by the combo of IVs
what does quantitative research measure?
measure attitudes/reactions of large number of people and allow u to compare groups and amke data simple
why is it a problem that multiple results from the same study might be sued in a meta-analysis? how do fix this?
might make ur meta-analysis rl biased and make results appear more reliable than they are. to fix this you need to limit the number of results from each study, compute avg of effect size across all measures of the same outcomes within the study, and perform separate meta-analysis for each DV instead of lumping together
what is the issue with extreme cut offs?
might result in everyone or no one being classified as diseased so intermediate cut off are recommended. - in SPSS, with two outcome categories, a case is assigned to disease outcome (Group 1) if probability of outcome is greater than .5Y' > .5, then diseased Y' </= .5, then healthy
how do you test for moderation?
moderation in conceptualized as a third variable that influences the zero order correlation between IV and DV. you can also use an ANOVA to look at moderation and with ANOVA ur looking at the interaction -process macro can also be used
what questions does hierarchical multiple regression answer?
more theoretical and allows for testing of explicit hypotheses and allows researcher to control regression process and order of IVs are determined by researcher based on theory
what are nested models?
nested models are identical models where we remove one path to see the effect that it has on model fit -for nested models, the chi-square of larger model is subtracted from chi-square of smaller, nested model ○ difference (also a chi-square) is evaluated for significance ● significance indicates the larger (more specified) model is explaining more variance than the simpler mode
is the vote counting method still used?
no
do qualitative researchers believe that objectivity is possible?
no because bias may enter any phase of research
why is stepwise regression looked down upon by journal reviewers?
no theory driven and capitalizes on chance
do qualitative researchers believe that we are able to reduce complex interactions of behaviors into simple measurable components
no they think it is impossible
IV: 3 groups- control, placebo, and treatment. What is the scale of measurement of this IV?
nominal (discrete)
how many journals can u submit ur paper to at once?
only one journal
when do you use post hoc tests?
only when there is a difference between your group means and u want to know exactly which groups differ ( so only when you reject the null) and have three or more levels of treatment/ IV
what does the wald test do?
opposite of lagrange test where it asks which parameters/paths shud be deleted from the model. non sig indicates that dropping that parameters estimate does not ruin the model fit. this test is important coz its a simplistic and parsimonious model (easy to interpret)
what is stepwise regression?
order of entry of IVs is based solely on statistical criteria (a bit controversial). the meaning and interpretation of the variables is not important only if they statically add smth to the prediction. the iv with the strongest relationship to DV goes in the equation first, and the IV that account for the next most variability after taking out the variability accounted for by the first and the process continues till no other IVs account for a significant amount of variability in DV
what is a primary analysis?
original analysis of data in a study (analyzing your own data that u collected)
what does the confidence inteval tell us?
our confidence interval that the popualtion values will fall within a certain range of values (usually our confidence interval is 95% for a specific range of values on variable)
constructs are always shown in
ovals in the structural equation modal
briefly explain the difference between the terms prevalence and incidence.
prevalence is how common adisease is in a group or population out of the entire population and incidence is the rate of new cases of a disease (people who are developing the disease)
logistic regression emphasizes the probability of particular outcome for each case. what does this mean?
probability that person K has hay fever given geographic area, season,temperature and nasal stuffiness
what do multivariate tests protect against?
protects alpha level -- protects us against rejecting the null incorrectly at too high a level
what are applied outlets for resarach?
publication designed for researchers to communicate the practical applications of psychological to non researchers (for example writing research papers to be used by health care providers)
what is the dominant approach of measurement in psychology?
quantitative research
what is the difference between quantitative and qualitative research?
quantitative research depends on careful scale construction, concerned with reliability and validity of measures, scale is administered in s standardized way and focuses on survey items. On the opposite hand however, in qualitative the researcher it he instruments and the credibility of the research relies on the skill of the researcher
what is parcelling?
randomly parcel a ten item scale into 3 parcels and end up have 2 3 items scales and a 4 item scale and randomly assign items into those scales and each them acts as a an indicator of the construct we are measuring (for example self esteem)
what is a secondary analysis?
reanalyzing published work (analyzing someone else's data) and the purpose if to find how we can answer the research question with better statistical techniques and how to answer new question with the already published data
quantitiative research has this belief that complex behaviors can be reduced to objectively measurable segments. what is this belief known as?
reductionist or simplistic view
what is forward selection?
regression starts off empty and IVs added one at a time if they meet statistical criteria for entry. strong IV goes in first and then the next strongest and then process continues till there are not more IVs that significantly predict the DV (once IV in the equation it stays)
what are practitioner outlets?
resources for non researchers who are trying to use research in their work lives. these differ from typical research manuscripts or articles coz there are fewer citations, the introduction is short, there is less emphasis on methods and more emphasis on findings and practical implications
what problems does stepwise logistic regression have?
same as stepwise multiple regression - capitalizes on chance (since its data-driven) - results are influenced by random variation in the data
as members of research communities, it is important for scientists to share
scientific findings with one another
what do lines with arrow on both ends (or a curved live between two variables) in SEM show?
show a correlation (an unanalyzed relationship)
what do partial correlation tell you?
shows u the relationship between a particular IV and DV controlling for the effects of other IVs on both IV and DV
what is standard entry?
simultaneous entry of all predictors and the unique contribution is allowed to discriminate between the groups
R^2 tends to be inflated in what types of samples?
small samples so in that case u use adjusted r^2
(Hayes)when you are working with a complex conditional process model it is often useful to begin by looking at what?
smaller components of the bigger model which can be accomplished by looking at and interpreting the regression coefficients from the individuals equations before bringing the results together
what is multicolinearity?
strong positive relationships between variables (can happen due to mistakes or recodes that were improper but if not mistakes it means the variables are rlly highly correlated and dont need them in the regression equation)
(Hayes) mediation, moderation and conditional process analysis can also be conducted using what software?
structural equation modeling (SEM) software
Eta2 , η2 -
symbol for effect size in MANOVA .01 small effect .06 medium effect .15 large effect
what is the most traditional way of conducting research conferences?
symposia
what is a macro?
syntax files u run and they add a program into SPSS
what is stepwise entry?
takes the strongest discriminant variable and adds it in first and then finds the next best and continues to that until there are no other significant discriminant variables that will improve prediction of DV
what is windorization?
taking the highest score and transforming it to the next highest score and then checking its not longer statically skewed
what is prior probabilities?
tells SPSS whether to assume whether the groups have equal size or whether the groups are unequal
what do beta coefficients tell you?
tells u how important th predictive variable is in the prediction of y/the DV
why is effect size improtant?
tells you the impact of the treatment of the DV. Significant level cant do this it just tell us if we reject our null, then our results are not likely due to sampling error and there is an actual difference. However, effect size tells us how important that difference is.
what is a manuscript?
term to describe a paper that is not published yet but is being prepared to be submitted to a journal
What is cross-validation? (used in stepwise)
test on one sample and then validate the important of particular variables on a second sample
what is the goal of SEM analysis?
testing specific hypotheses about the model. might look at the importance of a certain path, can modify an existing model
what does Y' that is closer to 1 mean?
that DV is likely to have occurred
what is the null hypothesis for multiple regression?
that the correlation between the IVs and the DV is 0 (no correlation)
what does it mean if we have a significant correlation with a discriminant function?
that the variables significantly contributed to the discriminant function meaning they are significantly contributing to group membership (men whoa re perpetrators of sexual assault and men who are not)// significantly predict group membership
how does DFA differ from MANOVA?
the IV is the DV and the DV is the IV --grouping variable becomes DV and the variables become IV
what are the concerns of epidemiology?
the causes of disease, disease prevention, and allocation of resources (for certain health problems or facilities).
what does log-likelihood multiplied by -2 approximate? (-2LL)
the chi squared distribution
the very last model r square in multiple regression is
the combined variability accounted for by all the IVs in the model on the DV
what is the research hypothesis for multiple regression?
the combo of IVs significantly exceeds 0
what does a semi-partial (part) correlation tell you?
the correlation between a particular IV and the DV with the effects of all other DVs removed from only the IV
what is canonical correlation?
the correlation between the combined set of variables entered (IV) into the DFA and the dv
variability that exists between conditions if due to
the independent variable and error
(Hayes) what is conditional process analysis?
the integration of mediation and moderation analyses in a unified statistical model
what is an introduction?
the introduction talks about the importance of the research problem, and talks about and reviews relevant studies to your own. It ends with specific hypothesis or objectives of the studies and places the studies in a broader context.
what are the statistical problems associated with the vote counting method?
the issue is the sample size differences where smaller sample sizes wud count as the same as a bigger sample size but the issue is that smaller sample sizes have lower statistical power and effect is underestimated -- meta-analysis combines sample sizes and eliminates the issue of small sample size
the smaller the absolute value regression coefficient (if the IV is not correlated to the DV)
the less important the IV is to the regression equation
(Hayes) conditional effect
the linear function which is the conditional effect of IV on DV
how do we know there is an interaction effect from a graph?
the lines in the graph will cross, touch or have different slopes (they don't always have to touch). if the lines are parallel then there is no interaction
What would ratio look like if your null is false?
the mean score for between treatment is larger than that within treatments this means the treatment had an effect and the f ratio would be significantly greater than 1
the bigger the absolute value of regression coefficient (if the IV is not correlated to the DV)
the more contribution that an IV is making to the regression equation (the more important it is)
what is an odds ratio?
the odds of an outcome base d on changes in the predictor
what is the review process?
the process used to determine which research is of high enuf quality to be presented at a conference or be published in a journal
what are the prior probability tables telling us?
the proportions of individuals who wud classify in a certain group
(Meltzoff and cooper) STAGE 5: ANALYZING AND INTEGRATING THE OUTCOMES OF STUDIES
the separate research reports collected by the synthesists are integrated into a unified statement about the results of the research.
what is the variables entered/removed table in DFA ANALYSIS?
the stepwise statistics (predictors in order of importance) - sig part of table tells us fi the model is significant at predicting group membership with the addition of that predictor into the function
what is effect size?
the strength of the effect of the IV
what is epidemiology?
the study of disease occurrence in the population
what does the method entry box determine in logsitic regression?
the type of regression ur running (method: enter)
briefly explain the value in taking the web of causation approach that was described in lecture.
the web of causation approach allows you to examine several factors and how they interact to cause disease. This is better than targetting a direct cause because it is more practical at preventing disease by educating on risk factors and what to avoid.
if the confidence interval does not contain 0, what does that mean?
there is a significant difference between the groups
what is the difference between bivariate regression and multiple regression?
there is one intendent variable in bivariate regression that is used to predict that is used to predict one DV/outcome. multiple regression differs because there are several IVS that combined to predict the DV
what are the kinds of research questions we can look at with DFA?
there must be past research and theory to base our predictors off of in order to predict group membership reliably
what are the causes of disease?
theres always many factors that cause a disease (multifaceted) -tuberculosis (not everyone exposed becomes ill-- other factors contribute like poverty, malnutrition, alcoholism)
what are cox and snell r square and nagelkerke r square?
these are modified r squared and shud be interpreted as the approxiamte percentage of the otucoem accoutned for by the predictors
why are path diagrams important in SEM?
they allow to diagram or hypothesize relationships and helps us determine what we expect the model to look like - the lines drawn allow for the creation of equation in SEM
what do poster tempaltes look like?
they are organized with the same sections of research paper but more condensed and more visual
what is a general abstract?
they have lower word limits and info included in a single paragraph
what do lines with arrow on one end in SEM show?
they indicate prediciton
what are measured variables?
things that can be directly observed or measures like BMI or age **always shown in boxes
what is the omnibus test?
this is our chi square table. Need to look at significance value. if significant then adding those predictors significantly improved our prediction of the outcome
what is the maximum likelihood method?
this method selects coefficients that makes observed values/predictions most likely to have occurred. The goal is to find a combo of predictors that maximize the likelihood of obtaining observed frequencies.
what does direct logistic regression tell us?
ti tells us the unique contribution of each IV in predicting the outcome
what is the purpose of science?
to advance knowledge and that is done when knowledge is shared (it is a collaborative process-- cumulative process of advancing knowledge and is done thru many studies)
(Hayes) why did clinical researchers think it was important to combine mediation and moderation analyses?
to better help understand psychological phenomena and the effectiveness of treatment in various population (looking at variables identifies as mediators and moderators in treatment research together which can help us develop adaptive interventions over time for an individual)
what are the identifying goals of meta analysis?
to integrate findings across multiple studies but not being too broad, narrowing ur focus, identifying ur goals, and having precise research questions
what are the historical roots of quantitative research?
to make psychology more scientific
what is the goal of DFA?
to predict group membership from a set of predictors. (for example using attitudinal, experiential, and situational predictors to predict if someone is a sexual perpetrator)
why should you use cut-offs in logistic regression?
to reflect the relative cost to each error meaning that if a disease is very important to detect then a lower cut-off (.30 or .40 for cut-off). if it is important to not treatment someone until u are sure they are disease then the cut-off is higher (.60 or.70 for cut off instead of .50).
(Hayes) what was process designed to do?
to simplify mediation, moderation, and conditional process analysis, as the estimation of indirect effects, the index of moderated mediation, and the probing of moderated mediation and estimation of conditional (indirect) effects is automated with minimal syntax or an easy-to-use point-and-click interface (for SPSS users).
if Wilk's lambda multivariate test is not significant, what does that mean?
treatment did not work
An effect size tells us the proportion of variance in the DV that can be explained by the IV. true or false??
true
a primary purpose of DFA is to find the best set of predictors that will correctly classify people into groups. t or f?
true
according to the lecture, logistic regression is more flexible than multiple regression. t or f?
true
according to the reading, qualitative research reports tend to be longer in length. rt or f?
true
according to the rockwood and hayes article, many methodologists recommend using bootstrap confidence intervals to make inferences about the indirect effect (to determine if it is statistically significant. t or F?
true
as discussed in the lecture and the readings, a primary strength of SEM is the ability to assess overall model fit including both direct and indirect effects of variables. t or f?
true
logistic regression allows for an estimate of an odds ratio that can take into account multiple covariates. t or f?
true
logistic regression is particularly useful when the relaitonship between the predictor (X) and the outcome (Y) is nonnonlinear. t or f?
true
researchers shud routinely report effect sizes and the effect size confidence intervals. t or f?
true
the article by schmidt (2010) cautions researchers that data can sometimes lie. t or f?
true
the primary advantage of using a case-control study over a cohort study is its efficiency when investigating outcomes that are rare. T or F?
true
to interpret regression coefficients you need to consider whether the IVs are correlated or correlated. t or f
true
qualitative research allow u to approach a topic without being constrained by the predetermined categories of standardized instruments. is this true or false
true and it allow u to be open to new categories
how do u interpret a coefficient table in multiple regression?
u look at the t and significance level which will tell u if they are significant or not and to interpret u look at standardized bet coefficients (higher number means stronger relation and if negative or positive it tells u the direction)
what is Y'?
u prime which is logistic regression equation which is the predicted value of the DV based on a set of predictors which tells u the probability of the DV occurring
what does the classification results table tell us?
under percent itll tell u how much of the individuals were correctly classified into a group
what does chi squared value represent?
unexplained variance in the model (smaller values are better)
What is boostrapping?
using sample data to create a bunch of different sample data to get and estimate of mediation effect and if the mediation effect significantly differs from 0 (requires a macro on SPSS and relies on confidence intervals instead of significance tests)
what is between groups variance?
variance between treatments or groups
what is within groups variance?
variance within each treatment or group
What is hierarchical regression?
when IVs are entered into an equation in an order specified by the researcher. each IV is assessed in terms of what it adds it to the equation at its own point of entry -the variability accounted for by first IV counts first and the for the second IV takes up anything not accounted for by the first IV and the the third takes up anything not accounted for by 1 and 2
what is the blind peer review process?
when a paper is sent to two or three reviewers who dont know who the reviewer of the researcher is
what is standard multiple regression?
when all the IVs are entered in the regression equation at once and each IV is assessed in terms of what it adds tot he prediction of the DV beyond that of the other IVs (only the unique contribution of that IV is what counts-- if an IV overlaps too much with the other IVs it will be seen as unimportant when there actually is a strong relationship-- this is why 0 or der correlations need to be considered and the unique contributions the IV makes in the prediction of DV)
what is direct logistic regression?
when all the predictors are entered into the regression equation simultaneously. This can be used when we have no specific hypothesis about the order or importance of predictors (similar to standard multiple regression)
for which comparisons do we need to consider effect sizes?
when comparing across groups and the relationship between two variables
when is logistic regression most useful?
when the distribution of response on an outcome (DV) is expected to be non linear with one or more IVs (like blood pressure and probability of heart diseases might be related but not linearly)
what is full mediation?
when the mediator fully explains the relationship between IV and DV
what is partial mediation?
when the mediator partially explains the relationship between IV and DV
what is moderation analysis?
when the third variable/moderator influences the relationship between the IV (predictor) and the DV (criterion). can better thought of as an interaction
when would we want to use MANOVA? ex
when we have multiple DVs ( ex. testing teaching methods and how they might impact several aspects of student performance such as the following: exam performance, critical thinking, and writing skills).
can MANOVA also have multiple IVs?
yes
can log transformations be used to deal with skewed data?
yes
can quantitative research present broad generalizable findings very well?
yes
can we include covariates in DFA?
yes
do clinicians use info from epidemiological research to daignose and treat patients?
yes
fi we have a large sample size are we more likely to have a significant chi-square?
yes
if your treatment has an effect would you expect a variability among your sample means?
yes
is logistic regression more flexible than other statistical techniques?
yes
is the whole more than the sum of the parts in qualitative reserach?
yes
do you need a larger sample in step wise regression?
yes (coz it capitalized on chance)
in logistic regression can you look at the interaction between two predictors?
yes (how they interact ith one another to influence the outcome)
does each predictor variable in logistic regression have its own beta coefficient?
yes (just like multiple regression)
can multiple regression be used for non linear relationships?
yes (looking at relationships between two IVs)
do psychologists frown on qualitative research?
yes and qualitative researchers frown on quantitive researchers
does -2LL go down when predictors are added into the model?
yes because as predictors are added we are explaining more variability in our model.
does qualitative research allow u study things in depth?
yes but you can only have a small number of cases
Does the level of research quality for the review process vary?
yes it varies based on the type of conference and the level of the journal-- where some conferences accept almost all submissions and others are very selective
are some conferences presentations invited?
yes some can be solely invitation based but most conference presentations are based on peer review (similar to publishing article to a journal they need to be peer-reviewed)
do we tend to turn qualitative data to quantitative data in psych research?
yess
when you report a multiple regression what do u report?
you report r and t and the beta for each IV
what would you do if your univariate tests is significant?
you would do a post hoc test (where are the mean differences occurring)
example of multiple regression
• What factors predict grad school success? -DV = grad school success (grad GPA) -IV1 = GRE scores -IV2 = research experience -IV3 = motivation Regression Equation: Y' = A + B1X1 + B2X2 ... + BkXk (x are the IVs, Bs are the regression coefficient which are weighted on each IV, Y is what we predict)
there are 2 commonly used effect sizes and all of them index association between two variables. what are they?
• correlation coefficient (r) • standardized mean differences (Cohen's d and Hegdes' g) • odds ratio (o or OR)
According to Syed and nelson what are the different ways in which we can establish reliability in qualitative research?
• gold standard/master coder (one person codes everything and someone else does 20% and as long as that matches with master coder then u have reliability) • reconcile differences via consensus (try to come to a consensus) • third-party resolution (if there is disagreement then a neutral third party will solve it) • averaging (average the number of smiles rated by diff coders)
as a method, what are the data collection strategies of qualitative research?
• open-ended interviews • direct observation • written documents
what are the strengths of meta analysis ?
• systematic and disciplined review process (consistent) • combining and comparing large amounts of data (can combine lots of data) • sophisticated reporting of data
Computer Programs - Meta-Analysis
•Stand-alone programs •Comprehensive Meta-Analysis •Mix •Review Manager (Cochrane Collaboration) •Statistical packages •R - open source statistical software •SPSS - macro available from David Wilson (George Mason University
what are the different comparative fit indices?
◉ Bentler-Bonett Normed Fit Index (NFI) ○ NFI > .90 (may range from 0-1)-- closer to 1 is good fit and anything less than .9 is poor model fit ○ influenced by n, with small n producing lower NFI ◉ Comparative Fit Index (CFI)-- tris to get around issue of small sample size ○ CFI > .95 (greater than .95 is good fit) ◉ Root Mean Square Error of Approximation(RMSEA)-- issues with small sample size ○ estimates lack of fit ○ look for .06 or less (.1 or greater is poor model fit and anything less is good fit)
what are the many different names for SEM?
◉ causal modeling ◉ causal analysis ◉ simultaneous equation modeling ◉ analysis of covariance structures ◉ path analysis ◉ confirmatory factor analysis
How can we assess the adequacy of a model?
◎ Closeness of model and sample covariance matrices - ◉ chi-square (non-significant is desired) ◎ Testing Theory - each theory (model) generates owncovariance matrix ◉ which theory produces estimated matrix closest to observedmatrix? ◎ Amount of Variance Accounted for by Factors ◉ amount of variance in DVs accounted for by IVs ◉ use R2 type stats ◎ Parameter Estimates - examine path coefficients ◉ relative importance of different paths
what are the disadvantages of SEM?
◎ Complexity ◉ jargon ◉ many techniques ◉ takes time to learn how to define variables, paths, etc. ◎ Ambiguity ◉ results may be difficult to interpret
What is R^2?
% variance in Y accounted for by X -tells us the amount of variability in the DV that can be accounted or by our set of IVs
give an example of a two factor design u can use MANOVA for
- 2 IVs: humidity (low/high -- 2 levels) and temperature (three dif temps) - DVs: speed, endurance, and quality * between subjects -- 6 diff samples 2 BY 3 FACTORIAL DESIGN (two levels of IV and 3 on DV)
(Wiest et al.) Conclusion
- Logistic regression is a powerful tool for assessing the relationship between a covariate or exposure and a binary event outcome and it allows us to easily adjust for potential confounders when we examine associations of interest
2 by 2 research design
- audience v no audience (IV) - low v high self esteem (IV2)
what are the statistical tests to asses model modification?
- chi square difference test - Lagrange Multiplier Test - wald test
(Hayes) Mediation, Moderation, and Conditional Process Analysis NOTES
- mediation is used to test hypotheses about the process by which one variable causally transmits its effect on another variable through one or more intervening mediator variables. the purpose is to explain how the effect of X on y operates (ex: self esteem identified as a mediator of the effect fo body dissatisfaction of depression and anxiety in adolescents) - Simple mediation model: Y = iY + cX+ eY (y is the outcome of interest, x is the cause, iy is the intercept, and ey is the error term, and c represents the association between x and y) - the purpose of mediation analysis is to better understand by which or how x affects y through a sequence of causal events in which x influences one mediator variable - the total effect of X can be partitioned into a direct effect and an indirect effect of X -M = iM + aX + eM The coefficient a in Equation 30.2 quantifies the effect of X on M (indirect) -Y = iY + c' X + bM + eY the coefficient b in Equation 30.3 quantifies the effect of M on Y statistically controlling for X (indirect) - bootstrapping is a resampling procedure that is used to generate an empirical approximation of the sampling distribution of statistic (it is the indirect effect ab) -bootstrap sample is analyzed by taking a simple radom sample of n cases from exisitng data, where n is the orginal sample and cases are sampled with replacement. (repeated 1000 times) - studies using the boot strap method found that compared to alternative models using the percentile based bootstrap approach provides a good balance of power, coverage, and type 1 error rate. - historically mediation analyses have been used after someone has established an association between IV and DV - now it is accepted that total effect is not a requirement of mediation or a pre req to mediation analysis because a lack of correlation does not disprove causation (increased aerobic activity might not be correlated with increased body weight but it may result in a consumption of more calories due to an increase in body weight which will increase body weight when all other things are held fixed) - Since the indirect effect is what carries information about the mediation of the effect of X on Y, rejection of the null hypothesis of no total effect is not a sensible prerequisite of mediation analysis. - mediation is causal process because u cant talk about to without using causal language - there is an assumption that there are no variables that confound he observed associations between variable in a causal system and in mediation this assumption is met partially. An association between two variables is confounded by a third if that third variable causally influences those two variables. (can never truly know if confounds are controlled) - we can never know in experiments if the mediator comes before the dependent variable - IVs can often transmit their effect on DV thru multiple mediators at the same time - multiple mediator models important for clinical research because they allow researcher to test competing theories and facilitate compare mechanisms that may be responsible for the effect of IV on Dv - When there is no hypothesized causal pathway between the mediators, the model is termed a parallel mediator model. (ex: some mediators did not mediate any relationships) - serial mediation model: A mediation model in which one or more mediators exerts its effect on another mediator (ex brain injury causes anger, which in turn causes depression, which subsequently leads to an elevated risk of suicide.) -. In models involving multiple mediators, the indirect effect through a given mediator is termed a specific indirect effect. The sum of the specific indirect effects across all k mediators is the total indirect effect of X -There are a few important properties of a parallel mediation model worth acknowledging. First, the indirect effect of X through a given mediator is expected to change with the inclusion of additional mediators if the additional mediators are correlated with the mediator in question. Second, the specific indirect effects are not influenced by the scale of the mediators. That is, each specific indirect effect is scaled only in terms of the scales of X and Y.
Meta-Analysis - Moderation
- random effects model - effect sizes will vary scores diff population and the size of effect is influenced by other factors. The relationship between X and Y that is weak at one level of the moderator and strong at another levels of the moderator
(Syed & Nelson) Methods of Coding and Establishing Quantitative Reliability: the coding process
- reliability isnt just a single coefficient presented in the methods section-- it is a process that involves multiple time intensive steps 1- the first step of the coding process involves deciding on a unit of analysis ( ex in a study of a hypothetical data set consisting of three self-defining memories for each of 200 participants -- unit of analysis can be the individual, the memory and where each memory received its own set of codes, or some defined text (words, phrases) -unit of analysis must be determined from the outset to make sure the coding process is consistent with the research questions and analytically feasible 2- the second step is to develop a coding manual. the first thing to do is to examine the research questions and determine which coding scheme will successfully address it. -coding schemes are generally developed in two ways- using a deductive theory driven top down approach or an inductive data driven bottom up approach -- theory driven approach involves deconstructing an existing psychological theory into codes that can be applied to the data whereas the data-driven approach involves the construction of a coding scheme based on the data collected - the second step involves becoming familiar with the data thru careful reading, watching or listening, rereading, rewatching, or relisting to the data collected. - after you review data a working coding manual shud take form. themes shud be notes and compiled in a dynamic documents and themes can be based on the theory the researcher is applying or can be derived purely from data - developing a coding manual shud be an iterative process (where coding categories are applied back tot he original data to ensure specificity and accuracy which can allow u to refine categories). iterative means to constantly refine - one of the most important decision made in coding development if the number of codes to be used. (more codes allow for greater complexity to be captured in data but comes at the cost of decreased reliability due to how complex the coding scheme is.) - coding manuals can be hierarchical, in which microcodes are nested within macrocodes 3. the third step after developing a coding manual is to train coders. -he general method for training coders follows a three-step procedure. First, the researcher should provide the coding manual for all coders involved in the project. This manual should be discussed in detail and all initial questions addressed. Second, the researcher should provide sample data randomly drawn from the data set, with which coders can practice the coding scheme. These initial codes should be discussed thoroughly with the coding team, and the exclusion and inclusion process should be detailed. At this stage, the coding manual is often revised to reflect common questions put to the researcher by coders in order to provide documentation of any decisions made in the early training stages. the quantitative researcher must make decisions about coding process, percentage to be coded to reach reliability, reliability coefficient needed, and any decisions about who will help with the coding process a priori. 4- the fourth step (establish reliability--one method)-- the fold standard/master coder. this is where one member in the research team is the the gold standard or master coder. the code all the narratives in the data set and a second member serves as the reliability coder. the will code a subset of the total data set. their ratings are just used to establish interrater reliability with the master coder. -coding of master codes is sued in the final analyses -other methods to establish reliability include one master coder and five reliability coders. 5- next step (establish reliability--another method) is to reconcile difference via consensus. Unlike the gold standard/ master coder approach, with this approach two (or more) members of the research team code all of the data, with interrater reliability calculated based on the entire set. Any discrepancies in the coding are then discussed by the research team, and resolved through consensus, and thus the final set of codes for analysis are based on multiple researchers' input, rather than just one. (weakness is if one member of the research team is convincing everyone that they are correct) - to avoid coercive consensus is to have a third member of the research team—neither of the original two coders—resolve the discrepancies. To use this approach, it is critical that the person doing the resolving has sound judgment and is well versed in the coding manual. - another way to resolve discrepancies: averaging which involves taking the avg of two raters coding
(Fink et al.) Considerations for Clinical Psychologists Using Epidemiologic Methods
- two considerations: availability of data, many of these studies employ what is known as complex sampling designs -many clinical psychology research have relied on non representative samples. they can be helpful despite the non random sampling but lack the design-based rigor to support strong generalizations and inferences about larger populations - many epidemiologic studies are designed and conducted with certain sampling methods -be aware that epi research is different from psychological where the design and statistical analyses are different and require special training (an issue for many clinical psychologists who dont realize this)
list ONE advantage and ONE disadvantage of MANOVA over ANOVA.
-Advantage: reduces type 1 error if you have multiple dependent variables. - disadvantage: many assumptions need to be met for MANOVA.
(Schmidt) Detecting and Correcting the Lies That Data Tell
.
Discriminant Function Analysis (DFA)
.
EPIDEMIOLOGY AND QUAL
.
PUBLISHING AND PRESENTING RESEARCH
.
if some IVs are not good predictors we can remove them or change them and others
.
interpret analysis on MANOVA quiz 5 for number 7.
.
methodology involves the construction of hypotheses and theories through the collecting and analysis of data • typically involves the following characteristics: • simultaneous involvement in data collection and analysis • construction of analytic codes and categories from data • use of the constant comparative method/analysis (compare new codes to old codes) • developing theory during each step of data collection and analysis
.
some important IVs that share variance with other IVs in a standard multiple egression may not be significant although the two IVs in combo are responsible in large for the size of r squared
.
if everyone reacts the same in all ur treatment conditions, what should ur variability be? f ratio? (true null)
0 - f ratio would be 1
what are the three steps of ANOVA?
1. compute within treatment and between treatment variance 2. determine if significantly different by computing f ratio 3. if f ratio is significant than the null is reject. Post hoc tests can then be used to determine specific mean differences.
(Meltzoff and cooper) The Stages of Research Synthesis
1. formulating the problem 2. searching the literature 3. gathering info from studies 4. Evaluating the quality of studies 5. Analyzing and integrating the outcomes of studies 6. Interpreting the evidence 7. Presenting the results
what are the two reasons why we might want to modify our model in SEM?
1. improve model fit 2. test hypotheses
name three indices to assess reliability with qualitative research
1. percent agreement 2. delta 3. kappa
how many indicators do factors have?
2 or more
what does a .345 r square mean
34.5% of the variability in DV is accounted for by the IVs in the regression model
What are randomized controlled trials?
Can be considered a special form of prospective cohort studies
What are counts?
Counting the number of people in a group that are studied with a disease or disease characteristics (ex: 42 students at a certain dorm in the U of m campus had a certain illness)
What is standardized ratio?
Differences between groups take into account another important characteristic (age, ethnicity, etc)
How can we make a count descriptive of a group?
It needs to be seen in proportion to that group which involves count divided by total number in the group which gives us a relative frequency
DFA is ______________ turned around
MANOVA
Parole Board Example
MANOVA -IV: recidivism (criminal; non-criminal) -DVs: education level, drug use, # of problem behaviors that occurred in prison, level of good behavior in prison DFA -IVs: education level, drug use, # of problem behaviors that occurred in prison, level of good behavior in prison -DV: recidivism (criminal; non-criminal)
Is everyone in a population at risk of developing a disease?
No
What is proportional mortality rate?
Number of deaths due to a particular cause divided by the total number of deaths -ex: leading cause of deaths in males 18-25
what does the solution given by SPSS tell us?
Solution gives the odds of being in response group givensome value on predictor
what are the different kinds of conferences presentations ?
Symposia Panel Discussions Round Tables Posters
what is the null hypothesis for one way anova?
There is no difference among sample means at any of the levels of the IV
what is a commonly used post hoc test?
Tukey's
What is proportional mortality rate often used?
When your can't use total number t at risk as the denominator
what is the most commonly reported test for multivariate testing in MANOVA?
Wilks Lambda
What is inductive reasoning?
a bottom up approach where u go from specific to general stuff (typically used in qualitative). first u examine data and then come to conclusions. you amke an observation and then look for patterns and then draw a general conclusion
what does a significant multivariate MANOVA allow u to do?
a significant MANOVAmeans univariate ANOVAs can be interpreted
what are the sections of a research paper/manuscript?
abstract, introduction, method, results, discussion
how to run DFA in spss?
analyze > classify > discriminant - grouping variable is DV (define groups where 0 is non perpetrator and 1 is perpetrator) -IVS go in independents and then u will determine order of entry which is either stepwise or standard
hwo do you run a standard multiple regression in SPSS?
analyze > regression > linear -drag DV and IV in appropriate boxes
what does odd ratio of 1.73 mean?
as someone's scale score goes up by one unit the odds of them being a perpetrator (or whatever the variables are) goes up by 1.73 times
what do positive values of beta mean?
as the predictor goes up then the probability of DV goes up
what shud the sample size be for small to medium sized models of SEM?
at least 200 (but paths are limited-- cant specify many pathways)
what is the alternative hypothesis for one way anova?
at least one of the sample means comes from a population different from that of the other samples means.
when cant you have outliers in MANOVA?
because MANOVA is sensitive to extreme scores
why are the equations produced by the logistic regression more complex than those in multiple egression?
because there is so linear relationship in logistic regression
what is the purpose of qualitative research?
can be used for program evaluation, academic research, and thesis or dissertation. Good research drives theory in academic research
what are the values of Y'?
can range between 0 and 1
what means do we look at for interaction effects?
cell means (all means in all conditions)
what is stepwise entry?
combo of forward and backward entry starts empty and add variables based on which is the strongest where they are added one at a time where the best predictor is first and then the next best but they can be deleted at any step if they are no longer contributing significantly to the regression equation
what is an example of an ANOVA study?
comparing the effectiveness of three different teaching methods meaning that teaching methods is the IV. DV is the measure of student performance. There are three different teaching methods and each will be compared to the other. ◦ Making 3 comparisons: Method A and B Method B and C Method A and C
what are latent variables?
constructs or factors we cant directly measure but can use scales to get a close measure
what is the input for SEM planning?
covariance matrix of all the variables included in the model
whata re examples of group membership?
criminal or non criminal, high or low blood rpessure.
which of the following is required in order to conduct a logistic regression? a. continuous independent and dependent variables b. dichotomous independent and dependent variables c. continuous or dichotomous independent variables and a continuous dependent variable. d. continuous or dichotomous independent variables and a dichotomous dependent variable.
d
what does DFA stand for?
discriminant function analysis
what is the only way to improve the accuracy of you logistic regression model?
find better predictors
main effects can appear on a graph
if there are different lines for different levels of that IV (for humidity for example) and if the lines have a slope (going upward or downward)
what si the goal of logistic regression?
is to correctly predict the outcome of individual cases based on a set of predictors. it allows us to predict a discrete outcome (ex: predicting group membership -- do they belong to a group or not-- so we can use multiple regression because were predicting a dichotomous outcome)
(Hayes) what is the advantage of using structural equation modeling?
it allows for the estimation of latent variable models which can help reduce the effects of measurement error on the estimation of effects when a structural model is combined with a good measurement model of the latent variables.
what are the reasons for coding study characteristics?
its descriptive and for the purpose of comparison
what is a revise and resubmit?
likely to decision will be reject the manuscript. It means you can no longer submit ur manuscript to journal u send it to but u can revise and submit to another one
what is R?
multiple correlation coefficient
is it common to have more than 2 or 3 significant discriminant functions?
no
path coefficients
numbers on the arrows lines
What is naturalistic observation?
observing and recording behavior in naturally occurring situations without trying to manipulate and control the situation (they wont know u are observing)
in anova how many DVS do you have?
one
what do larger log likelihood values mean?
poor fit of the model
what are poster presentations?
posters are the first experiences that students have in conferences. in these conferences, researchers pin-up posters on research studies describing the research and allow ppl to walk from poster to poster and browse and ask questions (the research is very condensed)
What is publication bias?
preferring manuscripts that have significant results
how do u enter IVs into SPSS in stepwise regression?
put them in blocks and pick stepwise method in the dropdown box below the blocks
DFA does classification. what is classification?
putting ppl into groups based on a set of predictors
what category does SPSS solve for in logistic regression?
solves for category coded as 1 (1 should be the response category like illness and the reference category should be 0 which means no illness)
what is a measurement model?
specifically describes how measured variables relate to factors is often known as the measurement model
_____________ multiple regression treats each variable as if it were entered ________ in the regression equation
standard; last
what does a r^2 change of .068 mean
that the variable explains an additional 6.8% of the variability in DV
what does the iteration history table tell us?
the -2LL value when no predictors are added (null table)
what is the initial classification table at step 0 of the logistic regression?
the null model before any of the predictors have been added - in this model SPSS will predict that everyone is in the bigger category since there are better odds of ppl being in the group that is bigger (better odds of passing than failing). SPSS aims to maximize the likelihood of correct prediction.
what are t-tests?
they compare two sample means
in what ways are research findings communicated with others?
thru conference presentations and journal articles , and applied outlets
what are we trying to do in DFA?
we are trying to see if we can predict group (DV) by knowing their scores on the attitudinal, experiential, and situational variables (IVs) (can we reliably predict group membership based on a certain set of predictors) ** basically we are discriminating variables
are practitioner journals easier to publish compared to academic journals?
yes
when DV or outcome variable is dichotomous, is the assumption linearity vioalted?
yes almost always so the solution is to transform the data using logarithmic transformations.
(Hayes) can moderators be either categorical or continuous?
yes and can moderate the effect of a categorical or continuous variable
what are the three different kinds of logistic regression?
• Direct Logistic Regression • Sequential Logistic Regression • Stepwise Logistic Regression
Computer Programs for SEM
◎ LISREL ◎ EQS ◎ AMOS ◎ MPlus ◎ Stata ◎ R
what are the advantages of SEM?
◎ More powerful and flexible than multiple regression ◎ When relationships among variables are examined, they are free from measurement error ○ Measurement error estimated and removed ○ Unreliability of measurement accounted for ◎ Complex relationships may be examined ○ Allows us to test entire model consisting of multiple, complex relationships ◎ Combination of continuous and discrete variables; observed and latent variables
MANCOVA Example (anxiety example from earlier)
◦ Adjusts for pre-existing differences in anxiety ◦ Then, tests for DV differences across conditions
what is multiple regression?
An extension of simple/bivariate regression that uses more than one predictor variable.
An F ratio close to 1 would indicate..
An f ratio close to 1 indicates no significant difference between group means.
(Schmidt) Measurement Error and Construct Redundancy
- One of the major problems in psychology is construct proliferation. Researchers frequently postulate new constructs that are questionably different from existing constructs, a situation contrary to the canon of parsimony. For example, is job involvement really different from job satisfaction? -Proper corrections for measurement error are now making another contribution: They are showing that some constructs are probably completely redundant at the empirical level. For example, our research has shown that measures of job satisfaction and organizational commitment correlate nearly 1.00 when each measure is appropriately corrected for measurement error. (conceptually distinct but not empirically distinct) - again simple but appears complex researchers appear addicted to sig testing despite most incorrect interpretations being based on sig tests - some false beliefs about significance tests 1. ''If my finding is significant, I know it is a reliable finding and the probability of replication is about .95 (1 minus the p value).'' --false coz statistical significance has no bearing on the probability of replication (probability of replication is usually around .5) 2. ''The p value is an index of the importance or size of a relationship.'' -- false coz p value does not tell u the size or importance of a relationship 3. ''If a relationship is not significant, it is probably just due to chance and the real relationship is zero.'-- false most nonsig findings are due to low statistical power to detect relationship do exist 4. ''Significance tests are necessary if we are to test hypotheses, and hypothesis testing is central to scientific research''—false because physical sciences like physics and chemistry test hypotheses without significance tests -- they use effect sizes and confidence intervals 5. ''Significance tests are essential because they ensure objectivity, which is critical in science.'' false coz confidence intervals are just as objective as sig tests and provide more info 6. ''The problem is only the misuse of significance tests, not the tests themselves.'' --false coz even when they are not misinterpreted sig tests slow down the development of cumulative knowledge but effect sizes and confidence intervals promote cumulative knowledge.
(Fink et al.) Psychiatric Epidemiology Methods Intro Notes
- Psychiatry epidemiology- the study of how psychiatric conditions are distributed in the population nd the risks and causes for them and it includes the application of this knowledge to prevent disease and improve population health and well being - the two goals of psychiatric epidemiology are: 1. to study the distributions of psychiatrics disorders within and between populations and 2. identify the causes of psychiatric disease and wellbeing -article presents an overview of study designs and then reviews statical methods and analytic approaches commonly applied in psychiatric epidemiology to describe the distribution of disease and identify causes of disease
(Meltzoff & Cooper) Protection of Human Participants
- The Nuremberg Code, the Declaration of Helsinki, the Belmont Report, and the American Psychological Association's Ethical Principles of Psychologists and Code of Conduct (hereinafter, Ethics Code) all provide principles and standards for the ethical treatment of participants. - u dont make ethical decisions alone. research conducted at major research facilities need to undergo external review by an IRB (institutional review board) - APA publishing requires researchers to certify that they have followed the Ethics Code when they submit their article and are encouraged to mention this certification again when they describe their procedures or in the Author Note. - As a critical reader, you should have reasonable assurance from the researchers that people participated in the research willingly and without coercion of any kind. the researchers might mention that (a) participants took part only after being fully informed about their role in the study and any risks that might be involved, (b) inducements to participate must not be so excessive as to be seen as undue pressure, and (c) signed informed consent was obtained in all except minimal-risk studies in which the identity of individual participants is not disclosed.
(Morrison et al.) Best Practices When Testing Models-- Data-Related Assumptions
- The most commonly used estimation method in SEM is maximum-likelihood (ML). -ML has various assumptions including: 1) there will be no missing data; and 2) the endogenous (or dependent) variables will have a multivariate normal distribution. -If a complete datafile is unavailable, then the researcher must test whether the data are missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR). -An alternative approach is using Multiple Imputation (MI) to estimate missing data.
(Hayes) Interpretation of Regression Coefficients
- The regression coefficient b3 quantifies the difference in the effect of X on Y for each one-unit difference in W. when moderator W is dichotomous, then b3 correspond to the different between the effect of X on y in the two groups (ex: being assigned to regular therapy or new experimental therapy) - one misinterpretation for b1 and b2 is that they represent main effect of IV and the moderator which results from generalizing concepts from factorial ANOVA literature to all lineal models - b1 and b2 are close to what are known as simple effects - it is important to understand the correct interpretation of b1 and b2 because it sheds light on why the lower order terms X and W should be included as predictors coz removing X and W from the model wud constraint b1 or b2 to 0
(Meltzoff & Cooper) Conducting and Reporting Research
- There is more to conducting ethical research than treating participants well-- more to it like how u conduct research (making rules to allow certain participants to be dropped after results are known) and how u report it (e.g., conducting statistical analyses that come out differently than the researchers would like, but which are never mentioned)-- these are ethically questionable behaviors - sometimes ppl do this after IRB approval is given - gross ethical violations: like fabrication or falsification of data - fabrication and falsification of data are usually uncovered by complaints from associates who become aware of them or scholars who become sus when reviewing data that have unexplained inconsistencies -- or data that looks too good to be true - ethics committees deal with fabrication of data - ethical committees of professional orgs look at these and also the Office of Research Integrity of the Department of Health and Human Services investigates these complaints - the outcomes of these investigations are reported on the office of research integrity website - if there are ethical violations in an article the article will be reacted and a notice will be printed on the paper online - if there is a less severe error then the article will appear with a statement of correction of erratum
how do we establish reliability in qualitative research according to Syed and Nelson (2015)?
- reflects a post positivist traditions - you can inter rater reliability (rate of agreement between different raters) -first decide on a unit of analysis (chunk of tweet or whole tweet) and then develop a coding manual and then train coders. After that u will establish reliability
(Morrison et al.) Best Practices before Testing a Model: Model development
- When formulating a model, a critical issue pertains to the number of manifest indicators that one should have for each latent variable. The consensus is that ≥ 2 indicators per latent variable is required. In terms of an upper limit, however, no consistent recommendation emerges. -Content validity may be defined as the relevance and representativeness of the targeted construct, across all features of a measure. can be done in the following ways: 1. extensive lit review of the cosntruct, 2. consulting with stakeholders from relevant groups that are able to furnish valuable insights about the construct; 3. using experts to gauge the suitability of all items designed to measure the construct -scale score reliability, the most popular estimate is Cronbach's alpha, which is the "expected correlation between an actual test and a hypothetical alternative form of the same length" (.8 or higher is good) -Cronbach's alpha has been subject to considerable criticism and that other forms of scale score reliability have been recommended such as Omega-- cronbachs might not be the best because it operates on assumptiosn that are abely met with real world data -two principal forms of construct validity: convergent and discriminant - Convergent validity examines whether scores on the measure that is being validated correlate with other variables with which, for theoretical and/or empirical reasons, they should be correlated - discirmannt validity: constructs that shud not be assocaited or correlated do not correlate
(Morrison et al.) Reliability and Validity
- When testing each measurement model, using confirmatory factor analysis, output can be used to assess indicator and composite reliabilities as well as convergent and discriminant validities. -Indicator reliability (IR) refers to the proportion of variance in each measured variable that is accounted for by the latent factor it supposedly represents. Calculating IR is straightforward as it merely involves squaring the standardized factor loading for each measured variable. .7 or higher is acceptable - The average variance extracted (AVE) may be used to test the convergent validity of the measurement model. To compute AVE for a given latent variable, simply square each standardized factor loading, sum them, and divide by the total number of loadings. - Assessing discrimannt validity: Using latent variables Y1 and Y2 as hypothetical examples, the researcher would first calculate AVE values for the two variables and then contrast these values with the squared correlation between Y1 and Y2. If both AVE numbers are greater than the square of the correlation, discriminant validity has been demonstrated.
(Fink et al.) Cohort studies
- a cohort study is longitudinal study that follow one or more groups of individuals (cohorts) who are or have been or in the future might be exposed to a hypothesized risk factor for a disease. - cohort studies compare the proportions of individuals within each group who develop a disease over a specified period of time - can be prospective or retrospective in cohort studies ppl must be disease free at the start of the risk period -primary aim of cohort studies: to estimate incidence (which is frequency of new cases of disease) -allow researchers to measure covariate multiple times thru study to minimize effect of confounding variables in the design phase , or the analytical phase
(Meltzoff and cooper) STAGE 4: EVALUATING THE QUALITY OF STUDIES
- after gathering info from studies synthesists shud make critical judgments abt the quality of studies and how well the study methods allow the inferences the diff study topics call for - Each study is examined to determine whether it used valid and reliable measures and appropriate research designs and statistical techniques for the topic or question
(Meltzoff & Cooper) The Research Report
- back then it sued to be that there were page number constraints which limited the amount of info a article can have -- issue has largely disappeared tho coz most journals are only online aand printed journals use online supplemntal files to show additional info like methods and reults so this isnt an issue anymore - the loosening of length constraints allow more info and allow u to critically examine studies and replicate them - it is required in some cases for researchers to place the raw data they collected into data repositories open to the public so other researchers can see the actual data collected -- lets u determine if the report is consistent with the data - there is also a growing interest in the details that shud be included in a report - need to move back and forth in the manuscript to ensure what the researchers write is consistent between the sections
what is the first step of SEM?
- begin with specifying the model (based on theory-- it is confirmatory and not exploratory because u are specifying the model-- we lay out exactly what paths we think will be connected) - the model is then estimated and evaluated - you need to begin with a covariance matrix based on a substantial sample size
what are the four reserach paradigms?
- positivism: basically says we can control and predict everything and reduce complex things to simple measurable things (everything is scientific-- like the emdical mdoel) - Postpositivism: said that it is hard to know reality but we can approximate it (we know we wont predict things perfectly but we can get close with the use of standardized approach)-- typically seen in psychology -- seen in qualitative psych - Critical theory: the idea of critique and transformation and challenging status quo. (often in social psych and qual) - Constructivism: understanding and reconstructing-- this basically says that everyone has their own reality and their own whay of seeing the world (seen in qualitative research in social psych)
(Fink et al.) Case-Control Studies
- case controls studies differ from cohort studies in that they compare groups of ppl differing in outcome status (cohort ones compare based on exposure status) - advantage of case control over cohort: it is very efficient in investigating very rare disease (save resources--time and money) -main challenge to using case control studies: to recruit controls that represent the distribution of experiences of the population that gave rise to the cases - Case-control studies, much like cohort studies, begin with articulating a specific question and specifying a source population (find a group that is known to have the outcome) - secondary base: individuals who would have been cases if they developed the disease of interest during the study period (people who if they did develop psychosis wud be diagnosed with it during the two year period of the study) -Whereas a secondary base starts with the cases and then attempts to identify a hypothetical cohort that gave rise to them, a primary study base can be defined before cases appear -Primary study base: a well-defined population that gives rise to cases before cases appear, typically defined by a geographic area or existing cohort study. (ex: studying the association between physical illness and suicide in elderly-- study base is ppl aged 65 or older living in a certain city) - limitations: 3. 1) case-control studies are poorly suited for investigating the effect of rare exposures. 2) selecting cases and controls from different source populations can introduce bias (hospital control sample is problematic due to comorbidities coz more likely to seek treatment or be hospitalized) 3) accurate measurement of past exposures can be challenging to assess (the experience of being diagnosed with a disease can change the perception of past experiences, difference in recall, etc)
(Fink et al.) WHAT IS A CAUSE?
- central aim of epidemiology is identifying the causes of of disease - john locke definition of cause: "that which produces any simple or complex idea, we denote by the general name cause', and that which is produced, 'effect - a cause has two definitive properties: direction and time order. cause occurs before the outcome and it changes the outcome -causal relationships can be ensured by study design like cohort studies (involves measuring the exposure before the outcome occurs) or an experiment (where the exposure if manipulated thru randomization) -A causal effect is the difference in a subject (e.g., a person, a clinic, a school) under two different states.(ex; someone can have a certain level of PTSD symptoms if they were exposed to a combat zone in the viet war, and wud have a diff level of PTSD if they were not in a combat zone) - the fundamental problem with causal inference: An object can only be observed under one exposure. to overcome this- u use groups of ppl to estimate causal effects (compare PTSD symptoms of ppl deployed to viet to ppl in non combat area) - exchangeability- implying that the average risk of the outcome in the unexposed group is equal to the average risk in the exposed group had the exposed group not received the exposure(enabled thru randomization) -exchangeability of exposure groups can be assessed using direct acyclical graphs (DAGs) to identify the e minimum set of variables sufficient for confounding adjustment to assume exchangeability between exposure groups. This is done through mapping out the causal relations between the exposure and the outcome under study.
What are the types of research designs in epidemiology?
- cross sectional or prevalence studies - cohort or incidence studies - case control studies
(Meltzoff and cooper) STAGE 3: GATHERING INFORMATION FROM STUDIES
- data colelction invovles the info abt each study the resarchers have decided are relevant tot htie problem - includes information that is not jsut relevants to the theroretical and practical questions (nature of iv and dv) BUT ALSO INFO ON HOW THE STUDY WAS CONDUCTED - NEED TO TRAIN CODERS AT THSIS TAGE SO THEY AGTHER INFO FORM STUDEIS IN A RELAIBLE AND CONSISTENT WAY - make sure coders have clear instructions and thorough and coders need to eb well trained and evidence about the reliability of coders needs to be presented
what are the different roles in a symposia?
- presenter/speaker: describe their individual research studies -Chair: introduces them and describes how the different studies fit together into a cohesive whole - Discussant: talks about the advantages and disadvantages of the research
what are the 3 different types of data analysis?
- primary analysis - secondary analysis - meta analysis
(Schmidt) Data Distortions Beyond Sampling and Measurement Error
- data errors, range restriction, dichotomization of measures, and imperfect construct validity. - Data errors—typos, coding errors, transcription errors, etc.— prevalent in literature (hard to identify and difficult or impossible to correct)-- they are a non systematic source of variability - , range restriction is a systematic artifact. Range restriction reduces the mean correlation. Also, variation in range restriction across studies increases the between-study variability of study correlations. Differences across studies in variability of measures can be produced by direct or indirect range restriction (DRR and IRR). - DRR is produced by direct truncation on the independent variable and on only that variable. For example, range restriction would be direct if college admission were based only on one test score, with every applicant above the cut score admitted and everyone else rejected. - Most range restriction is indirect. For example, self-selection into psychology lab studies can result in IRR on study variables. - can correct range restriction in meta analysis - . Researchers often dichotomize continuous measures into ''high vs. low'' groups. This practice not only loses information but also lowers correlations and creates more variability in findings across studies -can correct dichotimizaiton in met analysis - imperfect construct validity: Even after correction for measurement error, the measure may correlate less than perfectly with the desired construct. Degree of construct validity may vary across studies, causing between-study variability and typically lowering the mean. Correction for this requires special information, is complicated, and is often not possible - when not possible to correct for one of these the researcher should take this fact into account when interpreting and reporting the meta-analysis results. The meta-analyst should clearly state that the variance left unaccounted for could be due to these uncorrected artifacts. -- prevents in correct or inaccurate conclusions - don't let the data speak they can lie
when you are searching the literature to do a meta analysis what do you need to do?
- define a sampling frame (wat will ur search include) - identify exclusion and inclusion criteria of studies - search techniques (recall and precision)
Many of the statistical tests used by epidemiologists are similar to psych? Wat are some examples?
- distribution of of scores and basic descriptive stats - more emphasis on non parametric tests(chi square and tests for ordinal data) - correlation, regression (esp logistic regression) - proportional hazards model: modified version of logistic regression which allows for people being in the study for diff lengths of time (allows for complex examination of longitudinal data)
(Schmidt) Other Examples of Lying Data
- do different personality traits predict success in different cultures? - found that any given personality trait was significantly correlated with expatriate success in some countries but not in others - This appeared to indicate that different personality traits are important in different countries—something that companies would need to consider when deciding which manager to send to which country.-- however all the variability in validity across countries was explained by sampling error - effect of cultural differences was a data illusion - simple but appeared complex on the surface level
(Meltzoff & Cooper) Drawing Inferences From the Data
- drawing inferences from your data should be done in a way that it consistent with the results and not misinterpreting the data - you need to examine the data which will tell you if the research question as answered or not - looking at the data will allow u to tell if conclusions are justified based on the data - looking at the data will also tell you if the data is generalizable based on the sample - data will tell u if the findings conflict with recent research - If the study has been well-conceived, well-executed, and correctly interpreted, and if the results come out as predicted, problems with interpretation and conclusions should be minimal. - when you get results opposite to ur expectations then the researchers explanations and interpretations might have problems (its hard to relinquish beliefs even when the data tells u smth against it) - sometimes researchers obtain the results the expected but the conclusions and generalizations go beyond the data (cant say women are better than men at abstract reasoning as a conclusion) - The researchers, however, would be in error if they interpreted the result to indicate that similar mean differences would be expected 95% of the time if the study were repeated with samples that differed in meaningful ways. At best, the researchers could propose that similar differences might be obtained on similar measures given to other samples drawn from this college and perhaps even from other similar colleges. - A finding for a study is that morning productivity significantly exceeds afternoon productivity (p < .05). The researchers conclude, "People are more productive in the morning than they are in the afternoon." One cannot generalize from this sample of "assembly line workers" to "people"
(Fink et al.) Following the Cohort and Identifying the Outcome
- final stage of cohort study involves following participants until they develop disease, leave the study (death), or reach the end of the study - many methods during the follow up period to keep contact with the participants (for identifying new cases): ranges from postal mail or online questionnaires to phone, in person interviews, lab tests, or registries containing detailed Info abt each variable of interest. - follow up can range from a few miss to years -after obtaining info abt development of disease the investigators will use info abt exposure of interest and other risk factors measured earlier to examine how the risk factors differently predicted development of a disorder in cohorts with and without disease.
multiple regression example for SPSS
- what factors predict health? -DV: measure of health (Trauma Symptom Checklist) IVs: • Attitudes Related to Interpersonal Relationships • Negative Network Orientation • Positive Network Orientation • Self-Concealment • Perceived Availability of Social Support • Highest Level of Sexual Assault Experienced (0-11) • Current Drinking (quantity by frequency)
(Meltzoff and cooper) The Elements of Meta-Analysis
- finding multiple studies presenting data for a research question and many of them have different results where some show sig differences and some dont. it is the job of meta analysts after looking at all these studies to decide whether reject the null or accept it based on results from multiple studies - many ways to synthesists to answer this question (shud not discard studies jsut coz they dont have sig results or methodological flaws) Averaging effect sizes and measuring dispersion: - the main results that met analyses report are avg effect sizes and measures of effect size dispersion or variance - can calculate the odds ratio of each study and then weight each by its sample size and avg the weighted effect size across studies - calcualte confidence interval which wud then be used to test the null hypothesis (no 0 shud be in confidence interval-- if no 0 then sig) TESTS OF INFLUENCES ON EFFECTS - an advantage of performing meta analysis is that it allow synthesists to formally test hypotheses about why the outcomes of studies differ and about why effect sizes vary from one study to another - homogeneity analysis: finding more variation if odds ratio from one study to the next(like analysis of variance) -In sum, a typical meta-analysis should contain (a) estimates of average effect sizes with confidence intervals, (b) homogeneity analyses to assess whether the variance in effect sizes is greater than would be expected by sampling error, and (c) moderator analyses that examine study features that might influence study outcomes. There are other procedures and statistics that you will find reported in meta-analyses (Borenstein, 2009) but these are the basic elements. If you do not find them in the report, something may be amiss.
what are the levels of measurements we can use for multiple regression?
- for IV we can use nominal through ratio (can use both discontinuous and continuous levels of measurement) - for DV we can u interval or ratio
what are the types of entry for stepwise regression?
- forward selection - backward deletion - stepwise regression
(Boedeker & Kearns) Specification of prior probability
- four methods for specifying prior probabilities: (a) assuming equality across groups, (b) assuming equality to the data distribution, (c) using a known population distribution, and (d) using the cost of misclassification. a. first method essentially admits no prior information about differences in group membership. b and c. the researcher may assume that the sample reflects the distribution of cases in the population and set the prior probabilities to reflect the percentages of group membership in the data d. prior probabilities may be based on the cost of misclassification. Cancer diagnoses provide an example of when this approach is useful. Inaccurately classifying a tumor as benign is more lethal than inaccurately classifying it as malignant. If it is possible to numerically determine the costs of misclassification, then this information may be utilized in the prior probability.
what are the different kinds of abstract?
- general - structured
how do u plot multiple IVs on a graph?
- generally the IV is on the X axis and on the DV is on the Y - the IV with more levels if you have multiple IVs wud be on the x axis. the one with fewer levels wud be depicted using diff colored lines
what are the advantages of blind peer review process?
- high quality research is published - makes the final research paper stronger
(Meltzoff & Cooper) VULNERABLE POPULATIONS
- include children, mental patients, prisoners, or others who may be involuntarily institutionalized - When children are participants, signed consent is required of parents or guardians, as well as the assent (the childs consent) of participants who are under their jurisdiction. - When participants want to withdraw from a study, they must not be pressured to continue against their wishes. -authors of a research report have the obligation to describe the methods that were used to comply with the ethical principle, with particular emphasis on any special safeguards that they used. - researchers who work with animals have unique obligations to protect the animals' welfare (how were the animals handled?)
what are some other ways to determine good fit?
- proportion of the variance accounted for - goodness-of-fit index (GFI)and adjusted fit index (AGFI)-- similar to R^2 in multiple regression TELLS U THE AMONT OF VARIANCE IN THE SAMPLE SAMPLE COVARIANCE MATRIC THAT IS BEING ACCOUNTED FOR BY POPULATION ESTIAMTED COVARIACNE MATRIX - residual based indices - root mean square residual(RMR) and standardized root mean square residual(SRMR)-- indicators that consider the avg differences between sample covariances and population estimated covariances (small values are good-- good model, .8 or below is good model fit)
what is included in a discussion section?
- puts t- start it talking abt if there was support for hypothesis - then talk about how it was similar or different to past research findings - interpret reuslts 0 and then if its generalizable and then implications of the research like what wud it mean for real world and how can it be used and future directions
(Schmidt) An Example of Lying Data
- interpretations using significance tests dont usually have the best interpretations and are usualyl not correct compared to the bas interpretations that accept correlations at face value and dont have sig tests associated with them - sampling error: is the random departure of statistical estimates computed on samples from values in the population (the values of interest). - sampling errors vary randomly around 0 and the smaller the sample the more widely they vary - researchers underestimate the impact of sampling error in their data --not many realize that sampling error can produce so much variation in their data - correlations are still high even if only 70% of the variance is explained by sampling error - measurement error: error in measurement that exists in all data - reliability values can be used with each correlation in a standard formula to correct for biases created by measurement error (the classical disattenuation formula) -- significance levels are not effected by corrections to measurement errors in correlation values - variability increases when u correct the measurement error since this correlation increases sampling error variance - you can correct for both measurement error and sampling error to account for almost all variation (single value = no variation) - this interpretation is more parsimonious and an example of Occam's razor in data interpretation: it exemplifies simplicity underlying apparent complexity (simplifying complex data) - data can lie - all data are distorted by both sampling and measurement error - when the measures used are not the same in all studies the reliabilities will vary and this creates additional variance beyond sampling error. correcting for measurement error produces additional reductions in variance beyond that produced by correcting for sampling error. If this happens u need to use the , the Hunter-Schmidt method of meta-analysis
what are the disadvantages of using MANOVA compared to ANOVA?
- it is a more complicated analysis - less powerful than ANVOA - need to meet many assumptions when conducting a MANOVA
(Wiest et al.) Assessing goodness of fit
- it is important in any regression model to assess the validity of assumptions we make when we fit the model - most of the techniques for assessing assumptions in linear regression don't work in logistic regression - R^2 cannot be used to measure the proportion of variance explained in logistic regression because there is no natural analog for this statistic in logistic regression - instead for the goodness of fit we rely on visual inspection of diagnostic plots - Plots that overlay observed and predicted proportions within categories of continuous predictors can help evaluate model fit
(Boedeker & Kearns) Two Purposes of LDA
- it was initially created as a method for finding linear combinations of variables that best separated observations into groups, or classifications. Using these linear combinations, researchers can learn which of the variables contribute most to group separation and the likely classification of a case with unobserved group membership 1. DESCRIPTION - When LDA is used to describe group differences on a set of variables, the method is often referred to as descriptive discriminant analysis (DDA-- discrimination and separation). DDA is used to describe which of a set of variables contribute most to group differentiation 2. PREDICTION- (classification and allocation) when LDA is used to develop classification rules for predicting group membership of new cases with unknown classification, it is often referred to as predictive discriminant analysis (PDA). There is separate linear classification function (LCF) derived for each group -- LCF discriminant functions -LDA CLASSIFCAITION PROVIDES POSTERIOR PROBABILTIEIS: A single case will have a posterior probability of membership for each group (e.g., .10 for Group 1, .24 for Group 2, and .66 for Group 3), and these probabilities will sum to 1.00 across groups. Each posterior probability is the probability that the case in question, given the observed data for that case, is a member of the given group, characterized by the data of the group's members - A Bayesian derivation of LDA for classification utilizes prior probabilities of group membership: can be based on what is already known about the population distribution. If there are two possible classifications (e.g., successful or unsuccessful treatment), and one classification group has contained only 5% of cases in the past, the analyst would want to classify a case into that low-occurrence group only when the evidence for doing so is very strong (Klecka, 1980). In this case, the prior probability for the low-occurrence group could be set to .05, and the prior probability for the other group could be .95, to reflect the known population distribution
what are the steps of understanding the degree of relationship between the IVs and the DV
- look at the multiple correlation coefficient R - then run a significant test for R
(Hayes) Plotting and Probing a Moderation Effect
- make a visual depiction of a moderation effect to know what it is telling u to see how IVs effect on Y varies with the moderator -this might involve choosing combos of values of Iv and moderator and plug into regression equation which will generate estimates of Y for those combos - for continuous X or W the most commonly used values are sample and one SD above or below the mean - create a plot of the model - it is also useful to use inferential methods to test for the presence of a conditional effect of X on y at values of W chosen (popular method is the pick a point approach to probing an interaction which is aka an analysis of simple slopes) -Several textbooks provide formulas to calculate the standard error of a linear function of regression coefficients, but the regression-centering approach makes inference about a conditional effect in a regression model easy. his method is best understood by recalling that the coefficient b1 represents the conditional effect of X on Y when W = 0. By centering W around the value of interest w (i.e., subtracting w from every value of W in the data, even if W is just two arbitrary codes for two groups) prior to constructing XW and estimating the model, then b1 estimates the conditional effect of X on Y when W = w, and the standard error of b1 is a valid estimate of the standard error of this conditional effect. Thus, the pick-a-point approach can be conducted by repeating the analysis multiple times with W centered around each of the values of interest. -It is important to keep in mind that the conclusions drawn from the inferential tests may change depending on the values of the moderator at which the moderation effect is probed. This is one of the drawbacks of the pick-apoint approach. For this reason, some recommend using the Johnson-Neyman technique, which does not require you to choose values of W but, rather, analytically derives values of W that help you find where in the distribution of W that X is significantly related to Y and where it is not
(Meltzoff & Cooper) RESULTS TO DISCUSSION
- make sure every time the same result is reported it is describe using similar language (cant go from describing a result as a trend and then describing it as probable in teh dicussion)
(Syed & Nelson) Selecting the Appropriate Reliability Index
- many diff statistics can be used to index interrater reliability 1. percentage agreement: the PA method is the most straightforward and intuitive approach to establishing reliability. PA is simply the ratio of items on which two coders agree to the total number of items rated and is calculated using the following formula: Number of total agreements/ (number of total agreements plus number of total disagreements) *100 - limitation-- its is not corrected-- When two raters code an item, it is always possible that they did so accidentally, or by chance, rather than because they actually agreed on the appropriate code 2. Kappa; proposed by cohen. alternative method for calculating reliability, one that accounts for chance agreement. the definition of k is the proportion of agreement between raters that is not due to chance. The formula proposed by Cohen was as follows: k = (Po - Pc)/ (1- Pc) where Po is calculated using the formula for PA above but without multiplying by 100. pc is the index of chance. (gold standard for reliably indexes) -In sum, low values of k could be due to high levels of observed agreement in cases of homogenous marginals, high levels of observed agreement in cases of nonuniform marginals, or low levels of observed agreement. Thus, k is not only a conservative index of reliability but a highly sensitive one as well. A major take-home message from this discussion of k is that researchers must examine their marginal distributions, and not uncritically interpret a particular value of k as high or low. 3. delta: The index D was developed in response to some of the criticisms of k described previously, but primarily to address the problems of highly skewed marginal distributions 4. Intraclass correlation coefficient (ICC) for continuous/ordinal data, a reliability index that accounts for similarity and proximity is necessary. To this end, the optimal index is the ICC coefficient (shud be .75 to .8 which represent good reliability) - ICC is calculated using an analysis of variance model, with differences between raters modeled as within subjects variance 5. weighted k: Like the ICC, weighted k is a method for calculating reliability with ordered data (ICC and weighted k wud yield identical coefficients) 6. Correlation coefficient (Pearson's r, Spearman's r).: , the correlation coefficients r and r should generally not be used as indexes of interrater reliability.The reason is clear: these are indexes of consistency, not agreement, and when establishing reliability it is desirable to have an index that includes both. It is entirely possible to achieve a correlation of 1.0 and have raters disagree on every single item that they rated. As long as the raters are consistent in their disagreements, the correlation will be very strong
(Schmidt) Other Considerations
- met analysis can be used to to precisely calibrate relationships of theoretical and practical interest - meta analysis can be sued to create a matric of relationships among several variables or construct and then can be used in path analysis to test causal models or theories - data in the above examples in the last cards are correlational but u can apply the same principles to experimental data -- main diff is that the focal statistic is the d value which is the difference between two groups in SD units (exp and control group or any two groups) - d value stat is subject to even more sampling error than the correlation stat (r) and is biased downward by measurement error-- so both corrections of measurement error and sampling error are needed together -- shcmidt and Le (2004) program needed to analyze these data - schmidt program corrects each correlation for measurement error and then performs meta-analysis on corrected correlations (corrects for data distortions caused y both sampling and measurement error -- important to correct for both of these errors) - appearance of moderators can illusory but sometimes there are real moderators or interactions --For example, it has been shown in several comprehensive and independent meta-analyses of U.S. and European data that the information processing complexity of jobs moderates the size of the validity of intelligence tests in predicting job performance
what are the MANOVA assumptions?
- multivariate normality - absence of outliers - homogeneity of variance - linearity
(Meltzoff & Cooper) Research Ethics
- need to adhere for the ethical principles for the discipline of ur research - if human subjects are used in a study they need to be treated in compliance with the IRB approval - need to keep an eye out for any indicators that the study's methods, data, or analyses might have been manipulated in a way that biases the conclusions or the data or writing might not be original
what are the practical issues of multiple regression?
- need to have more cases in ur study than IVs (N(sample size)>/= 104+m (IVs)) - you cant have outlier because extreme cases have extreme effect on multiple regression equation (can windsorize to fix) -multicollinearity is a big issue in multiple regression and we can have this -normality and linearity are needed (linear relationship must be present and need normal distribution)-- residual shud also be normally distributed and be linear
(Meltzoff & Cooper) INTRODUCTION TO DISCUSSION
- no hypotheses are explicitly stated - usually hypotheses are self evident coz of the kind of info they report from past research in their intro - in the discussion tho researchers will report that the results support their hypothesis - if u do not see the hypotheses in the intro its likely the hypotheses are post hoc meaning they were formed after the results were obtained so in the discussion these hypotheses need to be clearly labelled as such and shud call for further studies
(Wiest et al.) the odds ratio
- odds ratio: the ratio of the probability of the event of interest occurring to the probability of the event not occurring. - odds are estimated by dividing the number of ppl who experience the event by the number of ppl who do not exp the event - to estimate the odds of a poor outcome in the intubation group u divide the number with a poor outcome by the number with a good outcome. - odds can also be calculated from the estimated risk so if the estimated risk of poor outcome in the intubation groups is 0.39 then u do 0.39 (1-0.39)
(Meltzoff & Cooper) Interpreting Effect Sizes
- to know how practically or clinically significant smth is u need to interpret effect sizes, the size of the standardized difference between the means (d index), the strength of the correlation (r index), or the ratio of the odds in the two groups - Cohen: effect sizes can be small, medium, or large - . According to Cohen, in correlational terms, small, medium, and large rs would be .10, .30, and .50, respectively - For the difference between means of two independent groups, according to Cohen (1992), a small d index is .20, a medium effect is .50 (i.e., half of a standard deviation), and a large effect size is .80 - interpretation shud not stop after looking at effect sizes. -one way to interpret the effect of a treatment would be to ask whether it revealed smaller or larger effects than other treatments currently in use (compare to other effect sizes rather than using cohens benchmarks coz cohen said to us his conventions/benchmarks of effect size when there are not other ways to look at effect size) - contrasting effect sizes can be done too and these often come from meta analyses - cooper stated that effect sizes also need to be interpreted in relation to the method used in the research (field studies for example produce smaller effect sizes and studies with interventions esp with a greater frequency or intensity of the intervention might produce a greater effect size). so it is important to consider research design differences when drawing conclusions about the relative size of effects since certain research designs and measures will produce different effects sizes)
how do we run a meta analysis?
- u need a weighted mean of all the effect sizes • how you conceptualize the effects determines themodel you should use: • fixed-effect models • expect fixed average effect size (homogeneity) • random-effect models • expect different average effect sizes across the different populations included in the study (heterogeneity) • moderator analysis
what are some ways to deal with some missing data?
- using mean substitution so using the mean for the rest fo the sample to fill in missing data for that person - mean imputation (using similar cases to figure out how the person with the missing data will answer)
(Syed & Nelson) abstract
- using qualitative , quantitative, and mixed methods approaches have been foundational in research on emerging adulthood but there are many unresolved methodological issues pertaining to how to handle qualitative data. - purpose of article: to review the best practices for coding and establishing reliability when working with narrative data - reliability shud be seen as an evolving process rather than a focus on the end product - 3 broad sections: first one discussed relatively more quantitatively focuses methods of codding and establishing reliability. second section is on more qualitatively focuses methods. final is for recommendations for researchers interested in coding narrative and other open ended data
what are the factors that effected by your choice of paradigm?
- value system and ethical principals - assumptions about the nature of reality and knowledge - theoretical frame work literature and research practice
what are the ideal results of logistic regression?
- very low - 2LL (amount of unexplained variance needs to eb low) - Significant Model Chi Square (indicating that model has improved with predictor) -
what are some practical concerns of SEM?
- we need a good enuf sample size (at least 200 participants and 10 participants peer parameter ur estimating otherwise u have unstable results) -missing data is problematic because sample size is problematic too - u need multivariate normality and no outliers (transform data) ◎ Multicollinearity ◉ when two or more variables highly correlated
what is the key difference between the three types of regression?
- what happened to the overlapping variability between the IVs and the DV when we have correlated IVs - who determines how the IVs are entered into the regression equation
(Hayes) Throughout this chapter, we've provided an overview of mediation, moderation, and conditional process analysis, with an emphasis on mapping particular research questions of interest to their respective statistical models. We've described how statistical mediation analysis can be used to help identify, quantify, and understand a causal sequence of events in which one variable influences another through one or more intermediary variables, while statistical moderation analysis can be used to test the boundary conditions or contingencies of an effect. These methods have proved useful for clinical research, both separately and when integrated within a unifying conditional process model. After detailing the framework of mediation, moderation, and conditional process analysis, with a focus on the substantive meaning of model parameters and how they relate to hypotheses of interest, we presented an example analysis that synthesized many of the concepts discussed throughout this chapter. We believe this exposition should provide clinical researchers with a good understanding of the conceptual and statistical foundations of mediation and moderation analysis and also aid them in the development of their own conditional process models for testing and exploring their own hypotheses and substantive interests.
.
(Hayes) To summarize, and using a data set fabricated for the purpose of this example, we have used a combination of a parallel mediator model and the simple moderation model to better understand the contingencies of the processes by which motivational interviewing affects alcohol use. Perceived risk of alcohol use and treatment seeking were the mediators of interest in this example. The effect of the motivational interview on each of the mediators was dependent on severity of the patient's alcohol abuse, and each of the mediators was negatively correlated with later alcohol use. Furthermore, the indirect effect through each mediator was conditional on alcohol use severity. The indirect effect through perceived risk was larger for less severe abusers than for severe abusers, but the indirect effect through treatment seeking was larger for severe abusers than it was for less severe abusers.
.
LOGISITIC REGRESSION
.
MULTIPLE REGRESSION
.
there are tiers of journals where higher level journals have a low acceptance rate and if u dont get accepted u can try to get published in a lower level journal
.
when you do structural equation modeling it means ur adding more paths into the model which adds more complexity to the model and requires a bigger sample size. the more indicators the more complex..
.
• First type of error (healthy as diseased) - if risk associated with treatment, then costly error • Second type of error (diseased as healthy) - if effective treatment and not receiving it, then costly
.
(Meltzoff & Cooper) Duplicate Publication
. If you discover that the same data have been published twice, this is called duplicate publication. It distorts the scientific record, perhaps making the results look like they have been replicated when they have not. (data was just copied)
(Morrison et al.) Reporting Guidelines for SEM
1. As determined by an a priori power analysis, the minimum number of participants needed, given the models that are being tested. 2. At least one alternative model that is plausible in light of extant theory or relevant empirical findings. 3. Graphical displays of all measurement and structural models. 4. Brief details about the psychometric properties of scale scores for all measured variables (e.g., Cronbach's alpha and its 95% confidence intervals or, preferably, omega as well as 2 to 3 sentences per measure detailing evidence of content and construct validities). 5. The proportion of data that are missing and whether missing data are MCAR, MAR, or MNAR. As well, researchers should explicate how this decision was reached (e.g., why does a researcher assume missing data are MAR?), and the action taken to address missing data. 6. Assessments of univariate and multivariate normality for all measured indicators. 7. The estimation method used to generate all SEMs (default is ML estimation). 8. The software (including version) that was used to analyze the data. 9. In accordance with the advised two-step approach, full CFA details about each measurement model followed by complete SEM details about the structural model. 10.Indicator and composite reliabilities. 11.Average variance extracted (AVE) for each latent factor which denote convergent validity. 12.Discriminant validity of latent factors, as per Fornell and Larcker's (1981) test. 13.All standardized loadings from latent variables to manifest variables (reflective models). 14.Fit indices that reflect overall, absolute, and incremental fit. If applicable, predictive fit indicators should be included. 15.A clear and compelling rationale for all post-hoc model modifications. 16.An indicator of effect size for the final model.
what are the errors that can be made in the classification table?
1. Classifying healthy person as diseased 2. Classifying diseased person as healthy
a researcher believes that recall of verbal material differs with the level of processing. He divided his subjects into three groups. In the low-processing group, participants read each word and were instructed to count the number of letters in the word. In the medium processing group, participants were asked t read each word and think of a word that rhymed. In the high-processing group, participants were asked to read each word and try to memorize it for later recall. Each group was allowed to read the list of 30 words three times, then they were asked to recall as many of the words on the list as possible. If the researcher wants to know whether the three groups have different amounts of recall, what type of statistical test should be used? what are the IVs and DVs?
ANOVA is the appropriate test. IV is the type of processing groups (low, medium, high) and DV is the amount of recall
What are cohort studies?
Aka incidence studies which can be retrospective or prospective. Look at characteristics or factors related to the development of a disease
What is a cross-sectional study?
Aka prevalence studies which examine the relationship between diseases and other characteristics or variables of interest in a population . It can look at a population with a disease or look at disease existence in a population
what are post hoc tests used for?
Allow you to see which groups differ in ANOVA since ur looking at 3 or more levels of the IV
What kind of science is epidemiology?
Quantitative. Numbers are used to characterize the likelihood of disease.
Incidence is often misused in the literature to describe what?
Prevalence
What are the two most commonly used rates in epidemiology?
Prevalence and incidence rates?
you are trying to predict the relationship between stress and fatigue. what is the most appropriate statistical test? a. correlation b. multiple regression c. chi square d. ANOVA e. logistic regression f. discriminant function analysis why?
a. correlation is appropriate because it looks at two variables and tells us how strong the relationship is, These variables are likely both interval or ratio variables where the level of stress may relate with the level of fatigue.