hca 465 chapter 5
coefficient alpha
aka "cronbach's alpha"; a measurement of internal consistency among a group of items (e.g., survey, test, or interview questions) -allows researchers to determine how well the items measure different aspects of the same topic -values range from 0 to 1; values closer to one indicate a higher internal consistency than values closer to zero
inter-rate reliability
aka "inter-observational reliability; the scoring or observations remain consistent regardless of the person who is doing it -investigators use a checklist to standardize and increase reliability by ensuring evaluators are collecting the same type of observational data
internal validity
focuses on the rigor of the study; its primary types are face validity, criterion validity, construct validity, content validitiy
measurement bias
happens during data collection by researchers or subjects because of systematic errors in measurement
relationship between internal and external validity
-INTERNAL validity is more critical than external -without internal validity, research is not testing what it reports to measure (however, as the study inclusion criteria becomes more selective, the results become less generalizable)
pilot test
-essential for both quantitative and qualitative studies -it is a complete dress rehearsal before data collection; involve every aspect including environment, data collection, content and outcomes of a study. -involves conducting a preliminary test of data collection tools and procedures to identify and eliminate problems -when executed correctly, they save time and resources during the actual data collection process
ways to reduce measurement errors
-pilot test -data collection training -double data entry -statistical consultation -triangulate data collection
how to conduct a pilot study by categories
-sample of respondents: recruit a small number (less than 10) of individuals with characteristics similar to the actual sample to test the method of recruitment. If it is difficult to recruit for the pilot test, than recruitment methods for the actual study need to be revised. -data collection: helps researchers determine if revisions are needed in the actual instrument or in the data collection procedures. Steps: 1) create the exact environment that will be used for the actual data collection 2) when possible, observe participants while they complete surveys during the pilot test. 3) after the surveys or interviews are complete, ask the participant a few questions -data analysis: enter the data and conduct a pilot test of the data analysis procedures -outcome: conduct the actual study
benefits of pilot testing
-taking the time to conduct a thorough pilot test is always worth the time and money, because once data is collected, it is too late to fix mistakes that could prove to be fatal flaws in the entire study -when reporting results, investigators document the pilot study and revisions made based on results, adding authenticity
randomized controlled trial designs
-the gold standard of research design -participants are randomly assigned to either a treatment of placebo group -participants have similar characteristics (age, gender, or length of diagnoses) -allows researchers to draw conclusions with confidence if one group is significantly different at the end of the study due to random assignment -controlled study (one treatment and one placebo group)
what 3 areas should one investigate when looking for systematical errors?
1) environment 2) observation 3) drift *the problem with each type of error is even when the error is recognized, it is difficult to determine when it began in the data collection process and how much the systematic error influenced actual results
factors that influence generalizability (external validity)
1) population (also known as selection and treatment): when population selection is too specific, treatment is matched to a specific sample and is not applicable to a wider population 2) environment: studies conducted in particular spaces/settings 3) temporal/sequential factors: the time of year/season a study was conducted in can affect the results 4) participants: animal-to-human links; human-to-human links; gender bias; racial bias; cultural and ethnocentric bias 5) testing and treatment interaction: if participants learn from the pretest, they may be less likely to learn as much from the treatment 6) reactive arrangements (hawthorne effect): if individuals change their behavior when observed (threat to internal validity), results are not generalizable to real-world conditions (threat to external validity) 7) multiple treatment conditions: the same individuals exposed to multiple treatments; because multiple treatments may create an artificial setting that does not exist in the real world, results may not be generalizable *additionally, researchers and evaluators must be aware of all possible threats that affect INDEPENDENT variables
Validity
the extent to which a test measures what it purports to measure
split-half technique
a technique to measure homogeneity by dividing the entire test or survey into two equal halves (e.g. odd numbered and even-numbered questions). -The two forms are administered to the same individuals. -if the odd-numbered questions yield the same results as the even-numbered, the entire test is deemed reliable
Reactive Arrangements (Hawthorne Effect)
applies to external validity as well; if the individuals change their behavior when observed (threat to internal validity), results are not generalizable to real-world conditions (threats to external validity)
what is reliable research based on?
based on several basic assumptions, including randomized controlled trial designs and adequate sample sizes, and is free of known bias
observation
can change due to observer fatigue, time of day, room temperature, training of different observers, attitude of participants, and various other human behavior variables
Test-retest technique
checking the stability of an instrument by giving it to an individual and then giving it to her again after a certain period of time -involves two assumptions: 1) the item or observation does not change over time 2) the time between the first and second administration is long enough
reliability
consistency of the instrument or survey being used, not the respondents -related to consistency or the ability to repeat results -important because if a test's results are different each time, the test is faulty and can lead to inaccurate conclusions -can be established one of two ways: stability and internal consistency
external validity
deals with generalizability; if evaluation or research is repeated with different populations, situations, time, or environments, the results are expected to be the same -a threat to this might explain how generalizations are incorrect -repeating evaluation methods and research in different populations is the best way to access generalizability
controlled study
defined as one group receiving treatment while another, similar group receives no treatment -may be randomized from original groups of participants, or may be similar groups at a different location
systematic errors
errors that are consistent in the same direction, so that no matter how many times the experiment is repeated, the same errors occur -validity is influenced by these kind of errors -these errors introduce inaccuracy to the measurement, cause bias in the data, and diminish the extent to which the test is measuring what it purports to measure -problematic to detect and eliminate -it is not possible to reduce the effect of these kind of errors through statistical methods
double data entry
evaluators and researchers enter a portion of the data twice and then compare the two entries. -if there are multiple errors, investigators go back to the original data and determine why the errors are occuring
relationship with similar measures
ex: researchers are interested in developing a shorter depression scale for adolescents. Adolescents are asked to complete both the older, longer depression scale with established validity and the newly developed, shorter depression scale. If the measurement of the established depression scale are similar to the measurement of the new scale, then researchers conclude evidence of construct validity.
face validity
exames how the test appears *NOT based on theory but merely the appearance of the test (potential ease of completion, comprehension, and readability) -can be compared to "showing up in the correct outfit" or "looking the part"; if the survey does not look appealing, individuals are less likely to complete it -also refers to the logical sense of the survey (do the questions relate to the subject being addressed?)
content validity
how well a test measures the specific content it is intended to measure
testing and treatment interaction
if participants learn from the pretest, they may be less likely to learn as much from the treatment -if a participant is sure that they received a perfect score on the pretest, they are less likely to listen intently during the class, because they already know the information
relationship between reliability and validity
if the scale is VALID, the results should be the same if the same person tested on it over and over. It measures what it is intended to measure HOWEVER, the reverse is not always true reliability is about consistency and repeatability
types of validity
internal validity (face validity, criterion-related validity; construct validity, content validity) external validity (
statistical consultation
investigators consult with statisticians to seek assistance with entering data and determine ways to reduce and/or measure data errors
intervention bias
involves how many intervention groups are treated differently than control groups if the researchers involved know which group is which
Is it more important for a test to be reliable or valid?
it is more important for a test to be valid than reliable
criterion-related validity
measures one topic in two different ways *most common in having a written test and then a test that applies the skill (like a written test for your permit and a behind-the-wheel driving test for your license) *it is important that the learner knows the didactic information as well as how to perform the skill
inter-rate reliability score
number of concurrences / number of opportunities for concurrence x 100 ex: investigators use a predetermined checklist to observe 80 patients in the clinic waiting room; according to their checklist, they agree on 68 out of the 80 ratings. The calculation is 68/80 = .85 x 100 = 85% -therefore, the investigators agree 85% of the time, or their observations have a reliability of 85%
random errors
occur by chance and are inconsistent across the respondents -these errors increase or decrease results in an unpredictable manner; therefore, researchers and evaluators have no control over the occurrence of them they influence consistency in several ways: 1) participating INDIVIDUALS may change from day 1 to day 2. 2) TASKS may change between day 1 and day 2 3) if there is a small sample of participating individuals, outside forces have a greater effect on the outcome *each alters the reliability of knowing whether the experiment was reliable in doing what it intended to 4) these errors may also occur in written tests. There are 3 examples under this source: 1) if the test is TOO SHORT, individual scores are based more on chance and luck than knowledge 2) if the test is NOT GRADE precisely the SAME WAY for each student, these errors can cause inconsistent reliability 3) if the test is NOT ADMINISTERED CORRECTLY Ways to reduce these kinds of errors: 1) have a larger sample size; these kind of errors are less influential in either direction when the sample size increases 2) average scores over a larger sample size to reduce these errors through statistical methods
selection bias
occurs if specific individuals or groups are purposely omitted from the investigation
bias
occurs in research and evaluations from numerous causes including selection, measurement, and intervention.
relationship with experimental measures
patients are asked to complete two anxiety scales: one with established validity and a newly developed scale. In the midst of potential anxiety, patients should score higher on both scales. If both scales show similar levels of anxiety, researchers should conclude evidence of construct validity
environment
refers to how the setting changes over the course of research -can be a source of systematical errors ex: if the research was conducted outside, the temperature, humidity, wind speed, or heat index may have caused variations in data and results that are difficult to pinpoint
cultural and ethnocentric bias
research conducted on one specific cultural group is not generalizable to other cultures. for example, spanish speaking individuals do not share the same cultural background (i.e. there are differences among the cultures in Puerto Rico, Mexico, Nicaragua, Honduras, Venezuela, and Spain) -religions can also contribute to cultural differences among various populations
racial bias
research conducted only on African Americans, for example, does not yield generalizable results to other racial groups. In today's world of linking genetics to health conditions, issues of racial inclusion are more important than ever.
gender bias
research including only men or only women is not generalizable to the non-represented gender group. The same is true for studies that include only heterosexual individuals and fail to recognize LGBTQ individuals.
comparison scores among defined groups
researchers create a new dexterity aptitude test with the hypothesis that adults have varying degrees of dexterity. *more than 1000 adults complete the new test. If results show that adults in specific fields (e.g. surgeons, dentists, musicians) score higher on the new dexterity aptitude test than adults in non-dexterity applications, there is evidence of construct validity
human-to-human links
researchers use college students for many studies because the students are a convenient sample. However, results are questioned because data gathered from college students may or may not be generalizable to other groups of young adults not attending college.
internal consistency
the extent to which each question in a survey is related to the same topic; also called "homogeneity" -in quantitative research, researchers use a split-half technique to measure homogeneity -Test A and Test B scores should be approximately equal, with all questions measuring concepts related to the material covered in the tested chapter
randomized
the research participants are randomly assigned to either a treatment group or a non-treatment group (placebo group) -allows researchers to draw conclusions with confidence if one group is significantly different at the end of the study
multiple treatment conditions
the same individuals exposed to multiple treatments *because multiple treatments may create an artificial setting that may exist in the real world, results may not be generalizable
construct validity
used to measure a concept that is not actually observable *can be established by exploring relationships with similar measures, experimental measures, and comparison scores among defined groups
threats to internal validity
things that confuse or confound test and survey results and overall findings 9 key threats: 1) history: an event happens during research that influences the behavior of participating individuals 2) maturation: the natural changes that occur over time with individuals 3) testing: differences noted from pretest to posttest that can be attributed to students becoming familiar with the test 4) instrumentation: measures changes in respondent performance that cannot be credited to the treatment or intervention 5) regression: some respondents performing well on pretests and poorly on post-tests or vice versa, merely by change. sometimes manifests as "regression to the mean" in which data from these high-pretest to low-posttest performances cancel each other out, and the overall score is similar to the average 6) ceiling effect and floor effect: **ceiling effect: when all participating individuals perform extremely well on pretest and posttest, therefore making it difficult to determine any changes the intervention may have had **floor effect: when individual performance starts out low and remains low; leads investigators to think that individuals are unresponsive to treatment, when in fact the low performance may be caused by a factor outside the intervention or treatment altogether 7) attrition: individuals lost from the study; if a large number of individuals leave the study for a variety of reasons, results may reflect more about the individuals who stayed in the study than the treatment conditions of the study 8) selection: when participating individuals are different at the onset of the study; it can make it difficult to know whether the type of condition or the treatment used influenced the results 9) hawthorne effect: improving performance when you are aware that you are being watched
triangulate data collection
to increase reliability and validity, researchers and evaluators choose to collect similar data using more than one method
determining adequate sample size
too small: results are inconclusive and significant differences among groups are statistically harder to determine too large: cost, feasibility, and time become problematic *an ideal sample size to adequately represent the target population is what researchers strive for. However, a larger population is preferred over a smaller one because of the increased precision and accuracy of the study
data collection training
training data collectors for consistency (e.g., detailed checklists, machine measurement calibration checks, inter-rater comparisons to avoid creating errors)
measurement errors
two types of errors that influence the results of surveys, tests, and instruments: random errors and systematic erros
dependent variables
variables that are fixed and not manipulated
independent variables
variables that researchers manipulate and control
advantages and disadv of estimating reliability
ways to establish reliability: stability --> test-retest *pros: single rater is adequate, no need to train teams of raters, less expensive and time consuming *cons: often difficult to recruit same respondents to respond twice; individuals may not respond as seriously the second time ways to establish reliability: internal consistency --> quantative: split half forms *pros: respondents take both surveys at the same time; no need to recruit respondents twice *cons: need to create a large pool of items qualitative: inter-rater or inter-observation reliability *pros: best for observational research, especially when video recording is used; possible option: one rater reviewing video at two different times *cons: expensive, time consuming, requires a team of raters for best results
drift
when evidence suggests that the dat is slowly moving in one direction -ex: may occur if the machine used for lab results is not calibrated each day prior to running complete blood count samples
animal to human links
when researchers use rats or other animals to test specific drugs, it is questionable whether humans will react to the new drug in the same way as rats
stability
when the results of a survey or instrument are consistent over time.