Counseling Research Midterm Review

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Standard scores

(AKA Deviation IQ) M=100 SD=15 Deviation IQs, which in the EDU and counseling fields are frequently referred to simply as "standard scores," are commonly used to describe scores on intelligence, achievement, ability, and perceptual skills test A deviation IQ score provides context for meaningful interpretation by declaring that the mean of a distribution of scores is 100 and the standard deviation is 15 Standard scores greater than 100 are above the mean, and standard scores of less than 100 are below the mean Deviation IQs, which in the education and counseling fields are frequently referred to simply as standard scores, are commonly used to describe scores on intelligence, achievement, and perceptual skills tests. Notice that all linearly transformed scores have traditionally been called standard scores [(e.g., z-scores, T scores) although I refer to these as "standardized scores " throughout this book], but in contemporary practice, deviation IQs are generically referred to as standard scores because it makes little sense to refer to a student 's reading achievement score as a deviation IQ score. A deviation IQ score provides context for meaningful interpretation by declaring that the mean of a distribution of scores is 100 and the SD is 15. Notice that the SD can be either 15 or 16, but at this point in history, nearly all tests use an SD of 15. Thus, standard scores greater than 100 are above the mean, and standard scores of less than 100 are below the mean. Again, the interpretation of standard scores is similar to z-scores and T scores. The formula for converting a raw score into an SS (standard score) score is [formula] Interpreting standard scores is similar to z-scores and T scores. In this example, a standard score of 85 is exactly one SD below the mean, and a standard score of 107.5 is exactly one-half SD above the mean.

Grouped frequency distribution

Grouped frequency distributions are similar to the simple frequency distribution, except that instead of reporting every score obtained, the scores are grouped into convenient class intervals (i.e., a lot of scores are collapsed into fewer groupings of scores). Grouped frequency distributions provide a meaningful graph of scores and allow computation of some statistics with greater ease. Of course, with the widespread use of computers, the latter reason is a less pressing concern. It is important to realize that when using a grouped frequency distribution, one loses some precision while gaining some advantages in understanding the visual summary Take scores from 94-100 collapse into A, etc.

Program records

Helpful sources of evaluative data that are usually easy to locate because they are generated by organizational bureaucracies The most important information professional counselors should keep are outcome study reports and previous program improvement documents These data sources help programs chart trajectories of program initiatives and document what has been done to address past suggestions for improvement

Histograms

Histograms are similar to bar graphs in appearance, but you will immediately notice that the categories are actually continuous and, therefore, touch adjacent values. Histograms are used to visually display ordinal, interval, or ratio data and thus are more useful when summarizing test scores and most other variables used in research

Independent variables

IVs affect the behavior or status of another variable (the DV) Treatments including counseling, medications, interventions, and training programs often act as independent variables Most independent variables can be manipulated by the researcher, although some independent variables are organismic variables -- characteristics of an individual that cannot be directly manipulated (eg age or sex of participant) In correlational studies, IV is the predictor variable In behavioral studies, IV is the stimulus variable

Nominal scale

Identification (nominal): names categories or groups

Disposal of research documents and records

Identifying participant protocols should have already been destroyed. There should be online databases of the information. There should be an assigned custodian of the information Paper records are destroyed, info is saved securely electronically (and always protecting identities of participants)

Inductive reasoning

Inductive reasoning (qualitative): Starts with data and works from there. Theory induction Try to come up with theory to explain your data and observations In inductive reasoning, a researcher begins by observing real and practical data that are evident in the environment to better understand the data or even develop a theory to better explain the observations of the data. For example, a researcher may notice how different teachers interact with and discipline their school aged students and even conduct interviews with the teachers and the children to better understand and describe what is happening, why, and the intended and unintended results or the interactions. Inductive approaches tend to be more descriptive and correlational to "construct a theory" and often are linked to qualitative approaches for knowledge generation. Thus, the researcher studying teacher-student interactions may eventually be able to "induce" a theory of teacher student disciplinary interactions

Types of statistics

Inferential statistics Descriptive statistics

Inferential statistics

Inferential statistics help determine if obtained sample values can be generalized to the population from which the sample was drawn (IE inferences about a population are made from sample results) to predict the probability of some coming event with some degree of confidence Parametric Tests Nonparametric Tests

Methods for collection data

Interviews Observations Written questionnaires, surveys, & rating scales Program records & schedules Standardized & educator-made tests Academic performance indicators Products & portfolios

Program evaluation research

Involves assessment of a program at all stages of development and implementation with a primary goal of providing quality services to individuals in need Refers to diagnosing programs, not individuals, for individuals' benefit -- not even evaluating the counselors Counseling program evaluation involves a systematic collection of information about a program or a program component to determine its effectiveness, efficacy, and benefit for clients served Program evaluation is a kind of action research Outcomes Accountability Evidence Evaluation Formative evaluation Summative evaluation Stakeholders Value-added assessment

Kurtosis

Kurtosis describes the degree of flatness (or peakedness) in a curve. When a distribution of scores is flatter than normal, it is referred to as platykurtic. (258) Leptokurtic -- more peaked than normal (higher than 1) Platykurtic -- flatter than normal (less than -1) Mesokurtic -- normal curve

Discrete variables

Leave little to no room for disagreement; individuals must belong to only one category at any given time (ex: male or female, political affiliation, level of education, etc.) Think of a number on a team jersey Nominal (name), qualitative

Normal curve (shape)

Like a bell curve -- perfectly symmetrical

Checklist items

List several response choices so participants can mark the response choice(s) that apply Sometimes participants are asked to check their favorite or the "best" choice; other times they may be asked to check all that apply The more items the participant is allowed to check, the more complicated data analysis and interpretation become -- so, best or easiest to analyze if you ask them to check one Examples Check your favorite type of TV shows from the list below Check your favorite type(s) of potato chips. Check all that apply

Contributors

Listed in order of contribution. Primary researcher bears responsibility for ethics. First author is the "lead author" -- gave greatest contribution

T scores

M=50, SD=10 Provides context for meaningful interpretation by declaring that the mean of a distribution of scores is 50 and the standard deviation Is 10 (M=50, SD=10) Commonly used in behavioral or personality assessments T scores greater than 50 are above the mean, and T scores of less than 50 are below the mean (10 SD segments) The interpretation of scores is similar to Z scores but the index of comparison has shifted -- same as Z from standardized score perspective, but uses a different metric (Z score*10+50) scores are frequently used in behavioral, personality, and clinical research and test development. A T score provides context for meaningful interpretation by declaring that the mean of a distribution of scores is 50 and the SD is 10. Thus, T scores greater than 50 are above the mean, and T scores less than 50 are below the mean. In this way, the interpretation of scores is similar to z-scores, but the index of comparison has shifted (i.e., z-scores: M ¼ 0, SD ¼ 1; T scores: M ¼ 50, SD ¼ 10). The formula for converting a raw score into a T score is [score] Interpreting T scores is similar to z-scores. In this example, a T score of 40 is exactly one SD below the mean, and a T score of 55 is exactly one-half SD above the mean.

Variance

Variance (also called "mean square deviation") is mathematically related to standard deviation, and so if you know the variance of a distribution, you also know the standard deviation. The variance is simply the average of the scores' squared deviations (253)

Measures of central tendency

Various measures of central tendency are meant to give an idea of what the center or middle of a distribution looks like. It is (usually) the single score that is most representative or typical of the entire set of scores. Three measures of tendency are important to understand: mode, median, and mean. When viewing a normal curve, the mean median and mode are at the exact middle of the distribution. Range Variance Standard Deviation Semi-interquartile Range

Value-added assessment

What value is added to the lives of the clients I'm seeing as a result of the services I'm providing? Seeks to determine and document what a program adds to what participants already possess A professional counselor will establish expected attitudinal, knowledge, and skill objectives and collect a baseline (IE pretest) measurement prior to program implementation and then collect an end-of-program measurement (posttest) IF the posttest results are significantly higher than baseline results, the program is determined to have "added value" to the participants' attitudes, knowledge, and/or skills in resolving conflicts

Outcomes

What you want to achieve Or, what happens and you didn't achieve It's just the outcome, okay?

Quantitative variables

Variables that can be represented on a numerical scale and are not exclusively categorical -- for example, distractibility, height, place finished in a race EG Gender as a scale 1 to 10

Dependent variables

A DV "depends on" the independent variable for its response The IV is the cause, the DV is the effect IV are manipulated, DV are measured DVs are measured or observed, not manipulated DV examples: depression, anxiety, achievement Sometimes the dependent variable is also referred to as: Outcome variable Response variable Criterion variable (in correlational research) There's not insinuation of cause and effect in these types of studies Response variable (in behavioral research) Given a research study, you will need to learn to identify independent and dependent variables Tip: what is the effect of this [IV] on that [DV] In correlational studies, DV is the criterion variable In behavioral studies, DV is the response variable

Frequency polygons

A frequency polygon is also used to visually display continuous variables. A polygon is a many-sided figure. It is basically the same as a histogram, except instead of using bars rising from the horizontal baseline, a single point above the midpoint of the interval is provided. In classic graphing fashion, this point represents both the score value (i.e., class interval) and the frequency of occurrence for each value. Once these points have been charted, straight lines are drawn from point to point until all the points are connected. Finally, a straight line is drawn from the farthest point at each end of the graph down to the baseline to indicate where the scores end. Figure 12.3 displays the frequency polygon for the depression frequency distribution. Notice that the frequency polygon is interpreted to mean the same as the histograms; it just has a different, more angular look. As it becomes normal (needs a large sample) starts to look like a normal curve

Which client-counselor factors contribute to successful outcome?

A question outcome research seeks to answer

Simple frequency distribution

A simple frequency distribution lists every score and the number of times it occurs, generally from the highest score to lowest score

Informed consent

Addresses: Purpose Risk including discomfort Power Benefits Explain other procedures to address Metaethical -- participants must be in a place to make a decision whether they should participate in a study (more for medical studies) Must answer any inquiries and give contact information How will the data be confidential? Will it be anonymous? Must get explicit (in writing) permission before publishing any identifying information Potential audiences, dissemination Can withdraw consent at any time

Pretest

Administered before to measure against posttest to determine outcome Baseline measurement Used for value-added assessment

Aggregation

All scores/data are lumped together; achievement information used to be provided in this format, with all scores lumped together to provide educators with average performances for all students in a given grade on each subject area tested Although this type of summary info can be helpful to understand how the average members of a group perform, it does little to help understand the strengths and weaknesses of subpopulations of the entire group

Guttman-type items

Also called cumulative items Provide a series of items on a continuum linked in a manner such that a participant's response to one item in the series indicates approval of all choices below the indicated level To be analyzed, this data must be coded in a sequence (e.g. 1, 2, 3, or 4 according to the item number selected) rather than each item coded separately Guttman-type items are somewhat complex to code, analyze, and interpret; thus they are seldom used anymore because it is easier to ask multiple-choice questions or open-ended questions and then collapse the scaling during coding Examples The amount of alcohol consumed by the average undergraduate ___ drink-equivalents per week A exceeds 13 B is between 9 and 13 C is between 5 and 8 D is less than 5 Endorse more than just your answer -- if you say that they drink at least 3 then you're also saying they drink at least 1

Performance indicators

Any data that can be generated to represent how a participant performs on a given task or perceives how they performed a given task Examples GPA Grades Daily work behaviors (attendance, times our of seat without permission, etc.) Attitudes (social self-efficacy, attitude towards school, etc.)

Evidence

Any data you submit to report Could be scores, products, etc. Any info or data helpful in a decision-making or judgment process This data can be derived using quantitative or qualitative methods Typical sources of evidence include standardized or counselor-made tests, surveys, questionnaires, interviews, focus groups external judges, portfolios, self-assessments, actual performances, and work samples Attached to programmatic objectives

Validity

Approaches to validity: Judgmental -- content validity How well an instrument represents a given domain of info or behavior (does the test question accurately assess knowledge) Content-related evidence asks the question of whether the instrument(s) or sample accurately represents the variable under study. In essence, a researcher who is gathering content-related evidence asks whether the questions used in the instrument adequately assess the variable under study. To ensure content-related validity, researchers should look over the content (do the questions or items accurately reflect the definition of the variables and the sample of participants?) and format (e.g., clarity, readability, font type and size, and language) of the instrument to be used. Frequently, content experts determine how well an instrument represents a given domain of information or behavior Empirical -- criterion-related validity (most common) Criterion-related evidence (i.e., empirical validity) is used to determine validity by comparing the instrument used in the study to another instrument or form of assessment presumed to measure the same variable. Two forms of criterion-related validity are predictive validity and concurrent validity. Predictive: administering the instrument and then allowing an elapsed time interval to pass for later comparison with the criterion scores (EG aptitude test and end-of-semester grade) Concurrent: administration of the instrument and criterion data at the same point in time (EG attitude of students compared to teachers' observations) Judgmental-empirical -- construct validity Construct-related evidence includes a variety of different types of evidence supporting the characteristic being measured. Three common ways to measure construct-related validity includes use of (a) a clearly defined variable, (b) the hypotheses based on theory explaining the variable, and (c) logical and empirically tested hypotheses. While content-, criterion-, and construct-related validity have important applications in research, a more thorough discussion of these concepts can be found in an assessment text (e.g., Erford, 2013). To control for threats to internal and external validity, researchers use various methods of experimental control, which will be explained next. Subsequently, specific threats to internal and external validity will be reviewed. Accuracy at predicting some criteria -- accurate measurement Uses test retest coefficients (?) Validity means usefulness

Continuous variables

Are used to measure variables that, theoretically, can be divided to provide a more precise measurement of some psychological characteristic Also called quantitative: ordinal interval ratio *0 doesn't always mean ratio, it depends on the scale EG Beck Depression Inventory Scale is interval because although there is a 0 scale, it isn't absolute -- can't give you absence of depression. Each individual question is an ordinal scale, but the score becomes an interval scale because the difference between scores of 5 and 6 is similar to scores of 9 and 10. The more items you add together, the score becomes an interval. If it was standardized then it would truly be an interval scale Characteristics measured using continuous scales are referred to as continuous variables, such as Hyperactivity Intelligence Math achievement Anxiety Examples Likert-type scales (multiple choice to assess perception, usually with no middle point such as very dissatisfied to very satisfied) Frequency estimate (IE almost never to almost always) Thermometers Rulers Measurements of psychological constructs are best thought of as approximations or estimates rather than precise measurements All continuous variables have some measurement error. This is why standardization is essential. Standardization leads to precision and replicability The presence or absence of certain properties defines the type of measurement scale one is using These properties include Identification (nominal): identifies individuals Order in terms of magnitude (ordinal): puts scores in order (increasing or decreasing) Equivalent intervals (interval): equivalency between intervals (IE distance between categories 1 and 2 is equal to distance between categories 4 and 5) An absolute zero point (ratio)

Likert items

Assess a participant's attitude or preference using a question or stem accompanied by an order of response choices (e.g. strongly disagree, disagree, agree, strongly agree) Classic Likert scale did not have a neutral middle choice (e.g. undecided, neutral, neither disagree nor agree) but researchers frequently add this middle choice, making it a Likert-type scale Usually researchers do not provide a numerical rating with the verbal description but add one during coding or data input to allow quantification of response choices (0=strongly disagree, 1=disagree, 2=undecided, etc.) 4 or 5 answer choices tend to provide optimal score reliability Label each answer choice -- critically important -- should NOT be simply "strongly disagree" and "strongly agree" -- there need to be options like "disagree, undecided" etc.

Posttest

At termination Follow-up May be caused by other factors not related to research End-of-program measurement Used for value-added assessment

Process Evaluation

Audit Doesn't have a quality indicator -- you're either doing it or you're not "We're doing what we said we were going to" Make sure you're implementing what you said you would

Bar graps

Bar graphs look a lot like histograms, but one will notice that the bars on a bar graph never touch each other, whereas the bars on a histogram do. This is because bar graphs are a method for visually displaying discrete variables (e.g., sex or religious affiliation), whereas histograms are used to display continuous variables. Basically, any nominal variable can be appropriately displayed on a bar graph.

Research

Basic research Applied research Action research

Semi-interquartile range

Because the range is affected by extreme scores, researchers may prefer to report the interquartile range (IR) or semi-interquartile range (Q). This is particularly appropriate when the median is reported as the measure of a distribution's central tendency. Although the IR is sometimes reported, Q is more relevant to our discussion. In practice, Q is simply one-half of IR (i.e., Q ¼ IR/2). Semi means "half," inter means "between," quartiles are the three points within a distribution that divide the distribution into four equal parts, and range refers to the difference between these demarcations. The two quartiles of interest are the first and the third demarcations. You are already familiar with the second quartile, which is more commonly known as the median. There are several methods for computing Q, and they vary in complexity. Simply put, Q is half the difference between the 1st (Q1 or the 25th percentile) and 3rd (Q3 or the 75th percentile) quartiles, or Q ¼ (Q3 Q1) 2. The more challenging part is determining Q1 and Q3! The statistics involved in computing Q1 and Q3 can be quite complex. Simply put, Q1 falls in the middle position of the bottom half of scores, while Q3 falls in the middle of the top half of scores. The good news is that SPSS gives Q1, Q2, and Q3 as part of its default parameters when using the frequencies analysis. In fact, nearly all of the measures of central tendency and variability covered in this section are available through common statistical packages, such as SPSS Excel, or R-stat. Use statistical programs whenever possible: it cuts down on errors and saves a lot of time. In a normal distribution, Q1 and Q3 will be the same raw or standardized score distance from the median score. The lesser or greater the distance between the median and Q1, as compared to the distance between the median and Q3, the more skewed the distribution will be. In general, both Q and IR have the advantages of not being influenced by extreme scores and fitting nicely as an index of variability to accompany the median score—just as the mean and standard deviation are often wedded when describing a relatively normal distribution. Thus, the 21st client who receives a 50 for either the depression scale or substance use scale will have little effect on the median, interquartile range, or semi-interquartile range. Figure 12.9 provides an example of finding the semi-interquartile range for the depression scores. (252)

Normal curve

Bell shaped and symmetrical Gaust (mathematician) -- symmetrical Single peak where mean=mode=median For every score above the mean there is an equivalent score below the mean Asymptotic: tapers as it proceeds away from the center of the distribution and, theoretically, approaches the horizontal axis without ever touching it Has mathematical properties making it very useful in interpretation of norm-referenced instruments The distribution of a number of human traits approximate the normal curve (EG weight, height, intelligence) It allows for comparisons to be made about different clients' scores on the same test, or about the same client's score on different tests Half (50%) of the scores always fall below the mean, and half always fall above the mean About 68% (68.26) of all scores fall between 1 SD above the mean and 1 SD below the mean, written as About 95% (95.44) of all scores fall between 2 SD above the mean and 2 SD below the mean About 99% (99.74) of all scores fall between 3 SD above the mean and 3 SD below the mean Property of equalizing all sorts of standard score scales (EG T scores, deviation IQs, z scores) That is, a deviation IQ of 130, T scores of 70, and z score of +2.00 will always mean the score falls at the 98th percentile This interchangeability is one of the valuable characteristics of the normal curve

Effects of Mathematical Operations on Central Tendency & Dispersion

Can normalize the curve. Add/subtract only affect mean. Mult./division affect everything

Duplicate submission

Can only submit to one journal at a time.

Variables

Categorical Quantitative Discrete variable Continuous variable Independent Dependent Behavioral studies: Stimulus variable (IV) Response variable (DV) Correlational studies: Predictor variable (IV) Criterion variable (DV)

Characteristics of a research study

Characteristics of a research study (see quantitative research flowchart on document) Develop an idea into a testable hypothesis Choose an appropriate research design Identify the population and select a sample from the population Conduct the study Analyze the obtained data Report the results Hey don't stop, call a rat Abstract Methods Results Discussion Conclusion

Observations

Commonly used in qualitative studies Are time intensive and expensive Two types Informal observations: ordinarily yield anecdotal data as researchers scan the environment and make notes on what is observed. However, different observers may come away with different reports or perspectives of the same observation target. Confirmation bias can be an issue. Formal or structured observations collect specific types of data for specified periods using predetermined procedures. Such a structured approach tends to minimize bias and results in different observers recording similar data

z-scores

Commonly used in research, mean (M)=0; standard deviation (SD)=1 Formula is ... X is participant's raw score M is the sample mean SD is the sample standard deviation SD is the sample standard deviation Z scores: M=0, SD=1 -1.00 means 1.00 standard deviation (the minus sign indicates) "below the mean" A positive 1.00 standard deviation indicates "above the mean" Interpreted in standard deviation units above or below the mean Z score of +2.00 would be 2 SD above the mean, Z score of -0.5 would be 1/2 standard deviation below the mean A common type of standardized score used in research is the z-score. The z-score has a mean of zero, and an SD of one and is computed from the raw score distribution using the following formula [formula] where X is the participant's raw score, M is the sample mean, and SD is the sample SD. The formula basically allows the individual's raw score to be expressed as above the mean (a positive z-score) or below the mean (a negative z-score) in terms of the sample's SD. For example, assume that client A's raw score is 30 and client B's raw score is 45. By themselves, raw scores of 30 and 45 give little meaning. Perhaps one can only really conclude that the score of 45 is higher than a raw score of 30 by about 15 raw score scale points, but little else can be discerned without further information. To convert these raw scores to z-scores, you need to know the mean and SD of the raw score distribution. In this example, assume the sample of scores has a mean of 40 and an SD of 10.

How do counselors support research?

Counselors support efforts of researchers by participating in studies whenever possible.

Data-driven needs assessement

Deals with real needs that are identified through analysis of program data Used mostly in educational settings Refers to info derived through the administration of tests or other standardized and objective sources of info by analyzing either Aggregated data Disaggregated data -- determines equitable treatment of subgroups across curriculum, look for gaps or divisions within performance needs Differences can be determined through statistical means or simply by noting gaps/needs in performance Aggregation Disaggregation

Deductive reasoning

Deductive reasoning (quantitative): Starts with theory and works from there. Use theory to predict how the question would be answered Used to test veracity of theories (which are created by inductive reasoning) In deductive reasoning, a researcher begins with an established, or at least a tentative, theory and through an in-depth study of how that theory purportedly operates, develops and tests hypotheses of how certain facts of the theory will operate or relate to observable variables. The manipulation of these variables creates data that are then examined and determined to either support or not support the hypothesis. Then the cycle is repeated. These results are then used to confirm, disconfirm, or modify the theory. In the teacher-student interaction example, the researcher could test the hypothesis that teachers who demonstrate predominantly democratic/authoritative disciplinary styles yield significantly better student academic outcomes than teachers with authoritarian/ autocratic disciplinary styles. Notice how this deductive process starts with a preexisting theory and tests that theory by collecting data to answer a question stemming from that theory

Accountability

Demonstrating responsibility for one's actions, usually by providing evidence of efficient services or evaluation of program effectiveness Accountable professional counselors seek to understand the effects of their interventions and take responsibility for their actions Accountability should always be based on evidence

Stakeholder

Depends on who you're serving and who's invested or affected Individuals to whom the professional counselor is responsible: anyone who may benefit from, is involved in, or interested in the program Stakeholders may include clients, students, participants, parents, educators, other professional counselors, community leaders, taxpayers (if a program is publicly funded), employers, or the researchers/evaluators

Descriptive statistics

Descriptive statistics describe and summarize data collected on a variable or the relationship between variables Frequency Distribution Grouped Frequency Distribution Bar Graphs Histograms Frequency Polygons Shapes of Distributions Skewness (Symmetry) Normal Curve Kurtosis (Peakedness) - Leptokurtic/Platykurtic/Mesokurtic

Conceptual definitions

Dictionary: reciting, different individuals might recite variables in different ways

Scales of measurement

Discrete (qualitative) or continuous (quantitative) Nominal Ordinal Interval Ratio

Action research

Does not build theory the same way basic research does, nor does it apply theory in any particular way The researcher is often a practitioner who has an immediate problem they would like to understand better Action research is the application of theory to a local setting that does not allow for generalizability but does create local insight toward the improvement of practice Of interest to school counselor because they're embedded in a context Does not build theory (basic) Doesn't apply theory (applied)

Plagiarism

Don't do it... Don't take anyone's ideas Don't use their phrasing Even if you edit it in minor ways or use their phrasing, it's plagiarizing Always appropriately cite concepts, etc.

Outcomes research

During the outcomes assessment phase, professional counselors collect final data identified during the program planning stage Counselors analyze all data sources and draw conclusions about intervention and program effectiveness To assess the outcomes of programs and to draw conclusions about treatment success, professional counselors need to analyze outcome data collected throughout the counseling program evaluation cycle Three types of methodology Clinical trials Qualitative analysis/reviews Meta-analytic reviews

Quantitative scales

Ordinal scales Interval scales Ratio scales

Summative evaluation

Evaluation at the end of the program End-of-program evaluation, and is very frequently performed, sometimes even mandated, by funding agencies or administrators The primary purpose of summative evaluation is the determination of whether and to what degree program objectives and outcomes have been met (IE Did the program accomplish what it set out to accomplish?) Summative evaluation is pursued to determine whether a program should be continued, but outcome results can often be helpful in making program modifications prior to beginning a new cycle of program implementation

Associational hypothesis

Explore how scores on an independent variable (called the predictor variable) relate to scores on some dependent variable (called the criterion variable), with both variables ordinarily comprising continuous scales Same language as correlational studies EG do SAT scores predict college success

Forced-choice items

Forced-choice items Popular in testing Most forced-choice questions are some permutation of a multiple-choice question format Multiple-choice items provide a question or stem of a question/statement along with several possible answers Participants are instructed to choose the correct or best answer from among the given response choices The categories of forced-choice items are often meant to be exhaustive Forced-choice items are easy to code and analyze Examples What is your sex? 1. Male 2. Female What is your marital status? 1. Single, never married 2. Married 3. Divorced, separated, or widowed

Frequency distribution

Frequency distributions allow researchers to quickly view the frequency with which each score occurs in a distribution; there are two types: simple and grouped. Simple frequency distribution Grouped frequency distribution

Open-ended items

Frequently used in qualitative studies Provide a question or incomplete sentence stem that cannot be answered yes or no with a single-word responses (ex. Extended response questions) No specific answers are provided for the participant to choose among Open-ended items are designed to generate extended responses that are not biased Qualitative research and program evaluation studies frequently use open-ended questions because these questions yield thick, rich descriptions and narratives from the participants' own perspectives Cons: coding and analyzing responses is often challenging and these questions are frequently skipped by participants who aren't motivated to do the extra effort required to answer Examples What additional comments or concerns would you like to express? How was the program most helpful to you? What are your suggestions for improving the program?

Group comparison hypothesis

Generally lead to research approaches that use randomized experimental, quasi-experimental and causal-comparative designs Categories of the independent variables are used to divide participants into groups, which are then compared to determine if they differ on the dependent variable Of interest in counseling EG new treatment will be more effective than old treatment (independent: treatment of depression, dependent: two treatment modalities)

Persons not capable of informed consent

Give informed "assent" not consent. True for young people, mental disorders, intellectual disabilities.

Baseline inputs

Measured by pretest? What you need to accomplish your goals?

Questionnaires, surveys, and rating scales

Most commonly used These are usually paper-and-pencil instruments, although the internet is allowing for more electronic administration (Survey monkey, Qualtrics) Researchers ask open-ended and/or closed-ended questions Surveys ask participants for their perceptions about the topics under study whereas questionnaires and rating scales (observing) typically ask for more factual responses Good psychometrics Nonresponse bias occurs when participants do not complete or do not return the instrument/survey (attrition) -- so the participants who submitted results must be similar to the nonresponses to show validity in the study

Ordinal scale

Name and provide an order for the choices, graded scale Order of magnitude (ordinal): Refers to how much of a characteristic a client may have and the ability of the professional counselor to rank these magnitudes in some ascending or descending manner Magnitude applies to some scaling methods but not others Allows people to be rank ordered

Interval scale

Name, order, and have equal intervals between the different designation on the scale, EG thermometer Equal intervals (interval): Means that the mathematical distance between the numbers must be accurately reflected in the distances between scale descriptors On an intelligence, aptitude, or achievement test, this is sometimes accomplished by scaling each answer as correct (1 point) or incorrect (0 points) where a 0 is the absence of correctness, and a 1 is the presence of correctness When 20 equally weighted items comprise the test, one may conclude that someone with a higher score had more correct answers, thus indicating greater mastery of what was being tested It becomes a bit more complicated when scales use the descriptors 0=almost never, 1=sometimes, 2=frequently, 3=almost always In this case, the burden is on the test developer (and test interpreter) to demonstrate that the distance between responses 0 (almost never) and 1 (sometimes) is the same as between responses 1 (sometimes) and 2 (frequently) As these problematic items are added to get a total score, the problem becomes less pronounced

Ratio scale

Name, order, have equal intervals, and have absolute zero EG height, weight, Kelvin Ratio scale: Has all the same properties of an interval scale (names and provides a measure of magnitude where the units are in equal intervals) AND has a meaningful zero Examples Speed: number of miles per hour is a measure of magnitude and each added mile is an equal interval. Plus, there is a meaningful zero because when you are driving at zero miles per hour, you are stopped and not moving) Height: number of inches is a measure of magnitude and each added inch is an equal interval. Plus, there is a meaningful zero (meaning no height) Weight Time on a stopwatch Degrees Kelvin

Qualitative variables

Nominal scales

Meta-analyses for career and educational planning activities

Outcome research

Research hypothesis

Nondirectional hypothesis: predict that a difference will occur without indicating which group will perform higher It will not predict whether the relationship between variables in an associational hypothesis will be positive or negative Directional hypothesis: Predicts not only the difference (or relationship) but the direction as well Null hypothesis (Hsub0): predicting no differences (or relationships) will be observed between or among groups Alternative hypothesis (Hsub1): predicting that differences (or relationships) will be observed between or among groups Basic descriptive hypothesis Associational hypothesis Group comparison hypothesis Not all research has hypotheses Could even be a statistical hypothesis Usually involves the variables themselves

Formative evaluation

Ongoing evaluation of the program as it happens Cost-effective A way of gathering info about program implementation and effectiveness during the course of the program. One facet of formative evaluation is a determination of whether the program is being implemented properly Another facet involves whether the program outcomes are progressing as planned and objectives are being met. Program implementation includes milestones or benchmarks en route to full implementation. These milestones not only indicate which facets of a program have been implemented, but also the expected levels of progress the participants should be achieving. If participants are not meeting expectations at these points in program implementation, formative evaluation allows for midcourse adjustments or modifications to be made for the program to meet its objectives Formative evaluation is very cost-effective when implementing large-scale expensive programs where a lot is riding on effective outcomes, can change mid-course

Designing instruments and gathering data

Open-ended items Forced-choice items Ranking items Checklist items Binary option Likert items Guttman-type items Semantic differential items *He will give examples and then you'll have to identify what kind of item it is*

Clinical trial studies

Outcome research Appear frequently in the research literature and are helpful because of their reliance on comparison groups (e.g. control, placebo, alternative treatment), standardized treatment protocol, and the use of outcomes measures Offer only a single result based on quantitative methodology Different studies on the same topic may yield variable or even contradictory results

Meta-analysis

Outcome research Gold standard of taking all clinical trails and aggregating them into a generable/generalizable conclusion A specific quantitative technique that allows empirical studies to be collapsed into a meaningful quantitative index, known as an effect size (ES) An effect size for experimental studies is usually calculated by subtracting the mean of the control group from the mean of the experimental group and then dividing by the pooled standard deviation of the control and treatment groups Effect sizes from various comparable studies can be averaged and compared to a criterion-referenced effect size range to indicate the strength of the finding (Cohen coefficient) ES of 0 indicates no effect of treatment ES = 0.20 indicates a small effect of treatment (the alt. treatment performed about 1 SD better than control) ES = 0.50 indicates a medium effect of treatment ES = 0.80+ indicates a large effect of treatment (some use 0.67) Not only statistically significant, but also practically relevant Expressing effect sizes in correlational research involves the computation of the Pearson r r = 0.10 (small) r = 0.30 (medium) r = 0.50 (large)

Qualitative analysis

Outcome research Involves researchers examining and summarizing robust trends and findings across studies, clients, and contexts Can present biased conclusions, so criteria and procedures should be put into place to ensure that the information is processed systematically and conclusions are robust and applicable

Implications of outcomes research for professional counselors

Outcome research Practitioners and educators ask practical questions that can be conceptualized and researched, leading to more relevant and practical training and practitioner-based information Clinicians can integrate these new, more relevant findings into clinical practice, and trainers can integrate the findings into state-of-the-art counselor education programs and in-service training modules Collaborative and collegial rather than contentious or adversarial Goal is to have more empirically validated, evidence-based practices occurring in the counseling field Researchers must understand the difference between clinical efficacy (tightly controlled experimental conditions that may not generalize) and effectiveness studies (what works in pragmatic, naturalistic circumstances)

What number is an effective effect size? Cohen's coefficient

Outcome research 0.7 or greater is effective 0.2 -- small 0.5 -- medium 0.67 -- large

Group counseling and individal counseling had effect sizes of...

Outcome research Of 0.7 and greater is considered effective Medium to large effective treatment More effective than not

Subject or participant?

Participant

Percentile ranks

Percentile ranks, or percentiles, indicate the percentage of observations that fall below a given score on a measure plus one-half of the observations falling at the given score. Ordinal scales that allows you to designate performance in a lineup of 100 (93rd percentile means you did better than 93% of people) [Imagine a line of 100 people] They're ordinal because Whereas proficiency in understanding the using disaggregated data relies on a comprehensive knowledge of norm-referenced and criterion-referenced score interpretation, the use of percentile ranks helps individuals understand the basics of score meanings and differences between scores of various sub-populations Usually transform percentile ranks into a different kind of score to do math and then transform back to percentile ranks Any standardized score can be transformed into percentile ranks

Skewness (Symmetry)

Positively skewed (or skewed right) means a distribution of scores has a lot of low scores (e.g., low-end piling) so a preponderance of scores fall below the mean. In other words, few scores fall at the positive end of the distribution, so the distribution "tails-off" in the positive (high-score) direction. When graphing a positively skewed distribution of scores, the "hump " or mode shifts to the left, and the tail of the distribution trails off to the right. If one puts an arrow on the tail of the curve, it will point to the right. To correct for a positive skew, the test developer can make the scale more normal by adding easier items to the item pool. Negatively skewed (or skewed left) means that a distribution of scores has a lot of high scores (e.g., high-end piling), and so a preponderance of scores fall above the mean. In other words, few scores fall at the negative end of the distribution, and so the distribution "tails-off" in the negative (low-score) direction. When graphing a negatively skewed distribution of scores, the "hump " or mode shifts to the right, and the tail of the distribution trails off to the left. If one puts an arrow on the tail of the curve, it will point to the left. To correct for a negative skew, the test developer can make the scale more normal by adding more difficult items to the item pool. Figure 12.7 shows a comparison of a normal curve with both positively and negatively skewed distributions. Note that on the normal curve, the mean, median, and mode are all at the same score (e.g., mean ¼ median ¼ mode). This is because the center point of a normal curve is the highest peak, and the distribution is bilaterally symmetric (i.e., a mirror image). When a distribution is positively skewed, the mean and, to a lesser extent, the median are pulled toward the tail (e.g., mean > median > mode). Likewise, when a distribution is negatively skewed, the mean and, to a lesser extent, the median are also pulled toward the tail (e.g., mean < median < mode). It is important to note that obtaining a skewed distribution is not always a bad thing. In fact, it may be something you want to plan for. For example, when attempting to identify students at risk for math deficiencies, a test developer is interested in identifying the students who are performing below the average level and is not at all interested in including difficult items that might help to identify the best math students in a class. SPSS -1 to 1 skewness tolerance = -1 to 1 is normal distribution, or mesokurtic

Operational definitions

Precise physical and concrete steps that will allow anyone to see and measure a variable Operational definitions enhance replication (a hallmark of research) Specific to research Define variables to the point that they're replicate-able EG "Depression is X score on Beck Depression Inventory 2nd Ed." EG Race EG Age

Who bears primary responsibility for the ethics of a research project?

Primary researcher

ACA Code of Ethics

Protect the rights of human participants Counselors do not engage in misleading or fraudulent research, distort data, misrepresent data, or deliberately bias their results Keep in mind concepts like beneficence and non-malevolence

Binary options

Provide only two mutually exclusive answer choices for each question or prompt (e.g. yes/no, agree/disagree, true/false, satisfied/dissatisfied) This two-choice response format simplifies statistical analysis and interpretation but restricts participant responses to an "all or none" dimension (like/dislike, etc.); thus may lack "sensitivity" Common in personality tests e.g. "I work hard in school" agree/disagree

Using outcomes studies in program evaluation

Qualitative and quantitative Quantitative: Non experimental Pre experimental True Experimental -- cause and affect comparing control group and treatment group Quasi experimental -- different sampling methods, instead of randomly assigning people to groups they might assign boys to one group and girls to another -- threatens validity Causality -- did intervention cause outcome? Generalizability -- could intervention be used in other programs?

Ranking items

Ranking items Require participants to rank order a number of provided choices Although easy for participants to respond to, they are difficult to statistically analyze, often because participants do not always rank all of the given choices, thus creating missing data Examples Rank the following types of television shows in terms of how much you would like to watch them... (1-4, with 1=most preferred, 4=4 least preferred -- please use all four numbers and please use each number only once) Not cumulative lists, just the ones the researcher wants perspective on

Categorical variables

Variables whose response choices can be categorized -- for example, geographical regions of the United States, sex, candidates for a political office Nominal or qualitative EG Gender as M/F

Products and portfolios

Real-life examples of performance that an evaluator can study and evaluate A product is anything created by a participant representing a program goal (anything created by participant such as artwork, composition, poster, etc.) A portfolio is a collection of products that can be evaluated to determine the quality of an individual's performance

Basic research

Research that seeks to build a theory (induce a theory) A theory can be seen as a lens through which researchers view the phenomenon they want to study Theories are different from laws -- theories explain an aspect of experience, laws experience all experiences Theories help researchers: Make a prediction State a hypothesis Write research questions Provide a conceptual framework

The research question

Researchers begin with a problem they want to solve or a question they want to answer. Legal and ethical implications for conducting the research should be considered before too much time or resources have been invested. Research questions are phrased in broad, general terms and may make specific reference to the variables of interest. Research questions help to focus the researcher's literature review and lead to the development of specific research hypotheses. Often in the qualitative tradition a research question is all you need to get started. In the quantitative tradition, we generally generate hypotheses. Variables of interest, asked in a way that's intriguing. More interested in exploring. help to focus the researcher's literature review and lead to the development of specific research hypotheses

Perceptions-based needs assessment

Return rate -- nonresponse bias Triangulation: Triangulation refers to the use of multiple methods or data sources in qualitative research to develop a comprehensive understanding of phenomena (Patton, 1999). Triangulation also has been viewed as a qualitative research strategy to test validity through the convergence of information from different sources. Asks the consumers of the counseling services for help in prioritizing needs and directing a program's focus More traditional/familiar approach than data-driven model When planning a perceptions-based needs assessment, a professional counselor and advisory committee should consider the questions Whose needs should be assessed? When should needs be assessed? What needs should be assessed? How should needs be assessed?

Standard error of measurement

SEM is based on score reliability (EG internal consistency) but reported in terms of the actual test score units (raw score units [raw score, t-score, standard score]) and helps determine the range of scores within which the true score probably lies. It is sometimes called a confidence interval or error band

Confidence intervals

SEM is based on score reliability (EG internal consistency) but reported in terms of the actual test score units (raw score, t-score, standard score) and helps determine the range of scores within which the true score probably lies. Sometimes called a confidence interval or error band Ø Confidence intervals allow us to report an individual score in the context of the score's reliability. For example, it is not accurate to state that a participant's IQ is a standard score of 110, because all measurements have error. Based upon the normal distribution of score errors and knowing the reliability of a set of test scores, it is possible to report a range of scores within which the true score probably lies, at a certain level of probability. This statistical probability is referred to as the standard error of measurement (SEM). No effect to moderate effect -- higher is more effect

Deception

Should be ethical and be disclosed as soon as possible. Keeping beneficence in mind Immediately un-deceive after study

Range

Simply put, the range is the distance between the highest and lowest scores in the distribution. Highest minus lowest, plus 1

Shapes of distributions

Skewness (Symmetry) Normal Curve Kurtosis (Peakedness) - Leptokurtic/Platykurtic/Mesokurtic

Standardized or counselor-made tests

Standardized tests exist that measure achievement, intelligence, anxiety, substance use, hyperactivity, career interests, and many other behaviors Test publishers are a good source of commercially available products, and mental measurements yearbook (MMY) provides independent, critical reviews of many of these products

Basic descriptive hypothesis

Summarize participant scores on a single variable, usually by presenting measures of central tendency and variability, or percentages of each category EG what percentage of females voted for candidate X

Applied research

Takes basic research to another level Of interest to counselors Requires designing and implementing experimental conditions; identifying the target population, randomized selection, and assignment; and examining research findings Applies theory Helps push the envelope How far theories can be extended How generalizable are theories of basic research (to different populations, etc.) Typical question -- what qualities do counselors possess that influence outcomes

Reliability

Test-retest, dependability -- quantitative Consistency, how often data acts the same or measures the same Internal Consistency: when writing a lit review, examine the lit for consistencies among studies, comparable or parallel findings, or common themes -- qualitative Many factors can threaten the validity of a research study, from the consistency of the data-gathering instruments, the nature of the participants in the study, to the conclusions the researcher draws from the data. In qualitative, you use "dependability" not "reliability"

Mean

The mean is the arithmetic average of a set of scores. It is the exact balance point in a distribution of scores. To obtain the mean, simply add all of the scores in a distribution and divide by the number of scores:

Evaluation

The measurement of worth based on evidence collected. Evaluation of program effectiveness is usually described as formative or summative

Median

The median is defined as the middlemost score, the value below and above which 50 percent of the cases fall. The median, by definition, also becomes the 50th percentile rank. To determine the median, first line up all scores in order from lowest to highest, then determine the number of scores, and finally locate the middle score in the rank order.

Mode

The mode is the most frequently occurring score in a set of scores. It is also the only measure of central tendency that can be used with nominal data. It is a simple concept, easily recognized during a visual scan of a frequency distribution or graph because it is the highest point on a curve, histogram, or frequency polygon.

Standard deviation

The standard deviation is the square root of the variance. Standard deviation has three primary advantages. First, it expresses the variation around the mean in score units comparable to the mean. This is why the mean and standard deviation are nearly always wedded in empirical research reports and test manuals. Second, the properties of the standard deviation imply that it should be applied to distributions that are basically "normal." This allows the use of the wedded terms mean and standard deviation to conjure up the wonderful statistical properties of the normal curve and standardized scores, which will be explained in Chapter 13. For now, suffice it to say that the closer the distribution is to normal, the more accurate one can be about knowing the exact percentages of a population that fall between the mean and a given standard deviation. This can be a big help when you have to interpret what an individual 's score means. Finally, the standard deviation has marvelous algebraic/mathematical properties that make it a commonly used facet of statistical formulas and applications. Thus, it is important to understand how the standard deviation is calculated and the kinds of information it allows professional counselors to convey when interpreting scores. The importance of the standard deviation cannot be overemphasized, and every researcher and test user must become proficient in its use and applications. SD describes how the scores spread out from the middle of the distribution 34.13% (about 34% of all scores fall between the mean (0 SD) and 1 SD above (or below) the mean 13.59% (about 14%) of all scores fall between 1 SD and 2 SD (whether above or below the mean) 2.14% (about 2%) of all scores fall between 2 SD and 3 SD (whether above or below the mean) Only 0.13% of all scores fall beyond 3 SD (whether above or below the mean)

Interviews

Three types Structured interviews: present interviewees with a formal sequence of questions, and little variation in administration is allowed, more so for quantitative, less biased Unstructured interviews: provide basic set or questions but allow for deeper exploration of participant responses, familiar to counselors, more biased because counselors especially will look for confirming evidence Semi-structured interviews provide a midground between structured and unstructured Researchers must take precautions to minimize bias during an interview and give special attention to the development of interview questions Interviews may be conducted face-to-face or over the phone. Face-to-face typically yields more detailed data, although it is more costly and inconvenient

Disaggregation

Total group scores have been broken down int specific subpopulation scores so that differences between and among subgroups can be analyzed More helpful in the decision-making process because it allows the identification of programmatic strengths and weaknesses and the determination of equity in subgroup performance Whereas proficiency in understanding the using disaggregated data relies on a comprehensive knowledge of norm-referenced and criterion-referenced score interpretation, the use of percentile ranks helps individuals understand the basics of score meanings and differences between scores of various sub-populations Usually transform percentile ranks into a different kind of score to do math and then transform back to percentile ranks No Child Left Behind Act of 2001 and school reform groups have led educators to study disaggregated data on gender, SES, race to determine whether students are given equal access to rigorous curricula, funding, specialized services, and so on

Semantic differential items

Use bipolar adjectives comprising the activity pairs (e.g. active-passive, fast-slow), evaluative pairs (e.g. good-bad, dirty-clean), or potency pairs (e.g. small-large, hot-cold) Only the bipolar end points are labeled, inevitably leading to diminished precision over exactly what a rating of 2, 3, or 4 really means Semantic differential scale items are relatively simple to code, analyze, and interpret Example Graduate school is ... Horrible 1 2 3 4 5 Awesome

Nonparametric Tests

Used to evaluate hypotheses about the shapes of distributions and are applied only to nominal or ordinal data Examples of nonparametric tests include (also, each has their parametric mate) Chi-square test Mann-Whitney U test Kruskal-Wallis test Wilcoxon's signed-ranks test Friedman's rank test

Parametric Tests

Used to evaluate hypotheses when the dependent variable is measured with an interval or ratio scale and when certain other assumptions are met (EG normally distributed scores, homoscedasticity) Examples of parametric tests include T tests (comparing two means for one variable) ANOVA (analysis of variance) ANCOVA (analysis of covariance) MANOVA (multiple analysis of variance) MANCOVA (multiple analysis of covariance) Under normal circumstances, parametric tests are more powerful than nonparametric tests

Perception data

beliefs of stakeholders about the need for, or impact of, services provided by the school counselor

Process data

refers to tallies of actual services provided and numbers of stakeholders impacted by the professional school counselor's services -- not as useful (how many clients are you seeing, how many hours are you doing X, etc.)

Common statistical and measurement symbols

p. 33

Results data

the outcomes of programs and interventions provided in the comprehensive school counseling program -- most important

Interpreting standardized scores

z-scores T scores Standard scores Percentile ranks Confidence intervals Standard error of measurement Made to fit the normal curve


Ensembles d'études connexes

LOS1/16reported speech&passive voice MODERNno true perfect CHART SEPARATING IMPERFECT SUBJUNCTIVE TENSE FROM fluentu.com's summary reviewing past perfect subjunctive -separating MODERN#20STATES=?studyspanNEAR future & SIMPLE #99studyspanish=MODERN51STATES

View Set

NUR 316 | Chapter 57: Drugs Affecting GI Secretions

View Set

fin 240 kaplowitz worksheet 20.3: contracts for the international sales of goods

View Set

INS312 Chapter 7-Variable life insurance

View Set

OB exam #2 (chapters 17,18,23,24,15,16)

View Set

Knopman Ch. 2 - The Nasdaq Stock Market

View Set

ASCL Possible Interview Questions

View Set

Essen Nutrition Ch 3-5 Questions Exam

View Set