KIN 250 final exam
leptokurtic
- (+ kurtosis) - steep curve with high peak; larger than expected probability of observations in tails
mesokurtic
- (+/- kurtosis) - normal curve with average peak; expected probability of observations in tails
platykurtic
- (- kurtosis) - flat curve with low peak; smaller than expected probability of observations in tails
certification programs
- ACSM is a leader in the research and promotion of exercise science
physical activity readiness questionnaire
- PAR-Q: a self-guided screening of risk for physical activity and fitness
independent variable
- X - generally nominal in nature
histogram
- a graph consisting of columns used to represent the frequencies of observed scores in the data
domains of human performance
- cognitive domain - affective domain - psychomotor
achievement test (i.e., norm-referenced test)
- discriminate among levels of achievement > V2 max test > SAT or ACT
probability
- expresses chance as a ratio of the number of "occurrences" over total number of observations
negative net D
- indicates that the lower-scoring group performed better on the item than the upper-scoring group
type 1 and type 2 error
- making an incorrect decision about the null hypothesis (e.g., the results) -
parameter
- measure of interest in the population
muscular fitness benefits
- negative relationship between muscular strength, obesity, and all-causes mortality after controlling for cardiovascular endurance
t-test for two independent groups
- purpose: to examine the difference in one continuous DV between two (and only two) independent groups (e.g., boys and girls; treatment and control) within one nominal IV.
percentile
- represents the percent of observations at or below a given score > a norm-referenced comparison
coefficient of determination
- represents the proportion of shared variance between two measures
constructing and scoring: objective
- requires selection of two more more given responses - types of objective questions (i.e., true-false, matching, multiple-choice) are distinctly different in format, presentation, and response format
variance
- s^2 - measure of the spread of a set of scores based on the average of the squared deviation of each score from the mean - used more frequently than range - most stable measure of variability
qualitative methods: observation
- second most common data source - direct observation with note taking and/or video taping - transcribe and/or code; seek themes
reliability and validity for self-report instruments was higher for older children than younger children
- self-report scales may be suitable for high school age students but caution necessary when using with young children
construct-related validity
- setting cutoff scores can be accomplished using the divergent group method with two groups that are clearly different
health-related fitness batteries
- several organizations have grouped health-related fitness items together into test-batteries > the general purpose of a battery, given sufficient documentation, is to provide the user the ability to: >> administer reliable and valid tests >> interpret scores from the test >> convey information to participants
ratio
- similar to interval - absolute (true) zero point > the value of zero means absence of the trait (e.g., weight, money, etc.)
youth fitness in the US
- some data suggest that US youth are unfit and/or that fitness levels may be decreasing - some data indicates that US youth fitness levels are not necessarily poor and/or may not be decreasing - no clear consensus on the levels of US youth fitness and the trend of those fitness levels
threats to reliability
- some potential sources of error in written tests in the classroom or the lab > inadequate sampling of the domain of interest > an examinee's mental and physical condition > environmental conditions > guessing > changes in the field of study (e.g., participant's true score changes between measurement occasions due to some external condition)
competitive anxiety
- sport competition anxiety - sport anxiety scale-2 - competitive state anxiety inventory -2
standard deviation
- square root of the variance > take the square root of the formula for variance
hypothesis
- statement of a presumed relation between at least two variables in a population
standard error of estimate (SEE)
- statistic that reflects the average amount of error in the process of predicting Y from X - also called > standard error > standard error of prediction
intrinsic
- taking part in an activity for the enjoyment and satisfaction inherent in engaging in he behavior itself
attributable risk or risk difference
- the difference in the probability of the occurrence of event X in populations due to exposure to a risk factor over a given period of time
prevalence
- the number, proportion, rate, or percentage of existing cases of a given health-related state within a specified period of time
incidence
- the number, proportion, rate, or percentage of new cases of a given health-related state within a specified period of time
alpha level (aka level of significance)
- the probability value that is set by the researcher to define very unlikely sample outcomes if the null hypothesis is true - chance of results happening only by random chance - allows testing of the probability of the actual occurrence of your result - conventionally set at .05 or .01
absolute risk
- the probability, or the odds of the occurrence of event X in a population over a given period of time
general psychology scales used in sport and exercise
- the trend in sport psychology has been to develop sport-specific versions of psychological inteventories - several general psychological inventories have been used regularly in physical activity and competitive sport settings > self-motivation inventory > profile of mood states > test of attentional and interpersonal style
index of reliability
- theoretical correlation between observed scores and true scores - if rxx= .90 then the index of reliability between observed scores and true scores = .95
formative and/or summative evaluation
- type of evaluation is determined by the use of the data
what to measure
- typically this is determined beforehand based on the objectives of a course of study - a two-way table of specifications can be used to identify relative importance of each area of a test > content objectives >> e.g., specific goals determined by an instructor > educational objectives (e.g., higher-order) related to the test: >> e.g., more general topics (e.g., cognitive abilities) determined by a board of education
cautions in using psychological tests (APA) - basic qualifications in applied settings
- understanding of test principles and the concept of measurement error > test giver needs to understand some basic statistical concepts (e.g., correlation, central tendency, dispersion) > no test is perfectly valid - ability to evaluate a test's validity and its purpose for use > try to minimize sources of error - self-awareness of one's own qualifications and limitations > performance enhancement vs. psychopathology - inappropriate test use (i.e., team selection) > evidence for particular uses of scores - administering feedback to participants about the results and interpretation of the tests > subjects should know why they are taking a test > some sort of feedback should be provided after completion - to help reduce bias, the coach should not be involved in the testing procedure > social desirability > sport psychologist like a strength and conditioning coach
the normative approach
- uses norm-referenced data to set standards with a theoretically accepted criterion > example: cooper institute suggests that being above the bottom 20% on fitness level, controlling or age and gender, is associated with reduced risk of negative health outcomes in the future >> link to criterion-related (predictive) validity
concepts of validity and relevance
- validity is the degree of truthfulness of a test score > dependent on two characteristics: reliability and relevance - relevance is the degree to which a test pertains to its objectives
how to measure: what type of test format
- verbal presentation vs. written test - most common, efficient, method of presenting achievement test is the written form - open-book test and take-home test - expense, convenience, minimizing cheating, and concern for impaired examinees all affect decisions on test format - some guidelines > notice about the number and nature of the test items > clear directions for completing the test > group together questions of the same type to reduce fluctuation among the types of mental processes > including a relatively simple question at the beginning of a test may benefit students by reducing anxiety
sport anxiety scale - 2
- viewed competitive trait anxiety as multidimensional: > somatic anxiety > worry > concentration disruption - cronbach's alpha (i.e., intraclass reliability), > = .84, for subscales - test-retest with a one-week interval (i.e., interclass reliability), >= .76 for subscales - convergent (e.g., correlated with self-esteem) and divergent (uncorrelated with perceived self-competence validity)
statistical analysis of CRTs
- with CRTs, data need to be categorized into nominal variables (categorical in nature) - for CRTs, the primary tool for analysis is a statistical technique using a contingency table (a 2x 2 "chi-square")
questionnaires and surveys
- written tests are primarily designed to assess the amount of knowledge a participant has - questionnaires are typically used to measure affective domain concerns such as opinions, attitudes, and behaviors
measuring body composition
-obesity refers to overfatness not overweight - body composition can be looked at as two components: lean body mass and fat mass - measuring body composition involves estimating a person's precent body fat
simple linear prediction (regression)
-statistical method used to predict the criterion, outcome, or dependent variable (Y) from a single predictor or independent variable (X) > prediction equation shares the same form as the equation for a straight line in plane geometry
isotonic
muscle generates enough force to move a constant load at a variable speed through full range of motion
odds
- expresses chance as a ratio of the number of occurrences to the number of non-occurrences
null hypothesis
- a statement that there is no relation (association or difference) between variables
sample
- a subgroup of the population of interest
false negative
- a test indicated that a condition is absent when it truly is present
false positive
- a test indicated that a condition is present when it is truly absent
test
- a written, oral, physiological, psychological, or mechanical instrument or tool used to make a particular measurement
split-halves reliability
- administering the same test twice, or two separate tests may not always be the best option in practice - this form of reliability uses a single test to determine reliability by splitting the test into parts (e.g., odd questions, and even questions) > suppose that i correlated total scores from odd items and a total score from even items on quiz 1
advantages and disadvantages of questionnaires
- advantages > efficient in terms of money and time > respondents can be widespread geographically - disadvantages > low or poor responses can diminish value of data > inability to clarify questions for mailed or online questionnaires
imagery
- aka as visualization, mental rehearsal, mental practice - shown to improve performance, increase neurological activation, and differentiate high-performing from lower-performing athletes - vividness of movement imagery questionnaire-2 - purports to measure 3 factors with regard to 12 movements: > internal visual imagery: first-person perspective of the movement > external visual imagery: adopting someone else's view of the movement > kinesthetic imagery: how it feels when experiencing the movement
cronbach's alpha coefficient
- alpha is a common intraclass correlation model - separates total variance into two general components
reliable and valid measurement of physical activity is essential in determining
- amount of pa - amount of sedentary behavior -role of pa in health status - factors that relate to pa behavior - effectiveness of interventions to promote pa
formative evaluation
- an evaluation conducted during (as opposed to the end of) an instructional or training program > initial or intermediate evaluation (i.e., pretest or interim report) - important for tracking changes in the instructional, training, or research process
correlation coefficient
- an index of the linear relationship between two variables measured by the Pearson product-moment correlation coefficient > ranges from -1 to +1 > r of zero indicates no linear relationship - scatterplots - calculating r - coefficient of determination (r^2) - limitations of r
placement
- an initial test and evaluation allow a professional to group students based on their abilities - can help to facilitate learning because groups have the same starting point
quantitative: likert scales
- an interval-based scale that assumes equal distance between responses - used to assess the degree of agreement or disagreement with statements
dual-energy x-ray absorptiometry - lab
- an x-ray technology that passes rays as two energy levels through the body > the rays changes of those rays as they pass through bone, organs, muscle and fat > provides estimates of bone mineral density as well as % fat > standard error of measurement ~2%
essay - scoring
- analytic scoring - global scoring - relative scoring
t-score
- another commonly used standard score is the t-score - mean of the t-score for all observations in a set of data is always 50, and the standard deviation is always 10
selected statistical test
- 1 nominal (IV) + 1 nominal (DV) = chi-square - 1 nominal (2 IV groups) + 1 continuous (DV) = t-test for independent groups - 1 nominal (2 paired IV groups) + 1 continuous (DV) = dependent t-test for paired groups - 1 nominal ( >2 IV groups) + 1 continuous (DV) = one-way ANOVA - 2 nominal (2 or more IV groups) + 1 continuous (DV) = two-way ANOVA
what type of items: semi-objective
- 3 types of semi-objective items: short answer, completion, and mathematical questions - examinees must compose correct answer; answer is so short that little/no organization of it is necessary - some subjectivity may be involved in scoring (e.g., awarding partial credit) - the response is checked to see whether it matches the previously determined correct answer - very similar to objective items
psychological skills
- 90% of US and Canadian athletes use some form of mental training in preparation for their events - psychological skills (five most common in "performance psychology) or psychology skills training") > imagery > self-talk > concentration > goal setting > confidence
interpretation of kappa
- <.20 = poor agreement - .21 -.40 = slight agreement - .41 -.60 = moderate agreement - .61 - .80 = substantial agreement
Canadian standardized test of fitness
- Canadian fitness survey conducted in 1981 to develop an understanding of the physical fitness level of the Canadian population - test battery includes the following components: > resting heart rate and blood pressure > body composition (skinfolds) > cardiorespiratory endurance (e.g., distance runs) > flexibility (sit-and-reach) > abdominal endurance ( 1minutes sit-up test) > upper body strength and endurance (push-up test)
trait and state sport confidence inventorie
- TSCI : 13 items purported to measure the confidence an athlete usually has in his/her ability to be successful in sport - SSCI: 13 items purported to measure the confidence an athlete has at a particular moment in his/her ability to be successful in sport
dependent variable
- Y - nominal dependent variable > measured by frequencies or proportions - continuous (interval or ratio) > measured by mean value s
aerobic power or aerobic capacity
- a body's ability to supply oxygen to working muscles during physical activity
trait
- a dominant approach early on - fundamental aspects of personality > extroverted versus introverted > confident vs. unsure - general source of variability resides within the person > minimized the role of environmental and situational factors
state
- a dominant approach more recently - behavior may change as a function of a particular situation or environment or time period - when the situation changes, so does the psychological state > self-efficacy
summative evaluation
- a final, comprehensive evaluation near or at the end of an instructional or training program > final evaluation - useful for measuring program achievement (i.e., did your program meet its goals?)
scatterplots
- a graphic representation of the correlation
evaluation based on norm-referenced standards ("comparison")
- a level of achievement relative to a clearly defined subgroup > how performance compares to that of others
evaluation based on criterion-referenced
- a level of achievement relative to a specific, pre-determined level of achievement > compares a person's performance relative to a criterion that you would like to achieve
frequency distribution
- a method of organizing data that involves noting how often each of the various scores occur
multiple regression
- a more complex prediction of Y can be developed with the use of more than one predictor: X1, X2, and so on
physical fitness
- a set of attributes that people have or achieve that relates to the ability to perform physical activity
achievement
- a set of objectives must be established in order to evaluate participants' achievement - assessment of achievement is a summative evaluation task that requires measurement and evaluation
standard scores
- a set of observations that have been standardized around a given M and standard deviation > a transformation of raw scores to a common metric - scores from different variables can be meaningfully compared using a standard score - z-score - t-score
test-retest reliability
- a single test is administered twice to participants within a short amount of time > suppose that each of you completed quiz 1 form 1A or form 1B at the beginning and the end of a given class - the scores of the separated test administrations are correlated using the PPM correlation coefficient - if the time between testing occasions is longer (e.g., days) the test-retest reliability is often called stability reliability
phi coefficient
- a special case of the PPM correlation coefficient between dichotomously scored variables > ranges from -1 to +1 indicating perfect association, and zero indicating no association > can be computed in SPSS using crosstabs or correlation
competitive state anxiety inventory -2
- a sport-specific inventory of multidimensional state anxiety: > somatic anxiety > cognitive anxiety > confidence - the CSAI-2 measures precompetitive state anxiety, which is how anxious an athlete feels at a given time - in this case, right before competition - cronbach's alpha (intraclass) reliability have been >=~.80 - convergent (e.g., somatic anxiety correlated with worry) validity - a truncated CSAI-2 has been proposed to improve some psychometric properties of scores
evaluation
- a statement of quality, goodness, merit, value, or worthiness about what has been assessed
alternative hypothesis
- a statement that there is a relation (association or difference) between variables; the converse of null hypothesis
confidence
- arguably the most important psychological factor in sport > confidence is sometimes used interchangeably with self-belief and self-efficacy - predictive of superior performance
activitygram
- assessment of physical activity developed by the cooper institute to accompany the fitnessgram - physical activity assessment with two approaches - fitnessgram physical activity questionnaire > 3 questions based on physical activity guidelines for children and youth: >> aerobic physical activity >> muscle-strengthening physical activity >> bone-strengthening physical activity
youth fitness test batteries
- assessment of physical fitness has shifted form motor fitness to more health-related emphasis - currently there are a number of test batteries; however, three elements are present in all the test batteries > health-related fitness items > criterion-referenced standards for each test > motivational awards
integrated
- assimilation of identified regulation so that engaging in the behavior is fully congruent with one's sense of self
health-related physical fitness
- attributes of physical fitness (i.e., cardiorespiratory; body composition; and muscular fitness) that are related to health outcomes
some recommendations for constructing matching items
- avoid providing clues - avoid including too many questions in one matching item - make sure all questions and answers appear on same page - keep the parts of matching questions as short as possible - randomly arrange the two lists of questions and answers - places the answer choices in a logical order (e.g., alphabetical, chronological)
some recommendations for constructing true-false items
- avoid using an item whose truth or falsity hinges on one insignificant word - beware of using indefinite words or phrases (e.g., frequently, many, most etc.) - include only one main idea - avoid taking minor statements directly out of textbooks or lecture notes - devoid of context - use negative statements sparingly and avoid double negatives completely - beware of giving clues to the correct choice through specific determiners or statement length
some recommendations for constructing multiple-choice items
- base each question on an important, significant, and useful concept - use novel situations when possible - phrase each question such that one responses can be defended as being the best of the alternatives - make sure all questions and answers appear on same page - phrase each question clearly and concisely - avoid negatively stated questions as much as possible - consider layout issues when formatting the test
criterion-related validity
- based on determining the relationship between a criterion (e.g., gold-standard measure) and other measures used to estimate the criterion (e.g., surrogate or alternative measure) - also known as statistical validity and correlational validity > concurrent validity (criterion is measured at about the same time as the alternative measure) > predictive validity (criterion is measured in the future) - surrogate measure (e.g., field test versus lab test) > sometimes the book says "predictor" but this can be confusing
the judgmental approach
- based on the beliefs and/or previous experience of content experts > example: a coach might require a certain proficiency in her athletes, based on the coach's beliefs, before an athlete can serve in a particular role >> link to content-related validity
create and save an SPSS data file
- be certain to have a DSD with you before you start this assignment. in some systems you may have to save your data to an electronic account - place the DSD in the machine and note the drive location - locate the SPSS icon and click on it. (alternatively, you might have to go to the start button on the bottom left of your computer and locate SPSS among the programs listed in the start menu) - first you will name he variables, define the variables, and essentially build a codebook that helps you remember what the variables are - click on the variable view tab on the bottom left - name each of the variables in the first column. note the variable name must start with a letter, have no spaces, contain no special characters, and be no more than 64 characters in length - for the time being, skip over the type, width, and decimals columns - you can expand on the variable names in the label column - click on the right-hand edge of the values for the second variable (i.e., gender). notice that you get a box that helps you define the values associated with numbers for gender. - click on the data view tab to get the data view window
administering the test
- before the test in the classroom or lab - during the test in the classroom or the lab - after the test in the classroom or lab
exercise motivation
- behavioral regulation in exercise questionnaire -2 measures amotivation and 5 subtypes of self-regulation
normal distribution
- bell-shaped, symmetrical probability distribution
kuder-richardsom formula 20
- binary items - the average of all possible split-half reliability coefficients - k = number of questions on the test - p = item difficulty - q = 1-p - s^2 total = variance of the test scores
estimating agreement between measures: bland-altman method
- bland and altman (1986) presented a method of estimating the agreement between two measures of the same attribute - correlation is limited in its ability to detect the magnitude and distribution of differences between two measures - bland-altman approach is based on the absolute difference between the measures and plotting those against the averages
calculating variance
- calculate the mean - subtract the mean from each score - take the deviation and square it - add the results together and divide by the number of scores minus 1 - variance = average of the squared deviations from the mean (hence, the term mean square)
diagnosis
- can be used to determine weakness or deficiencies
goal setting
- can enhance performance and enjoyment, increase participation and interest, and help maintain direction and persistent effort - no specific scales; however, a variety of strategies have been developed > specific, measurable, attainable, relevant, timely, and self-determined - task and ego orientation in sport questionnaire > task orientation: motivation to improve oneself and strive for subjective success > ego orientation: emphasizes external rewards and a desire for objective success
nominal
- categorical in nature - no notion of order, magnitude, or size - everyone within the group is assumed to possess the same degree of the trait that determines their group (e.g., gender)
ordinal
- categorical in nature - numbers are ranked in order, but the differences between ranked positions are not comparable (e.g., finishing place in a race)
what type of items: objective question
- characteristically, the task of an examinee responding to an objective question is to select the correct answer from a list of two or more possibilities provided > scoring consists of matching an examinee's response to a previously determined correct answer > this type of scoring is relatively free of any subjective or judgmental decision > true-false, matching, multiple-choice, and classification items
how to measure: when to test
- class-scheduling practices, deadlines, and/or school policies dictate when testing is likely to occur - test frequently enough to establish reliable results but not too much that testing limits instruction time - more errors are made by providing too little time for testing than using too much time
objective: true-false
- commonly used because they are easy to construct - least adequate type of objective question - good for factual information - low discrimination index and easily biased because the chance of getting the item correct is 50%
odd ratio
- comparing the odds of the occurrence of event X in populations over a given period of time
relative risk or risk ratio
- comparing the probability of the occurrence of event X in populations over a given time period
sport competition anxiety test
- competitive trait anxiety - SCAT was developed to provide a measure of how anxious athletes generally feel before competition - initially developed for youth but an adult form also exists - rxx ~ 80 -convergent (correlated with other anxiety scale) and divergent (uncorrelated with other personality constructs) validity
identified
- conscious acceptance of the behavior as being important in order to achieve personally valued outcomes
planning the test
- consider the type of test needed - what to measure - how to measure
criterion-referenced test
- constructed to yield measurements that are directly interpretable in terms of specific performance standards - performance standards are generally specified by defining a class or domain of tasks that should be performed by the participant - used to make categorical decisions (e.g., pass/fail or meeting a standard/not meeting a standard) > CTRs may, however, initially involve continuous data that are then used to create cutoff scores to separate participants into categories
analyzing the test: validity
- content validity is one of the most important types of validity with written tests > determined subjectively by test administrator as to how well individual test items represent the course content - item analysis can also be used to help determine the quality of the overall test
validity - in more depth
- content-related validity - criterion-related validity - construct-related validity
statistical techniques to use with CRTs
- contingency table or chi-square - phi coefficient (a PPM correlation between two dichotomous variables) - observed proportion of agreement - kappa
administering the test: after the test in the classroom or the lab
- correct the test and report the scores as quickly as possible - report test scores anonymously - avoid misusing and misinterpreting test scores
factors affecting the questionnaire response
- cover letter - ease of return - neatness and length - inducements - timing and deadlines - follow-up
advantages of CRT
- criterion-referenced standards represent specific, desired performance levels that are explicitly linked to a criterion - they are independent of the proportion of the population that meets the standard - specific diagnostic evaluations can be made to improve performance to the criterion level - achievement is based on reaching the standard, not on bettering someone else's performance level - performance is linked to specific outcomes - participants know exactly what is expected of them
limitations of CRT
- cutoff scores always involve some subjective judgment.. where is the "best" cut-point? - misclassifications can have severe consequences (e.g., medical decisions) - participants who attain the cutoff level may not be motivated to continue to improve
physical fitness assessment in older adults
- defined as persons aged 65 years and over - highest rates of chronic diseases - health care costs for older adults are contributing to health financing problems - aging is negatively related to: < sensations of taste, smell, vision, and hearing > mental abilities, > organ functioning of the digestive system, urinary tract, liver, and kidneys > bone mineral content > physical fitness - older adults respond to appropriate endurance and strength training programs similar to younger adults
profile of mood states
- designed in the early 1970s to measure a person's emotional state of mind, inclination, or disposition - 6 subscales: > vigor > confusion > anxiety > tension > anger > fatigue - an early tool that has provided evidence for a link between physical activity and mental health - can be used with instructions that ask participants to state how they feel right ow or have they have felt for an extended period of time
self-motivation inventory
- designed to measure a person's self-motivation to persist and was originally developed to be used in exercise adherence studies - ~50% of all people who start exercise programs drop out in the first 6 months - SMI consists of 40 likert-scale items > 21 positively keyed > 19 negatively keyed worded... to reduce response bias?
scales used in exercise psycholgy
- exercise psychology is primarily concerned with studying the: > psychological and emotional effects of exercise > promoting physical activity and healthy behaviors - understanding psychological determinants and consequences has never been more important, because public health concerns have continued to escalate globally, such as those of > obesity > lack of adherences to exercise programs
stages of change for exercise and physical activity
- designed to measure the specific stage of exercise that participants might be in at a particular time - stages are viewed as somewhere between a trait and a state - five stages > precontemplation: no intention to change behavior > contemplation: intention to change behavior > preparation: preparing for action > action: involved in behavior change > maintenance: sustained behavior change - used to develop specific interventions for people based upon where they are stuck regarding lifestyle changes
hydrostatic weighing - lab
- determine body volume by determining the amount of water displaced during the procedure > test may need to be repeated up to 10 times to maximize reliability and validity > subject a weighs 150 lbs with a 10% body fat > subject b weighs 150 lb with 30% body fat > subject a will have a higher underwater weight than subject b due to a higher body density
steps in conducting survey research
- determine the objectives - delimit the sample - construct the questionnaire - conduct the pilot study - write the cover letter - send the questionnaire - follow up - analyze the results and prepare the report
mastery test (i.e., criterion-referenced test)
- determine whether a student has achieved enough knowledge to meet some minimum standard - "job analysis" > lifeguard certification test > CPR test
overview of hypothesis testing and inferential statistics
- develop a research hypothesis about the relation between variables - state a null hypothesis reflecting no relationship - state an alternative hypothesis; this is the opposite of the null hypothesis and is a direct reflection of the research hypothesis in #1 - gather data and analyze them based on the research question and the types of variables - make a decision based on the probability of the null hypothesis being correct given the data you have collected
item analysis
- difficulty level and discriminating (ability of an item to discriminate between low and high ability examinees - most important) power are the keys to item improvement > to maximize an item's discrimination power, each item should be written so that half of the examinees will answer it correctly
objective measurement of physical activity in youth
- direct observation can be expensive in terms of time and labor > new technologies including portable computers have allowed reliable, valid, and feasible measurements using direct observation
observed scores, error scores, and true scores
- each person's observed core is the sum of the true score and error score > true score theoretically exists but is impossible to measure > error score results from anything that causes your observed score to differ from your true score - a test is said to be reliable to the extent that observed score variation is made up of true score variation
general vs. sport-specific measures
- early research - more recently - multidimensional and sport-specific
constructing and scoring: essay
- effective method to measure opinions, attitudes, and higher-order thinking skills - referred to as open-ended questions - 3 major limitations > inability to obtain a wide sample of achievement due time-intensive nature (i.e., too few items) > inconsistencies in scoring procedures (i.e., more subjective) > difficulties in analyzing test effectiveness
performance enhancement
- effects of psychological factors on sport performance - measures might include: performance anxiety, concentration, confidence, motivation, imagery
external self-regulation
- engaging in a behavior only in order to satisfy external pressures or to achieve externally imposed rewards
supporting evidence for health-related fitness
- epidemiology > examines the incidence, prevalence, and distribution of disease - relative risk or risk ratio > comparing the probability of the occurrence or event x in populations over a given period of time
research designs in epidemiology
- epidemiology uses both prospective and retrospective research approaches > prospective: tracking a study group into the future > retrospective: looking back at a database of previously collected data
interval
- equal or common unit of measurement (e.g., temperature [C or F] or time on 24 hour format) - zero point is arbitrarily chosen > the value of zero represents a point on a number line, not the absence of the trait
observed proportion of agreement
- established by adding the proportions in cells that are consistently classified > ranges from 0 to 1; the higher the value the more consistent (correct) the classifications > does not account for chance, Pc
prediction
- estimating one variable (Y) from one X (simple correlation aka regression) or more than one X (multiple "correlation" aka regression) variable > if X and Y are highly related, then X can predict Y to some degree and vice versa > still does not imply ( nor does it negate the possibility for) a cause-and-effect relationship
nature of measurement and evaluation
- evaluation based on norm-referenced standards ("comparison" - evaluation based on criterion-referenced standards - formative evaluation - summative evaluation - formative and/or summative evaluation
content-related validity
- evidence of truthfulness based on logical decision making and interpretation - also known as face validity and logical validity - content experts, expert judges, colleagues, and textbook writers can serve as sources to validate instrument content
construct-related validity
- evidence that combines logical (content) and statistical (criterion) validity procedures > often used to validate measures that are unobservable yet exist in theory (e.g., IQ) - if, in theory, the construct is valid, then such-and-such should occur > associated with convergent (e.g., should correlate) or discriminant (e.g., should not correlate) evidence
uses of item analysis
- examine the pattern of responses to determine how an item might be improved - retain, discard, or modify items based on their index of difficulty and index of discrimination
distribution of body fat
- excessive body fat on the trunk (e.g., android obesity) as compared with the lower body (e.g., gynoid obesity) is associated with a higher risk of coronary heart disease
type 2 error
- failing to reject a false null hypothesis > equal to beta > concluding there is no relationship when in fact there is one ( failing to reject the null hypothesis)
some factors affecting reliability
- fatigue - practice - participant variability - increase time between testing - constant circumstances surrounding the testing periods - appropriate level of difficulty for testing participants - increased precision of measurement - distracting environmental conditions
field methods
- feasible for mass testing - require less equipment - less time consuming
concentration
- focusing on sensory or mental events coupled with mental effort - general measure of concentration is the test of attentional and interpersonal style - multiple adaptations have been made for specific scenarios with the first used in tennis - baseball test of attentional and interpersonal style
muscular strength
- force that can be generated by contracting muscles
reporting data
- frequency distribution - percentile
analytical epidemiology and risk
- from measures of incidence or prevalence to estimates of risk often expressed in probability or odds > probability expresses chance as a ratio of the number of "occurrences" over the total number of observations > odds expresses chance as a ratio of the number of occurrences to the number of non-occurrences
how to measure: how many items
- generally, the reliability of an achievement test increases as its length increase > how much time is available > types of items used > attention span of participants
descriptive statistics
- go to the analyze menu - scroll down to descriptive statistics, over to descriptives, and click - when the descriptives window appears, use the narrow to move "age", "weightkg", "heightcm", and "stepsperday" into the variables box
compute statements in SPSS
- go to the transform menu, scroll down to compute, and click - a new compute variable window will appear - type "weightlb" in the target variable box - put "weightkg" in the numeric expression box by using the arrow to move it
summation notation
- greek letter, sigma, that indicated the sum of all values - N the number of participants in the sample - X any observed variable that you might measure
range
- high score minus the low score - least stable measure of variability because it depends on only two scores
measures of variation
- how scores spread out within a distribution > when variation is large, values are widely scattered > when variation is small, values are tightly clustered - measures of variation (aka dispersion) > range > variance > standard deviation
methodologies
- hydrostatic weighting - dual-energy x-ray absorptiometry - anthropometry (skinfold and girths)
analytic scoring
- identifying facts, points, or ideas in the answer and awarding credits for each one
evaluating reliability
- if a test is perfectly reliable, each examinee's score would be exactly duplicated on a second administration of the test - several methods of assessment > if test questions are dichotomous, then alpha coefficient can be used
criterion-referenced validity
- if on of the measures in a study is a "criterion" the matter being investigated is likely validity (opposed to reliability) - criterion-related (concurrent or predictive) validity > comparison of a test to the gold standard - construct-related validity > overlap of two divergent groups measured on a continuum
limitations of r
- if the relationship between the variables is nonlinear (i.e., curvilinear), the PPM correlation coefficient will underestimate the true relationship between X and Y (PPM will only indicate the linear relationships) - correlation does not necessarily indicate cause-and-effect relationship > some third variable may be the cause of the relationship detected by a high r value - greatly influenced by the variance or range of the variables measured
criterion-referenced reliability: equivalence reliability
- if two tests are being compared that are thought to measure the same thing, equivalence is being assessed - with CRTs, the equivalence is whether the two tests result in equivalent classifications for the people being assessed
criterion-referenced reliability: test-retest reliability
- if two trails of the same measure are administered, then stability reliability is being assessed - with CRTs this is the reliability or dependability of classification - test-retest criterion-referenced reliability estimates (i.e., P values) were calculated using the physical best and fitnessgram (dichotomous) cutoff values - did not pass the standard versus did pass the standard
analyzing the test
- important to determine the level of confidence that can be placed in a set of scores - to do this, examine the reliability and validity of the scores produced by responses to the test
purposes of measurement, testing, and evaluation
- important to understand because professionals in kinesiology will be making evaluative decisions daily - general purposes > placement > diagnosis > prediction > motivation > achievement > program evaluation
the president's challenge: adult fitness test
- in 2008 the president's counsil on fitness, sports, and nutrition launched an online adult fitness self-test - designed to measure: > aerobic fitness > body composition > muscular strength and endurance > flexibility - designed to be self-administered; however, a partner may be needed to complete some tests
institute of medicine recommendations for youth fitness assessment
- in 2012 the institute of medicine (at the request of the robert wood johnson foundation) created the committee on fitness measures and health outcomes in youth - two main goals > to assess relations between youth fitness items and health outcomes > recommend the best fitness test items for (a) use in national youth fitness surveys and (b) fitness testing in schools
major risk factors and classifications for cardiovascular disease
- in adult fitness assessment a critical issue is to establish a criteria for testing that is "low risk" > does not require medical clearance or physician supervision - ACSM has guidelines for major risk factors and classifications for fitness testing > subjects with low risk = less than 2 CVD risk factors
basic statistics in epidemiology
- incidence: the number, proportion, rate, or percentage of new (or newly diagnosed) cases of a given health-related state within a specified period of time - prevalence: the number, proportion, rate, or percentage of existing cases of a given health-related state within a specified period of time
special populations
- include people with physical or mental disabilities or both > valid and reliable fitness assessment of adults with disabilities is not a well-researched topic > fitness assessment should include these >> anaerobic power >> aerobic capacity >> electrocardiographic response to exercise >> muscular fitness >> body composition
constructing and scoring: multiple-choice
- includes two parts: the stem (the question) and at least two responses - used in almost any situation because of their ability to measure higher-order skills and almost any educational objective - easy to score, reduced chance for guessing, and low ambiguity compared to other question types - few limitations but can be time consuming to create good questions
positive net D
- indicates that the upper-scoring group performed better on the item than the lower-scoring group
calculating a reliability coefficient
- interclass coefficients > based on PPM correlation coefficient - intraclass coefficients > based on ANOVA
introjected
- internalization of external controls, which are then applied through self-imposed pressures in order to avoid guilt or to maintain self-esteem
affective domain
- involves attitudes, perceptions, emotional and psychological characteristics
cognitive domain
- involves knowledge and mental achievement
field methods
- involves lifting external weights or the repetitive movement of the body - muscular strength = 1-repetition max - muscular endurance > max number of repetitions with a submaximal weight load > or max number of reps of body movement exercise such as sit-ups > upper and lower-body strength and endurance
the combination approach
- involves use of multiple available sources > experts > prior experience > empirical data > norms >> link to construct-related validity
the receiver operator characteristic curve
- is another method often used to evaluate construct-related validity for criterion-referenced tests - an ROC curve is a plot of > true positive rate >> a test indicated that a condition is present when it truly is present > on false positive rate >> a test indicated that a condition is present when it truly is absent
how to measure: what type of items
- items can be classified into three general categories: semi-objective, objective, and essay
basic requirements for constructing effective written tests
- knowledgeable in the proper techniques for constructing written test - "how to" - thorough knowledge of the subject area to be tested - "what" - skilled at the relevant type of written expression - "speak the language" - awareness of the level and range of understanding in the group to be tested - "Who" - willing to spend considerable time and effort on the task - "resources"
lab vs field methods
- lab > generally assessed by measuring force, torque, work, and power generated through concentric, eccentric, isokinetic, and isometric contractions - field > assessed with concentric and isotonic contractions
criterion-referenced standards and epidemiology
- let epidemiology be defined as: the study of the distribution and determinants of health-related states and events in specified populations and the applications of this study to the control of health problems - "descriptive" epidemiology seeks to describe the frequency and distribution of mortality and morbidity according to time, place, and person - epidemiology may help identify risk factors of mortality and morbidity > "analytical" epidemiology pursues the causes and prevention of mortality and morbidity
consider the type of test needed
- mastery test (i.e., criterion-referenced test) - achievement test (i.e., norm-referenced test)
descriptive statistics
- mathematical summaries of performance (e.g., the best score) and performance characteristics (e.g., central tendency, variability) - characteristics of the distribution (e.g., symmetry)
using computers to analyze data
- obtaining information - determining reliability and validity - evaluating test results -conducting program evaluation - conducting research activities - developing presentations - assessing student performance -storing test items - creating written test items - calculating statistics
lab methods
- maximal oxygen consumption that can be used by a person during exhaustive exercise > VO2 max is achieved when work rate is increased but oxygen consumption does not increase > expired gases are monitored with a gas analysis system during a maximal exercise performance > most reliable and valid measure > can be difficult to measure due to expensive equipment, exhaustive exercise performance, and time restrictions - estimating (without directly measuring) VO2 max > maximal exercise (e.g., treadmill) performance >> exhaustive exercise without metabolic cart > submaximal exercise testing >> submaximal exercise without metabolic cart >> based on linear relationship among heart rate, workload, and vo2 max. - perceptual effort during exercise testing > perceived exertion: perception of the intensity of physical work > Borg rating of perceived exertion >> evidence the RPE scores correlate with heart rate, lactic acid production, percent VO2 max. >> can be used to control exercise intensity > relative intensity scale - a person's level of effort relative to her or his fitness level
statistic
- measure of interest in the sample
important facts concerning lab testing
- measurement error is present in lab tests > equipment should be calibrated and checked >> minimize error due to the test > test administrators should be trained and qualified >> minimize error to the lab personnel > practice test >> minimize error due to the participant > standardized testing procedures >> minimize error due to the protocol
motivation
- measurement evaluation process can be necessary for motivating your students and participants - people may need the challenge and stimulation they get from an evaluation of their achievement
absolute muscular endurance
- measurement of repetitive performance at a fixed resistance
relative muscular endurance
- measurement of repetitive performance related to maximum strength
multidimensional and sport-specific
- measures may help researchers better measure states and traits in sport
median
- middle score; 50th percentile; the typical value - a score that divides the distribution of scores exactly in half - to obtain the median, order the scores from high to low and find the middle one -works well for skewed data sets
qualitative methods: interviews
- most common source of data in qualitative research - range from a highly structured to semi-structured to open-ended style - tape recorder, note-taking - transcribe; seek themes
an "interactionalist" approach
- most dominant approach currently - traits and states interact to partially codetermine behavior
constructing and scoring: matching items
- most efficient for measuring relatively superficial types of knowledge - easy to construct and score - can be difficult to create matching questions that involve higher-order cognitive processes - present clear and complete directions > the basis for matching the item in the two lists > the method to record the answers > whether a response in the second column may be used more than once
z-score
- most fundamental standard score is the z-score - z-scores have a mean score of 0 and standard deviation of 1
skinfold assessment - field method
- most reliable, valid, and popular field method for estimating % body fat (except for very obese) - uses skinfold calipers to measure skinfolds at multiple sites to estimate %fat using prediction - rxy > .76 with measurement from hydrostatic weighing - properly trained testers should be able to produce measurements with high reliability rxx >.90
need reliable and valid measures of physical activity in children and youth
- most reliably assessed through direct monitoring (e.g., pedometers) > lack of feasibility limits for applying such procedures in large-scale studies
eccentric
- muscle generates force as it lengthens
concentric
- muscle generates force as it shortens
isokentic
- muscle generates force at a constant speed through full range of motion
isometric
- muscle generates force but remains static in length and causes no movement
criterion-referenced reliability
- norm-referenced and criterion-referenced reliability both share similar types of reliability and validity - test-retest or stability reliability -equivalence reliability
norm- vs. criterion-referenced
- normative youth data can be useful for > evaluating a program > identifying excellence in achievement > identifying the current status of participants either locally or nationally - most youth fitness evaluations ave moved away from norm-referenced standards and toward criterion-referenced fitness standards > specific pre-determined levels of performance
scale or continuous
- numbers are continuous in nature if they can be added, subtracted, multiplied, or divided an the results are meaningful - have a distinct place on a number line - can be interval or ratio in form
quantitative
- numerical in nature - likert scales, semantic differential scales - desire at least some objective measures and control - researcher tries to minimize direct involvement with subjects
concept of objectivity
- objectivity is a special kind of reliability also known as interrater reliability > objective tests (multiple-choice) differ from subjective tests (essay) in that they are less biased or not as easily influenced regardless of the rater
theoretical conclusions
- observed score variance = true score variance + error score variance - reliability coefficients range form 0 to 1 - a desirable level of reliability is 0.80 or higher - generally, longer tests re more reliable than shorter tests
presidential active lifestyle award
- offered by the PCFSN to reward physical activity - youth awardees must be physically active: > 60 minutes a day or 12,000 steps > 5 days per week > 6 weeks out of 8 weeks
administering the test: during the test in the classroom or the lab
- organize an efficient method for distributing and collecting the test - help the examinees pace themselves - answer individual questions carefully and privately - control cheating - control the environment
mental health
- participation in sport, exercise, and physical activity has the capacity to increase feelings of psychological well-being - measures might include: depression, self-esteem, mood, etc
reliability - in more depth
- pearson product-moment correlation coefficient is used to provide evidence of a test's reliability - observed scores, error scores, and true scores are theoretical concepts that will help you understand the concept of reliability
body composition benefits
- people who suffer from obesity have higher rates of cardiovascular disease, cancer, and diabetes
two main areas in sport psychology
- performance enhancement - mental health
some recommendations for constructing essay items
- phrase the question such that the mental processes required to respond to it are clearly evident - use several questions requiring relatively short answers rather than a few questions requiring extended answers - phrase the question so that the task of the examinee is specifically identified > explain how; compare and contrast - set guidelines to indicate the scope of the answer required > illustrate, through words and figures - avoid asking students for their opinions
athletic or motor fitness
- physical fitness related to sport performance
muscular endurance
- physically ability to perform work
cardiorespiratory benefits
- physically active groups have lower relative risk of developing fatal cardiovascular disease than sedentary groups - negative relationship between cardiovascular death rates and cardiovascular endurance
physical activity improves the overall health of children
- physically inactive children tend to become inactive adults with higher risk for chronic diseases - physically active children tend to become active adults with lower risk for chronic diseases
psychomotor
- physiological and physical performance
depicting discriminations with participants
- positive discriminations will increase reliability while negative discriminations will decrease reliability
prediction
- predict future or results from present or past data - most difficult research goal to obtain
administering the test: before the test in the classroom or lab
- proofread before administering - prepare the examinees for the test > eliminate the test-wise advantage for some examinees - give any unusual or lengthy instructions before the test - give a practice test or homework of some sort
more recently
- psychologists found that situation-specific measures provide more accurate and reliable results > sport anxiety scale > competitive state anxiety inventory
dependent t-test for paired groups
- purpose: to compare two related (i.e., paired) groups within one IV and one DV
chi-square
- purpose: to determine if there is an association between levels (cells) of two nominally scaled variables
contingency table or chi-square
- purpose: to determine if there is an association between levels (cells) of two nominally scaled variables
one-way ANOVA
- purpose: to examine group differences between one continuous DV and one nominal-scaled IV (within two or more levels)
two-way ANOVA
- purpose: to examine group differences between one continuous DV and two nominal-scaled IVs
global scoring
- reading the answer and converting the general impression into a score
kinesmetrics
- refers to measurement and evaluation in kinesiology ( a subdomain of human movement and performance)
self-talk
- refers to the practice of self-talk strategies - negative self-talk often results in declines in mood and performance - positive self-talk can enhance confidence, focus, performance, and psychological health - automatic self-talk questionnaire for sports - 8 subscales; 40 items
standard error of measurement
- reflects the degree to which a person's observed score fluctuates as a result of errors in measurement - SEM is the standard deviation of the errors of measurement around an observed score
type 1 error
- rejecting a true null hypothesis > equal to alpha level > concluding there is a relationship when in fact there is not (rejecting the null hypothesis)
reliability and validity
- reliability > degree to which repeated measurements of the same trait are reproducible under the same conditions - validity > the degree of truthfulness in a test >> requires reliability and relevance - a test cannot be valid if it is not reliable > reliability + relevance -> validity
concept of reliability
- reliability relates to the consistency or repeatability of an observation - the degree to which repeated measures of the same trait are reproduced under the same conditions - also described as consistency, dependability, stability, and precision
scientific method
- requires the development of a scientific hypothesis and an inferential statistical test of that hypothesis versus another competing hypothesis
relative scoring
- reviewing all students' answers and scoring papers based on adequacy relative to other students
questionnaire validity
- the most important issue of a questionnaire is the validity of responses - ways to improve questionnaire validity: > high-quality items through expert review and pilot testing > ensuring confidentiality of responses - most questionnaires are initially validated with content-related validity procedures
performing an item analysis
- score the tests - arranges the tests in order from high to low - separate the tests into 3 groups: > upper group (highest 27%) > middle group: (46%) > lower group (27%) - for each item record the frequency of each response selection by the upper group - for each item record the frequency of each response selection by the lower group - calculate difficulty index for each item >estimated percentage of examinees who answered the item correctly - calculate discrimination index for each item > if examinees who performed well (poorly) on the test did well (poorly) on the item then the item is considered to have good discrimination
enhancing reliable and valid fitness test results with children
- scores derived from the instrumentation within the youth fitness test batteries have sufficient validity evidence if administered correctly.. - steps to take during fitness test administration > attain adequate knowledge of test descriptions > give proper demonstrations and instructions > develop good student and teacher preparation through adequate practice trials > conduct reliability studies
subjective measurement of physical activity in youth
- self-report measures of physical activity behaviors in children can be misleading > self-report data suggest 46% of boys and 30% of girls meet the activity guidelines > during direct observation percentages drop 4% and 1% for boys and girls, respectively
laboratory methods
- similar to aerobic capacity, measuring muscular strength and endurance in a lab requires expensive and sophisticated equipment and precise testing protocols > computerized dynamometers > back extension strength test > non-computerized dynamometers > manual muscle test
quantitative: semantic differential scales
- similar to likert scales but uses pair of adjectives with opposite meanings
body mass index
- simple measure expressing the relationship of weight to height - rxy ~.70 with criterion measurement of body density - commonly used in epidemiology research - acceptable for people who are obese but may produce inaccurate results for people who are lean or of normal weight
using SPSS
- statistical package fo the social sciences > IBM SPSS statistics is software developed to analyze numbers - data are entered into a database or data matrix of N rows of people with p columns of variables - SPSS permits you to enter and manipulate data and conduct analyses that result in a variety of number, charts, and graphs > packages allow the use to: input; edit; and analyze data. the user must interpret the data
distribution shapes
- statistical term for (symmetry) of a distribution is skewness > typically ranges from -1 to +1 - the "peakedness" (or tail more precisely) of a curve is referred to as kurtosis > mesokurtic (+/- kurtosis) > platykurtic (- kurtosis) > leptokurtic ( + kurtosis)
alternative software packages
- students without access to SPS can use excel to conduct the statistical procedures (appendix) - enter your data using excel and have SPSS read the excel data file
scales of measurement
- taking a measurement often results in assigning a number to represent the measurement (e.g., weight, height) - not all numbers are the same > some numbers can be added or subtracted, and the results are meaningful > other types of numbers have little or no meaning - nominal - ordinal - scale or continuous > interval > ratio
early research
- tended to use instrumentation from psychology with little reference to sport or exercise > state-trait anxiety inventory > locus of control
interclass reliability
- test-retest reliability - equivalence reliability - split-halves reliability
qualitative
- textual in nature - essence of phenomena - interviews, field observations, less control - researcher often directly interacts with subjects
measurement
- the act of assessing, usually resulting in assigning a number to quantify the amount of the characteristic being assessed
physical activity
- the act of bodily movement that requires the contraction of muscles and the expenditure of energy
health-related fitness
- the attainment or maintenance of physical capacities related to good or improved health
mean
- the average score - sum of the scores divided by the number of scores - most stable and reliable measure of central tendency - may not be representative of skewed data sets - affected by the numerical value of every score in the dataset
torque
- the effectiveness of a force for producing rotation around an axis
population
- the entire group of people or observations in question
intraclass reliability
- the intraclass model allows you to estimate reliability for more than two trials - intraclass models also address the constant difference problem that can be seen in interclass models where a linear association is assessed - cronbach's alpha coefficient
developing criterion-referenced standards
- the judgmental approach - the normative approach - the empirical approach - the combination approach
sport psychology: an overview
- the mind affects the body > therefore, the way we think and feel has a strong impact on how we physically perform - the body affects the mind > therefore, the way we physically perform has a strong impact on way we think and feel - positive and negative psychological factors are associated with sport/ex participation - sport and exercise psychologists attempt to accentuate the positive aspects of participation in physical activity
mode
- the most frequently observed score - the most unstable measure of central tendency - can be more than one mode if two or more values are equal
fitnessgram
- the national youth fitness test battery > physical fitness battery that includes health-related criterion-referenced standards - identifies three "fitness zones" > healthy fitness > needs improvement > needs improvement - health risk
questionnaire reliability
- to assess reliability of a single item, you must ask the specific item on at least two occasions - reliability of domain sub-scales can be estimated using the alpha coefficient - to estimate stability reliability, you must administer the questionnaire to the same people two or more times
special children
- traditional fitness test batteries were not constructed for and may be biased against children with physical or mental disabilities > adapted physical education: PE adjusted to accommodate children with physical or mental limitations - must consider participants' limitations, mental capacities, interfering reflexes, and acquisition of prerequisite functions - in relation to the test - brockport physical fitness test > a health-related physical fitness test developed for youth with various disabilities age 10-17 > funded by US dept. of education
measures of central tendency
- trying to describe the center of a frequency distribution - a single score that best represents all of the scores > mean > median > mode
equivalence reliability
- two parallel or equivalent forms of an exam are given to participants > suppose that each of you completed quiz 1 form 1A and form 1B at the beginning of class - the scores on each exam are correlated using the PPM correlation coefficient to determine the degree to which there is reliability, or consistency, between the two test forms - the creation, and completion, of two forms can be burdensome to both the researcher and the participants respectively
erros in prediction
- unless the correlation coefficient is -1 or +1, Y hat will not necessarily equal Y - error or residual represents the inaccuracy of our prediction of Y based on the prediction equation
YMCA physical fitness test battery
- used by the YMCA to assess physical fitness of its members - easily adaptable to many adult physical fitness testing situations - test battery includes the following components: > height and weight > resting heart rate and blood pressure > body composition > cardiorespiratory endurance > flexibility > muscular strength and endurance
program evaluation
- used to demonstrate the successful achievement of program objectives
constructing and scoring the test: semi-objective
- useful for measuring relatively factual material > advantages include simple construction, reduction of the possibility of guessing, and rapid or easy scoring > minimal subjective influence in scoring > limited because it is easy to be ambiguous with only one question or an incomplete statement
the empirical approach
- uses an external criterion measure - may be regarded as the least arbitrary but is seldom used due to the availability of data from an external criterion > example: firefighter has to scale a 5 foot wall at the time of testing to assess her current readiness to fight fires >> link to criterion-related (concurrent) validity
research hypothesis
- what the researcher actually believes will occur > there will be differences in oxygen uptake based on the type of aerobic training a subject engages in
kuder-richardsom formula 21
- when all test questions have equal difficulty and equal discrimination, the KR21 is used > more conservative than KR 20, however, having equality of difficulty and discrimination on all items is unlikely > conservative estimate of test reliability - M = the mean test score - Pbar = the average difficulty
what type of items: essay questions
- when responding to an essay question, an examinee's task is to compose the correct answer > usually the item provides some direction by including such terms as compare or explain > or the item may constrain the answer by including such phrases as limit your discussion to.. > scoring usually involves judgmental decisions
how to measure
- when to test - how many items - what type of test formate - what type of items
kappa
- widely used technique that allows for a correction for the proportion of chance agreements, Pc > most appropriately used (1) to assess interobserver agreement or (2) to examine the agreement between a predictor and a criterion that is nominally scaled - theoretically ranges from -1 to +1 - negative values of K imply that the proportions of agreement resulting from chance are greater than those from observed agreements - higher value indicates more consistency - sensitive to low values in the marginals and small sample contingency tables