PS 15

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

histogram and the normal curve

-Histogram is a type of bar graph where the height and area of the bars are proportional to the frequencies in a category. It is suitable for displaying data points that are continuous in nature and have fixed intervals. (show in SPSS) -Is the distribution symmetrical or skewed (to the left or right), unimodal or multimodal, or "normal"? - Data displayed in histogram enables us to see how far is the distribution from the normal curve, the bell-shaped curve where data points are symmetrically divided into equal halves by the mean.

types of variables by level of precision in measurement

-Nominal (Categorical) - all we can tell is the name; unordered categories. Ex. religious denominations, makes of car - Ordinal - categories with an order to them and can be ranked by value. Many survey questions are of this type. Ex. Political ideology. - Interval - we can assign a number and the distance (interval) between any two points is the same. Ex. Intelligence scores - Ratio - we can assign a number, can tell the difference between two points, and with a true 0 value. Ex. Candies in the bag

When picking Hypothesis from abstracts

-Things to look out for— -What are the dependent and independent variables? ---What is a likely causal mechanism? -Could we state this relationship as a single sentence? -Empirical questions: observations, unit of analysis, what data is required?

the "science"in political science

Can only be science, if you follow these standards: Knowledge depends on verification ◦ Knowledge is obtained through objective observation or experimentation and logical reasoning ◦ Knowledge is transmissible and cumulative ◦ Knowledge is generalizable ◦ These principles dictate how the science is done

parsimonious

(characteristic of scientific method/knowledge) the simpler the explanation the better

explanatory

(characteristic of scientific method/knowledge) to explain the state of political affairs or how things work

secondary data

data collected by others and used by the researcher who did not personally collect the data

direct observations

Can be participant or non-participant ◦ Non-participant: researcher observes through a one-way glass or stands on the sidelines and takes notes Structured vs. unstructured: ◦ Structured observations require a checklist, presuppose knowledge of the phenomena ◦ Unstructured observations require copious notes Covert vs. overt ◦ Overt: subjects are aware of the presence of researcher who is clearly identifiable as such

indirect observations

Require a theory linking some physical trace to the observed phenomenon. Physical traces in indirect observations can involve using-- Erosion measures ◦ created by selective wear on some material Accretion measures ◦ created by the deposition and accumulation of materials Erosion and accretion measures may be biased—certain traces are more likely to survive because the materials are more durable. Studies of garbage (George Rathje) Predicting victory in presidential campaign--Alf Landon vs. FDR in 1936 ◦ Alf Landon's buttons were used as a trace of Landon's popularity—predicting that Landon would win the election ◦ However, FDR won the election by a landslide (60.8%). Why the prediction was far off? ◦ Campaign buttons were valued as collectible items, not due to popularity of candidates. • It is important to eliminate alternative/confounding explanations for what exactly causes the change in the measure. Using multi-item measure can help.

participant observation

Researcher joins the subjects (ethnography) Problems of access: will the subjects allow the researcher to be present Problems of measurement: ◦ Relying on a few friendly informants ◦ Skills at making observations and identifying a behavior pattern ◦ Unstructured observation ◦ Going native (R over-identify with subjects) Ethical problems: will subjects get harmed due to the study

bell curve or normal distribution

Special properties: Mean is the same as median and mode The total area under the curve is equal to 1 The curve never touches the x-axis (there is always a non-zero probability of a value, no matter how small) 95% of cases fall within ±1.96 s.d.from the mean Observations further away from the mean are less probable to occur by chance.

independent variable

(X) is thought to influence, affect, or cause variation in another variable

empirical/non-normative

(characteristic of scientific method/knowledge) based on observable, objective (value free) data

cumulative

(characteristic of scientific method/knowledge) current and future research based on past research

general

(characteristic of scientific method/knowledge) explains more than one phenomena

probalistic

(characteristic of scientific method/knowledge) goal is to assess the strength of the tangible evidence in supporting a hypothesis; always tentative, never 100% certain

falsifiable

(characteristic of scientific method/knowledge) somebody can prove me wrong by reaching a different set of result

replicable/transmissible

(characteristic of scientific method/knowledge) someone can follow exactly how i collected my data, how I tested it, and get the same results

dependent variable

(y) is thought to depend upon or be caused by variation in an independent variable

When Documenting the research process you must:

- Description of the process of measurement. - acknowledgement of potential bias in measurement. - explicit description of how the analysis was conducted. - Problem: critics argue that is impossible to get rid of deeply ingrained biases. (elitists and pluralists view on power)

probalistic explanations

- a theory that does not account for 100% of the cases is still useful . - instead of seeking for the dichotomy "yes/no" answers, we can shift to focus on probalistic explanations. - our dependent variable (Y) is the probability of an outcome

Deductive Theorists

- deductive theorists start with some fundamental assumptions and then deduce new implications. - example: proximity theory of voting - assumption: voters act to maximize their utility - modified version of the argument of self interest (adam smith etc) - conclusion: a voter will choose party/candidate whose position is closest to his/hers - logic: the closer the match between a candidate and the voter, the higher is the voter's utility (payoff)

Specifying an explanation: hypothesis

- in inductive research, we observe a limited number of cases and state probable relationships in hypothetical terms that can be generalized to the larger phenomena - a hypothesis is an explicit (but often tentative or provisional) statement about the relationship between phenomena that can be empirically verified (in principle)

inductive reasoning

- inductive theorists aim to make generalizations by beginning with: - observations of a relationship in a small number of cases. - generalize onto a larger population - generalizable theories are better than narrow theories - most research based on hypothesis-testing is based on the inductive approach. - deductive theories are closely related to normative theories which tell us what the subjects should do.

use of correlation coefficient

-Construct validity: comparing patterns of responses (values) of measure 1 to those in measure 2 or more in the same cases. -Test-retest reliability: assessing similarity of variation of values obtained at time 1 and at time 2. -Inter-coder reliability: comparing values recorded by coder 1 and coder 2 when both use the same instructions on how to code. -Inter-item association: comparing similarity in response patterns of values in various measures of the same phenomenon.

cross level generalizations are difficult

-Data collected for one unit of analysis are sometimes used to make inferences about another unit of analysis. - In general, researchers should not mix units of analysis within a hypothesis. -However, it sometimes becomes necessary to make ecological inferences about individuals from aggregate data. -Ecological fallacy - error of assuming that a finding observed at an aggregate level can be be true at the individual level

correlation

-If we believe that A -> B, then we should see a relationship between A and B. - Positive correlation: larger values of A are associated with larger values of B. -Negative correlation: larger values of A are associated with smaller values of B. -Correlation does not prove causality; it is only about association

Proving vs Falsifying

-In natural sciences, they prove by examining all possible cases and showing that our theory works for them - They can do it when items (cases) are identical (e.g., hydrogen atoms) or when the number of cases is small. - In social sciences, the number of cases is very large and we cannot test our theory on each possible item (e.g., person, country), which can vary from one case to another. - Thus, in social sciences we do not prove and can only falsify - find a case that shows our theory to be false (hence, a hypothesis can be rejected).

primary data

-observation is generally an example of gathering primary data first hand. data is recorded and used by the researcher making the observations, which can be quantitative or qualitative in nature. ethnography is an example of first hand observation that generally goes beyond descriptions of events or actions to reveal the "cultural constructions" in which we live.

a good hypothesis

1. Formalize educated guesses about phenomena that exist in the political world with empirical statement 2. Explain general rather than particular phenomena 3. Present a logical/plausible reason for thinking that the hypothesis might be supported by the data 4. Include the specific direction of the relationship (positive or negative or either way) 5. Terms describing concepts should be testable (not tautological—two concepts are identical in meaning) 6. Relationship specified should be consistent with data/research design

Scientific/Empirical Research Method

1. Identifying a research problem 2. articulating a research question 3. formulating a hypothesis 4. gathering data for hypothesis testing. 5. choosing an analytical/statistical method for testing hypothesis 6. report and interpret statistical findings

creating a good hypothesis

1. pose an interesting research question. 2. propose an explanation-- ◦ By identifying which is a dependent variable and which are the independent variables ◦ By specifying the causal mechanism or explaining how Y is "caused" by X. 3. define your concepts clearly. ◦ Tell us exactly what we should see if your hypothesis is correct. What exactly do you mean by invoking a term such as "web presence," or "knowledge," or "revolution." 4. Is the relationship plausible (credible/logical)?

Dispersion

3 measures of dispersion: -range - variance - standard deviation range: Range -One of the simplest measures of dispersion is the range. -Range = Y maximum - Y minimum -describes the extremes of the data around the typical case variance: Variance is the sum of the squared differences between each value of that variable and the mean divided by the number of cases. We square the differences so that positive and negative differences don't cancel out. There are two formulas to calculate the variance. -one formula for the sample (a subset of observations drawn from a specified population) one formula for the population (all the observations or cases in a study) The only difference is that we subtract 1 from the sample size (n) in the sample version of the equation. Standard Deviation: The standard deviation is the square root of the variance. -There are also two formulas to calculate the Standard deviation. • one formula for the sample • one formula for the population -Like variance, the only difference is that we subtract 1 from the sample size in the sample version of the equation. STANDARD DEVIATION IS USED MORE OFTEN TO SUMMARIZE THE DATA THAN VARIANCE BECAUSE THE SD IS IN THE SAME UNIT AS THE MEAN

Specifying a causal relationship

A causal relationship has three necessary components: 1. X and Y covary—a change of value in X is followed by a change of value in Y (correlation) 2. The change in X precedes the change in Y (temporal ordering) 3. Covariation between X and Y is not a coincidence or spurious (exhaustion of alternative explanations)

Z -SCORES

A z score (z) is a measure of how many standard deviations a particular observation is above or below the mean of 0. We subtract the mean from the observation and divide by the standard deviation. Values above the mean will yield positive z scores; values below the mean will yield negative z scores. By converting raw scores to z scores, we are able to make fair comparisons between or among data/scores collected in different settings (e.g. student performance in different TA sessions). (observation-mean)/standard deviation

Accuracy of Measurements

Accuracy of a measurement is determined by both its reliability and validity.

hypothesis about the general: a statement about how the world works, not about a specific case

Comparing A to B, which one is better? Why? A. "The United States has more murders than other countries because so many people own guns." B. "Countries with more guns per capita will experience more murders per capita than countries with fewer guns."

Methods of data collection through obsert

Data should be collected on the proper unit of analysis. Different data sources can lead to different conclusions about the same thing. Most observations in the social sciences are direct, but sometimes indirect methods are used. Sometimes we hide our intensions by making covert rather than overt observations. Researchers can actively participate in activities being observed or stay as non- participants. We can choose to make structured (systematic) or unstructured observations.

temporal ordering

If we believe that A -> B, then we should see a change of value in A first, and then a responding change in value of B. - That fact that A needs to happen before B in a causal statement does not prove causality. Making this mistake is called post hoc fallacy.

three types of common political data

Interview data: verbal or written cues collected in-person or by phone, mail, or the internet-- typically using an instrument (e.g., survey questionnaire) and w/ informed consent. Documents: official government records & records kept by private institutions, interest groups, media organizations, or individuals. Physical observations: both direct and indirect recording of activities, behavior, or events.

Validity

Is the measurement valid? is it measuring what it is supposed to measure? - validity is the extent of correspondence between the measure and the concept. - validity is connected to the ideas of operationalization- decisions on what dimensions to include in measurements of a concept. types of validity: 1. Face validity: on the face of it, is the measure valid? A test has face validity if its content simply looks relevant to the person taking the test. 2. Content validity: Are the appropriate dimensions of the concept being tapped by the measure? We often combine different measures in an effort to increase content validity through an index. EVALUATING VALIDITY: - Construct validity: Check to see if the measure is related to other measures thought to capture the same concept. ◦ Example: Scores in Freedom House democracy measure of the electoral process should be correlate to another democracy measure such as independent media. - Inter-item association: If we are combining measures thought to capture the same concept, all of these measures should be related.

Central Tendancy

Measures of central tendancy: - mode - median - mean Mode: The mode is the category of a variable with the greatest frequency of observations. ・ It refers to the most common or frequent value in a distribution. Mode is resistant to outliers (see next) ・ There can be more than one modal value for a variable. Variables with more than one mode are referred to as bimodal or multimodal. ・ Example: In a party ID variable we have 40 Democrats, 60 Republicans, and 20 Independents, the mode is "Republican." Median: The Median describes the middle value in an ordered set of values ◦ By sorting the values in a dataset in increasing order and then finding the value where 50% of the cases are smaller, and 50% are larger ◦ important to rank order the observations first then we divide the observations on the variable in half Need to use a different formula when the number of cases N is either odd or even mean: basically what you know. unlike mode and median, mean is not resistant to outliers.

transmission

Our ability to transmit/share the findings imposes several requirements: - the research method must be well documented: clear and detailed description of the logic, the cases, process of observation/measurement, and analysis. - results must be amenable to replication - regardless of our differences, scientists operate as a community with shared values, language, and ethics.

Reliability v Validity

Reliability is easy to demonstrate through some form of repeated trials - Validity is more difficult because we can never be sure about the true value of a concept - While a valid measure is always reliable, because if truly valid it will measure the concept correctly every time, a reliable measure is not necessarily valid. - We could be measuring the concept incorrectly in a consistent way

Falsifiability

Scientific findings must be falsifiable: there must be a possibility of observing a case where the theory fails

making empirical observations

The choice of data collection method depends on the following 6 points: ◦ validity of the measurements that a particular method will permit ◦ effect of the data collection itself on the phenomena being measured ◦ population covered by a data collection method ◦ resources and the cost associated with a method ◦ public availability of data ◦ ethical implications

inference

The process of using limited number of observable data/fact to learn something about the larger phenomena that we do not have all the information about. - Descriptive inference: using these facts to explain the large phenomena - Casual inference: using these facts to explain why something happens Rather than simply describing what happens, our goal is to learn how casual inference works.

Ways to describe scientific data

We can describe quantitative data in a number of ways: 1. We could describe every observation, or every value in a data set. ◦ this would be overwhelming, and mostly unhelpful 2. Or, we could display observations with a data matrix, which is an array of rows and columns that stores observed values of variable. 3. Or we can describe data as distributions, which shows how data points are distributed across classes/categories

Hypotheses should be specific: The expected relationship is clearly stated

Which one is better? In what way? A. "A country's geographic location influences the type of political system it has." B. "The more borders a country shares with other countries, the more likely it is to be non- democratic."

Operationalization (from theory and hypotheses to measurement)

a process of deciding how to turn abstract concepts into specific measures by recording empirical observations of the occurrence of an attribute or a behavior using numerals or scores. - a good operational definition of a concept helps identify what are the necessary components in measuring a concept that may be 'multi-dimensional' in nature. - neglecting a measurement may lead to permanent bias in measurement --> creating a problem with validity of the measure.

Summarizing Data with statistics

a. statistical summaries - frequency distributions, descriptive statistics b. graphical summaries - bar graphs, pie graphs, dot plots, histograms

Reliability

all things equal, will the measure give the same results over and over? - reliability is the extent of yielding the same results on repeated trials The property of getting the same measurement over and over for the same item -The more consistent the results, the higher the reliability. Four types of reliability: ◦ Test-retest reliability: applying the same test in different times to the same respondents ◦ Alternative-form method: applying two different measures of the same concept to the same respondents at different times ◦ Split-halves method: applying two halves of a multi-item questionnaire measuring the same attributes at the same time ◦ Inter-coder reliability: Comparing results of two persons coding the same measure(s) of a concept

Statistical Dataset

collection of cases with values that can be numerically expressed and counted

reactivity effect

direct, researcher poses or acts a certain way and people react to it.

basic unit of analysis

each case, in the candies example it was each student

longitudinal/time series design (hypothesis/research design)

ex: "As the literacy rate of a county increases, the political process of the country also becomes more democratic"

cross-sectional design (hypothesis/research design)

ex: "Countries with higher literacy rates tend to be more democratic than those with lower literacy rates"—

observer effect

indirect, researcher is just observing and the people act different

interval/ratio data relationships

use correlation coefficients, (r in sample, "rho" in population) - correlation coefficients describe how closely the values of x and y are associated with each other. r=1 - perfect positive association r= -1 perfect negative association r= 0 no association

intervening variable

variable that comes between X and Y that helps explain the relationship between the two. - X -> Z -> Y Example: higher levels of education (X) would lead to higher levels of voter turnout (Y) Possible intervening variable(Z) : number of political science classes taken, knowledge of candidate position

antecedent variable

variable that influences the value of the independent variable - Example: Z -> X -> Y - Y - vote for the Democratic party - X - attitudes toward National Health Coverage - Antecedent variable Z: adequacy of medical insurance of a person

variable

varies from one case to another. it is a constant if it does not change, no variation across cases

unit of analysis

what is being measured. in Millennial students are more likely to be depressed than students born earlier than 1980." The unit of analysis is—students (in different cohorts).

field studies

• A field study occurs in a natural setting. • Field studies hold many advantages over other methods. -people behave as they would ordinarily, unlike in a lab -can observe people for lengthy periods of time so that interaction and changes in behavior may be studied -may achieve a degree of validity/accuracy or completeness not possible with documents or survey interviews However, to take accurate and complete notes can be a challenge.

ethical concerns

• Ethical concerns arise primarily when there is a potential for harm to the observed. -negative repercussions from associating with the researcher because of the researcher's sponsors, nationality, or outsider status. -invasion of privacy. -stress during the research interaction. -disclosure of behavior or information to the researcher resulting in harm to the observed during or after the study. Addressing Ethical Concerns Federal regulations require faculty and students to submit research proposals involving human subjects for review to an institutional review board to protect participants from harm. Informed consent means that research subjects are to be given information about the research, including the research procedure, its purposes, risks, and anticipated benefits; alternative procedures (e.g., where therapy is involved); how subjects are selected; and the person responsible for the research.

Descriptive Statistics (describing data with summary statistics)

◦ Method of describing a large amount of data with just one number that summarizes the phenomenon two types: 1. Central Tendency: What's the average look like? What is typical? Where do I stand? 2. Dispersion: How well did I do (when compared to the average)? Or how far off was my guess from the average?

Frequency Distributions (describing data with summary statistics)

◦ a table that shows the number of observations associated with each value of a variable ◦ may include other statistics like the relative frequency proportion, percentage, missing values or odds ratios


संबंधित स्टडी सेट्स

Microbiology Chapter 17: Adaptive Immunity

View Set

Restraint Devices (large animal)

View Set

Emergency Management Questions, Volume 1

View Set