pols 201 midterm

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

direction

>0 is positive <0 is negative

difference-in-means estimator

Y-bar treatment group - Y-bar control group or the average outcome for the treatment group - the average outcome for the control group non-binary interpretation: average in the same unit of measurement as the variable binary interpretation: percentage points in % after multiplying by 100

standard deviation

a measure of the spread of a variable's distribution

Xlittle1

a particular observation of X, where i denotes the position of the observation and n is the total number of observation in the variable

correlation coefficient or correlation

a statistic that summarizes the relationship between two variables with a number shows direction and strength of a linear association

sample

a subset of individuals chosen for a study

randomized experiment

a type of study design in which treatment assignment is randomized

histogram

a variable is the visual representation of its distribution through bins of different heights

table of proportions

a variable shows the proportion of observations that take each value in the variable

representative sample

accurately reflects the characteristics of the population from which it is drawn, that is, characteristics appear in the sample in similar proportions as in a the population as a whole

if a variable is binary

as a proportion, in % after multipying the result by 100

if variable is non-binary

as an average, in the same unit of measurement as the variable

β hat is ∆ Y ...

associated with change in X = 1 (slope)

conclusion statement final paragraph

assuming that [the treatment and control groups are comparable] (a reasonable assumption because...), we estimate that [the treatment] (increases/decreases) [the outcome] by [size and unit of measurement of the effect] on average

lm()

calculates linear model in formula of Y~X ex: lm(data$final ~ data$midterm)

cor()

calculates the correlation ex: cor(star$reading, star$math)

median()

calculates the median

sd()

calculates the standard deviation x-bar - x sd(X) x-bar + x sd(X)

datasets

capture the characteristics of a particular set of individuals or entities

least squares method

chooses the line that minimizes the prediction errors

strength

closer the absolute value is to 1 the stronger the associate 0 is no association

variables

column that contains the values of a changing characteristic

mean ()

computes the mean of a variable

sqrt()

computes the square root of the argument specified inside the parentheses

var()

computes the variance ex: var(voting$birth)

hist()

creates a histogram based on one variable

plot()

creates a scatter plot using plot(x=data$x_var, y=data$y_var)

assignment operator

creates an object through a name

ifelse()

creates the contents of a new variable based on the values of an existing one requires three arguments, separated by commas, in the following order: 1. logical test (using ==) 2. return value if logical test is true 3. return value if logical test is false ex: ifelse (data$variable == "yes", 1, 0)

unit of observation

defines the individuals or the entities that each observation in the data frame represents such as the unit of observation being students if each observation represents a different student

scatter plot

enables us to visualize the relationship between two variables by plotting one variable against the other in the two-dimensional space

substantive interpretation of β hat

ex: an increase in midterm scores of 1 point is associated with a predicted increase in final exam scores of 0.97 points, on average

substantive interpretation of α hat

ex: when a student scores 0 points on the midterm, we predict that in the final exam they will score -6.01 points, on average

[ ]

extracts a selection of observations from a variable to its left, we specify the variable we want to subset inside the square brackets, we specify the criteria of selection ex: data$var1[data$var2==1] which will extract the observations of the variable var1 for which the variable var2 equal 1

Yhat = α hat + β hat X

fitted line

how do you write a function?

function_name(required_argument, optional_argument_name = optional_argument)

object

how R stores data

control group

individuals who did not receive the treatment

treatment group

individuals who received the treatment

α hat

intercept

linear model

line on a scatter plot?

random sampling

makes the sample and the target population on average identical to each other in all observed and unobserved characteristics

random treatment assignment

makes the treatment and control groups an average identical to each other in all observed and unobserved pre-treatment characteristics

emasures of centrality

mean or median

numeric non-binary

more than two numeric values (ex: distance traveled)

proportion of observations

number of observations that meet criterion/ total number of observations (ex: 3/6 is 50%)

median

of a variable is the value at the mid-point f the distribution that divides the data into two equal-size groups

descriptive statistics

of a variable numerically summarize the main characteristics of its distribution

numeric binary

only two numeric values (0,1) that represent the presence or absence of a trait (ex: voted or not)

View()

opens a new tab in the upper-left window of RStudio with the contents of the dataset (only function with a capital first letter)

dataframes

organized datasets in observations and variables

y hat is the predicted outcome

our predictions of Y using x

hat

predicted or estimated

∆Y hat = β hat∆X

predicts change in y hat associated with change in X

dim()

provides the dimensions of the data frame using the name of the object (order of row, columns)

read.csv()

reads the CSV files - the only required argument is the name of the CSV file in quotes ex: read.csv("file.csv")

represents change

observation

row and a particular entity or individual in the study

setwd()

set the working directory, that is, directs R to the folder on your computer where the dataset is save (Session >> Set Working Directory >> To Source File Location)

table ()

shows how many observations did one thing or another ex: table(voting$voted) gives the solution of who did and did not vote using 1 and 0

prop.table(table( ))

shows in a percent which observations did what ex: prop.table(table(voting$voted))

head()

shows the first six rows or observations in a dataset using the name of the object

frequency table

shows the values the variable takes and the number of times each value appears in the variable

β hat

slope

population

something like the residents of country - infeasible to collect data from an entire population

slope

specifies the angle or steepness of the line

intercept

specifies the veritical location of the line

measures of spread

standard deviation or variance

what is a function?

takes input -> performs actions with the inputs -> produces output

character variable

text not in quotes

X-bar

the average of X

average causal effect

the average of all the individual causal effects of X on Y within a group

causal effects

the cause-and-effect connection between two variables

causal effect of X on Y

the change in the outcome variable caused by a change in the treatment variable

the larger the standard deviation...

the flatter the distribution

what values do not need to be in quotes?

the names of objects, names of functions, and names of arguments as well as special values such as TRUE, FALSE, NA, and NULL and numbers never need to be in quotes

i

the position or row number of the observation

unit of measurement

the quantity in which the value is measured (points, miles, kilometers)

variance

the square of the standard deviation of a variable - sd are easier to interpret because they are in the same unit of measurement as the variable

mean

the sum of the values across all observations divided by the total number of observations

n

the total number of observations

percentage point

the unit of measurement for the arithmetic difference between two percentages

outcome (y)

the variable that we want to predict

predictor(X)

the variable we want to sue to predict the outcome

what is # used for?

to comment notes that will not be used by r

aim of predictions

to predict Y as accurately as possible with the smallest errors possible

r script

type of file we use to store the code we write to analyze data

$

used to access an element inside an object such as a variable inside a data frame

fit

used to create an object for a linear model

==

used to create logical tests that evaluate whether the observations of a variable equal a particular value (values in quotes if not numbers ex: "yes")

abline()

using the name of object with fitted line it will then add that line to the scatter plot

interpretation of mean()

using unit of measurement binary ex: the average year of birth of registered voters is the year 1956 non-binary ex: 31% of registered voters voted

outcome variable (Y)

variable that may change as a result of a change in the treatment variable (either binary or non-binary)

treatment variable (X)

variable whose change may produce a change in the outcome variable (ALWAYS BINARY in this class)

x are predictors

variables that we use as the basis for our predictions

y is the outcome variable

what we want to predict

counterfactual outcome

what would happen if we had made different decisions - impossible to observe

writing a conclusion statement breakdown

what's the assumption we are making when estimating the average causal effect? why is this a reasonable assumption? what's the treatment? what's the outcome? what's the direction, size and unit of measurement of the average causal effect?

α hat is Y when...

x = 0

∈ hat is the errors of our predictions

∈ hat = y - y hat


Kaugnay na mga set ng pag-aaral

final pt2QA Exercise #2 04/12/19, Practice Quiz 12.1 (RHIA & RHIT), Practice Quiz 12.2 (RHIA & RHIT), Practice Quiz 12.3 (RHIA & RHIT), Final Quiz 12.1 (RHIA & RHIT, Practice Quiz 11.1 (RHIA & RHIT), Practice Quiz 11.2 (RHIA), Final Quiz 11.1a (RHIT),

View Set

Human Sexual Behavior Chapter 11

View Set

Building Construction Chapter 10

View Set

SAU5 Rigid Motion Vocabulary and Key Concepts

View Set