R Methods
aov(formula, data=___)
Fit an analysis of variance model by a call to lm for each stratum.
t.test(vectorname,optionalsecondvector,conf.level=___)
Performs one and two sample t-tests on vectors of data. will either give you an estimate of the mean (one sample) or the estimate of the difference (two sample)
quantile(vectorname, probvalue)
The generic function quantile produces sample quantiles corresponding to the given probabilities. The smallest observation corresponds to a probability of 0 and the largest to a probability of 1. USE WITH STAT FROM INFERENCE PACKAGE AS VECTOR
element_blank()
Theme element: blank. This theme element draws nothing, and assigns no space
position.jitter(value)
allows you to control how much to jitter
read.csv("cities.csv")
creates a data.frame from a csv file
get_regression_points(modelname)
creates a tibble of y hat along with data values for each plot variable for each observation
rbinom(numberofresults, numberofflipsineachresult, chanceofeachflip)
creates a vector of how many "heads" you get from a coin flip style probability
cor()
embedded within the summarize() function, takes two variables and returns the correlation
generate(reps=___, nameoftype)
generates a certain number of replicates and gives them a type, such as "bootstrap"
which(vectorname)[value]
returns index of vectorname where first instance of value is found
newdata parameter of get_regression_points
specifies a new dataframe to apply your predictions to
first three steps of the infer package
specify() will specify the response and explanatory variables. hypothesize() will declare the null hypothesis. generate() will generate resamples, permutations, or simulations.
pivot_longer()
"lengthens" data, increasing the number of rows and decreasing the number of columns. uses names_to, values_to, col
aes(aesthetic = value, ...)
Aesthetic mappings describe how variables in the data are mapped to visual properties (aesthetics) of geoms. examples are x =, y =, color =
pairwise.t.test(responsevector, groupingvector)
Calculate pairwise comparisons between group levels with corrections for multiple testing.
%>%
R pipe operator. Can be used to filter data.frames
scale_typeofmethod_manual(value1, value2, ...)
These functions allow you to specify your own set of mappings from levels in the data to aesthetic values.
subsetting data
[row:row,col:col]
tibble(attribute1=...,2=...,.......)
a less expressive but quicker version of a dataframe
annotate(
adds embellishments to the plot
mutate(col=___,col2=___ ,...)
adds new columns to a data frame
attach(data.framename)
allows you to access the variables of the data.frame without the dollar sign $
element_something()
allows you to alter the various theme components
arrange(colname)
arranges data based off of the given column, similar to group_by() but will actually display the ordering as grouped
n()
can only be called within a summarize(), is a count of observations
summarize(summarycol1=____,...)
compiles and displays summary data chosen by the programmer. Can be split up using group_by()
margin(top,right,bottom,left,unit)
creates a border for whiespace. used to set the sizes of varying rectangular objects such as the data space and legend with legend.margin = margin(20,30,40,50,"pt")
data.frame(attr1=__,attr2=___,...
creates a dataframe with many attributes, which should be set to some list or vector
lm(y ~ x, data=data.framename)
creates a linear model of a chosen variable against another. defaults to mathematical interpretation, but can be added to a ggplot stack. use + to use multiple regression in the x area.
rnorm(n, mean = 0, sd = 1)
creates a normally distributed set of data with n points, a mean and a standard deviation
random poisson, rpois(n, p)
creates a random vector with a poisson distribution. P IS EXPRESSED AS A WHOLE NUMBER PERCENTAGE
geom_smooth(method=___,se=FALSE)
creates a straight line with a given behavior such as "lm" for linear model
ftable(vectorname)
creates frequency table for categorical values
theme(component1=___,....)
customize the non-data components of your plots: i.e. titles, labels, fonts, background, gridlines, and legends.
visualize()
display data with p value
get_p_value()
display p value
factor(vectorname)
encode a vector as a factor
filter(col==value)
filters out all rows that don't satisfy the given predicate
%in%
filters the rows where a variable is an element of the proceeding vector of values
get_regression_table(modelname)
give a model and it will give you the regression table
mean(dataset)
gives mean
sd(dataset)
gives standard deviation for dataset
IQR(vectorname)
gives the IQR
var(listorvectorname)
gives the variance
group_by(col)
groups together rows with matching columns in the ordering of the displayed data. Should be used for categorical data. typically used with summarize.
how does the area in the tails change as degrees of freedom increases for a t distribution
increases
ggplot(dataset, mapping = aes())
initialize a ggplot object by declaring the input data frame and plot aesthetics
library(libname)
loads a package into the script so that its methods can be used
hist(dataset)
makes a histogram
qt(p, df)
p probability df degrees of freedom gives the cutoff value for the t distribution with df degrees of freedom for which the probability under the curve is p
how to print to console in R
print()
read_csv("Web address or file")
puts data into a tibble, requres the readr package, allows imports from the web.
pt(q,df)
q given cutoff value df degrees of freedom gives probability under the t distribution with df degrees of freedom for values of t less than q.
pbinom(q,n,p)
q is the number that you want to know if there will be less than this number of heads n is number of flips, p is probability of heads
sample_frac(size=___,replace=___)
randomly shuffles the rows
desc(vector/listname)
rearranges numeric data to be in descending order
replicate(n,functioncall)
repeats a function call n times and puts results into a vector
theme_set(nameoftheme)
resets a theme to default
diff()
returns differences, used in dplyr
select(col1,col2,...)
selects only specified columns of the data.frame, using - (negation) on a col omits it from the results.3
unit(amount, unit)
set a whitespace value for use in margin(). used to set tick marks on axes to a variable amount of some unit such as axis.ticks.length=unit(2,"cm")
labs(labelname=___,...)
set the label texts of the plot
hypothesize(null = option, stat = ___)
set your null hypothesis using one of many options such as "point" and also set a stat.
c(e1, e2, e3,...)
short for concatenate. creates a vector
glimpse(data.framename)
similar to str(), but gives some basic inferences
facet_wrap(~ variablename)
splits a plot by a certain categorical variable
str(objectname)
str short for structure. Gives cursory information on any given object like a list of numbers
built in themes and ggthemes
theme_gray() theme_bw() theme_classic() theme_void() theme_fivethirtyeight() and so on
TRouBLe
top right bottom left unit
tbl_df(data.framename)
turns a data.frame into a tibble
prop.test(successnumbervector,count,conf.level=___)
used for testing the null that the proportions (probabilities of success) in several groups are the same, or that they equal certain given values.
starts_with(string)
used with select to select columns of a certain starts with string
tidy(x)
x is an object to be converted into a tidy data.frame
dbinom(x,n,p)
x is desired number of heads n is number of flips p is probability of heads returns the chances of that many heads appearing