R Programming
NaN
Undefined mathematical operation
&&
evaluates and for the first member of a vector only
||
evaluates or for the first member of a vector only
Logical operators
>=, >, <=, <, ==, !=
file.rename()
Change the name of a file
NA
Missing value
Test to see if objects are NA
is.na()
Isolate the positive elements of vector x without including NA
x[!is.na(x) & x > 0]
Create an index vector that will show the first ten elements of vector x
x[1:10]
Four different flavors of index vectors
Logical vectors, vectors of positive integers, vectors of negative integers, and vectors of character strings
Reading in R code files (inverse of dput)
dget; the analogous function for writing data is is dput
|
or
Reading in R code files (inverse of dump)
source; the analogous function for writing data is dump
Create the dimensions of my_vector to have 4 rows and 5 columns, so it's now a matrix instead of a vector.
dim(my_vector) <- c(4,5)
Create the names of the matrix "m" rows to be "a" and "b" and the names of the columns to be "c" and "d"
dimnames(m) <- list(c("a", "b"), c("c", "d"))
Delete a directory named "testdir2"
unlink("testdir2", recursive = TRUE)
ls()
List all of the objects in the local workspace
header
Logical indicating if the file has a header line
nrows
The number of rows in the dataset
Atomic vector
Atomic vectors contain exactly one data type, whereas lists can contain multiple data types. Atomic vectors can be numeric, logical (e.g., TRUE, FALSE, NA), character, integer, or complex.
file.info()
Get information about the file
setwd()
Move to a different working directory (put a path in the parentheses)
rep()
Replicate
Make the column names for the my_data data frame the cnames vector
colnames(my_data) <- cnames
?
Use this to prompt the help page (e.g., ?list.files will bring up the help page for the list.files function). Note that symbols must be enclosed in backticks after the ? (e.g., ?`:`).
How to determine the classes of each column after reading in a table named initial
initial <- read.table("database.txt", nrows = 100) classes <- sapply(initial, class) tabALL <- read.table("database.txt", colClasses = classes)
Test to see if objects are NaN
is.nan()
Reading in saved workspaces
load; the analogous function for writing data is save
Mean of column "Ozone" in data frame hw1
mean(hw1["Ozone"][!is.na(hw1["Ozone"]) & hw1["Ozone"] > 0]) or mean(hw1$Ozone[!is.na(hw1$Ozone) & hw1$Ozone > 0])
Change the matrix my_matrix to a data frame named my_data
my_data <- data.frame(my_matrix)
Select 100 elements at random from vectors x and y
my_data <- sample(c(y, z), 100)
Create a matrix called my_matrix2 containing the numbers 1-20 and dimensions of 4 rows and 5 columns
my_matrix2 <- matrix(data = 1:20, nrow = 4, ncol = 5, byrow = FALSE, dimnames = NULL)
Get the named elements of this numeric vector: vect <- c(foo = 11, bar = 2, norf = NA)
names(vect)
Assign names to the numeric vector: vect2 <- c(11, 2, NA)
names(vect2) <- c("foo", "bar", "norf")
Assign the value of the current working directory to a variable called "old.dir"
old.dir <- dir()
Join the elements of a character vector called my_char together in one continuous character string.
paste(my_char, collapse = " ")
Create a sequence of numbers from pi to 10
pi:10 This will create 3.141593 4.141593 5.141593 6.141593 7.141593 8.141593 9.141593
Reading tabular data
read.table, read.csv; they are the same except read.csv assumes the separator is a comma; the analogous function for writing data is write.table
Reading lines of a text file
readLines; the analogous function for writing data is writeLines
Create a vector that contains 40 zeros
rep(0, times = 40)
Create a vector to contain 10 zeros, then 10 ones, then 10 twos
rep(c(0, 1, 2), each = 10)
Create a vector that contains 10 repetitions of the vector (0, 1, 2)
rep(c(0, 1, 2), times = 10)
A vector containing 1000 draws from a standard normal distribution
rnorm(1000)
Create a vector of numbers ranging from 0 to 10, incremented by 0.5
seq(0, 10, by=0.5) This will create 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0
Create a sequence of 30 numbers between 5 and 10
seq(5, 10, length=30)
Number of missing values in an data frame hw1
sum(is.na(hw1))
Mean of Solar.R column values when Temp column values are > 90 and Ozone column values are > 31
ugh <- complete.cases(hw1$Temp, hw1$Ozone, hw1$Solar.R) & hw1$Temp > 90 & hw1$Ozone > 31 mean(hw1$Solar.R[ugh])
Reading single R objects in binary form
unserialize; the analogous function for writing data is serialize
Subset all elements of vector x except the 2nd and 10th elements
x[c(-2, -10)] or x[-c(2, 10)]
Subset the 3rd, 5th, and 7th elements of vector x
x[c(3, 5, 7)]
LETTERS
A predefined variable in R containing a character vector of all 26 letters in the English alphabet
sep
A sting indicating how the columns are separated
unlink()
Delete a directory
file.remove()
Delete a file
args()
Displays the argument names and corresponding default values of a function or primitive
break
Exit the loop
print()
Explicit printing of what you put in the parentheses. You can also auto print something by typing it and pressing enter. Note that R will print brackets around the number of elements in a vector before the elements themselves.
getwd()
See the current working directory
stringsAsFactors
Should character variables be coded as factors? TRUE or FALSE
return
Signals the function should exit and return a given value
next
Skip an iteration in a for loop
Inf
Special number meaning infinity
How to specify an integer
Specify L suffix to get integer (i.e., 1L gives integer 1)
Data Frames
Stores tabular data in rows and columns; can consist of many different classes of data; every element of the list has to have the same length
Matrices
Stores tabular data in rows and columns; can only contain a single class of data
isTRUE()
This is a function that will take one argument. If that argument evaluates to TRUE, the function will return TRUE.
file.create()
Create a new file in the working directory
c()
Create a vector
If I have a data frame with 1,500,000 rows and 120 columns (all numeric data), roughly how much memory will be required to store this data frame?
1,500,000 rows x 120 columns x 8 bytes/numeric = 1440000000 bytes 1440000000 bytes / 2^20 bytes/MB = 1373.29 MB = 1.34 GB Rule of thumb is that you need twice as much memory as the file size, you you need about 1.34 x 2 GB
Create a sequence of numbers from 1 through 20
1:20 or seq(1,20) This will create 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Generate a sequence of integers from 1 to N, where N represents the length of the my_seq vector
1:length(my_seq) or seq(along.with = my_seq) or seq_along(my_seq)
comment.char
A character string indicating the comment character; set comment.char = "" if there are no commented lines in your file
colClasses
A character vector indicating the class of each column in the dataset
Ellipses
All arguments after ellipses must have default values
Order of operations
And is evaluated before or
file.exists()
Check to see if a file exists in the working directory
identical()
Check to see if two vectors are identical
cat("\014")
Clear R console
cbind()
Combine columns
dir.create()
Create a directory in the current working directory
file.path()
Create a file path
list()
Create a list
The ... argument
Indicates a variable number of arguments that are usually passed on to other functions. Note that any arguments coming after ... must be explicitly named.
dir()
List all files in the working directory
list.files()
List all files in the working directory
Boolean values
Logical values of TRUE and FALSE in R
What is the maximum value in the "Ozone" column during month 5?
M5 <- complete.cases(hw1$Ozone) & hw1$Month == 5 max(hw1$Ozone[M5])
What is the mean of "Temp" when "Month" is equal to 6?
M6 <- complete.cases(hw1$Temp) & hw1$Month == 6 mean(hw1$Temp[M6])
file.copy()
Make a copy of a file
<-
The assignment operator will assign a value to a symbol
file
The name of a file, or a connection
skip
The number of lines to skip from the beginning
Why are dumping and dputing useful?
The resulting textual format is edit-able, and in the case of corruption, potentially recoverable. The metadata is preserveNd, unlike writing out a table.
xor()
The xor() function stands for exclusive OR. If one argument evaluates to TRUE and one argument evaluates to FALSE, then this function will return TRUE, otherwise it will return FALSE. xor(TRUE, TRUE) = FALSE xor(TRUE, FALSE) = TRUE xor(FALSE, FALSE) = FALSE
&
and
Find the mean of columns with a for loop
columnmean <- function(y, removeNA = TRUE) { nc <- ncol(y) means <- numeric(nc) for(i in 1:nc) { means[i] <- mean(y[, i], na.rm = removeNA) } means }
Create a directory in the current working directory called "testdir2" and a subdirectory for it called "testdir3" all in one command
dir.create(file.path('testdir2', 'testdir3'), recursive = TRUE)