Introduction to R
How to make a comment
Put a # sign before you write the comment, r won't run it as code
create factors
create a vector that contains all the observations that belong to a limited number of categories then, factor_sex_vector <- factor(sex_vector)
construct a data frame
data.frame(name, type, diameter, rotation, rings)
Vectors
one-dimension arrays that can hold numeric data, character data (needs quotes), logical data
sorting
order() gives you the ranked position of each element when it is applied on a variable a <- c(100, 10, 1000) order(a) then you can do a[order(a)] to sort values from smallest to largest
str()
shows you the structure of your data set total # of observations, total # of variables, a full list of the variable names, data type of each variable, first observation
sum function
sum() calculates the sum of all the elements in the vector
Creating a vector
use the combine function c() my_vector <- c(12, 13, 14, 15)
Modulo
%% returns the remainder of the division of the number to the left by the number to its right
rowsums()
calculates the total for each row in the matrix
rbind
function merges matrices and/or vectors together by row matrix is ahead of vector ex: rbind(matrix 1, matrix 2, vector 1)
head() vs tail()
head()- enables you to show the first observation tail()- enables you to show the last observation
selection of data frame
like matrix selection can use vector name if you want just a specific row/column (have to put a $ in front of the vector name)
list function
list(component 1, component 2, etc)
changing the names of the levels
mainly for clarity or other reasons levels(factor_blank_vector) levels)factor_blank_vector) <- c("Female", "Male")
matrix function
matrix() 1st argument- the collection of elements that R will arrange into the row and columns of the matrix 2nd argument- byrow= indicates that the matrix is filled by the row can be true or false 3rd argument- nrow= indicates that the matrix should have n many rows ex: matrix(1:9, byrow= TRUE, nrow= 3)
Data frames
multiple data types has variables of the data set as columns and the observations as rows
Assigning variable example
my_var <- 4
nominal vs ordinal categorical variables
nominal- without an implied order ordinal- natural ordering
selecting multiple elements from a vector
poker_vector[(1, 5)] if you want a row of numbers, poker_vector[c(1:5)]
naming rows and columns of a matrix
rownames(my_matrix) <- row_names_vector colnames(my_matrix) <- col_names_vector
if you want to know if you made money on a day and want to know how much money you won on a certain day
selection_vector <- poker_vector > 0 Poker_winnings_days <- poker_vector[selection_vector]
Factors
statistical data type used to store categorical variables categorical variables is a limited number of categories
summarizing a factor
summary() give you a quick overview of the contents of the variable
ordered factors
two added levels: ordered, levels ex: factor_speed_vector <- factor(speed_vector, ordered= TRUE, c("slow", "medium", "high")
class() function
used to check the variable of the data type beforehand
comparing order pairs
Da2 <- factor_speed_vector[2] Da5 <- factor_speed_vector[5] Da2 < Da5
Selecting one or multiple elements from a matrix
Ex: my_matrix[1,3] if you select all elements of a row or column, no number is needed before or after the comma my_matrix[,1] -> selects all elements from 1st column my_matrix[1,] -> selects all elements from 1st row
cbind
function merges matrices and/or vectors together by column matrix is ahead of vector ex: cbind(matrix 1, matrix 2, vector 1)
naming parts of list
after you created the list names(my_list) <- c("name1", "name2") same time my_list <- list(name1= your_comp1, name2 = your_comp2)
lists
allows you to gather a variety of objects under one name in an ordered way can be matrices, vectors, data frames, other lists
add elements to a list
c() function Ex: ext_list <- c(my_list, my_val) 1st part- list you want the element to be added to 2nd part- element you want added to the list
selecting elements from a list
can use [[ ]] or $ Ex: shining_list[[2]][1] 1st part- selected the second component 2nd part- selected the first element
Selecting elements of a vector
can use square brackets start out at 1, not 0 poker_vector[1]
what is a matrix?
collection of elements of the same data type arranged into a number of fixed columns and rows
subset()
ex: subset(my_df, subset= some_condition) 1st argument- specifies the data set for which you want to subset 2nd argument- give R the necessary info and conditions to select the correct subset