R

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Vectors: what's the dif between a vector and a list?

1. Vector All elements must be of the same type 2. Array & Matrix Matrix is a special kind of vector. A matrix is a vector with two additional attributes: the number of rows and the number of columns.

Graph, graph, graph that thing up (barplot)

> class(height) [1] "table" > barplot(height) shit man that was easy! The input was a table.

set wd, absolute or relative

An example of an absolute path is the following: setwd("C:/Users/Username/Documents/datasets") An example of a relative path is the following: setwd("./datasets") If you would use the latter option in your local R session, it uses the string "C:/Users/Username/Documents" through the use of the . character. In datacamp, it takes the current working directory and combines it with the datasets folder.

select an entire column

For instance, planet_df$planets would select the entire planets column from the dataframe planet_df.

mtcars[3:5,] -

For instance, the following code planet_df[1,2] would select the element in the first row and the second column from the dataframe planet_df.

Graph 1 is right skewed, graph 2 is normally distributed, graph 3 left skewed.

Graph skew

binary vectors to subselect into vectors

In the last exercise we saw larger_than_ten consisted of a vector of TRUE and FALSE. We make use of this logical vector to select elements from another vector. For instance, numeric_vector[c(TRUE, FALSE, TRUE)] will select the first and the third element from the vector numeric_vector.

matrix(1:9, byrow = TRUE, nrow = 3, ncol = 3)

In the matrix() function: The first argument is the collection of elements that R will arrange into the rows and columns of the matrix. Here, we use 1:9 which constructs the vector c(1, 2, 3, 4, 5, 6, 7, 8, 9). The argument byrow indicates that the matrix is filled by the rows. This means that the matrix is filled from left to right and when the first row is completed, the filling continues on the second row. If we want the matrix to be filled by the columns, we just place byrow = FALSE. The third argument nrow indicates that the matrix should have three rows. The fourth argument ncol indicates the number of columns that the matrix should have

"indexing into" vectors

Indexing entails the use of square brackets [] to select elements from a vector. For instance, numeric_vector[1] will select the first element of the vector numeric_vector. numeric_vector[c(1,3)] will select the first and the third element of the vector numeric_vector.

Install packages

Install the package ggplot2 using install.packages("ggplot2") Load the package ggplot2 using library(ggplot2) or require(ggplot2)

cases and variables

The first value returned by dim() is the number of cases (rows) and the second value is the number of variables (columns).

Histo vs bar chart

This was wrong... binned it carb_vector = table(mtcars$carb) hist(carb_vector, main = 'Carburetors')

What are the inputs to this function

args(mean)

Coerce variable type

as.*()

Label the bar graph!

barplot(height, names.arg = barnames, ylab = "number of cars")

read om data from interwebs!

cars = read.csv ( "http://s3.amazonaws.com/ assets.datacamp.com/course/uva/mtcars.csv")

data type check

class(some_variable_name)

turn into factor

factor() Factors are stored as integers, and have labels associated with these unique integers. While factors look (and often behave) like character vectors, they are actually integers under the hood, and you need to be careful when treating them like strings. Once created, factors can only contain a pre-defined set values, known as levels.

inspect dataframe

head: this by default prints the first 6 rows of the dataframe tail: this by default prints the last 6 rows to the console * str: this prints the structure of your dataframe dim: this by default prints the dimensions, that is, the number of rows and columns of your dataframe colnames: this prints the names of the columns of your dataframe

help me w dat function girl

help(mean) ?mean

specific data type binary check

is.character()

see the levels of a factor variable

level()

list files in wd

list.files()

list variables

ls()

select more than one column

planet_df[1,c(2,3)]

read in data

read.table: Reads in tabular data such as txt files read.csv: Read in data from a comma-separated file format readWorksheetFromFile : Reads in an excel worksheet read.spss: Reads in data from .sav SPSS format.

Frequency table... group by / sum?

table()

vector comparison

the statement c(4,5,6) > 5 returns: FALSE FALSE TRUE. In other words, you test for every element of the vector if the condition stated by the comparison operator is TRUE or FALSE.

list

variable data type, ordered, like python can be anything

Add calculated column to a df

yourdata$newvariable[yourdata$age > 18] <- "adult" This assigns the value "adult" to the variable newvariable, for all cases where age is greater than 18.


Ensembles d'études connexes

Netacad - Modules 9-12: Data Communications and Network Services

View Set

Fundamentals of Logistics Management

View Set

Basic electronic components(diodes)

View Set

section 6; unit 3 environmental hazards and other property impacts

View Set

Music 351 Motown Final - Quiz Answers

View Set

Astronomy 115 Exam 2 Study Guide

View Set

Chapter 4: Musculoskeletal System (Quiz & Practice Exercises Review)

View Set

Always, Sometimes, Never: Angles

View Set