R Programming Part 1

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Provide an example of R code that reads elements from a web page into R using the URL function to create a connection to a website and then using the readLines function to read the text from that connection.

## This might take time con <- url("http://www.jhsph.edu", "r") x <- readLines(con) head(x)

Provide the output you will see when the following R code is entered into the R console prompt: x <- list(a = 1, b = 2, c = 3) x

$a [1] 1 $b [1] 2 $c [1] 3

The following R code is entered to create a matrix: m <- matrix(nrow = 2, ncol = 3) attributes(m) What is the output that is produced?

$dim [1] 2 3

List and describe the 2 different types of vectors.

1) Atomic vectors - contains exactly one data type, (Note when referring to vectors, this is the type of vector commonly referred to). 2) Lists - may contain multiple data types.

Describe the 2 steps involved in specifying a column names for a data frame using the colnames( ) function.

1) Create a vector with the column names you would like. For example: <column vector name> <- c("<column name 1>", "<column name 2">) 2) colnames(<data frame name>) <- <column vector name>

Describe the 2 basic steps of creating a matrix via binding columns or binding rows.

1) Create vector objects with sequences. 2) Use the cbind( ) or rbind( ) functions to bring together 2 vectors to form a matrix.

When using R with larger datasets, list 5 things that are useful to know about your system.

1) How much memory is available? 2) What other applications are in use? 3) Are there other users logged into the same system? 4) What operating system? 5) Is the OS (Operating System) 32 bit or 64 bit?

Give 3 examples of the special number NaN.

1) If you take zero over zero (or 0/0) that is not a number. It is not defined so you will get a NaN back. 2) If you enter the following equation: Inf-Inf or infinity minus infinity you will get a NaN back. 3) NaN can also be thought of as a missing value.

What are the 2 types of coercion that can occur when you mix types of objects in R? Describe each.

1) Implicit coercion - occurs behind the scenes. There are no error messages. Based on convention in R. 2) Explicit coercion - coerce objects from one class to another using functions that usually start with the word as.* function.

List 3 different types of special numbers.

1) Inf which represents infinity. 2) -Inf which represents negative infinity. 3) NaN which represents an undefined values ("Not a number").

Describe 2 ways to use the paste( ) function.

1) Join the elements of a character vector (i.e. a character vector of length more than 1) together into one continuous character string (i.e. a character vector of length 1). 2) Join the elements of multiple character vectors into one character vector of length 1.

List 4 possible operating systems you could be using in R programming.

1) Mac 2) Windows 3) Linux 4) Unix

List 3 things (or optimizations) you can do to make your life easier and prevent R from choking when dealing with much larger datasets.

1) Read the help page for read.table, which contains many hints. 2) Make a rough calculation of the memory required to store your dataset. If the dataset is larger than the amount of RAM on your computer, you can probably stop right here. 3) Set comment.char = "" if there are no commented lines in your file.

What 4 things will R automatically do if you use the read.table function without specifying any arguments other than the file name.

1) Skip lines that begin with a # 2) Figure out how many rows there are (and how much memory needs to be allocated) 3) Figure what type of variable is in each column of the table Telling R all these things directly makes R run faster and more efficiently. 4) The read.csv function is identical to read.table except that the default separator is a comma.

List the 2 different types of tabular formats.

1) Text files 2) CSV files (or comma separated values file)

List 3 advantages of textual formats.

1) Textual formats can work much better with version control programs like subversion or git which can only track changes meaningfully in text files. 2) Textual formats can be longer-lived; if there is corruption somewhere in the file, it can be easier to fix the problem. 3) Textual formats adhere to the "Unix philosophy".

Describe the 2 basic arguments of the vector function.

1) The class of the object, so the type of object that you want to have in the vector. 2) The length of the vector itself.

Describe 2 ways that the reading functions of read.table and read.csv are different.

1) The default separator for the read.csv function is the comma, whereas the default separator for read.table is the space. 2) The function read.csv always specifies the header argument to be equal to true. Recall that the header argument tells the function whether the first line contains the variable names or not or whether the line just, right away contains data.

What 2 things does the dim( ) function do?

1) The dim( ) function allows you to get the dimension of an object. 2) The dim( ) function allows you to set the dimension attribute for an R object.

List 3 different operators that you can use to extract subsets of R objects.

1) The single square bracket - [ 2) The double square bracket - [[ 3) The dollar sign - $

Outside of retaining metadata, list 2 advantages of using the textual format of the output functions of dput and dump.

1) The textual format is editable. 2) In the case of corruption, potentially recoverable.

What are the 2 types of factors used in R?

1) Unordered 2) Ordered

Describe 2 noticeable differences in the auto print output of a factor object when compared to a character vector object.

1) When you auto print the factor object, the output content is not in double quotes as you would see for the auto print of a character vector object. This is despite the input into a factor function being a character vector. 2) When you auto print the factor object, you will see a separate attribute called levels and a print out of the possible values of the factor.

List 4 things you definitely do not want to do when posting to a forum.

1) You do not want to claim that you have found a bug. 2) Groveling as a substitute for doing your homework, is not usually looked well upon, and you definitely shouldn't do that. 3) Do not post homework questions on the mailing list or forums. 4) Do not ever email multiple mailing lists at once. 5) Do not ask others to debug your code without giving some sort of hint as to what the problem might be.

List 4 examples of why you would want to subset a vector.

1) You may only be interested in the first 20 elements of a vector. 2) You may only be interested in the elements that are not NA. 3) You may only be interested in the elements that are positive. 4) You may only be interested in the elements that correspond to a specific variable of interest

List the 5 basic types or atomic classes of objects used in R.

1) character 2) numeric (real numbers) 3) integer 4) complex 5) logical (True/False)

What are the 2 main functions for writing out data?

1) dput 2) dump

List 8 arguments used for the read.table function.

1) file 2) header 3) sep 4) colClasses 5) nrow 6) comment.char 7) skip 8) stringsAsFactors

List 4 connection interfaces used in R.

1) file, opens a connection to a file 2) gzfile, opens a connection to a file compressed with gzip 3) bzfile, opens a connection to a file compressed with bzip2 4) url, opens a connection to a webpage

Name 2 version control programs.

1) git 2) subversion

List the 4 different types of index vectors.

1) logical vectors 2) vectors of positive integers 3) vectors of negative integers 4) vectors of character strings

Give 2 examples of dimensions for an attribute of an R object.

1) matrix 2) array

List 5 different attributes an R object can have.

1) names, dimnames 2) dimensions 3) class 4) length 5) other user-defined attributes/metadata

List 4 different column types that could be used for a data frame.

1) number 2) factor 3) integer 4) logical

List 7 principal functions that can be used for reading data into R.

1) read.table 2) read.csv 3) readLines 4) source 5) dget 6) load 7) unserialize

List 6 analogous (to read functions) writing data functions for writing data to files.

1) write.table 2) writeLines 3) dump 4) dput 5) save 6) serialize

By the convention in R, when coercion occurs between the mixed types of logical and numeric, the logical type of true is represented as _______ and the logical type of false is represented as _______.

1, 0

True or False. The colClasses argument is required for the read.table function.

False. It is not required.

Provide an example of an R command that is a more direct way of creating a matrix rather than using the dim( ) function.

<matrix name> <- matrix(1:10, 2, 5) Here a matrix with <matrix name> is created with 2 rows and 5 columns with contents of 1 through 10.

What is the input into a factor function?

A character vector.

Describe a situation where you may need to use a connection to read the file.

A connection can be useful if you want to read parts of a file rather than the whole file.

What is a csv file?

A csv file is a comma separated value file. It is usually something that you get from a spreadsheet program, like Microsoft Excel or something similar to that. So csv is a very common format that most spreadsheet types of programs will understand. The read.csv function is used in R for reading this type of file.

What is a data frame?

A data frame is a key data type used in R and it is used to store tabular data.

What does double precision real numbers mean?

A double-precision floating-point number is a 64-bit approximation of a real number. The number can be zero or can range from -1.79769E+308 to -2.225E-307, or from 2.225E-307 to 1.79769E+308 with a precision of at least 15 decimal digits. A double-precision constant is an approximation of a real number.

What is a factor?

A factor is a special type of vector, which is used to store categorical data.

Describe how factors are represented by R.

A factor is really an integer vector with a levels attribute (like no and yes).

Contrast a list with a vector.

A list is like a vector except that every element of a list can be an object of a different class and so that makes lists very, very handy for carrying around different types of data. Lists are very useful in R and they are commonly used in conjunction with other types of functions.

Describe a list as it applies to R.

A list is represented as a vector. It is a sequence of objects. Each element of that vector can be a different and can be an object of a different class. For example, you can have a list that has a character, a numeric, and logical types.

Describe the attribute of dimension.

A matrix will have dimensions for example it will have a number of rows and a number of columns. If you have a multidimensional array you will have more than two dimensions.

Multiple objects can be deparsed using the _______ function and read back in using _______ function.

dump, source

What is the most basic object used in R?

A vector.

What does a version control program do?

A version control program tracks changes between documents.

Describe a 3rd option for creating matrices in R other than using the matrix( ) function or by creating a dimension attribute on a vector.

Another way to create a matrix is by binding columns or binding rows. Matrices can be created by column-binding or row-binding with the functions: cbind( ) and rbind( ).

Describe how you can view the names of elements in a vector.

Apply the names( ) function to the vector.

Describe how you can view all aspects of a matrix.

Applying the attributes( ) function on the matrix object will return a list where the first element is the dimension element which is a vector that has a dimension attribute to it where the first number after the index is the number of rows and the second number after the index is the number of columns of the matrix.

Describe a rough calculation of required memory to store a data frame in computer memory before you read in a table into R using either the read.table or read.csv functions.

Assume a data frame with 1,500,000 rows and 120 columns, all of which are numeric data. 1,500,000 × 120 × 8 bytes/numeric = 1440000000 bytes = 1440000000 /2 exp 20 bytes/MB = 1,373.29 MB = 1.34 GB

Sometimes when you evaluate an expression into R console, nothing happens. Why is that?

Because there is nothing to really show.

Explain what happens to classes or types of objects when you use the data.frame( ) function to create a data frame.

Behind the scenes, the data.frame( ) function takes any number of arguments and returns a single object of class `data.frame` that is composed of the original objects and their respective classes. For example, the data frame can store a character vector of names alongside a matrix of numbers.

What is a big advantage of being able to name R objects?

Being able to to give names to R objects is very useful for writing readable code and self-describing objects.

Explain what matrices and data frames have in common.

Both represent 'rectangular' data types, meaning that they are used to store tabular data, with rows and columns.

What do the load and unserialize functions have in common?

Both the load and unserialize functions are for reading binary objects into R.

What is the difference between the text formats of functions dput and dump when compared to the text format of tabular data?

Both the outputs of the dput and dump functions are text formats, but they are not really formatted in a way that is like a table because they contain a little bit more in the form of meta-data.

What is the default setting when a row or a column of elements of a matrix is retrieved or subsetted from a matrix?

By default, when a row or column of elements of a matrix is retrieved, it is returned as a vector rather than a matrix.

What is the default setting when a single element of a matrix is retrieved or subsetted from a matrix?

By default, when a single element of a matrix is retrieved, it is returned as a vector of length 1 rather than a 1 × 1 matrix.

How can you can find out what your working directory is currently set to be in R?

By using the getWD function. Type the following into the R console window: getwd( )

What is the most common connection interface in R?

Connections made to files are most common in R.

True or False. A NA value is also NaN but the converse is not true

False. It is the opposite. A NaN value is also NA but the converse is not true. So an NA value is not necessarily, an NaN value.

Describe data frames.

Data frames are basically represented as a special type of list, where every element of that list has the same length. Each column of the data frame is an element of the list, and in order to be a table, every column has to have the same length. However, each column doesn't have to be the same type. Each element of the list can be thought of as a column and the length of each element of the list is the number of rows.

Describe an alternative way to create a data frame other than read.table( ) or read.csv( ).

Data frames can be created using the data.frame( ) function.

How is a data frame similar to a list?

Data frames can store different classes of objects in each column (just like lists).

How is the amount of memory available determined?

Determine the amount of physical RAM (Random Access Memory) in the specification of your computer. These days in most computers will have on the order of a few gigabytes up to many gigabytes of physical RAM.

Provide an example of creating a matrix by creating a dimension attribute on a vector.

Enter the following R code in the prompt of R console: m <- 1:10 ## creates a vector dim(m) <- c(2, 5) ## apply dimension attribute to vector m ## view resulting matrix

For a given function used in R, how can you quickly find out all the arguments so that you can use that function properly?

Enter the following command at the prompt in the R console: str(<function name>) For example enter the following to see all the arguments for the file function in R: str(file)

Describe the attribute of length.

Every object also has the attribute of length. For a vector, the length of the object is just the number of elements in the vector.

Provide two examples of R code where you use the single bracket operator to return everything but the 2nd and 10th elements from a vector with 100 elements.

Example 1: x[c(-2, -10)] Example 2: x[-c(2, 10)]

Why are factors with labels generally better than using simple integer vectors?

Factors with labels are generally speaking better than using simple integer vectors because the factors are self describing. For example, having a variable that has values male and female is more descriptive than having a variable that just has ones and twos.

True or False. An auto print of a list is the same as that of a vector.

False. An auto print of a list does not print out like a vector because every element is different.

True or False. The paste( ) function can only be used to join elements of a character vector or multiple character vectors.

False. For example, you can use the paste( ) function to join elements of a numeric vector with elements of a character vector.

True or False. For subsetting a list, only the double bracket or dollar sign operators can be used.

False. For subsetting a list, 3 operators can be used: double bracket operator, single bracket operator or dollar sign operator.

True or False. When the matrix( ) function is used to create a matrix, the matrix is not full.

False. If you auto print the object of the matrix you will see the matrix is full and is initialized with NA values.

True or False. Matrices in R are constructed row-wise in R.

False. Matrices are constructed column-wise, so entries can be thought of starting in the "upper left" corner and running down the columns.

True or False. The missing value of NA in an object cannot have a class.

False. NA values can have a class also. There are integer NA, character NA and numeric NA values. So even though it may look like all NAs are the same, the NAs can potentially have different classes.

True or False. Every object in R has attributes.

False. Not every object in R necessarily has attributes, but attributes are a part of an object in R.

True or False. Numbers in R are generally treated as integer objects. So, even if you are looking at a number that is like 1 or 2, R thinks of those numbers as integer objects.

False. Numbers in R a generally treated as numeric objects (i.e. double precision real numbers). So, even if you are looking at a number that is like 1 or 2, R thinks of those numbers as numeric objects not as integer objects. There is a way to explicitly say you want an integer.

True or False. Ordered factors in R have numerical order.

False. Ordered factors represent things that are ranked. Ordered factors have an order but the order is not numerical.

True or False. A successful posting of a question to a mailing list or forum includes the maximum amount of information.

False. Provide the minimum amount of information necessary, not the maximum amount of information. Posts that have lots of output produced are not very helpful. because volume doesn't really help you in terms of diagnosing the problem. You need to know exactly, to narrow down where the problem is going to be.

True or False. R objects can only be identified by single character variables.

False. R objects can have names with multiple characters.

True or False. Setting an nrow argument makes R run faster when working with a large dataset.

False. Setting the nrow argument doesn't make R run faster but it helps with memory usage.

True or False. The dput function will take all types of R objects to create some R code that will essentially reconstruct the object in R.

False. The dput function will take most types of R objects except for some more exotic ones to create some R code that will essentially reconstruct the object in R.

True or False. The single bracket operator can be used to select only one element of an object.

False. The single bracket operator can be used to select more than one element of an object.

True or False. All vectors in R must contain a single type of object.

False. There is an exception. A list is one type of vector that can have multiple different types of classes.

The attributes of dim name and dimension are the same thing.

False. They are different attributes.

True or False. A standard vector in R can contain multiple types of objects or classes.

False. You cannot have a standard vector with mixed types of objects. For example, you cannot have a vector of characters and numerics, or numerics and integers, or integers and logicals in it. Everything in a vector has to be the same class.

How can you identify compressed files?

Files that are compressed with gzip usually have a gz extension and files compressed with bzip2 usually have a bz2 extension.

What does the following expression mean if typed into the R console and then the object x is printed: x <- 1:20

For printing it will create an object called x and it's the sequence one to 20, so the colon operator is used to create a sequence. Here, 1:20, creates a sequence of one, two, three, all the way up to 20. When you auto print x, you will see a long vector. This is an integer vector. When you print the object of x, the first line of the printout will have a 1 next to it, because that is the first element. And then the, the second line has a 16 in brackets because that is, the first element of that line and is the 16th element of this vector.

What is the result of the following expression if typed into the R console: x <-

For the expression x followed by the assignment operator and nothing else, this is not a complete expression and so when you hit Enter nothing will happen because it is waiting for the expression to be completed.

Describe what is happening when you enter the following into R console at the prompt: x <- c(0.5, 0.6)

Here an object is created called x by concatenating 0.5 and 0.6 and that will give you a numeric vector of length 2 where the first element is 0.5 and the second element is 0.6.

Describe what is happening when you enter the following into R console at the prompt: x <- c(TRUE, FALSE)

Here an object is created called x by concatenating TRUE and FALSE and that will give you a logical vector of length 2 where the first element is TRUE and the second element is FALSE.

Say you have used an assignment operator to assign a value of 1 to x. In the print(x) expression, what is the x considered to be?

Here the x is considered to be an R object that is also a numeric object and has one element.

Describe using a logical index with letters to subset and create a logical vector.

Here you create a logical vector which is given a name, and it is a true/false vector. The true/false vector tells you which elements of the character vector are greater than or less than a given character per the logical index.

Say you have used an assignment operator to assign a value of 1 to x. What is the meaning of the following expression: print(x)

Here, print(x) is a function. The symbol x is being passed to the print function, so that when you print out x you get the value of x or 1.

Describe how meta-data can get lost and how to recover it or prevent meta-data from getting lost.

If metadata do not get carried with the data set itself, then it can get lost if they get transferred somewhere else and if you don't remember where the metadata are. For example, metadata such as the classes of the different columns can get lost and then you have to reconstruct that from scratch. Using dput and dump functions prevents the loss of metadata since metadata is included in the output of these functions.

Describe what happens when you mix different types of objects in creating a vector. Will you get an error?

If you create a vector and you mix two different types of objects, R will create the least common denominator type vector. R will not give you an error but what it will do is coerce the vector to be the class that is the least common denominator. This will happen behind the scenes.

Explain why setting comment.char = "" if there are no commented lines in your file is a good idea when working with large datasets.

If you don't specify it then what R does by default is go through every column of your dataset and tries to figure out what type of data it is. That is fine when the dataset is small to moderate. But reading each of these columns and trying to figure out what type of data it is takes time, it takes memory, and it can generally slow things down. If you can tell R, what type of data, is in each column, then R doesn't have to spend the time to figure it out on its own and so, it will generally make read.table run a lot faster and save you a lot of time.

What is a common courtesy extended to readers looking for solutions on mailing lists and forums?

If you find the solution later on, it is useful for everyone else in the community if you follow up with the solution and explain kind of what the problem was and how you solved it.

What precaution must be taken when you use the data.matrix( ) function to convert a data frame to a matrix?

If you have a data frame that has many different types of objects, and then if you coerce that into a matrix, it is going to force each object to be coerced so that they are all the same type. So you may get something that is not exactly as you expected.

Explain why it can be convenient to have all columns have the same class when working with a large dataset. How can you take advantage of this convenience?

If you have a few columns in your dataset, then you can usually just say what the classes are. But if you have columns that are all the same class, for example if all the columns are numeric, you can just set all classes equal to numeric. If you set it to only give a single value, it will just assume that every column has that same value. So if you just say numeric it will assume that every column is numeric. This can make read.table run much faster.

Explain how the nrow argument for the read.table function can be helpful when working with a large dataset.

If you have a huge data set, you can read in maybe the first 100 or first 1,000 rows by specifying the nrow argument.

How do explicitly say you want an integer object for numbers in R?

If you just enter the number 1 in R, that gives you a numeric object. But entering 1 with a capital L next to it explicitly gives you an integer object.

How is inf or infinity mathematically expressed in R?

If you take one and divide it by zero (or 1/0), you will get infinity and if you take 1 and divide it by infinity (or 1/infinity) you will get zero.

Explain what a baseline level is and how it relates to a factor.

In modeling functions and when you include a factor variable it is important to know what is the baseline level. The baseline level is just the first level in the factor, and the way this is determined by R is critical.

If you have a question about a function you can use the ? symbol. But what if you have a question about an operator like the colon?

In the case of an operator like the colon, you must enclose the symbol in backticks like this: ?`:`. (NOTE: The backtick (`) key is generally located in the top left corner of a keyboard, above the Tab key. If you don't have a backtick key, you can use regular quotes.)

What is one noticeable difference in the output of creating a matrix for column-binding or row-binding versus the other options of using the matrix( ) function or by creating a dimension attribute on a vector.

In the output, the respective vector object names are applied to the rows using rbind( ) or the respective vector object names are applied to the columns using cbind( ) rather than an index number that is used in the other options to create a matrix.

Explain what is happening when you type the following expression into the R console: msg <- "hello"

In this expression you are creating a new symbol msg. This symbol is assigned a value of the string hello. This is a character vector. And the first element of this character vector is the string hello. You can add other elements to this vector if you want to, but they would all have to be characters.

Explain the meaning of the following R expression: x <- 1

In this expression, here the symbol that is created is called x, and the value that is assigned to it is 1. The assignment operator is used to create that. So x is 1, is an R expression.

Give an example of an ordered factor.

In universities, you have categories of educators that have an order: assistant professors, associates professors, and full professors.

Why would you want to know what other applications are in use?

It can be helpful to know whether there are other applications that are running on your computer that are eating up some processor time or memory. If you are on a multi-use system or there are other users logged into the system, they could also be using up some of the resources on your computer.

How does the combination of using the nrow argument and determining the classes of each column help in managing a large dataset?

It doesn't necessarily make R run any faster, but it does help with memory usage. And so, you can tell R how many rows are going to be read into R. Then it can calculate the memory that is going to be required and not have to kind of figure it out on the go. So even if you mildly overestimate how many rows there are in the data set, that is okay because it won't make a difference, it will still read the correct number of rows.

Explain why making a rough calculation of the memory required to store your dataset is a good idea when working with large datasets.

It is a good idea to make a very rough calculation of how much memory you need to store the data set you are about to read. This way you can get a sense of if there is enough memory on your computer to store the data set because R will have to store your entire dataset in memory unless you do otherwise. When you call read.table or read.csv it is reading your entire dataset into the RAM of your computer. It is important to know, roughly speaking, how much RAM the dataset will require.

Why is asking others to debug your code without giving the readers some sort of hint as to what the problem might be on the mailing list or forums a bad idea?

It is very difficult when a person posts a long listing of code and says "there is a problem in here somewhere, I don't know where, please help." It is better to specify where you think the problem is and what you are trying to do, so that everyone can save some time.

What is the default values for the following command that creates a vector: x <- vector("numeric", length = 10)

It will initialize the vector, with a default value for numeric vectors and a default value of zero.

Describe how the dump function can be used to reconstruct multiple R objects in R.

Like the dput function (except in this case multiple R objects rather a single R object), the dump function will take multiple R objects to create some R code that will essentially reconstruct multiple objects in R. This is done via the use of a character vector which contains the names of the objects.

Give an example of an unordered factor.

Male and Female.

Explain how indexing in R is different from many other programming languages.

Many programming languages use what is called 'zero-based indexing', which means that the first element of a vector is considered element 0. R uses 'one-based indexing', which means the first element of a vector is considered element 1.

Describe the manner in which matrices are constructed.

Matrices are constructed column wise. So you can think of the matrix taking a vector and all the numbers are inserted into the matrix by column. So the first column gets filled and then when you hit the number of maximum number of rows, then the second column gets filled and the third column and so forth.

Define matrices in R.

Matrices are vectors with a dimension attribute.

How are matrices different from data frames? How is this similar to the difference between vectors and lists?

Matrices like vectors can only contain a single class of data, while data frames like lists can consist of many different classes of data.

Explain how missing values should be handled.

Missing values play an important role in statistics and data analysis. Often, missing values must not be ignored, but rather they should be carefully studied to see if there's an underlying pattern or cause for their missingness.

What is the missing value of NA used for in R?

NA is used for everything else outside of mathematical operations.

Any operation involving NA generally yields _______ as the result. For example if you multiply NA with 3, you will get _______.

NA, NA

Missing values or "not available' values in R are denoted by either _______ or _______.

NA, NaN

Provide the output you will see when the following R code is entered into the R console prompt: x <- 1:3 names(x)

NULL

What is the missing value of NaN used for in R?

NaN is used for undefined mathematical operations.

Describe the special number of NaN.

NaN represents an undefined value so you can name it as "not a number."

If all types of data are not tabular why are data frames considered to be important in R?

Not all types of data are tabular. But because so much data becomes a tabular form. Data frames are very important in R.

What is the treatment of numbers in R?

Numbers in R are generally treated as what are called numeric objects and all numbers are treated as double precision real numbers.

Why would you want to know whether you are running a 32-bit or 64-bit operating system?

On a 64-bit system, you will generally be able to access more memory if the computer has a lot more memory.

What is the downside of the textual format?

One downside of textual formats is that they tend not to be space efficient, so they tend to take up a lot of space, and often need to be compressed.

Describe one key difference between the dollar sign operator and the double bracket operator.

One key way a double bracket operator is different from the dollar sign is that you can use the double bracket operator to index it to a list, where the index itself is computed.

Describe ordered factors in R.

Ordered factors have labels that are categorical and have ranking.

Why is a row name a useful attribute about a data frame?

Row names are useful for annotating the data. So for example, each row in a data frame could represent a subject enrolled in a study, and then the row names would be the subject ID.

What row name is commonly used if row names are not interesting?

Row names of 1, 2, 3, ... etc.

Describe how to use the matrix( ) function to create a matrix object and the numbered contents. Give an example.

Sample R code: m <- matrix(1:6, 2, 3 ) Here 1:6 specifies the numbered contents 1 through 6 of matrix m. The matrix m has 2 rows and 3 columns. The numbered contents of the matrix are filled by going down the rows and once one row is completed advancing to the next column.

How much of an improvement over the default can you expect in the performance of the read.table function of R for large datasets by using the colClasses argument to specify the class of each column in a data frame?

Specifying this option instead of using the default can make 'read.table' run MUCH faster, often twice as fast.

Give some examples of the single square bracket or [ operator always returning an object of the same class as the original.

Subset a vector using single square brackets and you are going to get back a vector. If you subset a list using single square brackets you are going to get back a list. Any time you use the single bracket operator to subset an object, you will get the same object of the same class back.

Logical vectors can contain the values _______, ________, and _______ (for 'not available'). These values are generated as the result of ____________.

TRUE, FALSE, NA, logical 'conditions'

What is the "Unix philosophy"?

The "Unix philosophy" is to store all kinds of data in text.

What is the default comment character used in R?

The # symbol.

You enter code in the R console window and it auto prints the value of the object and you see the following output: [1] 5 What does it mean?

The 1 tells you what element of the vector is being shown. Here the number 5 is the first element of the vector.

What is the R engine? Under what conditions does it work?

The R engine is the interpreter of the R language script you type into the R console. Once you have typed in a syntactically valid and complete expression at the prompt in the R console, when you hit enter the expression is evaluated by the R engine. And the result of that evaluation expression is then returned.

Explain how to get a negation of a logical expression.

The `!` gives you the negation of a logical expression, so !is.na(x) can be read as 'is not NA'.

Describe the advantage of the meta-data included with the text format of dput and dump functions.

The advantage of using the meta-data type of mechanism to store data or to read data is that even though it is still a textual format you don't have to specify it (like the class of each column of the data frame) every single time you read it in.

What does the assignment operator or <- expression do?

The assignment operator is what assigns a value to a symbol.

Describe the attributes function.

The attributes ( ) function allows you to set or modify the attributes for an R object.

How does R determine the baseline level (or first level) for a factor variable?

The baseline level (or first level) is determined using alphabetical order.

Why is posting homework questions on the mailing list or forums a bad idea?

The reason is because people who write the homework questions are reading those mailing lists and will be able to identify all homework questions without a doubt. So we have seen them all, do not bother trying to get the answers to your homework on mailing lists.

Describe an R function that can be used to create vectors of objects.

The c( ) function can be used to create vectors of objects.

Describe the colClasses argument used for the read.table function.

The colClasses is a character vector indicating the class of the data for each column in the dataset. The length of colClasses is the same length as the number of columns of the data set.

Describe the comment.char argument used for the read.table function.

The comment.char argument is a character string indicating the comment character. For example, anything to the right of the specified symbol or character string is ignored as a comment character and the lines of the file that begin with that comment character will be ignored.

If you convert an object with a numeric type into a logical type using the function as.logical, what is the convention that is followed?

The convention that is followed in R is that 0 is false. And any number that is greater than zero is going to be true.

What does the description argument do in the file( ) function?

The description argument is the name of the file.

How does the dget function read data into R?

The dget function is for reading R code files but it's for reading R objects that have been dparsed into text files.

Define the dimension attribute of matrices in R.

The dimension attribute is itself an integer vector of length 2 (nrow, ncol) where nrow is an integer quantity for number of rows and ncol is an integer quantity for number of columns.

Describe what the dollar sign - $ operator extracts as subsets of R objects.

The dollar sign is used to extract elements of a list or data frame that have a name. Very similar objects can have names and one of the reasons names are used in an object is so that you can reference elements of the object by the different names.

Describe what the double square bracket - [[ operator extracts as subsets of R objects.

The double bracket operator is used to extract elements of a list or a data frame. It can only be used to extract a single element of that object, either the list or the data frame. The class of the returned object will not necessarily be a list or a data frame. So the idea with the double bracket operator is that lists can hold things that are of many different classes. They don't all have to be the same classes. So, the first element might be a numeric vector, the second element might be a data frame, the third element might be a complex vector, et cetera.

Contrast the dput and dget functions with the dump and source functions.

The dput and dget functions can only be used on a single R object whereas the dump and source functions can be used on multiple R objects.

Describe what the dput function does.

The dput function takes an arbitrary R object, and it will create some R code that will essentially reconstruct the object in R.

Describe the output of a list when you auto print it.

The elements of the list are indexed by double brackets and each index in double brackets is followed by a vector with the location of the element of that vector in single brackets and then the element of that vector.

If you convert an object with a numeric type into a character type using the function as.character, what is the convention that is followed?

The explicit type change function as.character on an object takes all the numbers and converts them into characters. In the following example you have the string zero, the string one, two etc. x <- 0:6 as.numeric(x) Output: [1] 0 1 2 3 4 5 6 as.character(x) Output: [1] "0" "1" "2" "3" "4" "5" "6"

Describe the file argument used for the read.table function.

The file argument is the name of a file, or a connection. The file name is a string and the connection is a path to a certain file in your computer.

What is the first step in crafting the body of the message when posting a question to a mailing list or a forum? Why is this prudent?

The first step is to describe the goal, not the steps. You may have many steps that you are going through, and maybe one of those steps is causing a problem. It is useful for other people to know what the bigger picture is in terms of what you are trying to do, because for example, they might have a better idea about how to go about achieving that goal, which may be faster or simpler, and may work around whatever problem you are having. Describe the ultimate goal, and then talk about what the problems are. And don't just narrow it down to the one little step that you are having a problem with. Be explicit about your questions. So remember, provide details about what you are trying to do.

How does the readLines function read data into R?

The function readLines is for reading lines of a text file (or of any type of file) and puts the text in a character vector in R.

What is the meaning of the hash symbol or # when typed into the R console? How is it useful?

The hash symbol indicates that everything to the right of that is a comment and is ignored by the R engine. You can put things like comments or notes to yourself in code and R will just ignore those comments.

What does the head( ) function do?

The head( ) function can be used to preview the first 6 lines of a dataset.

Describe the header argument used for the read.table function.

The header is a logical flag indicating whether the first line is a header line in the file. For example, if the first line in the file has all variable names in it, then that is not really a piece of data, that is just a line that has labels on it. The header argument tells the read.table function whether the first line contains the variable names or not or whether the line just, right away contains data.

Explain why reading the help page for the read.table function is useful when working with large datasets.

The help page has a lot of important information for how to optimize the read.table function when working with large datasets.

Describe how you can test objects to see if there are missing values of NA.

The is.na( ) function is used to test objects if they are NA.

Describe how you can test objects to see if there are missing values of NaN.

The is.nan( ) function is used to test objects if they are NaN.

What is in the meta-data included with the text format of dput and dump functions? Why is this feature useful?

The meta-data has the class of each column of the data frame so that you don't have to specify it when you read it in.

What is the most common attribute you will encounter in using R?

The most common type of attribute that you will encounter is name or dim name.

Why would you want to subset an element from a list using a name?

The nice thing about being able to subset an element using its name is that you don't have to remember where it is in the list. So if you cannot remember whether bar it is the first element or is the second element, you don't have to remember where it is in order to use the numeric index. You can just use its name and then you don't have to. It will automatically extract that element from the list.

Describe the nrow argument used for the read.table function.

The nrow argument provides the number of rows in the dataset.

Say you have a factor variable where no is the baseline level (or first level) and yes is the second level (the default alphabetical order used in R). Explain how you can switch the baseline level (or first level) so yes is the baseline level and no is the second level for the factor variable.

The order of the levels can be set using the levels argument to factor( ). See R code below: x <- factor(c("yes", "yes", "no", "yes", "no"), levels = c("yes", "no")) ## placement of "yes" before "no"

What is one of the most commonly used functions for reading data?

The read.table function is one of the most commonly used functions for reading data.

Explain the similarity of the dollar sign - $ operator and the double square bracket - [[ operators.

The semantics of the dollar sign are similar to the double bracket in the sense that when you use the dollar sign to extract an element of an object it may or may not be of the same class as the original object.

Describe the sep argument used for the read.table function.

The sep argument stands for separator. It is a string indicating how the columns are separated. For example, if you have a file that is separated by commas then the separator is a comma. You may have files separated by semicolons or by tabs or by spaces. It is important to tell the read.table function how the columns are separated and this is done via the sep argument.

What is a basic principle to remember about the single square bracket or [ operator for extracting subsets of R objects?

The single square bracket or [ operator always returns an object of the same class as the original.

Describe the skip argument used for the read.table function.

The skip argument is the number of lines to skip from the beginning. So sometimes there may be some header information or some non-data region at the beginning of the file, and you want to skip right over that. You can tell the read.table function to skip the say the first ten lines of the first 100 lines and then only start reading data after that.

How does the source function read data into R?

The source function will read anything written to a file (such as R functions) and read all that code into R.

Describe the stringsAsFactors argument used for the read.table function.

The stringsAsFactors argument answers the question of should character variables be coded as factors? The default value of stringsAsFactors is true. Anytime the read.table function encounters a column of data that looks like it is a character variable, it will assume that what you mean to read in, is a factor variable. If you do not mean to read this in as a factor variable, then you can set stringsAsFactors argument equal to false.

Describe how you can view the frequency count of each level of a factor object.

The table( ) function can be applied to a factor and it will give you a frequency count of how many of each level there are in that factor.

Why can you not use the dollar sign operator to index it to a list, where the index itself is computed in order to extract an element from a list?

The way the dollar sign works is that it is looking for an element of the list that has the word name associated with it, and that does not exist in the list. So to use the dollar sign, you have to use a literal symbol from the list.

What function can you use in R to pass lines of the character vector where each element of the character vector becomes a line in a text file.

The writeLines function takes a character vector and writes each element one line at a time to a text file.

What is the output when you type in the following into the R console prompt: x <- c("a", "b", "c") as.numeric(x)

There is really no way to convert the characters of a, b, and c to numerical variables so you get the following output: [1] NA NA NA Warning message: NAs introduced by coercion

Describe user-defined attributes or meta-datas.

These are attributes that you can define separately, for an object using various attribute functions.

How do the read.table and read.csv functions read data into R?

These functions read text files that contain data that are stored in rows and columns type of format and return a data frame in R.

By default, when a single element of a matrix is retrieved, it is returned as a vector of length 1 rather than a 1 × 1 matrix. How can this default setting be changed? Why this default?

This behavior can be turned off by setting drop = FALSE when calling the matrix object. This adds an extra argument to the subsetting operation, which is called drop. The idea is that by default, drop is equal to true, and it drops the dimension. Rather than getting a two dimensional object back, you get a one dimensional object back. If you want to preserve the dimensions of the object (i.e. matrix), you can say drop equal to false.

What happens when you convert an object of some other type into the complex type using the as.complex function?

This function changes the type of the object to a complex number where all the imaginary components are zero.

Why is claiming you found a bug on a forum or mailing list a bad idea?

This happens all the time, and usually, 99 times out of 100, it is not a bug, and it is just a misunderstanding about what should have happened, so a mistake in the expectations of the user.

Why is emailing multiple mailing lists or posting to multiple forums at once a bad idea?

This is a little bit annoying, because people will subscribe to different mailing lists and will be getting your message more than one time.

Which is the best operator (single bracket, double bracket or dollar sign) for extracting multiple elements of a list, like the first and third elements from a list?

To extract multiple elements of a list then you need to use the single bracket operator.

True or False. A key property of vectors in R is that all elements of a vector must be of the same class.

True

True or False. A non-sensical coercion will result in a vector of NAs and a warning message.

True

True or False. As a general rule, vectors can only contain elements of the same class.

True

True or False. Be cautious when using logical expressions anytime NA values might creep in, since a single NA value can derail the entire thing.

True

True or False. Because different operating systems have different conventions with regards to things like file paths, the outputs of these commands may vary across machines. It is important to note that R provides a common API (a common set of commands) for interacting with files, that way your code will work across different kinds of computers.

True

True or False. Connections can be very powerful and they can let you navigate files and other external objects in a more sophisticated way than just, like, reading the whole thing.

True

True or False. Even if you have more than adequate memory (twice the calculated amount) to store a data frame in computer memory it will still take some time to read in all the data but you won't be running out of memory.

True

True or False. Every row of a data frame has a row name.

True

True or False. For small to moderately sized datasets, you can usually call read.table without specifying any other arguments.

True

True or False. Generally you don't have to deal with the connect interface in many cases, but sometimes it's useful.

True

True or False. In modeling functions and when you include a factor variable sometimes it is important to know what is the baseline level (or first level).

True

True or False. It is possible to strip out the classes or the labels from a factor and reduce it to an integer vector.

True

True or False. It is possible to subset nested elements of a list.

True

True or False. Tabular data make up a lot of what we use in statistics.

True

True or False. The # character indicates a comment. Anything to the right of the # (including the # itself) is ignored.

True

True or False. The 5 basic types (logical, character, numeric, complex, integer) or atomic classes of objects used in R, also apply to vectors since vectors are objects in R.

True

True or False. The grammar of the R language determines whether an expression is syntactically correct or not or whether it is complete.

True

True or False. The idea behind the connection interface is that it abstracts out the mechanism for connecting to different types of objects that are external to R, whether they be files, or webpages, or whatever.

True

True or False. The nrow argument is not required for the read.table function.

True

True or False. The order of the levels in the factor, can be set using the levels argument to factor( ).

True

True or False. The output from the is.na( ) and is.nan( ) functions to test objects is a logical vector.

True

True or False. The principal functions of read.table and read.csv are the two most commonly used functions for reading data into R.

True

True or False. The reading functions of read.table and read.csv are pretty much identical except for 2 differences.

True

True or False. The special number of inf is like a real number and it can be used in calculations and you will get an expected result.

True

True or False. Unlike writing out a table or csv file, the functions of dump and dput preserve the metadata (sacrificing some readability), so that another user doesn't have to specify it all over again.

True

True or False. Using dput and dump functions prevents the loss of metadata since metadata is included in the output of these functions.

True

True or False. When a complete expression is entered at the prompt, it is evaluated and the result of the evaluated expression is returned. The result may be auto-printed.

True

True or False. When you have a question that you want to post to a mailing list (or forum) it is important to figure out which mailing list (or forum) is the most appropriate mailing list (or forum) for your question and then send the message to that mailing list (or forum).

True

True or False. When you use the double bracket operator to extract an element of a list, the object that comes back may not be a list, it may be an object of a totally different class. So that is what the double bracket operator is useful for.

True

True or False. You can think of a factor as an integer vector where each integer has a label. For example, you can think of the vector as one, two and three, where one represents a high value and two represents a medium value and three represents a low value.

True

True or False. You cannot use the double bracket or the dollar sign operators when you extract multiple elements of a list.

True

True or False. You should always make sure that what you are asking for is within the bounds of the vector you are working with.

True

True or False. The output from the !is.na( ) and !is.nan( ) functions to test objects is a logical vector.

True. It is the opposite in the sense 'is not NA' but still a logical vector.

True or False. Being courteous on forums and mailing lists never hurts anyone.

True. Promoting civility on mailing lists is always a nice thing.

True or False. A standard vector in R can contain multiple copies of a single type of object.

True. You can have a vector of only characters or a vector of only integers.

Say you have used an assignment operator to assign a value of 1 to x. Describe an alternative to using the print(x) expression to print the value of x or 1.

Type X at the prompt in the R console and when you hit enter what happens is it prints out the value of X. This is another way to print out an object without explicitly calling the print function.

Describe how you can join the elements of a character vector (i.e. a character vector of length more than 1) together into one continuous character string (i.e. a character vector of length 1).

Type the following command into R: paste(<character vector name>, collapse = " ") Note: Make sure there's a space between the double quotes in the `collapse` argument. The `collapse` argument to the paste( ) function tells R that when you join together the elements of the <character vector name> character vector, we would like to separate them with single spaces.

How is a data frame different from a matrix?

Unlike matrices, data frames can store different classes of objects in each column; matrices must have every element be the same class.

Describe unordered factors in R.

Unordered factors have labels that are categorical but have no ordering.

Describe how to combine logical expressions. Provide an example.

Use the ampersand or & symbol. The output from the following example of R code is requesting only values of vector x that are both non-missing AND greater than zero: x[!is.na(x) & x > 0]

Describe how you can have all columns be the same class when working with a large dataset.

Use the colClasses argument. In order to use this option, you have to know the class of each column in your data frame. If all of the columns are "numeric", for example, then you can just set colClasses = "numeric".

How do you find the dimensions of an object in R? Does this work for vectors?

Use the dim( ) function to find the dimensions of an object. Note that this will not work for a vector. If you try to apply it to a vector you will get NULL. The output is 2 numbers: the 1st number is the number of rows or observations and the 2nd number is the number of columns or variables.

Describe how names can be added as labels to a matrix.

Use the dimnames( ) function to assign it a list where the first element of the list is a vector of row names and the second element of the list is a vector of column names. The result will be row and column names added as labels to the matrix.

Describe how you can compare 2 matrices to confirm if they are identical.

Use the identical( ) function. Example code: identical(<matrix name 1>, <matrix name 2>) Output: either TRUE or FALSE

Describe how you can compare 2 vectors to confirm if they are identical.

Use the identical( ) function. Example code: identical(<vector name 1>, <vector name 2>) Output: either TRUE or FALSE

Since the dim( ) function does not work on vectors, what does?

Use the length( ) function to find the length of a vector.

How can you determine the number of items in a vector?

Use the length( ) function. Where the object name of the vector is the argument and the output is the number of items in the vector.

Describe how you can add names to the elements of a vector.

Use the names( ) function when defining the vector and its elements.

Describe how you can find the number of columns in a data frame.

Use the ncol( ) function.

Describe how you can find the number of rows in a data frame.

Use the nrow( ) function.

Describe how you can read parts of a file.

Use the readLines function to read parts of a file.

Describe how you can count the number of NA or missing values in a vector.

Use the sum( ) function with the is.na( ) function. See example below: sum(is.na(<vector name>)) Here R represents TRUE as the number 1 and FALSE as the number 0. So the sum( ) function can total up the values of 1 in the vector.

Describe how you can extract the last few rows in a data frame.

Use the tail( ) function to extract the last few elements of an R object.

Describe how you can create a vector of a certain length and a certain type.

Using the vector( ) function, you can create, a vector of a certain type and a certain length.

Describe what happens when you attempt to use the paste( ) function to join elements of multiple character or numeric vectors and the vectors are of different lengths?

Vector recycling occurs. R simply recycles the joining, or repeats the joining of the shorter vector until it matches the length of the longer vector.

Describe why textual formats work better with version control software.

Version control programs tend to be much more useful with textual data rather than binary data and lets you track changes meaningfully. Textual formats adhere to the general kind of Unix philosophy.

Describe how names can be added to the elements of a list.

When defining the elements of the list in the list( ) function, for each element pick a name and set it equal to a value.

What happens when the explicit coercion of an object from one type to another doesn't work?

When it doesn't work you get output of what are called NA values and a "Warning message: NAs introduced by coercion".

Describe how to subset an entire column or row of a matrix.

When specifying the matrix subset leave a blank space for the respective row or column you wish to exclude. The respective row or column included in the specification will be produced in the output.

Provide a description of the read.table function opening up a connection to a file.

When you call read.table with it and you pass it the name of a file, what it does behind the scenes is it opens up a file connection to that file, and then reads from that file connection.

What is auto printing?

When you just type an object's name and hit enter. R will by default auto print the value of that object. This is the same as calling the print function on that object which will just print out the value of that object. So you can explicitly print an object or you can auto print an object.

What happens to the type or class of the joined vector when you use the paste( ) function to join elements of a character vector to elements of a numeric vector?

With the joining of elements, the numeric vector is coerced into a character vector.

What is another way to create a matrix?

You can create a matrix by creating the dimension attribute on a vector. So you are taking a vector and transforming it into a matrix that is the specified number of rows and columns.

Describe how a data frame can be used to create a matrix.

You can create a matrix from a data frame by calling the data.matrix( ) function. This converts a data frame to a matrix.

Provide an example of R code that creates a vector of complex numbers.

You can create a vector of complex numbers where the i is a special symbol, which indicates the imaginary part of the complex number: x <- c(1+0i, 2+4i)

Provide an example of R code that creates an integer vector.

You can create integer vector by creating a sequence with colon operator: x <- 9:29

How can you get the column names for a data frame?

You can get the column names for a data frame with the names( ) function.

What makes a list in R so useful?

You can have a list that is inside the list and one element of the list can be a data frame so, any element of the list can be anything. And that is what makes a list so useful.

Describe how you can join the elements of multiple character vectors into one character vector of length 1.

You can join two character vectors that are each of length 1 (i.e. join two words) with the following command: paste("<character element 1>", "<character element 2>", sep = " ") Note: here the `sep` argument tells R that we want to separate the joined elements with a single space.

Why would you want to specify as many arguments as possible for a larger dataset? Why is this not so useful for a small to moderate-sized dataset?

You can specify arguments for R when reading files so it can run faster and more efficiently. This works well for larger datasets. However, for small and moderate-sized datasets there is really not much of an advantage to doing that because because it will already be pretty fast and pretty efficient as it is.

Describe using a logical index with letters to subset.

You can subset letters using a logical index. The greater than sign or less than sign can be used with letters instead of just numbers. This is possible because there is a lexicographical ordering to the letters. For example all the letters that are greater than a are letters like b, c, d, e, and so forth.

How can you determine the number of lines in a file or dataset?

You can use the Unix tool wc to calculate the number of lines in a file.

How can you determine what arguments a function in R can take?

You can use the args( ) function on a function name to see what arguments a function can take.

How can you get the number of rows for a data frame?

You can use the nrow( ) function to compute the number of rows in a data frame.

Describe how you can read elements from a web page into R.

You can use the readLines function to read elements from a web page. You can use the URL function to create a connection to a website. Here the URL function is useful for creating a connection to a non file object. and then using readLines function to read the text from that connection with the text being stored in a character vector.

Describe how you can strip out the class attributes or the labels from a factor and reduce it to an integer vector.

You can use the unclass( ) function on a factor object and bring it down to an integer vector object and see the underlying.

Say you have used an assignment operator to assign a value of 1 to x. What is another way to view a numeric object that has one element?

You can view it as a numeric vector where the first element is the number one.

What does the following command do when entered in the R console: seq(5, 10, length=30)

You get a vector of 30 numbers ranging from 5 to 10 in increments of 1/6.

Why is your rough calculation of required memory to store a data frame in computer memory not adequate? How much memory is adequate?

You will actually need a little bit more memory than what you have calculated to read the data in. This is because there is a little overhead required for reading the data in. A rule of thumb is that will need almost twice as much memory to read a dataset into R using read.table then the object itself requires. So if your computer only has two gigabytes of RAM and you are trying to read in a 1.34 gigabyte table, you might want to think twice about trying to do it

What happens when you enter the following command into the R console for a vector with missing values: <vector name> == NA Why do you not get the same results as the following command: is.na(<vector name>)

You will get a vector of all NA values. This is because NA is not really a value, but just a placeholder for a quantity that is not available. Therefore the logical expression is incomplete and R has no choice but to return a vector of the same length as <vector name> that contains all NA values.

What happens if there is not enough memory to store a data frame in computer memory? How can you be certain memory is the source of the problem?

You will get errors. Doing the calculation of required memory will help you know whether the error is because of memory, whether running out of memory or not. Doing this kind of calculation is enormously useful when you are reading in large data sets because it can give you a sense of if you have enough memory.

The following R code is entered to create a matrix: m <- matrix(nrow = 2, ncol = 3) What is the output when you auto print m?

[,1] [,2] [,3] [1,] NA NA NA [2,] NA NA NA

Provide the output you will see when the following R code is entered into the R console prompt: x <- 1:3 y <- 10:12 rbind(x, y)

[,1] [,2] [,3] x 1 2 3 y 10 11 12

Provide the output you will see when the following R code is entered into the R console prompt: m <- 1:10 dim(m) <- c(2, 5) m

[,1] [,2] [,3] [,4] [,5] [1,] 1 3 5 7 9 [2,] 2 4 6 8 10

Provide the output you will see when the following R code is entered into the R console prompt: names(x) <- c("foo", "bar", "norf") names(x)

[1] "foo" "bar" "norf"

What is the output in the R console when the following commands are entered: x <- vector("numeric", length = 10) x

[1] 0 0 0 0 0 0 0 0 0 0

What is the output if you enter the following into the prompt of the R console: 15:1

[1] 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

Provide the output you will see when the following R code is entered into the R console prompt: x <- data.frame(foo = 1:4, bar = c(T, T, F, F)) ncol(x)

[1] 2

Provide the output you will see when the following R code is entered into the R console prompt: x <- factor(c("yes", "yes", "no", "yes", "no")) unclass(x)

[1] 2 2 1 2 1 attr(,"levels") [1] "no" "yes"

What is the output if you enter the following into the prompt of the R console: pi:9

[1] 3.141593 4.141593 5.141593 6.141593 7.141593 8.141593 The result is a vector of real numbers starting with pi (3.142...) and increasing in increments of 1. The upper limit of 9 is never reached, since the next number in our sequence would be greater than 9.

Provide the output you will see when the following R code is entered into the R console prompt: x <- data.frame(foo = 1:4, bar = c(T, T, F, F)) nrow(x)

[1] 4

Provide the output you will see when the following R code is entered into the R console prompt: x <- c(1, 2, NA, 10, 3) is.nan(x)

[1] FALSE FALSE FALSE FALSE FALSE

Provide the output you will see when the following R code is entered into the R console prompt: x <- c(1, 2, NA, 10, 3) is.na(x)

[1] FALSE FALSE TRUE FALSE FALSE

Provide the output you will see when the following R code is entered into the R console prompt: x <- c(1, 2, NaN, NA, 4) is.nan(x)

[1] FALSE FALSE TRUE FALSE FALSE

Provide the output you will see when the following R code is entered into the R console prompt: x <- c(1, 2, NaN, NA, 4) is.na(x)

[1] FALSE FALSE TRUE TRUE FALSE

Provide the output you will see when the following R code is entered into the R console prompt: x <- factor(c("yes", "yes", "no", "yes", "no")) x

[1] yes yes no yes no Levels: no yes

Provide the output you will see when the following R code is entered into the R console prompt: x <- factor(c("yes", "yes", "no", "yes", "no"), levels = c("yes", "no")) x

[1] yes yes no yes no Levels: yes no

Provide the auto print output of the following list: x <- list(1, "a", TRUE, 1 + 4i)

[[1]] [1] 1 [[2]] [1] "a" [[3]] [1] TRUE [[4]] [1] 1+4i

Objects can be explicitly coerced from one class to another using the ________________, if available

as.* functions

If you want to convert an object into a character type you can use the function called ____________.

as.character

If you want to convert an object into a logical type you can use the function called ____________.

as.logical

If you want to convert an object to a numeric type you can use the function called ____________

as.numeric

The <- expression is called the _______________.

assignment operator

Provide the R command that is an alternative command to dim( ) to view the dimensions of an object.

attributes(<object name>)

There are two logical values in R, also called ___________. They are TRUE and FALSE. In R you can construct logical expressions which will evaluate to either TRUE or FALSE.

boolean values

Provide the output you will see when the following R code is entered into the R console prompt: m <- matrix(1:4, nrow = 2, ncol = 2) dimnames(m) <- list(c("a", "b"), c("c", "d")) m

c d a 1 3 b 2 4

The ___________ function treats vectors as if they were columns of a matrix. It then takes those vectors and binds them together column-wise to create a matrix.

cbind( )

What is the type of object in the following vector: y <- c("a", TRUE)

character

What is the type of object in the following vector: y <- c(1.7, "a")

character

For the following types, provide the order of coercion when you mix different types of objects in creating a vector: numeric, character and logical.

character over numeric numeric over logical character over logical character > numeric > logical

Say you have a dataset and you would like to know in what format the variables have been stored. How can you find the class of each variable in the dataset?

class( ) function

What is a function you can use to determine the type or class of an object?

class( ) function

When different objects are mixed in a vector, ____________ occurs so that every element in the vector is of the same class.

coercion

The [[ or double bracket operator can be used with ____________ indices; $ or dollar sign operator can only be used with ___________ names.

computed, literal

Provide the R code using the connect interface that is equivalent to the following R code: data <- read.csv("foo.txt")

con <- file("foo.txt", "r") data <- read.csv(con) close(con)

Provide an example of R code that opens a connection to a compressed text file and reads the first 10 lines of the text file into a character vector.

con <- gzfile("words.gz") x <- readLines(con, 10) x

You can think of the c in the c( ) function as standing for ____________ since it can be used to ___________ objects together.

concatenate, concatenate

Functions in R that help R to interface with the outside world are called ___________. Data are read in using ___________ interfaces.

connections, connection

Provide an example of R code that uses the read.table function.

data <- read.table("foo.txt")

Provide an example of R code that sets the dimension for an R object. For a vector, what does setting the dimensions do?

dim(<object name>) <- c(4,5) Here the dimensions of the object are given 4 rows and 5 columns. If the object name is a vector this command effectively converts the vector to a matrix.

Another name for the attribute of dim name is ________________.

dimension name

What is the inverse of the reading in R code files function of dget?

dput

Another way to pass data around is by deparsing the R object with ________ and reading it back in using _______.

dput, dget

The _______ function, essentially writes R code, which can be used to reconstruct an R object using _______.

dput, dget

The output functions of _______ and _______ and the input functions of _______ and _______ are useful for preventing the loss of metadata.

dput, dump, dget, source

What is the inverse of the reading in R code files function of source?

dump

The things that we type into the R prompt on the R console window are called ___________.

expressions

The ________ function is the function that opens a connection to a standard uncompressed file. So this, this can be useful for text files, and for reading in other types of text files.

file

Provide the output you will see when the following R code is entered into the R console prompt: x <- data.frame(foo = 1:4, bar = c(T, T, F, F)) x

foo bar 1 1 TRUE 2 2 TRUE 3 3 FALSE 4 4 FALSE

Provide the output you will see when the following R code is entered into the R console prompt: names(x) <- c("foo", "bar", "norf") x

foo bar norf 1 2 3

___________ and __________ file functions, are used for opening connections to compressed data files.

gzfile, bzfile

The gzfile function is used for files that are compressed with the ________ algorithm and bzfiles function is used for opening connections to files compressed with the ___________ algorithm.

gzip, bzip2

Give an example of baseline level (or first level) for a factor variable.

if you create a factor variable, with the elements yes and no, then the baseline level with be the first level that is encountered and because no comes before yes in the alphabet then no will be the baseline level (or first level) and yes will be the second level.

Provide the R code for a quick and dirty way to find out the classes of each column in a dataset named datatable.txt.

initial <- read.table("datatable.txt", nrow = 100) classes <- sapply(initial, class) tabAll <- read.table("datatable.txt", colClasses = classes)

The : operator is used to create ___________ sequences.

integer

Modeling functions like ________ and ________ give special treatment to factors.

lm( ), glm( )

The _______ function is for reading data in saved workspaces into R.

load

Give an example of R code where the matrix( ) function is used to create an empty matrix where it is explicitly stated how many rows and how many columns there are.

m <- matrix(nrow = 2, ncol = 3)

Provide an example command of R code that will give you the dimensions of a matrix object. Explain how to interpret the results.

m <- matrix(nrow = 2, ncol = 3) ## creates matrix m dim(m) ## dimension function Output: [1] 2 3 Here, you can interpret the output of the dimension vector as 2 rows and 3 columns for the matrix object m.

In R, to take the average of an object, use the ___________ function.

mean( )

The ___________ function treats vectors as if they were rows of a matrix. It then takes those vectors and binds them together row-wise to create a matrix.

rbind( )

If you create a vector with an integer sequence, by default, there is no _______ for the elements of the vector and when you apply the names( ) function to the vector it gives you a _______ value.

names, null

What is considered the most important type of object used in R?

numbers or numeric

What is the type of object in the following vector: y <- c(TRUE, 2)

numeric

For numeric objects, their class is ____________ and for integer objects, their class is ___________.

numeric, integer

List the classes of data for each column in the dataset that could be defined by the colClasses argument.

numeric, logical, factor, etc

All the things that you manipulate in R, all the things that we encounter in R, are called ___________. ___________ can be all different kinds and can contain all different kinds of data. Everything in R, is an __________.

objects, Objects, object

What does the open argument do in the file( ) function? What possible values can be specified for open?

open is a code indicating: "r" read only "w" writing (and initializing a new file) "a" appending "rb", "wb", "ab" reading, writing, or appending in binary mode (Windows)

Data frames are usually created by calling ___________ or ___________.

read.table( ), read.csv( )

The principal functions of _______ and _______ are for reading tabular data into R.

read.table, read.csv

Describe a function in R that can create a vector containing 1000 draws from a standard normal distribution.

rnorm(<number of draws>)

Matrices can be subsetted so the first index is going to be the ___________, the second index is going to be the ___________.

row index, column index

What is a special attribute about data frames?

row.names

Describe a function that selects 100 elements at random from a vector of over 100 elements.

sample(<vector object name>, 100)

Often, we will desire more control over a sequence we're creating than what the `:` operator gives us. The _______ function serves this purpose.

seq( )

Say you want a vector of numbers ranging from 0 to 10, incremented by 0.5. Provide the command in R.

seq(1, 10, by=0.5)

Provide the equivalent of the command of 1:20 using the seq( ) function.

seq(1, 20)

Provide two equivalent R commands to the following R command (note that my_seq is set to a value of 30): 1:length(my_seq)

seq(along.with = my_seq) seq_along(my_seq)

In R, to take the square root of an object, use the _______ function and to take the absolute value of an object, use the _______ function.

sqrt( ), abs( )

The output from dump and dput are ________ formats.

textual

By the convention in R, when coercion occurs between the mixed types of character and numeric, the logical type of true is represented as ______________ and the logical type of false is represented as ______________.

the string: TRUE, the string: FALSE

The _______ function is for reading single R objects in binary form into R.

unseriaiize

In R, a list is represented by a ____________.

vector

The simplest and most common data structure in R is the ___________.

vector

You can create an empty vector with the ______________.

vector( ) function

Provide the output you will see when the following R code is entered into the R console prompt: x <- factor(c("yes", "yes", "no", "yes", "no")) table(x)

x no yes 2 3

Provide an example of R code that passes data around by deparsing R objects with the dump function and reading them back in using the source function.

x <- "foo" y <- data.frame(a = 1, b = "a") dump(c("x", "y"), file = "data.R") rm(x, y) source("data.R") y x

Provide an example using the as.* function to explicitly coerce objects from one class to another.

x <- 0:6 ## creates vector with integer object type as.numeric(x) ## converts type from integer to numeric Ouput: [1] 0 1 2 3 4 5 6

Provide an example command that shows the current type of an object you have created.

x <- 0:6 ## creates vector with integer object type class(x) ## query that asks the type of the object Output: "integer"

Provide an example of R code that creates a character vector that concatenates a bunch of characters.

x <- c("a", "b", "c")

Provide an example of R code with output where you use the single bracket operator to create a logical vector from a sequence of elements of a character vector using a logical index. The result is a logical vector with multiple elements in it.

x <- c("a", "b", "c", "c", "d", "a") u <- x > "a" u Output: [1] FALSE TRUE TRUE TRUE TRUE FALSE

Provide an example of R code with output where you use the single bracket operator to create a character vector from a sequence of elements of a character vector using a logical index. The result is a character vector with multiple elements in it.

x <- c("a", "b", "c", "c", "d", "a") u <- x > "a" x[u] Output: [1] "b" "c" "c" "d"

Provide an example of R code with output where you use the single bracket operator to extract a sequence of elements of a character vector using a numeric index. The result is another character vector with multiple elements in it. This is also known as an index vector.

x <- c("a", "b", "c", "c", "d", "a") x[1:4] Output: [1] "a" "b" "c" "c"

Provide an example of R code with output where you use the single bracket operator to extract the first element of a character vector using a numeric index. The result is another character vector with a single element in it. This is also known as an index vector.

x <- c("a", "b", "c", "c", "d", "a") x[1] Output: [1] "a"

Provide an example of R code with output where you use the single bracket operator to extract the second element of a character vector using a numeric index. The result is another character vector with a single element in it. This is also known as an index vector.

x <- c("a", "b", "c", "c", "d", "a") x[2] Output: [1] "b"

Provide an example of R code with output where you use the single bracket operator to extract a sequence of elements of a character vector using a logical index. The result is another character vector with multiple elements in it.

x <- c("a", "b", "c", "c", "d", "a") x[x > "a"] Output: [1] "b" "c" "c" "d"

Provide an example of R code that can be used as an alternative to the following logical vector and has the same objective: x <- c(TRUE, FALSE)

x <- c(T, F)

Provide an example of R code that produces a factor function.

x <- factor(c("yes", "yes", "no", "yes", "no"))

Provide an example of R code that uses the list( ) function.

x <- list(1, "a", TRUE, 1 + 4i)

Provide an example of R code with output where the double bracket operator can take an integer sequence to recurse into the list to extract nested elements of a list. This example should use double sub-setting.

x <- list(a = list(10, 12, 14), b = c(3.14, 2.81)) x[[1]][[3]] Output: [1] 14

Provide an example of R code with output where the double bracket operator can take an integer sequence to recurse into the list to extract nested elements of a list. This example should use the c( ) function to create a vector of objects.

x <- list(a = list(10, 12, 14), b = c(3.14, 2.81)) x[[c(1, 3)]] Output: [1] 14

Provide a second example of R code with output where the double bracket operator can take an integer sequence to recurse into the list to extract nested elements of a list. This example should use the c( ) function to create a vector of objects.

x <- list(a = list(10, 12, 14), b = c(3.14, 2.81)) x[[c(2, 1)]] Output: [1] 3.14

Provide an example of R code with output where you use the dollar sign operator with the name of the element to return one element from a list with two elements. The result is an element with a different class as the list from which it is subsetted. In this case, the class of the output is just a sequence.

x <- list(foo = 1:4, bar = 0.6) x$bar Output: [1] 0.6

Provide an example of R code with output where you use the single bracket operator with the name of the element to return one element from a list with two elements. The result is an element with the same class as the list from which it is subsetted. In this case, the class of the output is also a list.

x <- list(foo = 1:4, bar = 0.6) x["bar"] Output: $bar [1] 0.6

Provide an example of R code with output where you use the single bracket operator to return one element from a list with two elements. The result is an element with the same class as the list from which it is subsetted. In this case, the class of the output is also a list.

x <- list(foo = 1:4, bar = 0.6) x[1] Output: $foo [1] 1 2 3 4

Provide an example of R code with output where you use the double bracket operator with the name of the element to return one element from a list with two elements. The result is an element with a different class as the list from which it is subsetted. In this case, the class of the output is just a sequence.

x <- list(foo = 1:4, bar = 0.6) x[["bar"]] Output: [1] 0.6

Provide an example of R code with output where you use the double bracket operator to return one element from a list with two elements. The result is an element with a different class as the list from which it is subsetted. In this case, the class of the output is just a sequence.

x <- list(foo = 1:4, bar = 0.6) x[[1]] Output: [1] 1 2 3 4

Provide an example of R code with output if you attempt to use a dollar sign operator to index it to a list, where the index itself is computed. This is in order to extract an element from a list.

x <- list(foo = 1:4, bar = 0.6, baz = "hello") name <- "foo" x$name ## element 'name' doesn't exist! Output: NULL

Provide an example of R code with output where you can use the double bracket operator to index it to a list, where the index itself is computed. This is in order to extract an element from a list.

x <- list(foo = 1:4, bar = 0.6, baz = "hello") name <- "foo" x[[name]] ## computed index for 'foo' Output: [1] 1 2 3 4

Provide an example of R code with output where you use the single bracket operator to return the first and third elements from a list with three elements. The result is an element with the same class as the list from which it is subsetted. In this case, the class of the output is also a list.

x <- list(foo = 1:4, bar = 0.6, baz = "hello") x[c(1, 3)] Output: $foo [1] 1 2 3 4 $baz [1] "hello"

Provide an example of R code with output where the entire column of a matrix is subsetted with (i,j) type indices.

x <- matrix(1:6, 2, 3) x[ , 2] Output: [1] 3 4

Provide an example of R code with output for retrieving a single element of a matrix that returns a 1 x 1 matrix.

x <- matrix(1:6, 2, 3) x[1, 2, drop = FALSE] Output: [,1] [1,] 3

Provide an example of R code with output where a matrix is subsetted with (i,j) type indices.

x <- matrix(1:6, 2, 3) x[1, 2] Output: [1] 3

Provide an example of R code with output where the entire row of a matrix is subsetted with (i,j) type indices.

x <- matrix(1:6, 2, 3) x[1, ] Output: [1] 1 3 5

Provide another example of R code with output where a matrix is subsetted with (i,j) type indices.

x <- matrix(1:6, 2, 3) x[2, 1] Output: [1] 2

Provide an example that uses the vector( ) function and creates a vector with a length of 10 and a type of numeric.

x <- vector("numeric", length = 10)

Provide the output you will see when the following R code is entered into the R console prompt: x <- 1:3 > y <- 10:12 > cbind(x, y)

x y [1,] 1 10 [2,] 2 11 [3,] 3 12

Provide an example of R code that passes data around by deparsing the R object with the dput function and reading it back in using the dget function.

y <- data.frame(a = 1, b = "a") dput(y) dput(y, file = "y.R") new.y <- dget("y.R") new.y

When you review the following auto print of a factor variable, what is the base line or first level? [1] yes yes no yes no Levels: yes no

yes

On which operating systems does the R language run?

Mac, Windows, Linux and Playstation 3

Explain how playing around with the problem and trying to find the answer by inspecting or experimentations can be helpful.

Maybe if you have a function that is not working right, maybe change the inputs and see if the outputs change or if the error message changes.

Discuss the R is not ideal for all possible situations drawback of the R language.

People have high expectations for R. They expect it to be able to do everything. But it doesn't do everything and so you should go into this knowing that fact.

Describe the vibrant community for R and how that is an advantage.

R has a very active and vibrant user community. The mailing lists at R-help and R-devel are very active. There are many, posts per day, and there is also a series on stack overflow where questions can be answered. So, the user community is one of the most interesting aspects of R. It is where all the R packages come from and it creates a lot of kind of interesting features.

Describe the command in R to set your working directory to a working directory name of your choice.

setwd("<directory name of your choice>")

Give the command to enter into the R console window to view the current directory contents or list all the files in your working directory.

dir( )

As you are learning to use R, list documents that you can find on the R website.

1) An Introduction to R 2) The Writing R Extensions Manual 3) The R Data Import and Export Manual 4) The R Installation Administration Manual 5) The R Internals Manual

List 6 additional book titles that can help with learning R.

1) Books by John Chambers offers data analysis and programming the data. 2) Modern Applied Statistics with S by Bill Venables and Brian Ripley. 3) S Programing by Bill Venables and Brian Ripley. 4) Mixed Effects, Models in S and S-PLUS by Jose Pinheiro and Douglas Bates. 5) R Graphics by Paul Murrell. 6) The Use R series of books by Springer.

Name 5 packages included in the recommended packages of the base R system.

1) Boot for bootstrap package 2) Class for classification package 3) Cluster package 4) Code tools package 5) Foreign package

Describe how to change directories in R.

1) In the R program go to File and then select Change dir... 2) In the menu, that appears, select the working directory you want to use.

List resources that you can use before posting a question to a forum or mail listing.

1) Search the archives of an R forum for answers. 2) Search the web. 3) Search R manuals. 4) Search FAQ section of R website. 5) Ask at skilled friend who knows something about R. 6) If you are a programmer, read the source code. 7) Play around with the problem and try to find the answer by inspecting or experimentations.

Name the 2 conceptual parts of the R system.

1) The base R system that you download from a CRAN which is the comprehensive R archive network. 2) Everything else.

List 5 items you should include in the subject line of posting of a question on a mailing list or forum.

1) The operating system. 2) The version of the operating system. 3) The version of R. 4) What function you are using. 5) A summary of what actually happened.

List 2 restrictions placed on R packages that are downloadable from CRAN.

1) There has to be documentation for all the functions that are in the package. 2) The packages have to pass a certain number of tests.

Name 3 packages included in the fundamental packages of the base R system.

1) Util stats package 2) Data sets package 3) Graphics package

What are your 2 options if you submit a command in the R console window to read a file that is not in your working directory and you get an error message?

1) You can move the file to your working directory. 2) You can change your working directory to be wherever the file you want to read is located.

How can changes be implemented in the R language, if the R-core group is the only party that controls the source code for R?

A number of, people who are not in the core group have suggested changes to R, and they have been accepted by the R-core group.

Describe the reference document An Introduction to R.

An introduction to R is a relatively long PDF document that goes through the basics of how to use R and how to use the language.

How can you access R's built-in help files if you have a question about a particular function?

Anytime you have questions about a particular function, you can access R's built-in help files via the `?` command. For example, if you want more information on the c( ) function, type ?c without the parentheses that normally follow a function name.

What is R?

Basically, R is a dialect of S.

What does CRAN stand for? What is CRAN good for?

CRAN is the comprehensive R archive network. CRAN is the go to place for all things R.

Describe how to change directories in R Studio.

Click on Session in upper left-hand corner >>> Set Working Directory >>> Choose Directory...

What is recommended for this course?

Create a single directory or single folder where you can put all of your materials for the course and not have to worry about them being scattered all over the place. Anytime you download something from the website or create a new file it's probably best to store it all in one folder so that you don't have to be searching all over for it. That way you can always set your working directory to be that, to be that directory and not have to worry about changing it.

Describe the command in R you can use to create a directory in the current working directory and give it a name of your choice.

Enter the following command into the R console prompt: dir.create("<directory name of your choice>")

True or False. Any changes made in the text editor are automatically loaded into the R console window.

False. Any changes made in the text editor like adding or removing R code or functions have to be saved first and then loaded again into R console using the source function to source that file back into R.

True or False. Anyone can write an R package and make it available for download from CRAN.

False. CRAN has a number of restrictions and standards that have to be met in order to get a package on to CRAN. The packages that you download have to meet a certain level of quality.

True or False. S has always contained statistical functions.

False. Early versions of the S language did not contain functions for statistical modelling. That did not come until roughly version three of the language.

True or False. R does not offer a text editor. You have to download a text editor to facilitate writing of code.

False. R comes with a rudimentary text editor.

True or False. When R was licensed under the GNU General Public License it became a paid software anyone could purchase.

False. The GNU General Public license made R a free software.

True or False. You have to use the text editor provided with R.

False. There are many other text editors that you might see on the web that you can download. And those are fine to use but they are not really necessary. The text editor that comes with R should be sufficient for this course.

True or False. You can only have one function per saved file you load into R console.

False. You can have multiple functions that do different things all loaded into the same file. Use the source function to load them into R console window and then call the functions as you need them in the R console window.

True or False. You can only have one text editor window open at one time when using R.

False. You can have multiple text editor windows and give them each a unique file name. You don't have to use a single file. You can add a new file. So you can say New Script and it will open up another window. You can save this to be a different file if you want so that way you can separate code for different projects or different assignments.

How to you load a new text editor in R?

Go to File and in the drop-down menu that appears, select the option of New script. This will give you a blank window that you can use to write R code.

Discuss the functionality is based on consumer demand and basically user contributions drawback of the R language.

If no one feels like implementing your favorite improvement suggestion then that is your job to do. If you can't, there is no corporation, there is no company that you can complain to. There is no helpline that you can call to say that, to demand a specific implementation or a specific feature. If the feature is not there, then you have to build it or at least you can pay someone to build it.

Why is being able to answer the question of whether the problem can be reproduced important to know before posting a question on a mailing list or forum?

If someone else can reproduce your problem, it makes it a lot easier for that other person to figure out what the solution is going to be. And so if you can provide some code or a very simple example that will reproduce your problem, this will be enormously useful to everyone else involved.

Discuss the objects that you manipulate in R have to be stored in the physical memory of the computer drawback of the R language.

If the object is bigger, than the physical memory of the computer, then you can't load it into memory. Therefore you can't do something in R with that object. There have been a lot of advancements to deal with this. Both in the R language and also just in the hardware side there are computers now that you can buy with tremendous amounts of memory. Some of those problems had been resolved just by improvements in technology. Nevertheless, as we enter the big data era where you have larger and larger data sets, the model of loading objects into physical memory can be a limitation.

True or False.

In general, you will want your working directory to be someplace sensible, perhaps created for the specific project | that you are working on.

What is S?

S was a language, or is a language that was developed by John Chambers and at the now-defunct Bell Labs. And it was initiated in 1976 as an internal statistical analysis environment. An environment that people at Bell Labs could use to analyze data.

Describe S-Plus. How is it different from S?

S-Plus is an implementation of S. S-Plus has a number of fancy features into it for example graphical user interfaces and all kinds of a nice tools.

Why is it a good idea to learn Unix if you are interested in the R language?

Some R commands are the same as their equivalents commands on Linux or on a Mac. Both Linux and Mac operating systems are based on an operating system called Unix. It's always a good idea to learn more about Unix!

Discuss the similarity between S or S-Plus and R languages. Why is this similarity important?

Some of the features of early R, which was important back in the old days, when people were still using S-Plus was that the R syntax was very similar to S, which made it easy for S-Plus users to switch over. This feature isn't quite so relevant today, where most people generally go to R directly. The semantics are superficially similar to S, in that it looks like it is S, but in reality is quite different.

Why is knowing what operating system you are using important to know before posting a question on a mailing list or forum?

Sometimes it's important to know what operating system you're using, so whether you're using a Mac, or Windows, or Linux, or some other UNIX machine. Some problems can be traced to the type of operating system that you're using.

Describe the reference document The R Data Import and Export Manual.

The R Data Import and Export Manual is useful for getting R's data into R and the various different ways.

Describe the reference document The R Installation Administration Manual.

The R Installation Administration Manual is most useful if you want to build R from the source code.

Describe the reference document The R Internals Manual.

The R Internals Manual is is a really technical document for how R is designed. It explains how R is implemented at a very low level. It is not really for the faint of heart. But if you are that kind of person, who wants to know how R works at a very, very low level, this is the document for you.

Describe the reference document The Writing R Extensions Manual.

The Writing R Extensions Manual is useful to read if you are thinking of developing R packages. R Packages are these R extensions to the main system.

Describe the base R system.

The base system contains what is called the base package which has all the kind of low level fundamental functions that you need to run the R system and other packages contained in the base system.

Describe the 2nd of 4 basic principles about free software as it relates to R.

The freedom to study how the program works and adapt it to your needs. You can look at the source code for R itself. You can make changes to it if you want. You may improve it or make a better version of it. You can sell changes to it if you want. You can modify the program any way you want and adapt it to your needs.

Describe the graphics capabilities of R and why that is an advantage.

The graphics capabilities of R are very sophisticated and give the user a lot of control over how graphics are, created, and are considered to be better than most statistical packages.

Discuss the based on 40-year-old technology drawback of the R language.

The original S language developed in the 1970s was based on a couple of principles, and the basic ideas have not changed too much since then.

True or False. Primary source code for R can only be modified by members of the R core group.

True

What is the convention used for saving written R code or function?

Usually the name of the file has a .R extension. So if the file name is Mycode, then Mycode.R would be used as the file name.

Describe what happened with the development of version four of the S programming language.

Version four of the S language was released in 1998. And its version, it's the version we more or less use today. The book Programming with Data, which is a reference for this course, is written by John Chambers sometimes called the green book and it documents version four of the S language.

Why is knowing the version of the product you are using important to know before posting a question on a mailing list or forum?

What version of R are you using? What version of the R packages you are using, if it's specific to a given package? Because often there may be legitimate bugs in older versions of R or R packages, and your problem might be solvable if you just upgrade to the latest version. So if you are using the latest version of R, it is important to mention that.

Describe the 4th of 4 basic principles about free software as it relates to R.

You have the freedom to improve the program and release your improvements to the public so the whole community benefits. The idea is that when people make changes to the program they can release them to the public so that everyone gets those changes.

Describe the 3rd of 4 basic principles about free software as it relates to R.

You have the freedom to redistribute copies so you can help your neighbor and so the idea is that you can give copies to other people. You can sell copies. You can do whatever you want with it.

Describe how dir.create( ) and file.path( ) can be combined all in one command to create a directory in the current working directory called "testdir2" and a subdirectory for it called "testdir3". What is one required setting with this combined command?

dir.create(file.path('testdir2', 'testdir3'), recursive = TRUE) Note: The 'recursive' argument. In order to create nested directories, 'recursive' must be set to TRUE.

Describe the command in R for making a copy of a file.

file.copy("<old file name>.R", "<new file name>.R") Note that the .R is the file extension for R files.

Describe the command in R to create a file and give it a name of your choice.

file.create("<file name of your choice>")

Describe the command in R to confirm if a file with a given name of your choice exists in your current working directory.

file.exists("<file name of your choice>")

Why is being able to answer the question of what you expect the output to be before posting a question on a mailing list or forum?

if your expectation is wrong, then of course, it may or may not be an error, depending on what your expectation should be. So what you expect the output to be will indicate the nature of the error and what needs to be solved. Given your expectation, you need to say, what do you see instead? So what was the thing that was unexpected that gave you the question?

What is an alternative command to enter into the R console window to view the current directory contents or list all the files in your working directory?

list.files( )

Assuming you have located the correct directory for the file you wish to read, give the command to enter into the R console window to read the file mydata.csv and print it to the same console window.

read.csv("mydata.csv")

In order to write R code you need to be able to us a ______________.

text editor

What is one very important detail to include in posting a question on a mailing list or forum?

If you don't find the answer it is important to let everyone know all the different methods you used to find a solution that did not work. Letting people know that you have done your homework, and you have looked in a variety of places is very useful, and it saves a lot of time.

Describe the world of packages available for download on CRAN that are not included in the base R system.

On CRAN there are about 4,000 packages that have been developed by users and programmers all around the world. These packages are user contributed. They are not controlled by the R core. CRAN has a lot of different packages written by users and the number is really increasing everyday. So it's very exciting to see all these packages on CRAN and to see new ones come up everyday.

Discuss the there is little built in support for dynamic or 3-D graphics drawback of the R language.

Things have improved, greatly since the old days and there are a lot of interesting tools and now packages for doing dynamic or 3-D graphics.

True or False. In 1998 the S language won the association for repeating machinery software system award. A very prestigious honor.

True

True or False. R is an implementation of the S language, that was originally developed in Bell Labs.

True

True or False. The R system is divided into two, what you can think of as two conceptual parts.

True

True or False. The basic fundamentals of the S language have not really changed since 1998 and the language that existed in 1998 looks more or less like we, like what we use today at least superficially.

True

How to do you see the name of the functions or R code that has been loaded into the console window? This is the same as the list of all the objects in your local workspace.

Type in the following command: ls( )

How do you run a function or some R code in the R console window?

Type in the name you have given to the R code and/or function and hit enter.

Describe 2 ways to load a function or R code from the text editor window into the console window?

1) Cut and paste the function or R code into the R console window. 2) Save the function or R code in the text editor and give it a file name with a .R extension. Type the following command into R console window to load the file: source("filename.R")

Name and describe the 2 types of packages included in the base R system.

1) Fundamental packages - packages everyone use more or less. 2) Recommended packages - commonly used packages, they may not be critical packages, but they are commonly used by many people.

List the things you need to think about before posting a question on a mailing list or forum.

1) Is it possible to reproduce your problem? 2) It is important to understand what you expect the output to be. 3) Is the version of the product you are using, so for example, what version of R are you using? 4) What operating system are you using?

List 9 advantages of using the R language.

1) It runs on any standard computing platform or operating system. 2) Very active development going on. 3) The core software of R is actually quite lean. 4) It's graphics capabilities are very sophisticated. 5) Best general purpose statistical package. 6) Contains a powerful programming language. 7) For developing new tools, so, it eases the transition from the user to the program. 8) Vibrant user community 9) It is available for free.

List the 5 basic drawbacks of the R language.

1) It's based on 40 year old technology. 2) There is little built in support for dynamic or 3-D graphics. 3) Functionality is based on consumer demand and basically user contributions. 4) Objects that you manipulate in R have to be stored in the physical memory of the computer. 5) R is not ideal for all possible situations.

Who are the 3 groups active with the R programming language? What does each group do?

1) R-help - is a general mailing list for questions. 2) R-devel - a more specific mailing list for people who are doing development work in R. 3) R-core - controls the source code for R.

Describe what happened with the development of version three of the S programming language.

In 1988, the system was rewritten in the C language and to make it more portable across systems and it began to resemble the system that we have today. So this was version three. And there was a seminal book the, called the Statistical Models in S written by John Chambers and Trevor Hastie. Sometimes referred to as the white book. And that documents, all the statistical analysis functionality that came into the version, that version of the language.

Describe how to recover a saved R code or function file.

In the R program, go to File and then select Open script... Then click on the file name of interest and you will see your code in a popped up script or text editor window.

What is another important detail to include in posting a question on a mailing list or forum?

Include as much useful information as possible in the subject line of the posting. Put the important details right away in the subject header before you even get to the body of the message.

How was S initially implemented?

Initially it was implemented as a series of FORTRAN libraries to kind of implement routines that were tedious to have to do over and over again, so there were FORTRAN libraries to repeat these statistical routines.

Describe the free aspect of the R language.

It doesn't cost any money, so you can download the entire software from from the web.

Describe a key design principle of the S programming language.

John Chambers who was the original writer of the S language the, the original creator kind of laid out his key principal with designing the S language. They wanted to create an interactive environment where you didn't have to think of themselves as programming. Then as the needs become clearer and their sophistication has increased, they should be able to slide gradually into programming, when the language and system aspects would become more important.

Describe the basic idea or philosophy behind the S language and then later the R language.

The basic idea behind the S language and then later the R language is that people would enter the language in an interactive environment, where they could use the the environment, without knowing about any sort of programming, or having to know very detailed aspects of the language. They could use the environment to look at data, and do basic analyses. And then when they outgrew their environment, then they can get into programming. They could get into learning the language aspects and learning to develop their own tools and, and the system would promote the kind of transition from user to programmer.

Why is the leanness of R an important feature about the R programming language?

The core software of R is actually quite lean. Its functionality is divided into modular packages, so you don't have to download and install a massive piece of software. Whereas you can download a very small piece of fundamental core, kind of functions, and then add things on as you need them.

When you first start programming in R, what is the first thing you want to do? Why?

The first thing you want to do is figure out what your working directory is. Because the working directory is where R finds all of its files for reading and for writing on your computer.

Describe the 1st of 4 basic principles about free software as it relates to R.

The freedom to run the program for any purpose that you need. There is no restrictions on how you can run the program or when you can run the program or what you can or cannot do with it.

Why do you want to figure out or set your working directory when using R?

The reason why it is important to know and to set your working directory, is because when you read data or when you write things out, using functions like Read or Write CSV and follow transcript they will be read or written to your working directory.

Why is very active development an important feature about the R programming language?

There are frequent releases, so there are annual major releases. Also often, there are bug fixes released in between annual releases. There is a very active development going on and so things are happening.

Describe the world of packages available for download other than CRAN that are not included in the base R system.

There are packages that people make available on their personal websites. And there is really no reliable way to keep track of how many packages are available in this fashion. So, there is really thousands of packages out there written by people. That you can discover and use, to analyze data.

What is another great resource for finding books about R?

There is a long list of books available on the R website.

Where can you find the 4 basic principles about free software as it relates to R?

These basic freedoms are outlined by the free software foundation and you can see more about it at their website there.

Describe the command in R to get access information about a file. How can you get specific items?

file.info("<file name of your choice>") file.info("<file name of your choice>")$mode Note in the above command, mode is a specific item of information you can get from file.info( ) command.

Describe the command in R to find the relative path to a file.

file.path("<file name of your choice>")

Describe how to construct file and directory paths that are independent of the operating system your R code is running on. Pass 'folder1' and 'folder2' as arguments to the command to make a platform-independent pathname.

file.path('folder1', 'folder2')

Describe the command in R for deleting a file. Describe a scenario where this command won't work.

file.remove("<file name of your choice>") Note that if you have already renamed the file you wish to delete, this command won't work.

Describe the command in R to rename a file.

file.rename("<old file name>.R", "<new file name>.R") Note that the .R is the file extension for R files.


Ensembles d'études connexes

human anatomy chapter 5 questions

View Set

U.S. Government Part II.B - Formal Constitutional Amendment Process

View Set

1.3 Comments and whitespace (All Questions)

View Set

Chapter 17: The 17th and 18th Centuries

View Set

Quizzes for Ethics of Health Care

View Set

6.1.10 Practice Exam, TestOut Network Pro Fall 2016

View Set

Exam 3 Saunders Practice Questions

View Set