R
Line plots
Often data comes through time plot(t1,D2$DELL,type="l",main='Dell Closing Stock Price', xlab='Time',ylab='Price$'))
Paneling Graphs
If we want more than one graphic on a panel Can add par to give us a framework in which to panel our plots par(mfrow=c(nrow,ncol)) nrow-number of rows ncl- number of columns puts multiple graphs on one image
Data Frame
Most widely useful type of variable. Typically used to store the sort of data that most often is used for statistical analysis and are the closest analog in R to an excel spreadsheet. Has multiple columns, all values in the columns must be of the same type and all columns must have the same number of rows (observations)
Formatting files for R
Replace headers with a simple name describing the data in a single character string- no spaces, allowing acces to those variables by name Missing values- write in N/A, R will ignore these in calculation Save the files as CSV (comma separated values) Can reformat data in R (I think)
Factor
Special type of character vector, where the text strings signify factors to be used in statistical analysis and the levels and are encoded internally as integers Can be treated as normal data when the order does not matter, or as ordinal data when the order does matter
What is R?
Statistical programming language The language is very powerful for writing programs Many statistical functions are already built in Contributed packages expand the functionality to cutting edge research Since it is a programming language, generating computer code to complete tasks is required. History can be saved over sessions multiple commands can be put onto one line using ; as a separator between lines Can use as a calculator, can create multiple values with one function, whereas we have to copy and paste (or copy-down) to get multiple values in Excel, assign values to variables (one variable can store many different pieces of information (like multiple values or even a whole data set)) R recognizes at least 15 different types of data
Scatterplots
Works well to see relationship between two variables plot(x,y) plot(D$metmin, D$wg) plot(D$metmin,D$wg,main='Met Minutes vs. Weight Gain',xlab='Mets(min)',ylab='WeightGain(lbs)',pch=2) creates plot of met minutes vs weight gain
Programming Language
Write your own code, adaptable and flexible lots of steps reproduce exactly what you did in the past (written history) computer code provides a sequential, line by line record
Box plots
boxplot(mydata$weightgain, main='WeightGain', ylab= "WeightGain(lbs)') main function- titles graph ylab- titles y axis To make boxplots side by side 1. have to subset them into separate variables ex. wg.low<mydata[mydata$type="Low",] wg.high<mydata[mydata$type="High",] 2. create the boxplot boxplot(wg.low$weightgain,wg.high$weightgain)
Histograms
hist(mydata$weightgain,main='WeightGain', xlab='WeightGain', ylab= 'Frequency', col='blue') Main function gives the graph a title/heading xlab- titles the x axis ylab- titles the y axis col- alters the color of the columns in the graph