SRA 365 Midterm with quizes
descriptive statistics
help us understand general trends in the data by providing summary statistics of its shape, average, or the spread of the scores.
You decide to run a statistical test to evaluate whether gender predicts the perceived threat someone feels. In this test, gender would be your:
independent variable
You run another statistical test to evaluate the effect of percent of sensitive data breached (per_sensitive) on the length of the negative financial impact from the data breach (dys_impact). In this test, "per_sensitive" would be your:
independent variable
How would you rewrite your syntax to read in the file if it was comma separated?
install.packages("foreign")library(foreign)sralab3CVS<-read.cvs("sralab3.cvs", header=TRUE)
Use the space provided below to provide a copy of all of the syntax you used to read the data file into R.
install.packages("foreign")library(foreign)sralab3SPSS<-read.spss("sralab3.sav",use.value.labels=TRUE,to.data.frame=TRUE)
How would you rewrite your syntax to read in the file if it was tab-delimited?
install.packages("foreign")library(foreign)sralab3TXP<-read.delim("sralab3.txt", header=TRUE)
dependent variable
is the variable we identify as the effect. Its value is influenced by (i.e. dependent on) the IV.
independent variable
is the variable we identify as the potential cause. It is used as the input for determining relationships between variables, and its value does not depend on the other variable.
Sav file
lecturerSPSS <- read.spss("lecturer data.sav", use.value.labels =TRUE, to.data.frame = TRUE)
central tendency
mean- average median- central mode- most frequent
Mean
meanIncome <- mean(lecturerCSV$income)
What level of measurement is "ID"?
nominal
What level of measurement is "gender"?
nominal
Levels of Measurement
nominal, ordinal, interval, ratio
Install packages
o Install.packages("foreign") o Library(foreign)
Creating charts
o Install.packages("ggplot") o Library(ggplot)
Z-Score
o Z= x - mean/ sd o Zincome <- (lecturerCSV$income - meanIncome)/sdIncome
Command
objects and functions
What level of measurement is "family.size"?
ordinal
What level of measurement is "perceived.threat"?
ordinal
What level of measurement is "family.income"?
ratio
Standard deviation
sdIncome <- sd(lecturerCSV$income)
first
set working directory
normal distribution key features
symmetric unimodal asymptomatic
continuous
variables are numeric variables that have an infinite number of values between any two values. can be numeric or date/time
categorical
variables contain a finite number of categories or distinct groups. ex) gender, ethnicity, shoe size
CSV file
· Object name <- read.csv("lecturer data.csv", header = TRUE)
Histogram
- Histogram <- ggplot(lecturerCSV, aes (num_people)) - Histogram + geom_histogram() - Histogram + geom_histogram() + labs(x= "Number of People (in millions)", y="Frequency")
Scatterplot
- Scattherplot <- ggplot(lecturerCSV, aes (num_records, num_people)) - Scattherplot + geom_point() - Scattherplot + geom_point() + labs(x= "Number of records breached", y="Number of people impacted")
Box plot
-Boxplot <- ggplot (lecturerCSV, aes (num_people_v2, fin_loss)) - Boxplot + geom_boxplot() - Boxplot + geom_boxplot() + labs(x= "Number of People Impacted", y="Financial Loss")
Why use R
-Its free -its powerful -its expandable -commonly used in industry
Levels of measurement vary based on position and distance. Select the variable(s) in the list in which the position of the values is interpretable.
family.income family.size anger fear percieved.threat
What level of measurement is "anger"?
interval
good statistics should
-Show the data -Induce the reader to think about the data being presented (rather than some other aspect of the graph, like the color) -Avoid distorting the data -Present many numbers with minimum ink -Make large data sets coherent -Encourage the reader to compare different pieces of data -Reveal data
normal distribution
68-95-99
nominal
At the lowest end of the continuum is a nominal scale. With nominal scales, as the name suggests, numbers are assigned mainly to name or identify the variable. The value of the number is not meaningful nor is the distance between the numbers. A common example of this is gender. If we assign a 1 for male and a 2 for female in a data set, we cannot interpret that women are better because the value assigned to them is higher. We also cannot interpret fractions between the categories of male and female.
Give an example of a command, an object, and a function in the syntax you provided above.
Command: sralab3CVS<-read.cvs("sralab3.cvs", header=TRUE) Object: sralab3CVS Function: read.cvs("sralab3.cvs", header=TRUE)
Parts
Console, editor, graphics
To set your working directory in RStudio you can use the shortcut keys: Ctrl + Shift +
H
Select the variable(s) in the list that would be considered categorical:
ID gender family.in.range family.size percoeved.threat
To create a new script file in RStudio you can use the shortcut keys: Ctrl + Shift +
N
Tab-delimited file (txt)
Object name <- read.delim ("lecturer data.txt", header = TRUE)
interval/ratio
On the other end of the spectrum, interval and ratio scales are more precise, with ratio scales being the more precise of the two. Both scales are considered to be continuous because it is possible to interpret the distances between the values. The difference between an interval and ratio scale is that a ratio scale has a true 0 point whereas an interval scale doesn't. An example of this can be seen with the measurement of temperature in which a Fahrenheit or a Celsius scale are both considered interval scales because a 0 on that scale does not reflect the absence of heat. However, on a Kelvin scale, there is a true 0 point in which 0 indicates no heat. Teasing apart the differences between interval and ratio scales can be difficult. Throughout the rest of the course we will be collapsing these two scales of measurement into one. However, it is important to know that there is an underlying difference between these two.
Ordinal
Ordinal scales are more precise than nominal scales. With such scales you are able to interpret the value of the number but not the distances between the numbers. A classic example of this can be found in the context of sports. Athletes that are placed first and second in a race during the Olympics have only fractions of seconds in the difference between their times. As opposed to athletes that are placed first and second in a high school race who would have seconds and maybe minutes in the difference between their times. Ordinal scales treat these two cases the same because they don't take into account the distances between the numbers. These scales only take into account the position or rank of the number. Both nominal and ordinal scales are considered to be categorical variables because in both scales the distance between the values is NOT interpretable.
inferential statistics
allow us to take the data analyses a step further by using our sample data to make inferences about the broader population
You decide to run a statistical test to evaluate whether the number of people impacted in a data breach (num_people) can be predicted by the type of data breached (data_type). In this test, "num_people" would be your:
dependent variable
You run another statistical test to evaluate whether levels of anger vary based on the family being in the rocket range. In this test, anger would be your:
dependent variable