SRA 365 Midterm with quizes

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

descriptive statistics

help us understand general trends in the data by providing summary statistics of its shape, average, or the spread of the scores.

You decide to run a statistical test to evaluate whether gender predicts the perceived threat someone feels. In this test, gender would be your:

independent variable

You run another statistical test to evaluate the effect of percent of sensitive data breached (per_sensitive) on the length of the negative financial impact from the data breach (dys_impact). In this test, "per_sensitive" would be your:

independent variable

How would you rewrite your syntax to read in the file if it was comma separated?

install.packages("foreign")library(foreign)sralab3CVS<-read.cvs("sralab3.cvs", header=TRUE)

Use the space provided below to provide a copy of all of the syntax you used to read the data file into R.

install.packages("foreign")library(foreign)sralab3SPSS<-read.spss("sralab3.sav",use.value.labels=TRUE,to.data.frame=TRUE)

How would you rewrite your syntax to read in the file if it was tab-delimited?

install.packages("foreign")library(foreign)sralab3TXP<-read.delim("sralab3.txt", header=TRUE)

dependent variable

is the variable we identify as the effect. Its value is influenced by (i.e. dependent on) the IV.

independent variable

is the variable we identify as the potential cause. It is used as the input for determining relationships between variables, and its value does not depend on the other variable.

Sav file

lecturerSPSS <- read.spss("lecturer data.sav", use.value.labels =TRUE, to.data.frame = TRUE)

central tendency

mean- average median- central mode- most frequent

Mean

meanIncome <- mean(lecturerCSV$income)

What level of measurement is "ID"?

nominal

What level of measurement is "gender"?

nominal

Levels of Measurement

nominal, ordinal, interval, ratio

Install packages

o Install.packages("foreign") o Library(foreign)

Creating charts

o Install.packages("ggplot") o Library(ggplot)

Z-Score

o Z= x - mean/ sd o Zincome <- (lecturerCSV$income - meanIncome)/sdIncome

Command

objects and functions

What level of measurement is "family.size"?

ordinal

What level of measurement is "perceived.threat"?

ordinal

What level of measurement is "family.income"?

ratio

Standard deviation

sdIncome <- sd(lecturerCSV$income)

first

set working directory

normal distribution key features

symmetric unimodal asymptomatic

continuous

variables are numeric variables that have an infinite number of values between any two values. can be numeric or date/time

categorical

variables contain a finite number of categories or distinct groups. ex) gender, ethnicity, shoe size

CSV file

· Object name <- read.csv("lecturer data.csv", header = TRUE)

Histogram

- Histogram <- ggplot(lecturerCSV, aes (num_people)) - Histogram + geom_histogram() - Histogram + geom_histogram() + labs(x= "Number of People (in millions)", y="Frequency")

Scatterplot

- Scattherplot <- ggplot(lecturerCSV, aes (num_records, num_people)) - Scattherplot + geom_point() - Scattherplot + geom_point() + labs(x= "Number of records breached", y="Number of people impacted")

Box plot

-Boxplot <- ggplot (lecturerCSV, aes (num_people_v2, fin_loss)) - Boxplot + geom_boxplot() - Boxplot + geom_boxplot() + labs(x= "Number of People Impacted", y="Financial Loss")

Why use R

-Its free -its powerful -its expandable -commonly used in industry

Levels of measurement vary based on position and distance. Select the variable(s) in the list in which the position of the values is interpretable.

family.income family.size anger fear percieved.threat

What level of measurement is "anger"?

interval

good statistics should

-Show the data -Induce the reader to think about the data being presented (rather than some other aspect of the graph, like the color) -Avoid distorting the data -Present many numbers with minimum ink -Make large data sets coherent -Encourage the reader to compare different pieces of data -Reveal data

normal distribution

68-95-99

nominal

At the lowest end of the continuum is a nominal scale. With nominal scales, as the name suggests, numbers are assigned mainly to name or identify the variable. The value of the number is not meaningful nor is the distance between the numbers. A common example of this is gender. If we assign a 1 for male and a 2 for female in a data set, we cannot interpret that women are better because the value assigned to them is higher. We also cannot interpret fractions between the categories of male and female.

Give an example of a command, an object, and a function in the syntax you provided above.

Command: sralab3CVS<-read.cvs("sralab3.cvs", header=TRUE) Object: sralab3CVS Function: read.cvs("sralab3.cvs", header=TRUE)

Parts

Console, editor, graphics

To set your working directory in RStudio you can use the shortcut keys: Ctrl + Shift +

H

Select the variable(s) in the list that would be considered categorical:

ID gender family.in.range family.size percoeved.threat

To create a new script file in RStudio you can use the shortcut keys: Ctrl + Shift +

N

Tab-delimited file (txt)

Object name <- read.delim ("lecturer data.txt", header = TRUE)

interval/ratio

On the other end of the spectrum, interval and ratio scales are more precise, with ratio scales being the more precise of the two. Both scales are considered to be continuous because it is possible to interpret the distances between the values. The difference between an interval and ratio scale is that a ratio scale has a true 0 point whereas an interval scale doesn't. An example of this can be seen with the measurement of temperature in which a Fahrenheit or a Celsius scale are both considered interval scales because a 0 on that scale does not reflect the absence of heat. However, on a Kelvin scale, there is a true 0 point in which 0 indicates no heat. Teasing apart the differences between interval and ratio scales can be difficult. Throughout the rest of the course we will be collapsing these two scales of measurement into one. However, it is important to know that there is an underlying difference between these two.

Ordinal

Ordinal scales are more precise than nominal scales. With such scales you are able to interpret the value of the number but not the distances between the numbers. A classic example of this can be found in the context of sports. Athletes that are placed first and second in a race during the Olympics have only fractions of seconds in the difference between their times. As opposed to athletes that are placed first and second in a high school race who would have seconds and maybe minutes in the difference between their times. Ordinal scales treat these two cases the same because they don't take into account the distances between the numbers. These scales only take into account the position or rank of the number. Both nominal and ordinal scales are considered to be categorical variables because in both scales the distance between the values is NOT interpretable.

inferential statistics

allow us to take the data analyses a step further by using our sample data to make inferences about the broader population

You decide to run a statistical test to evaluate whether the number of people impacted in a data breach (num_people) can be predicted by the type of data breached (data_type). In this test, "num_people" would be your:

dependent variable

You run another statistical test to evaluate whether levels of anger vary based on the family being in the rocket range. In this test, anger would be your:

dependent variable


Kaugnay na mga set ng pag-aaral

Chapter 27 : Choice of Business Entity and Sole Proprietorships

View Set

Chapter 2: Financial Statements, Taxes, and Cash Flow

View Set

unit 3 essential nursing intervention

View Set