Data Analytics - Exam 1
sample mean equation
(= ∑x ÷ n) (notation called x-bar)
steps to analyze data
1. recognize a problem that needs to be solved (sales going down) 2. gather the data 3. analyze it 4. act on this analysis by changing policies
samples should be
1. representative of the population 2. randomly chosen
Of these 1,000 customers, how many are married males at least 60 years old with salaries exceeding $125,000?
26
If a value represents the 95th percentile, this means that:
95% of all values are below this value
quartile
A division of the total into four intervals, each one representing one-fourth of the total.
measure of central tendency
A number that describes roughly where the data are located or centered along the number line
categorical data
Data that consists of names, labels, or other nonnumerical values
Examples of ordinal data
Rate your pain on a scale of 10 1 - good 2 - ok 3 - bad
percentile
Specific point in a distribution of data that has a given percentage of cases below it.
mode
The value that occurs most frequently in a given data set
True or False: A variable (or field or attribute) is a characteristic of members of a population, whereas an observation (or case or record) is a list of all variable values for a single member of a population
True
True or False: Age, height, and weight are examples of numerical data
True
measure of location
When describing numerical data, it is common to report a value that is representative of the observations in the data set
panel data
a combination of cross-sectional and time series *most desirable*
data set
all the data collected in a particular study
median is often used for
annual wages
data
are the facts and figures (text, voice ...) collected, analyzed and summarized for presentation and interpretation
mean
average of all the data values *central location*
if data has two modes, the data is
bimodel
statistics
can be referred to the art and science of collecting, analyzing, presenting, and interpreting data
Gender and states of residence are examples of
categorical data
qualitative
categorical data: classify subjects into two categories
to present data, we use:
charts numerical summary (% , avg.) tables of summary measures
structure of data
cross sectional time series panel data
cross-sectional data
data collected at the same or approximately the same point in time
time series data
data collected over several time periods
examples of nominal data
gender hair color where you live religion
The average score for a class of 30 students was 75. The 20 male students in the class averaged 70. The 10 female students in the class averaged:
higher than the males
The difference between the first and third quartile is called the:
interquartile range
example of coding categories
marital status dumby variable
most common central tendencies
mean median mode percentile quartile
descriptive statistics
measure of location measure of central tendency measure of variability
The interquartile range (IQR) encompasses what percent of the observations?
middle 50%
The median can also be described as the:
middle observation when the data values are arranged in ascending order
if the data has more than two modes, the data is
multimodel
ordinal
natural order
nominal
no natural order
quantitative
numerical data
a nonnumerical label or numerical code may be used for both
ordinal and nominal variables
two categories of categorical data
ordiual and
in stats, we are interested in obtaining info about total collection of observations, which we will refer to as
population
types of data
qualitative and quantitative
as a measure of variability, what is defined as the maximum value minus the minimum value?
range
measures of variability
range standard deviation variance
the population is often too large... thus, we try to learn about the population by observing and examining a sub group of its elements which is a
sample ($$$ + time)
what happens if the sample size (n) is odd
sample median is the single middle value
Student Major Exam Score Campus 1 Psych 92 yes 2. Business 75 no 3 Com 88 no 4 Econ 33 yes 5 Art 15 yes Who are the observations? who are the variables? How many subjects are there?
students are the observations major, exam score, campus are the variables there are 5 subjects
what does observations mean in data?
subjects or elements
numerical data
tells you how much or how many of something there is ( x )
variables
the characteristics of subjects about which we collect info can be divided into two categories
Interquartile Range (IQR)
the difference between the first and third quartiles
range
the difference between the highest and lowest scores in a distribution
what happens when a sample contains outliers
the median may be more representative of the sample than the mean is
what happens if the sample size (n) is even
the sample median is the average of the two middle values
A variable is classified as ordinal if:
there is a natural ordering of categories
what is the purpose of interpreting data?
to make conclusions
True or False: Abby has been keeping track of what she spends to stream movies. The last seven week's expenditures, in dollars, were 6, 4, 8, 9, 6, 12, and 4. The mean amount Abby spends on streaming movies is $7
true
True or False: The mean is a measure of central tendency
true
True or False: The value of the mean times the number of observations equals the sum of all of the data values
true
coding categories
two approaches; coding that matches your research purposes or generate codes from existing data
median
value in the middle when the data points are arranged in ascending order
population mean equation
µ (Greek mu) (= ∑x ÷ N) mean of all values in a POPULATION (sum of all DATA values divided by the number of DATA values in a POPULATION)