Descriptive Analysis +SPSS
What are the first steps in data analysis?
1. Data entry 2. Data cleaning 3. Prepare database for analyses 4. Start analyzing
Why does sample size matter in inferential analysis?
-More evidence → more accurate estimation → stronger inferences -Larger sample = more evidence = more accuracy that the true population value will be close to your estimate -Larger sample = less sampling error = more accuracy Sample size matters, because sample size influences accuracy!
What is predictive analysis?
-Most complex and advanced (often used for forecasting models) -use regression analysis Management often worried about what will happen in the future, if for instance prices are increased → prediction model is useful to see how "price increases" (X) affect "sales levels" (Y). e.g., if a band makes their new album more expensive, will that positively or negatively influence their revenue?
What is descriptive analysis?
-Often early in the analysis process, foundation for subsequent analyses -describe the sample dataset, portray the "typical" respondent -mean, median, mode, SD, range, freq. tables
What is differences analysis?
-determine whether groups are different on a certain variable → t-test, ANOVA e.g., compare male vs. female soccer players' average income e.g., compare several countries' average cost of living e.g., compare the daily amount of food eaten by men versus women: who eats more, on average? e.g., experiment to compare which ad was more effective (compare control group versus experimental group)
Sampling size is to (blank) as sampling method is to (blank):
1. Accuracy 2. Representativeness
What are the 4 types of statistical analysis?
1. Descriptive Analysis 2. Differences Analysis 3. Associative Analysis 4. Predictive Analysis
What are 2 sets of descriptive measures in descriptive analysis?
1. Measures of central tendency 2. Measures of variability
Statistical inferences are a set of procedures to really estimate or test population values, based on what?
1. The evidence of the sample (i.e., the sample statistic) 2. The sample size*
What is Geospatial predictive modeling?
A process for analyzing events through a geographic filter in order to make statements of likelihood for event occurrence or emergence. It is a growing tool in intelligence-led policing and takes a more proactive approach towards disrupting criminal activity (anticipate and prevent crime and terrorism) cf. Crime Forecast of Washington DC. Red and orange colors indicate areas of high risk. The risk assessment was generated using an inductive predictive modeling tool called Signature Analyst.
What is data codebook?
All variable names & codes for each possible response to each question that makes up the data set
What is an (statistical) inference?
Any form of logic in which you make a general statement (generalization) about an entire population, based on what you know from a sample of that population: drawing a general conclusion based on some (small) evidence. e.g., You go to OB, you see 3 hippies, and say "OB is full of hippies!"... this is an inference you make about a population, based on limited sample data.
What is the relationship between statistics and parameters?
Every sample statistic has a corresponding population parameter. We use sample statistics to estimate population values (parameters). This is called "population estimates" or statistical inferences.
What is associative analysis?
Investigates if & how two variables are related (strength and direction of the association: positive or negative?) Use crosstabs and correlation analysis. e.g., Is there a link between how much time you spend on the toilet and how good you are at playing Candy Crush? e.g., Are brand recall scores positively associated with purchase intentions? e.g., Is higher credibility associated with more message effectiveness? Remember: Correlation does not always mean causation!
What is standard deviation in descriptive analysis?
It is the degree of variation/diversity in the values (σ: sigma) -If the SD is small→ the distribution is narrow (small), high peak -If the SD is large → the distribution is wide (flat), low peak
What can the mean tell us in descriptive analysis?
Mean: "the center of gravity/balance point" -Pulled up by large values, pulled down by small values -Can be skewed by outliers, -Doesn't deal well with widely varying samples
What can the median tell us in descriptive analysis?
Median: "the middle item in a sorted list" -Handles outliers well (often most accurate reflection of a group) -Splits data into 2 groups, each with exact same number of items -To calculate: need to sort list first -Often overshadowed by more known concept of the 'mean' (average)
What can mode tell us in descriptive analysis?
Mode: "most popular" -Nominal data -Gives a choice that most people picked (whereas the mean gives a choice that nobody actually picked) More effort to compute (tally up 'votes')
What is mean, median, and mode?
Mode: occurs the most often Median: the middle value Mean: the average Bimodal is if you have 2 modes, trimodal if you have 3.
When should you use the mode?
Nominal scale.
At what measurement levels can you use the mean?
Only with metric scales, so INTERVAL or RATIO measurement levels. Technically, you could calculate the mean for nominal (through coding), but it would not make any sense, since nominal is descriptive.
What are parameter estimates?
Parameter estimation is the process of using sample information to compute an interval that describes the range of a parameter such as the population mean or the population percentage.
What are parameters?
Refers to the complete census (population) and indicated with greek letters. Parameters represent "what we wish to know" about a population (the unknown that we are trying to gauge). (e.g., the mean or percentage of a population) ← this is of course the 'great unknown' that we're trying to find, because we never asked/surveyed the whole population, all we can do is use the sample statistic as a "proxy" for estimating or guessing or testing what the corresponding population's parameter could be).
What are measures of central tendency?
Report a single piece of information that describes 'the most typical/frequent' response to a question(i.e., average/mean, mode, median) ex: What the average on an exam was
What are measures of variability?
Report difference between the values in a set of values (i.e., range, variance, standard deviation, uniformity) ex: Freshman and the value of their degree. Will probably have more variability, more variation than seniors and the value on their degree (probably more uniform).
Assuming a probability sample, what is the difference between representativeness and accuracy?
Representativeness = about the composition of your sample: do your sample units represent the population? (if you draw a probability sample, it is representative.) Out of the question. But, you will always have error when you have a sample. Drawing a sample always gives you a sample error which measures accuracy. Small sample errors = more accuracy. In order to reduce sample errors, YOU NEED A LARGER SAMPLE.
What is signature analyst?
Signature Analyst is used to analyze past events and predict where subsequent events are most likely to occur.
Is political polling statistics or parameters?
Statistics, because political polling is only among a sample of voters, not the whole population... so these numbers are statistics, not parameters (cause after all, that's what a political poll is: a way to gauge how the country will vote, prior to the actual elections) based on sample data).
What is frequency distribution in descriptive analysis?
Tabulation of the number of times that each different value appears in a particular set of values. Remember: measures of variability report how much variation there is in your data (your string of numbers), or how widespread the distribution on a certain variable is.
What is a percent distribution?
The conversion is accomplished simply through a quick division of the frequency for each value by the total number of observations for all values, resulting in a percent
What is data entry?
The creation of a computer file that holds the raw data taken from your questionnaire. a) Manual, one-by-one b) Automatic (e.g., Qualtrics)
What does range tell us in descriptive analysis?
The distance between lowest value (min.) & highest value (max.) in an ordered set of values. How far apart are the extremes.
What does a normal distribution/no skew mean in descriptive analysis?
Theoretically, with a *perfect* normal distribution: the median = the mode = the mean! These 3 values would be the exact same if your data is perfectly symmetrical bell-shaped.
What are statistics?
They are values that are computed from information provided by a sample (e.g., the mean or percentage of a sample). They are indicated with roman letters.
What is the point of statistical analysis?
To draw conclusions about the population based on that sample data (a.k.a.: to make statistical inferences). The ultimate goal: estimating/testing population values.
What do we (researchers) mostly deal with and why?
With statistics, because we are taking SAMPLES of populations.