QBA 2302 Exam 1

Ace your homework & exams now with Quizwiz!

*Construct and analyze a box and whisker plot*

*Box and Whisker Plot- A graphic display that shows the general shape of a variable's distribution. It is based on the max, min, mean, and first and third quartiles. (It is also beneficial to include the median as including the median will indicate the skew)*

Quasi-Experiment

(one type of correlational study), they look like experiments on the surface, but lack random assignment of subjects. (Ex. asking for volunteers to participate)

*Display a frequency table using a bar or pie chart.*

*Bar Charts- an illustration of qualitative data representing a variable's categories on the x-axis as independent vertical bars and simple frequencies on the y-axis* & *Pie Chart- an illustration of qualitative data representing a variable's categories as portions of a circle (or slices of a pie).*

Qualitative Variables

Can be translated with either summaries or illustrations

Experimentation

If ________________ is possible, you always want to conduct an experiment (because it will always give you the highest quality answer to your questions)

representative

If a sample is highly _____________, conclusions drawn from the sample are likely to reflect conclusions that would have been drawn from the population.

representative

If a sample is not _____________, conclusions drawn from the sample are unlikely to reflect the population.

"tail"

If the ______ points towards positive numbers, it's positive skew. If the _____ points towards negative numbers, it's negative skew.

Illustrate

If you are trying to translate data for a client or customer, you will usually choose to ___________ the data rather than to summarize them

Illustrations

If you want to deliver a clear, powerful message about a variable, _____________ are the most useful

nominal, ordinal, interval, and ratio (in that order, each is more specific and has more requirements than the one before)

What are the 4 scales of measurement?

temperature in degrees Celcius

What is an example of Interval Data?

Rank data

What is an example of Ordinal Data?

temperature in degrees Kelvin

What is an example of Ratio Data?

addition and subtraction

What kinds of math do Interval Scales use?

multiplication and division

What kinds of math do Ratio Scales use?

robust

When a statistic has fewer assumptions, it is described as being more _________ than statistics with more assumptions.

correlational study

When an experiment is not possible, a researcher often uses a ____________________

Sample Mean

X bar = sum of ("E" shaped symbol) x/ n - x bar = sample mean - n = number of values in the sample - x = any particular value - "E" shaped symbol (sigma) = sum of the x values in the sample

precise

You want to use the most __________ statistics whose assumptions are met.

Mean/Arithmetic Mean

____________ is affected by extreme values/outliers

Quantitative variables

______________ are best translated with illustrations.

Scatterplots

_______________ can be used to examine qualitative OR quantitative data, and can even be used to compare quantitative and qualitative data with each other

Representativeness

_________________ can never be addressed with numbers, it is a rational argument.

Normal Distribution

a bell-shaped curved line (a common shape that data take)

Variable

a collection of data with different values based upon its source, represented in a dataset as a column

Fractions

a common way of representing portions of wholes

A frequency

a count of how many times a value appears in a variable within a dataset, are symbolized with an italicized "f", they are sometimes called "simple frequencies"

Case

a group of data, collected across one or more variables from one source, represented in a dataset as a complete row

Parameter

a measurable characteristic of a population

Datum

a single value collected in the context of research, could be a number/letter/word

Dichotomous data

a special type of nominal data with only two possible values (ex. "male" or "female")

Hypothesis

a testable relationship between operational definitions that reflects a theory

1. You could be fooled by others with their poor translations, 2. You could fool others with your poor translations, 3. You could fool yourself with your poor translations

Reasons not to be misled by a pretty picture:

Theory

Specific beliefs about what might be true in a relationship between constructs.

Arithmetic Mean

The ______________ is the most widely reported measure of location

Central tendency

refers to a family of statistics that can be used to determine where the "middle" of a data set is (the approach you choose to find the "middle" depends on the data's shape --- this dependency is called an "assumption")

Relative Frequency

represented by the term "rel.f" (rel.f = f/n)

n

sample size (the number of cases in a particular sample)

Frequency Polygon

similar to a histogram, also shows the shape of a distribution (these are good to use when comparing two or more distributions)

Subscripts

smaller letters used beneath terms in formulas to denote where a variable came from

Range

the difference between the maximum and minimum values in a set of data (= Maximum Value - Minimum Value)

Treatment Condition

the group you chose to receive the treatment

Control condition

the group you did not choose to receive the treatment, they receive nothing different

Q2

the median of the entire data set

Q1

the median of the lower half of the data

Q3

the median of the upper half of the data

Median

the midpoint of the values after they have been ordered from the minimum to the maximum values

Cumulative Frequency

the number of cases, represented as "cum.f"

Class Frequency

the number of observations

Cumulative Percentage

the percentage of cases, represented as "cum.%"

Random Assignment

the process by which a sample is randomly split into two or more groups during an experiment

Percentage

the value of a proportion multiplied by 100, followed by the percentage symbol (%)

Dispersion

the variation or spread in a set of data

Operational Definition

the way we represent a construct in a dataset (this definition is sometimes referred to as the "operationalization" of a construct)

Relative Frequency Distributions

to find the relative frequencies, simply take the class frequency and divide by the total number of observations

Conditions

two or more groups that a sample is randomly split into during an experiment

Inferential Statistics

used to estimate properties of a population, the methods used to estimate a property of a population on the basis of a sample

Descriptive Statistics

used to organize data into a meaningful form, methods of organizing, summarizing, and presenting data in an informative way

Dataset

when multiple related data are collected in one place

Descriptive & Inferential

What are the 2 types of Statistics?

Likert-type Item

A particular type of question found on surveys

"i" >=

(highest value - lowest value)/k

*Differentiate between descriptive and inferential statistics.*

*Descriptive statistics: The techniques used to describe the important characteristics of a set of data. This includes organizing the data values into a frequency distribution, computing measures of location, and computing measures of dispersion and skewness.* vs. *Inferential statistics: The methods used to estimate a property of a population on the basis of a sample.*

*Summarize quantitative variables with frequency and relative frequency distributions.*

*Frequency Distribution: a grouping of quantitative data into mutually exclusive and collectively exhaustive classes showing the number of observations in each class*, 4 Steps to make a Frequency Distribution: 1. Decide on the # of classes (use 2^k > n rule), 2. Determine the class interval ("i"), 3. Set the individual class limits, 4. Tally the data into classes and determine the number of observations in each class, *Relative Frequency Distribution: to find the relative frequencies, simply take the class frequency and divide by the total number of observations*

*Summarize qualitative variables with frequency and relative frequency tables.*

*Frequency Table- A table containing all categories, simple frequencies, relative frequencies and cumulative frequencies for a given variable.* & *Relative Frequency- Count of how many times a value appears in a variable, calculated as a proportion of the total number of cases; the number of observations of a specific outcome compared to total number of observations in the dataset*... *Did you notice how, in the frequency table, I put the classes in order already from most likely to least likely? This is because this is ordinal data. With ordinal data, make sure you always keep the order of the ordinal data in tables and graphs. If you are dealing with nominal data, I normally encourage you to organize graphic displays from greatest frequency to least greatest frequency in graphs and alphabetically in the tables*

*Compute the mean of grouped data*

*Geometric Mean- Variation of mean used for finding the average change of percentages, ratios, indexes, or growth rates over time*... *the first step is to determine the midpoint (also called a class mark) of each interval, or class. These midpoints must then be multiplied by the frequencies of the corresponding classes. The sum of the products divided by the total number of values will be the value of the mean*

*Display a frequency distribution using a histogram or frequency polygon.*

*Histograms- an illustration of quantitative data representing the range of a variable's values on the x-axis and frequencies of those ranges on the y-axis with no gaps between the bars.*, *Frequency Polygon- an illustration of quantitative data representing bins as segments on a line graph*, *(Bin- A range of values represented as a single bar in a histogram)*

*Compute and interpret the mean, the median, and the mode*

*Median- the midpoint of the values after they have been ordered from the minimum to the maximum values*, *Mode- the value of the observation that appears most frequently* , *Arithmetic Mean- (This is what is typically referred to as the mean) the average of a set of numerical values, calculated by adding all values together and dividing by the number of values in a set*

*Distinguish between nominal, ordinal, interval, and ratio levels of measurement. (And Dichotomous)*

*Nominal level of measurement: Data recorded at the nominal level of measurement is represented as labels or names. They have no order. They can only be classified and counted.*, *Ordinal level of measurement: Data recorded at the ordinal level of measurement is based on a relative ranking or rating of items based on a defined attribute or qualitative variable. Variables based on this level of measurement are only ranked or counted.*, *Interval level of measurement: For data recorded at the interval level of measurement, the interval or the distance between values is meaningful. The interval level of measurement is based on a scale with a known unit of measurement.*, *Ratio level of measurement: Data recorded at the ratio level of measurement are based on a scale with a known unit of measurement and a meaningful interpretation of zero on the scale.*

*Identify and compute measures of position*

*Percentile - a portion of a whole represented as its share of 100 (eg 50th percentile)*, *Decile - a portion of a whole represented as its share of 10 (eg 50th percentile is equal to 5th decile)*, *Quartile - a portion of a whole represented as its share of 4 (eg 50th percentile is equal to 5th decile is equal to 2nd quartile or median)*

*Classify variables as qualitative or quantitative, and discrete or continuous.*

*Qualitative variables: When data refers to qualities usually represented in words or letters.*, *Quantitative variables: when a variable can be reported numerically and mathematic functions can be performed*, *Discrete variable: can assume only certain values, and there are "gaps" between the values*, *Continuous variable Can assume any value within a specific range.*

*Construct a scatterplot for 2 quantitative variables*

*Scatterplot- an illustration of two qualitative or quantitative variables where one variable is represented on the x-axis and the other variable is represented on the y axis. Each case is represented with a mark at the intersection of its scores on those two variables* 2k>n k = about the number of bins you should use n = the number of observations in the sample i = the width of the bin H = the highest value L = the lowest value

*Describe the Shape of Data*

*Shape- Distribution shape is characterized by the number of peaks, symmetry, skewness and uniformity.*, *Normal- a common shape in which data are found resembling a bell or hill*, *Uniform- Data is spread equally across the entire range.*, *Symmetric-the data on the left side of the distribution perfectly mirrors the data on the right side of the distribution. If folded in half, it would be a mirror itself*

Geometric Mean

*The ______________ is used to find the rate of change from one period to another*, GM = [the nth root of (value at end of period/value at start of period)] - 1

*Define statistics and provide an example of how statistics is applied.*

*The science of collecting, organizing, analyzing, and interpreting data for the purpose of making more effective decisions.* How Statistics is Applied -->

*Compute a weighted mean.*

*Weighted Mean- Variation of arithmetic mean used when observations carry different weights (contributions to overall average).* w = (w1x1 + w2x2 + ... + wnxn) / (w1 + w2 + ... + wn)

Scatter Diagram

*a graphical tool to portray the relationship between two variables or bivariate data (both variables have to be measured with interval or ratio level scale -- must be quantitative)* - If the scatter of points moves from the *lower left to the upper right*, the variables under consideration are *directly or positively related* - If the scatter of points moves from the *upper left to the lower right*, the variables are *inversely or negatively related* - If the scatter of points are *all over the place*, there is *no correlation*

Frequency Table

*a grouping of qualitative data into mutually exclusive and collectively exhaustive classes showing the number of observations in each class*, *Mutually exclusive = the data fit in just one class (ex. can't have Waco and Texas)*, *Collectively exhaustive = there is a class for each value*

Coefficient of Skewness

*a measure of the symmetry of a distribution* - Pearson's Coefficient of Skewness: sk = 3(Mean - Median)/s - The coefficient of skewness can range from -3 to +3 - A value near -3 indicates considerable negative skewness - A value of 1.63 indicates moderate positive skewness - A value of 0 means the distribution is symmetrical

Bar Chart

*illustrates data by displaying categories on the x-axis as independent bars and simple frequencies on the y-axis*, The most common illustration for qualitative data, they represent simple frequencies as the height of a bar

Pie Chart

*illustrates data by displaying frequencies and relative frequencies or percentages as portions of a circle*, unless you specifically want this feature of a pie chart, you should stick with a bar chart, each number on a pue slice represents "f", while the percentage represents "rel.f" expressed as a percentage

Scatterplots

*place one variable of interest on the x-axis and a second variable of interest on the y-axis. This allows the viewer to see the relationship between the two variables*, used when you want to compare two variables to see how one variable behaves in relation to another (ex. as one variable increases, what does the other do), each point in a scatterplot represents the scores of two variables for a single case

Frequency Polygons

*similar to histograms in that they contain the same information (frequencies are represented on the y-axis, and bins are computed to group the data into manageable chunks), but instead of using bars to represent frequencies with bins, a line is drawn*, do not illustrate data as clearly as histograms, so unless you have a specific reason to use a polygon, you should use a histogram (ex. of a specific reason would be to compare two frequency distributions)

Histogram

*strongly resemble bar charts, but with no gaps between bars*, the most common illustration for a single quantitative variable, there are no gaps between the bars because each bar does not represent a unique and discrete variable, instead each bar represents a range of values called a bin

Geometric Mean

*the nth root of the product of n positive values*, GM = the nth square root of (x1)(x2)...(xn)

Arithmetic Mean (Mean)

- Must be interval or ratio level - All values are included in computing the mean - The mean is unique (only one per data set) - The sum of the deviations of each value from the mean is zero

Median

- Not affected by extremely large or small values - Can be computed for ordinal level data or higher

Mode

- Not always unique - Used to find most typical value - Can be used on all levels of measurement

Major Characteristics of the Variance:

- all observations are used in the calculation - the units are somewhat difficult to work with (they are the original units squared) - the units are in real world metrics - it cannot be negative

Major Characteristics of the Range:

- only 2 values are used in its calculation - it is influenced by extreme values - it is easy to compute and understand

one

How many populations are present in an experiment?

How to Create a Frequency Table:

1. Get a list of every unique value in your qualitative variable & put them in order (if your data are text based -- ex. apple, banana, this should be alphabetical order; if numeric, you should use numeric order), 2. Once you have your list of values in a meaningful order, determine the frequencies and relative frequencies of each of those values, and enter them into your table

Steps to Find the Standard Deviation:

1. Is it a sample or a population? 2. Find the average 3. Subtract the average from each observation (this gives you each observation's "deviation" from the average) 4. Square each "deviation" to give you the squared deviation from the mean 5. Add all the squared deviations to give you the "squared deviations from the mean" 6. Divide by n if it is a population. Divide by (n-1) if it is a sample. 7. Take the square root of the quotient (the square root of the quotient is the standard deviation or the average distance each observation is from the mean)

Common Measures of Location/Central Tendency:

1. Mean, 2. Median, 3. Mode

How to Construct a Frequency Table:

1. Sort data into classes, 2. Count the number in each class and report as the class frequency, 3. Convert each frequency to a relative frequency

an experiment, a quasi-experiment, or a correlational study

3 Major Approaches to Testing a Hypothesis in a Business Context:

Median

50% of values are larger than this, and 50% of values are lower than this

Construct

A characteristic or property of interest in a population that can not be measured directly *a proposed attribute of a person that often cannot be measured directly, but can be assessed using a number of indicators *

Likert-type Scale

A grouping of multiple Likert-type Items (Ex. "Strongly Agree, Agree, Disagree")

Statistic

A measurable characteristic of a sample

population, sample

All = ______________, Some = ______________

guessing

Basing decisions on faulty translations is no better than ____________

Formulas

Constants are usually used in ______________

*Explain why knowledge of statistics is important.*

Data are collected everywhere and statistical knowledge is required to make that information useful, statistical techniques are used to make professional and personal decisions, a knowledge of statistics is needed to understand the world and your career *Statistics will help you make more effective personal and professional decisions*

multiple

Data will be either Nominal, Ordinal, Interval, or Ratio (one of the 4), but not ______________!

Skew

Data with ________ are data that deviate from the normal distribution.

Histogram

Each class is depicted as a rectangle, with the height of the bar representing the number in each class

assumptions

For a statistic to be meaningful, its ________________ must be met.

The Emperical Rule (aka the "68-95-99.7 Rule")

For a symmetrical, bell-shaped frequency distribution, approximately 68% of the observations will lie within plus/minus one standard deviation of the mean, about 95% of the observations will lie within plus/minus two standard deviations of the mean, and practically all (99.7%) will lie within three standard deviations of the mean

Discrete Data

Have only specific meaningful values (Ex. the number of applicants for a specific position -- you can't have a fraction of a person)

one

How many populations are present in a correlational study?

two or more

How many populations are present in a quasi-experiment?

Variables

In formulas, fractions are used to illustrate algebraic relationships between ___________ instead of portions of whole.

Ratio

Interval data with a meaningful zero point -- they have all the properties of interval data (meaningful ordered labels with consistent distances between values) plus a meaningful zero. (quantitative) *Based on a scale with a known unit of measurement and a meaningful interpretation of zero. Indicates difference, direction of difference, amount of difference, and has an absolute zero * ~ Can be classified, ranked, counted, added, subtracted, multiplied, and divided ~

nominal

It is inappropriate to use the median with ____________ data, because this type of data have no order.

Population Mean

Mu ("u" shaped symbol) = sum of ("E" shaped symbol) x/ N - Mu = population mean - N = number of values in the population - x = any particular value - "E" shaped symbol (sigma) = sum of the x values in the population

level of the subject

Random assignment must take place at the _____________________, or it is still a quasi-experiment (not a real experiment)

Not starting with zero on a chart/graph (this exaggerates the differences between regions)

One of the most common ways to mislead people with graphics

Interval

Ordinal data with meaningful distances between values -- they have all the properties of ordinal data (meaningful labels that are ordered) plus meaningful distances (quantitative) *Distance between values is meaningful. This level of measurement is based on a scale with a known unit of measurement. Indicates difference, direction of difference, and amount of difference* ~ Can be classified, ranked, counted, added, subtracted ~

Lower Outlier Limit =

Q1 - 1.5 (IQR)

Upper Outlier Limit =

Q3 + 1.5 (IQR)

discrete

Qualitative Data (nominal and ordinal) are always ____________

discrete or continuous

Quantitative Data (interval and ratio) may be ___________________

backup choice

Quasi-Experiments and Correlational Studies are always a _________________ (but are sometimes the only choice)

sex, gender, race

What are some examples of Nominal Data?

Proportions

The decimal equivalents of the fractional values

Rational argument

The link between a construct an operational definition, just like the link between a population and a sample, cannot be demonstrated with numbers, it too is a ___________________

Experiment

The most rigorous approach, the only approach that allows you to conclude something causes something else. Unfortunately it is also the most difficult approach to implement.

Statistics

The science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions

Skewness

The shape of data - Symmetric (left side matches the right) - Positively Skewed (the mean is greater than the median, tapers off to positive numbers) - Negatively Skewed (the mean is lower than the median, tapers off to negative numbers) - Bimodal (higher probability on the exterior of the distribution, lower probability on the interior of the data)

Populations

The theoretical group that you want to draw conclusions about

Mode

The value of the observation that occurs the most frequently

Median

This is unique to a set of data

Proportions

To avoid confusion, most statisticians writing formulas represent most fractions as _____________

Summaries

Translating many numbers into fewer numbers

Illustrations

Translating many numbers into pictures

1. Bar Charts, 2. Pie Charts

Two Common Illustrations for Qualitative Data:

1. Histograms, 2. Frequency Polygons

Two Most Common Illustrations for Quantitative Variables:

Datasets

Variables are usually parts of ______________

causality

We can never make conclusions about __________ from correlational studies

Interval

We will consider survey data to be ____________

Correlational study

any study where multiple variables are measured at the same time without special treatment by the researcher

Continuous Data

can be subdivided infinitely without losing their meaning (Ex. money)

Negative Skew

data trails off to the left

Positive Skew

data trails off to the right

Nominal

data with meaningful labels -- the symbols used as data represent something in the real world (qualitative) *Data recorded is represented as labels or names. They have no order. They can only be classified or counted, indicates difference* ~ Can be classified & counted ~

Bar Charts have gaps between the bars because the data they represent are _____________ values

discrete, unique

Quantitative Data

quantities (usually represented as numbers)

Relative Frequency

each of the class frequencies is divided by the total number of observations, shows the fraction of the total number of observations in each class

Chebyshev's Theorem

for any set of observations (sample or population), the proportion of the values that lie within k standard deviations of the mean is at least 1-1/k^2, where k is any value greater than 1

Samples

groups of subjects drawn from a population at random; because subjects are much smaller than populations, we can more realistically collect measurements from the samples

"That doesn't look right" sense

helps you avoid making silly mistakes

Formulas

how statistical concepts are concisely represented

Weighted Mean

is found by multiplying each observation (x) by its corresponding weight (w)

Cummulative Frequency Distribution

is used to determine the number of observations that lie above or below a particular value in a dataset

2^k > n rule

k is the number of classes, n is the number of values in each dataset

Qualitative Data

labels or qualities (usually represented as words or letters)

Data

more than one datum

Ordinal

nominal data with a meaningful order -- they have all the properties of nominal data (meaningful labels) plus ordering (qualitative) *Based on relative ranking or rating of items based on a defined attribute or qualitative variable. Variables on this level of measurement are only ranked or counted. Indicates difference and direction of difference* ~ Can be classified, counted, and ranked ~

N

population size (the total number of cases within a particular population)


Related study sets

Ch 15, Ch 16, Ch 17 and Cumulative FINAL

View Set

HA Chapter 19 Assessing Thorax and Lungs

View Set

Chapter 58: Concepts of Care for Patients With Problems of the Thyroid and Parathyroid Glands

View Set

International Monetary System and Trade Policy

View Set