BUS 210 -Chapter 2 and 3 Study Guide for Exam 1
4) Pivot Tables and Pivot Charts
The pivot table is an Excel® tool that allows you to break data down by categories. Sometimes pivot tables are used to display tables of counts, often called crosstabs or contingency tables. However, crosstabs typically list only counts, whereas pivot tables can list counts, sums, averages, and other summary measures.
d. Minimum
The smallest number in a set of data
5) Descriptive Measures for Numerical Variables
There are many ways to summarize ______________ variables, both with numerical summary measures and with charts. a. Count and Percentage Distributions b. Histograms
Categorical Data
_________ variables represent types of data which may be divided into groups such as Gender, Race, Regions, and States.
2) Relationships among Categorical Variables and a Numerical Variable
a. Comparison Problem b. Stacked or Unstacked Data Formats The data are stacked if there are two "long" variables, such as Gender and Salary. The idea is that the male salaries are stacked in with the female salaries. This is the format you will see in the vast majority of situations. You will occasionally see data in unstacked format, when there are two "short" variables, such as Male Salary and Female Salary.
1) Relationships among Categorical Variables
a. Cross tabulation and Contingency Tables b. Row and Column percentages
3) Types of Data
a. Numerical b. Categorical c. Cross-sectional Data d. Time series Data
3) Relationships among Numerical Variables
a. Scatterplots and Trend Lines in Excel b. Correlation and Covariance
c. Cross-sectional Data
are data on a cross-section of a population at a distinct point in time.
d. Time series Data
data collected over time. i. Time Series Graphs
Quartiles
each of four equal groups into which a population can be divided according to the distribution of values of a particular variable.
Percentiles
each of the 100 equal groups into which a population can be divided according to the distribution of values of a particular variable.
Kurtosis
has to do with the "fatness" of the tails of the distribution relative to the tails of a normal distribution. A distribution with high kurtosis has many more extreme observations. In Excel®, kurtosis can be calculated with the KURT function.
Bin width
help divide the data into equal bins. (Max-min)/number of bins
a. Within Numerical
i. Discrete ii. Continuous iii. Binned
Categorical Ordinal data
i. Survey questionnaire, ratings, etc.
Categorical Nominal data
ii. States, Gender. etc.
Histogram
is a graphical representation of the distribution of numerical data. Bins: Group the continuous values into smaller number of bins or smaller intervals. Bin width: help divide the data into equal bins. (Max-min)/number of bins Bin Frequency: number of observation that fall into the bin
Trend Lines in Excel
is a line or curve that "fits" the scatter as well as possible. This could be a straight line, or it could be one of several types of curves.
Scatterplots
is a scatter of points, where each point denotes the values of an observation for two selected variables. It is a graphical method for detecting relationships between two numerical variables. The two variables are often labeled generically as X and Y, so a ______________ is sometimes called an X-Y chart. The purpose of a ______________ is to make a relationship (or the lack of it) apparent.
Outlier
is a value or an entire observation (row) that lies well outside of the norm. Some statisticians define as any value more than three standard deviations from the mean, but this is only a rule of thumb.
i. Boxplots (or box-whisker plot)
is an alternative type of chart for showing the distribution of a variable.
Mode
is the value that occurs most often. If no number is repeated, then there is no _____ for the list
b. Correlation and Covariance
measure the strength and direction of a linear relationship between two numerical variables. The correlation is always between -1 and +1. The closer it is to either of these two extremes, the closer the points in a scatterplot are to a straight line. Excel® has a built-in CORREL function
Bin Frequency
number of observation that fall into the bin
iii. Skewness
occurs when there is a lack of symmetry. A variable can be skewed to the right (or positively skewed) because of some really large values (e.g., really large baseball salaries). Or it can be skewed to the left (or negatively skewed) because of some really small values (e.g., temperature lows in Antarctica). In Excel®, a measure of skewness can be calculated with the SKEW function.
iii. Binned
or discretized variable Numerical variable that has been categorized into discrete categories called ______ Excel functions: VLOOKUP, COUNTIF
Mean
sum divided by the count.
Data set
(Usually) a rectangular array of data, with variables in columns, observations in rows, and variable names in the top row
Example: calculate the Minimum of 2,4,6,4,3,7,5,6,8,6,1,4,5
1
Describing the Distributions of a Single Variable
1) Populations and Samples 2) Data Sets, Variables, and Observations 3) Types of Data 4) Descriptive Measures for Categorical Variables 5) Descriptive Measures for Numerical Variables
c. Calculate the MODE of the following: 2,4,6,4,3,7,5,6,8,6,1,4,5
4
c. Calculate the MEAN of the following: 2,4,6,4,3,7,5,6,8,6,1,4,5
4.69
c. Calculate the MEDIAN of the following: 2,4,6,4,3,7,5,6,8,6,1,4,5
5
Example: calculate the Maximum of 2,4,6,4,3,7,5,6,8,6,1,4,5
8
Dummy Variables
A variable coded 1 or 0: 1 for observations in a category, 0 for observations not in the category iii. (0,1 for Female and Male)
Rule 1
Approximately 68% of the observations are within one standard deviation of the mean.
Rule 2
Approximately 95% of the observations are within two standard deviations of the mean.
Rule 3
Approximately 99.7% of the observations are within three standard deviations of the mean.
Variable (or field or attribute)
Attribute or measurement of members of a population, such as height, gender, or salary
b. Within Categorical
Categorical Ordinal data Categorical Nominal data Dummy Variables
i. Numerical Discrete data
Count data: 0, 1, 2, 3... Example: Number of children, Number of Students, number of accidents.
4) Descriptive Measures for Categorical Variables
Count the number of observations. (The resulting counts can be reported as "raw counts" or as percentages of totals.) a. Count (Frequency) and Percentage Distributions b. Column Chart c. Pie Chart
ii. Numerical Continuous Data
Data that can take any value (within a range) Example: Weight, height, salaries, prices, ratio
Bins
Group the continuous values into smaller number of bins or smaller intervals.
c. Empirical Rules
If the values of a variable are approximately normally distributed (symmetric and bell-shaped), then the following rules hold: Approximately 68% of the observations are within one standard deviation of the mean. Approximately 95% of the observations are within two standard deviations of the mean. Approximately 99.7% of the observations are within three standard deviations of the mean.
Median
If there is an odd number of data values then the it will be the value in the middle. If there is an even number of data values the it is the mean of the two data values in the middle
Population
Includes all objects of interest in a study—people, households, machines, etc. Examples - all GMU Students,
Observation (or case or record)
List of all variable values for a single member of a population
a. Measures of Variability
Range, Interquartile Range, Variance and Standard Deviation
Sample
Representative subset of population, usually chosen randomly. Examples - random sample of GMU students.
b. Difference for Measures between a Population and Sample
Sample Variance, Population Variance
Data type
Several categorizations are possible: numerical versus categorical, discrete versus continuous, cross-sectional versus time series; categorical can be nominal or ordinal