BUS 210 -Chapter 2 and 3 Study Guide for Exam 1

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

4) Pivot Tables and Pivot Charts

The pivot table is an Excel® tool that allows you to break data down by categories. Sometimes pivot tables are used to display tables of counts, often called crosstabs or contingency tables. However, crosstabs typically list only counts, whereas pivot tables can list counts, sums, averages, and other summary measures.

d. Minimum

The smallest number in a set of data

5) Descriptive Measures for Numerical Variables

There are many ways to summarize ______________ variables, both with numerical summary measures and with charts. a. Count and Percentage Distributions b. Histograms

Categorical Data

_________ variables represent types of data which may be divided into groups such as Gender, Race, Regions, and States.

2) Relationships among Categorical Variables and a Numerical Variable

a. Comparison Problem b. Stacked or Unstacked Data Formats The data are stacked if there are two "long" variables, such as Gender and Salary. The idea is that the male salaries are stacked in with the female salaries. This is the format you will see in the vast majority of situations. You will occasionally see data in unstacked format, when there are two "short" variables, such as Male Salary and Female Salary.

1) Relationships among Categorical Variables

a. Cross tabulation and Contingency Tables b. Row and Column percentages

3) Types of Data

a. Numerical b. Categorical c. Cross-sectional Data d. Time series Data

3) Relationships among Numerical Variables

a. Scatterplots and Trend Lines in Excel b. Correlation and Covariance

c. Cross-sectional Data

are data on a cross-section of a population at a distinct point in time.

d. Time series Data

data collected over time. i. Time Series Graphs

Quartiles

each of four equal groups into which a population can be divided according to the distribution of values of a particular variable.

Percentiles

each of the 100 equal groups into which a population can be divided according to the distribution of values of a particular variable.

Kurtosis

has to do with the "fatness" of the tails of the distribution relative to the tails of a normal distribution. A distribution with high kurtosis has many more extreme observations. In Excel®, kurtosis can be calculated with the KURT function.

Bin width

help divide the data into equal bins. (Max-min)/number of bins

a. Within Numerical

i. Discrete ii. Continuous iii. Binned

Categorical Ordinal data

i. Survey questionnaire, ratings, etc.

Categorical Nominal data

ii. States, Gender. etc.

Histogram

is a graphical representation of the distribution of numerical data. Bins: Group the continuous values into smaller number of bins or smaller intervals. Bin width: help divide the data into equal bins. (Max-min)/number of bins Bin Frequency: number of observation that fall into the bin

Trend Lines in Excel

is a line or curve that "fits" the scatter as well as possible. This could be a straight line, or it could be one of several types of curves.

Scatterplots

is a scatter of points, where each point denotes the values of an observation for two selected variables. It is a graphical method for detecting relationships between two numerical variables. The two variables are often labeled generically as X and Y, so a ______________ is sometimes called an X-Y chart. The purpose of a ______________ is to make a relationship (or the lack of it) apparent.

Outlier

is a value or an entire observation (row) that lies well outside of the norm. Some statisticians define as any value more than three standard deviations from the mean, but this is only a rule of thumb.

i. Boxplots (or box-whisker plot)

is an alternative type of chart for showing the distribution of a variable.

Mode

is the value that occurs most often. If no number is repeated, then there is no _____ for the list

b. Correlation and Covariance

measure the strength and direction of a linear relationship between two numerical variables. The correlation is always between -1 and +1. The closer it is to either of these two extremes, the closer the points in a scatterplot are to a straight line. Excel® has a built-in CORREL function

Bin Frequency

number of observation that fall into the bin

iii. Skewness

occurs when there is a lack of symmetry. A variable can be skewed to the right (or positively skewed) because of some really large values (e.g., really large baseball salaries). Or it can be skewed to the left (or negatively skewed) because of some really small values (e.g., temperature lows in Antarctica). In Excel®, a measure of skewness can be calculated with the SKEW function.

iii. Binned

or discretized variable Numerical variable that has been categorized into discrete categories called ______ Excel functions: VLOOKUP, COUNTIF

Mean

sum divided by the count.

Data set

(Usually) a rectangular array of data, with variables in columns, observations in rows, and variable names in the top row

Example: calculate the Minimum of 2,4,6,4,3,7,5,6,8,6,1,4,5

1

Describing the Distributions of a Single Variable

1) Populations and Samples 2) Data Sets, Variables, and Observations 3) Types of Data 4) Descriptive Measures for Categorical Variables 5) Descriptive Measures for Numerical Variables

c. Calculate the MODE of the following: 2,4,6,4,3,7,5,6,8,6,1,4,5

4

c. Calculate the MEAN of the following: 2,4,6,4,3,7,5,6,8,6,1,4,5

4.69

c. Calculate the MEDIAN of the following: 2,4,6,4,3,7,5,6,8,6,1,4,5

5

Example: calculate the Maximum of 2,4,6,4,3,7,5,6,8,6,1,4,5

8

Dummy Variables

A variable coded 1 or 0: 1 for observations in a category, 0 for observations not in the category iii. (0,1 for Female and Male)

Rule 1

Approximately 68% of the observations are within one standard deviation of the mean.

Rule 2

Approximately 95% of the observations are within two standard deviations of the mean.

Rule 3

Approximately 99.7% of the observations are within three standard deviations of the mean.

Variable (or field or attribute)

Attribute or measurement of members of a population, such as height, gender, or salary

b. Within Categorical

Categorical Ordinal data Categorical Nominal data Dummy Variables

i. Numerical Discrete data

Count data: 0, 1, 2, 3... Example: Number of children, Number of Students, number of accidents.

4) Descriptive Measures for Categorical Variables

Count the number of observations. (The resulting counts can be reported as "raw counts" or as percentages of totals.) a. Count (Frequency) and Percentage Distributions b. Column Chart c. Pie Chart

ii. Numerical Continuous Data

Data that can take any value (within a range) Example: Weight, height, salaries, prices, ratio

Bins

Group the continuous values into smaller number of bins or smaller intervals.

c. Empirical Rules

If the values of a variable are approximately normally distributed (symmetric and bell-shaped), then the following rules hold: Approximately 68% of the observations are within one standard deviation of the mean. Approximately 95% of the observations are within two standard deviations of the mean. Approximately 99.7% of the observations are within three standard deviations of the mean.

Median

If there is an odd number of data values then the it will be the value in the middle. If there is an even number of data values the it is the mean of the two data values in the middle

Population

Includes all objects of interest in a study—people, households, machines, etc. Examples - all GMU Students,

Observation (or case or record)

List of all variable values for a single member of a population

a. Measures of Variability

Range, Interquartile Range, Variance and Standard Deviation

Sample

Representative subset of population, usually chosen randomly. Examples - random sample of GMU students.

b. Difference for Measures between a Population and Sample

Sample Variance, Population Variance

Data type

Several categorizations are possible: numerical versus categorical, discrete versus continuous, cross-sectional versus time series; categorical can be nominal or ordinal


Ensembles d'études connexes

PNU 133 Honan PrepU Professional Behaviors / Professionalism

View Set

Chapter 27 upper respiratory problems

View Set

The Vegetable Industry #8: ID Plant, Fruit, and Seeds (AEST Ag Systems)

View Set