CHAPTER 2 - DESCRIPTIVE STATISTICS

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Defining bins

# of bins: Bins are formed by specifying the ranges used to group the data. As a general guideline, we recommend using from 5 to 20 bins. For a small number of data items, as few as five or six bins may be used to summarize the data. For a larger number of data items, more bins are usually required. The goal is to use enough bins to show the variation in the data, but not so many that some contain only a few data items. ex: n=20 use 5 bins width:we recommend that the width be the same for each bin. Thus, the choices of the number of bins and the width of bins are not independent decisions. A larger number of bins means a smaller bin width and vice versa. To determine an approximate bin width, we begin by identifying the largest and smallest data values * largest data value-smallest data value/# of bins* bin limits: The lower bin limit identifies the smallest possible data value assigned to the bin. The upper bin limit identifies the largest possible data value assigned to the class

histogram

A graphical presentation of a frequency distribution, relative frequency distribution, or percent frequency distribution of quantitative data constructed by placing the bin intervals on the horizontal axis and the frequencies, relative frequencies, or percent frequencies on the vertical axis

boxplot

A graphical summary of data based on the quartiles of a distribution

mean (arithmetic mean)

A measure of central location computed by summing the data values and dividing by the number of observations.

mode

A measure of central location define as the value that occurs with the greatest frequency

median

A measure of central location provided by the value in the middle when the data or arranged in ascending order (smallest to largest)

geometric mean

A measure of central location that is calculated by finding the nth root of the product of *n* values

covariance

A measure of linear association between two variables. Positive values indicate a positive relationship; negative values indicate a negative relationship

co-efficent of variation

A measure of relative variability computed by dividing the standard deviation by the mean and multiplying by 100

Skewness

A measure of the lack of symmetry in a distribution

variance

A measure of variability based on the squared deviations of the data values about mean

range

A measure of variability defined to be the largest value minus the smallest value

Random variable/uncertain variable

A quantity whose values are not known with certainty

percent frequency distribution

A tabular summary of data showing the percentage of observations in each of several non-overlapping classes. percent frequency= relative frequency x 100

cumulative frequency distribution

A tabular summary of quantitative data showing the number of data values that are less than or equal to the upper class limit of each bin (class) summing the relative frequencies in the relative frequency distribution or by dividing the cumulative frequencies by the total number of items.

z-score

A value computed by dividing the deviation about the mean (xi-x) by the standard deviation S. A z-score is referred to as a standardized value and denotes the number of standard deviations that xi is from the mean STANDARDIZE

outliers

And unusually large or unusually small data value data value with z-score less than -3 or greater than +3 is an outlier the location of each outlier is shown with an asterisk (*)

Categorial

A Wall Street Journal subscriber survey asked 46 questions about subscriber characteristics and interests. State whether each of the following questions provides categorical or quantitative data. What type of vehicle are you considering for your next purchase? Nine response categories include sedan, sports car, SUV, minivan, and so on.

Categorial

A Wall Street Journal subscriber survey asked 46 questions about subscriber characteristics and interests. State whether each of the following questions provides categorical or quantitative data. When did you first start reading the WSJ? High school, college, early career, midcareer, late career, or retirement?

Categorial

Cross-sectional data and time series data

Cross-sectional data: Data collected from several entities at the same or approximately the same point in time ex:surverytoallstudentsonsameday Time series data: Data collected over several time periods -graphs of time series data are frequently found in business and economic publications -graphs help analysts understand what happened in the past, identify trends over time in project future levels for the time series.

Cross-sectional data

Data collected from several entities at the same or approximately the same point in time

Time series data

Data collected over several time periods -graphs of time series data are frequently found in business and economic publications -graphs help analysts understand what happened in the past, identify trends over time in project future levels for the time series.

Categorial Data

Data on which arithmetic operations cannot be performed

Nominal Scale

Data that can only be categorized or grouped ex: stocks can be traded on the nasdaq or nyse - a number can be assigned to a characteristic being evaluated for grouping purposes

Experimental study

Sources of data: Experimental study and Non-experimental study/observational study

Experimental study: a variable of interest is first identified -then one or more other variables are identified and controlled or manipulated so that data can be obtained about how they influence the variable of interest Non-experimental study/observational study: makes no attempt to control the variables of interest -a survey is perhaps the most common type of observational study

Non-experimental study/observational study

Non-experimental study/observational study: makes no attempt to control the variables of interest -a survey is perhaps the most common type of observational study

Population and sample data

Population: all elements of interest Sample: a subset of the population -Random sampling: a sampling method to gather a representative sample of the population data

A Wall Street Journal subscriber survey asked 46 questions about subscriber characteristics and interests. State whether each of the following questions provides categorical or quantitative data. How long have you been in your present job or position?

Quantitative

Quantitative data

Quantitative data: data on which numeric and arithmetic operations such as addition, subtraction, multiplication, and division can be performed

Quantitative and categorial data

Quantitative data: data on which numeric and arithmetic operations such as addition, subtraction, multiplication, and division can be performed ex:volume,average Categorial data: Data on which arithmetic operations cannot be performed ex:industry(grouping)

Skewness is an important characteristic of the shape of a distribution

Ordinal scale

Stronger level of measurement than nominal; able to categorize in rank data with respect to a characteristic or trait. ex: ratings of excellence, good, fair, and poor - difficult to determine difference between values; differences are not equal

imputation

Systematic replacement of missing values what values that seem reasonable

Quartiles

The 25th, 50th, and 75th percentiles, referred to as the first quartile, the second quartile (median), and third quartile, respectively. The quartiles can be used to divide a data set into four parts, with each part containing approximately 25% of the data. QUARTILE.EXC

interquartile range

The difference between the upper quartile and the lower quartile.

Variation

The difference in a variable measure over observations

growth factor

The percentage increase of a value over a period of time is calculated using the formula ( GROWTH FACTOR - 1 ). A growth factor less than one indicates negative growth where as a growth factor greater than one indicates positive growth. the growth factor cannot be less than zero.

dimension reduction

The process of removing variables from the analysis without losing crucial information.

Missing completely at random (MCAR)

The tendency for an observation to be missing a value of some variable is entirely random.

Missing Not at Random (MNAR)

The tendency for an observation to be missing a value of some variable is related to the missing value.

Missing at random (MAR)

The tendency for an observation to be missing a value of some variable is related to the value of some other variable(s) in the data.

standard deviation

a Measure of variability computer by taking the positive square root of the variance

Variable

a characteristic or quantity of interest that can take on different values

scatter chart

a graphical presentation of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other on the vertical axis (scatter chart or scatter plot)

Empirical rule

a rule that can be used to compute the percent of data values that must be within 1,2 or 3 standard deviations of the mean for data that exhibit a bell shaped distribution -Approximately 68% of the data values will be within 1 standard deviation of the mean. -Approximately 95% of the data values will be within 2 standard deviations of the mean. -Almost all of the data values will be within 3 standard deviations of the mean.

Observation

a set of values corresponding to a set of variables

Sample

a subset of the population -Random sampling: a sampling method to gather a representative sample of the population data

Relative frequency distribution

a tabular summary of data showing the fraction or proportion of data values in each of several non-overlapping bins relative frequency of a bin= frequency of the bin/n

frequency distribution

a tabular summary of data showing the number of data values (frequency) in each of several non-overlapping bins ex: how many times the same sample appear

percentile

a value that approximately p% of the obsservations have values less than the pth percentile; hence approximately ( 100-p)% of the observations have values greather than the pth percentile. The 50th percentile is the median. PERCENTILE.EXC LOCATION OF THE PTH PERCENTILE L sub p = p/100(n+1)

Population

all elements of interests

mesokurtic distributions

are NORMAL disbutions

Interval scale

data can be categorized and ranked AND the differences between scale values are equal; adding and subtracting are meaningful BUT the value of zero is arbitrarily chosen and may not reflect a complete absense of what is being measured ex: fahrenheit scale for temperature ( 0 degree does not reflect the absense of temperature )

continuous data

data that can have almost any numeric value (infinite) and can be meaningful subdivided into finer and finer units of measurements. Ex: sales, profits, and time

discrete data

data that has values that are finite, unique and different from other values ex: the question, "number of children" on a survey form. ( this question would yield discrete values from each respondent, as you can only have 1,2,3,4, etc. children-not 1.28 or 4.56 children )

Data

facts and figures collected collected, analyzed, and summarized for presentation and interpretation

Platykurtic distributions

have NEGATIVE kurtosis

Leptokutic disbutions

have excess POSITIVE kurtosis

illegitimately missing data

missing data that do not occur naturally

legitimately missing data

missing data that occur naturally

correlation coefficient

standardized measure of linear association between two variables that takes on values between -1 and +1. Values near -1 indicate a strong negative linear relationship, values near +1 in indicate a strong positive relationship and values near zero indicate the lack of a linear relationship.

Ratio

strongest scale of measurement; categorized, ranked, equal distances between values and meaningful zero, that is true absence of what is being measured; can be added, subtracted, multiplied, or divided ex: sales, profits, and inventory levels

Kurtosis

the heaviness or lightness of the tails of the distribution that make the distribution flatter or taller than the typical normal distribution

bins

the non-overlapping groupings of data used to create a frequency distribution. Bins for categorical data are also used known as CLASSES. ex: coca cola, diet coke, dr. pepper, Pepsi, sprite

CHAPTER 2 - DESCRIPTIVE STATISTICS

संबंधित स्टडी सेट्स

A&P Pearson questions

US Presidents

vol5 ch5 review

COMM 315 (interpersonal) quizzes

ch 7 econ study

nutrition midterm

American Revolution

Unit 5 U.S

NAB KH

Geheimer unit 0 ap psych

ISMG: Chapter 10: Software in Flux: Partly Cloudy and Sometimes Free

Macroeconomics Exam 2

Government and Real Estate

Illinois Statues and Regulations Pertinent to Life Insurance Only

Stable Internal Environment

MGT 405 - Sloan (Ch. 5-7)

Google Interview

Personal Finance Ch 10

Fruits + Colour (with pinyin)

AST Test #3 Part 2