ISDS Final

Ace your homework & exams now with Quizwiz!

___ scale: strongest level of measurement, ___ data may be _____and ____with respect to some characteristic or trait. Differences between interval values are equal and meaningful. There is an ______ or defined starting point. "0" does mean "the absence of..." Thus, meaningful ratios may be obtained.

ratio, ratio, categorized, ranked, "absolute 0"

Unlike qualitative data, _____operations are valid on interval and ratio scaled data

arithmetic

The ____ is the most common measure of central tendency.

arithmetic mean

A ____ depicts the frequency or the relative frequency for each category of the qualitative data as a bar rising vertically from the horizontal axis.

bar chart

The ____ is the extent to which all the data values group around a typical central value.

central tendency

slide 36-42

ch 2

class relative frequency=

class frequency/total number of observations

_____ variable can take on any value within an interval, using a sufficient precision of measurement, no two continuous values are identical.

continuous

A _____ identifies the proportion or fraction of values that fall into each class

relative frequency distribution

a subset of the population

sample

A ___ is used to determine if two variables are related. Each point is a pairing: (x1,y1)(x2,y2).

scatter plot

The ____ is the pattern of the distribution of values from the lowest value to the highest value

shape

________ describes how data are distributed, two useful shape related statistics are : ___ and ____.

shape of a distribution, skewness, kurtosis

The Range: ____ measure of variation. Difference between the largest and smallest values.

simplest

measures the amount of asymmetry in a distribution. Mean< median = ______. Mean = median =______ and Median < mean = ______.

skewness, left skewed, symmetric, right skewed

numerical measure that describes a characteristic of a sample

statistic

___ is the methodology of extracting useful information from a data set.

statistics

A _____ provides a visual display of quantitative data. It gives an overall picture of the data's center and variability. Each value of the data set is separated into two parts: the ____ consists of the leftmost digits, while the _____ is the last digit.

stem and leaf diagram, stem, leaf

shape of distribution: _____- mirror image on both sides of its center

symmetric

Shape of distribution: typically ___ or ____.

symmetric or skewed

Shape is either ___ or ____

symmetrical, skewed

example: waiting in line at bookstore, wait time can be 10 minutes, 10.1 minutes, 10.11 minutes

continuous quantitative variables

Many subjects at same point in time, without regard to differences in time (subjects might include: individuals, households, firms, countries)

cross sectional data

A _______ specifies how many observations fall below the upper limit of a particular class.

cumulative frequency distribution

A ______ gives the proportion or fraction of values that fall below the upper limit of each class

cumulative relative frequency distribution

scatter plot: ____ relationship: As x increases, y increases at an increasing (or decreasing) rate. As x increases y decreases at an increasing (or decreasing) rate.

curvilinear

To do good statistics, you must: find the right _____, use the appropriate statistical ____, and clearly communicate the ___ information into ____ language.

data, tools, numerical, written

Use ____ statistics when all the data points are known. Collect data by using surveys, present data- tables/graphs, characterize data i.e., the sample mean.

descriptive

collecting, organizing, and presenting the data

descriptive statistics

Two branches of statistics: ______ and ______

descriptive statistics, inferential statistics

Reasons for sampling from the population: Too _____ to gather information on the population. Often ____ to gather information on the entire population.

expensive, impossible

data value is considered an _____ if its z-score is less than -3.0 or greater than +3.0/

extreme outlier

The larger the absolute value of the z-score, the ____the data value is from the mean.

farther

A____ for qualitative data groups data into categories and records how many observations fall into each category.

frequency distribution

____ variable is categorical-yes/no, day of week, year classification, insurance provider

qualitative

Types of variables: 1)______- gender, race, political affiliation 2)_____- test scores, age, weight (discrete, continuous)

qualitative, quantitative

___ values may be converted to ____ quantitative values for analysis purposes

qualitative, quantitative

____ variables (numerical) continuous- infinite number of values within some interval, weight, height, investment return, length of time

quantitative

_____ variable (numerical) discrete- countable number of distinct values, number of children in family, number of points scored in a basketball game, number of books purchased, number of classes taken. Takes on individually distinct values

quantitative

Interval Scale is used with ______ variables. Main drawback is the value of zero is ______. Zero point does not reflect a complete absence of what is being measured.

quantitative, arbitrarily chosen

variation: 3 types, measures of variation give info on the ____ or ___ or ___ of the data values.

range, variance, standard deviation, spread, variability, dispersions.

Q2

(n+1)/2

Q1

(n+1)/4

majority of observations will lie within ___ to ____ standard deviations.

+1, -1

If you draw one card from a deck what is the chance that you will draw a specific card? 1 in 52, a particular suit?

13/52 or 1/4

Q3

3(n+1)/4

Frequency distribution classes for quantitative data usually range from ____ to ____. Approximating the class width: ____

5, 20, large value-small value all over/ number of classes

1 standard deviation

68.2

3 standard deviation

99.7

It is now commonly believed that the average young person will have seen 100,000 beer commercials between the age of two and eighteen. This statistic was stated in: New York Times, Sport Illustrated, In Congressional Testimony. Is it true? Is it even feasible?

But just think, sixteen years or about 5,844 days occur between a person's second and eighteenth birthday. To see 100,000 beer commercials in that period, a person would have to see an average of more than seventeen a day! Common sense alone should have been enough to dispel the myth.

Researchers showed that infants who sleep with a nightlight are much more likely to develop myopia. Problem with conclusion:

This is an example of the correlation-to-causation fallacy. Even if two variables are highly correlated, one does not necessarily cause the other.

____ example: Categories- rainy, sunny, or cloudy. For each category's frequency, count the days that fall in that category. Then calculate relative frequency by dividing each category's frequency by the sample size. To express relative frequencies in terms of percentages, multiply each proportion by 100%. Note that the total of the proportions must add to 1.0 and the total percentages must add to 100%.

frequency distribution

A ____ for quantitative data groups data into intervals called classes, an records the number of observations that fall into each class. Guidelines when constructing these: classes are _______, classes are ______.

frequency distribution, mutually exclusive, exhaustive

Ratio Scale: The following variables are measured on a ratio scale: ____Examples- weight, time, and distance. _____Examples- sales, profits, and inventory levels.

general, business

A ____ is a visual representation of a frequency or a relative frequency distribution. ____ represents the respective class frequency (or relative frequency). ____ represents the class width.

histogram, bar height, bar width

Why the range can be misleading: ___ the way in which data are distributed. sensitive to ____.

ignore, outliers

Headline of newspaper states "What global warming?" after record amounts of snow in 2010. Problem with conclusion:

incorrect to draw conclusions based on one data point

Use ___ statistics when the actual count is not easily obtainable. Often times inferences are made about college students based on data obtained in a sample.

inferential

We could use the average age of students in this classroom and make an inference regarding the average age of LSU students of the average age of college students in general.

inferential

drawing conclusions about a population based on sample data from that population

inferential statistics

_____ scale: consider the ______scale of temperature, this scale is interval because the data are ranked and differences (+ or -) may be obtained. But there is no "absolute 0" (What does 0 degrees this mean?)

interval, fahreheit

_____ Scale: data may be categorized and ranked with respect to some characteristic or trait. Differences between ____values are equal and meaningful. No "Absolute 0" or starting point defined. ______ratios may not be obtained.

interval, interval, meaningful

measures the relative concentration of values in the center of a distribution as compared with the tails

kurtosis

Statistics is the ____ of data. It is the study of collecting, analyzing, presenting, and interpreting _____. It is a science of getting useful ______ from data.

language, data, information

scatter plot: ____ relationship: upward or downward sloping trend of the data. Positive linear relationship: as x increases, so does y. Negative linear relationship: as x increases, y decreases.

linear

Sample Variance: Average (approximately) of squared deviations of values from the _____. It _____ take into account how all the data values are distributed. SS= _______ is the top part of the equation which is the summation of all squared differences between x values and the mean. Can never be ___ because values are squared. Will equal ___ only if all observations have the same value (no variation). Sample variance is the sum of the squared differences around the mean divided by sample size minus 1.

mean

The ___ is generally used, unless extreme values (outliers) exist. The ___ is often used, since it is not sensitive to extreme values. For example, ____ home prices may be reported for a region; ____ home prices may be reported for a region it is less sensitive to outliers. In some situations it makes sense to report both the ___ and ____.

mean, median, median, mean, median

There are 3 measures of central tendency:

mean, median, mode

central tendency: 3 types

mean, median, mode

Most data sets tend to cluster around a central tendency, when people talk about: An "average value"- they are talking about the ____. The "middle value"- they are talking about the _____. The "most frequent value"- they are talking about the ____. ____ is the sum of values over the number of values =mean/average

mean, median, mode, x-bar

In an ordered array, the ____ is the middle number. (50% above, 50% below) When data set contains odd number- ______, when data set contains even number- take _____of the 2 middle values. It ______ affected by extreme values.

median, middle value, average, is not

measure of variation: the ___ the data is spread out, the greater the range, variance, and standard deviation.

more

measure of variation: the ____ the data is concentrated, the smaller the range, variance, and standard deviation.

more

The mean: the _______ of central tendency. _____ = the sum of values divided by the number of values. Affected by _______ (outliers because it is the only measure in which all values play an equal role. Acts like the _____ for the set. Avoid using the mean when you have ______ (outliers) as a measure of central tendency.

most common measure, mean, affected by extreme values, "balance point", extreme values

Sample standard deviation: _____ used measure of variation. Shows variation about the ____. Is the square root of the ____. Has the same uits as the _____.

most commonly, mean, variance, original data

The mode: Value that occurs _______. _____ affected by extreme values. Used for either ___ or ___ data. There may be no mode. There may be several modes.

most often, not, numerical, categorical

Location of median when the values are in numerical order (smallest to largest) , not the value of the median, only the position of the median in the ranked data.

n+1/2

scatter plot:___ relationship: data are randomly scatter with no discernible patter. There is no apparent relationship between x and y.

no

scales of measure: Qualitative Variables: ___&_____. Quantitative Variables: ____&______.

nominal, ordinal, interval, ratio

When summarizing, typically count ____or calculate the percentage of persons or objects that fall into each possible category. Unable to perform meaningful arithmetic operations such as adding/subtracting.

number

An ____ is a visual representation of a cumulative frequency or a cumulative relative frequency distribution. Plot the cumulative frequency (or cumulative relative frequency) of each class above the upper limit of the corresponding class. The neighboring points are then connected.

ogive

___ scale data may be categorize and ranked with respect to some characteristic or trait, for example: instructors are often evaluated on this scale (excellent, good, fair, poor)

ordinal

___ scale: differences between categories are meaningless because the actual numbers used may be arbitrary: there is no objective way to interpret the difference between instructor quality.

ordinal

zscore is useful in identifying _____

outlier

The range: measures total spread in the set of data; sensitive to _____. Range does not consider how data are distributed between smallest and largest values. ___ measure of variation

outliers, simplest

numerical measure that describes a characteristic of a population

parameter

A _____ is a segmented circle whose segments portray the relative frequencies of the categories of some qualitative variable.

pie chart

A _____ is a visual representation of a frequency or a relative frequency distribution. Plot the class midpoints on x-axis and associated frequency (or relative frequency) on y-axis. Neighboring points are connected with a straight line.

polygon

consists of all items of interest

population

shape distribution: ____ skewed- data form a long narrow tail to the right. ___ skewed- data form a long, narrow tail to the left.

positively, negatively

the least sophisticated level of measurement, data are simply categories for grouping the data.

the nominal scale

A gambler predicts that he will roll a 7 on his next roll of the dice since he was unsuccessful in the last three rolls. Problem with conclusion:

the probability of rolling a 7 stays constant with each roll of the dice.

one subject over several time periods, daily, weekly, monthly, quarterly, annual (monthly sales, daily price, weekly rate)

time series data

With knowledge of statistics: avoid risk of making ____ decisions and ____ mistakes. Differentiate between sound ____ conclusions and ____ conclusions.

uninformed, costly, statistical, questionable

___ are expressed in words but coded into numbers for processing purposes.

values

A ____ is the general characteristic being observed on an object of interest.

variable

2 common measures of variation:

variance, standard deviation

The ____ is the amount of dispersion or scattering of values

variation

____ is the nuber of standard deviations a data value is from the mean

z-score

measure of variation: If the values are all the same (no variation, all these measures will be ____, none of these measures are ever _____.

zero, negative


Related study sets

Biomolecules (What we are really made of)

View Set

chapter1 domain 1 threat management

View Set

AHIP 2023 Module 2, AHIP 2023 Module 1, AHIP 2023 Module 3, AHIP 2023 module 4, AHHIP 2023 Module 5

View Set

(14) 3. Describe and illustrate how to use financial statement analysis to assess liquidity. (1-4) 4. Describe and illustrate how to use financial statement analysis to assess solvency. (5-8)

View Set

In-Text Citations & APA 7th Writing

View Set

WH 7 L 8 the protestant reformation

View Set

BIO 1030 Final practice questions

View Set