MGSC2301 chapter 1 quiz
Which of the following is an example of continuous data? (a) number of children (b) amount of time it takes to assemble an IKEA bookcase (c) total number of phone calls made in a week (d) number of bathrooms in a house (e) all of the above.
(b) amount of time it takes to assemble an IKEA bookcase
A researcher should not use the mean as a measure of central tendency unless the data is at least on a(n) ______ scale: (a) nominal (b) ordinal (c) interval (d) ranking (e) ratio
(c) interval
Which of the following is a parameter? (a) sample mean (b) sample standard deviation (c) population mean (d) sample median (e) sample mode
(c) population mean
A manufacturer of supercomputers wants to sample 20 out of 500 that have been manufactured this year. An ID number is assigned to each of the 500 computers and then 20 random numbers are generated to see which computers to choose for the sample. This is an example of a: (a) random data listing (b) frequency distribution (c) simple random sample (d) census (e) none of the above.
(c) simple random sample
Which of the following is an example of interval scale data? (a) religion of Americans (b) ethnicity of Americans (c) temperature in Centigrade (Celsius) (d) height of Americans (e) all of the above.
(c) temperature in Centigrade (Celsius)
Which of the following is an example of discrete data? (a) circumference of American women's wrists (b) amount of time spent playing computer games (c) total number of phone calls made in a week (d) length of elephant tusks (e) all of the above.
(c) total number of phone calls made in a week
Which of the following is an example of ratio scale data? (a) Score on an accounting final (b) temperature in Fahrenheit (c) weight of cows (d) occupation (e) all of the above.
(c) weight of cows
observational studies
(nonexperimental) studies no attempt is made to control or influence the variables of interest. surveys Studies of smokers and nonsmokers are observational studies because researchers do not determine or control who will smoke and who will not smoke.
Statistics
-The term statistics can refer to numerical facts such as averages, medians, percentages, and maximums that help us understand a variety of business and economic situations. -Statistics can also refer to the art and science of collecting, analyzing, presenting, and interpreting data.
Parameter
A characteristic of a population. E.g., the population mean, μ. - The population mean, μ and the population standard deviation, σ, are two examples of population parameters. - If you want to determine the population parameters, you have to take a census of the entire population. Taking a census is very costly.
Probability density function
A continuous probability distribution. The probability is interpreted as "area under the curve." Some continuous probability distributions: Normal distribution, Standard Normal (Z) distribution, Student's t distribution, Chi-square ( χ2 ) distribution, F distribution.
Statistic
A measure derived from the sample data. E.g., the sample mean, X ̄
Stochastic process
A repetitive process which generates outcomes (called events) that are not identical, and not individually predictable with certainty, but that may be described in terms of relative frequencies.
Probability Sample
A sample collected in such a way that every element in the population has a known chance of being selected.
Simple Random Sample
A sample collected in such a way that every element in the population has an equal chance of being selected.
How is statistics used in accounting economics and finance
Accounting Public accounting firms use statistical sampling procedures when conducting audits for their clients. Economics Economists use statistical information in making forecasts about the future of the economy or some aspect of it. Finance Financial advisors use price-earnings ratios and dividend yields to guide their investment advice.
How is statistics used for marketing, production, and information systems
Marketing Electronic point-of-sale scanners at retail checkout counters are used to collect data for a variety of marketing research applications. Production A variety of statistical quality control charts are used to monitor the output of a production process. Information Systems A variety of statistical information helps administrators assess the performance of computer networks.
Descriptive Analytics
This describes what has happened in the past.
sample
a subset of the popuation
Objective probabilities
are long-run frequencies of occurrence
Discrete data
arise from a counting process. Example: How many courses have you taken at this College? ____
Continuous data
arise from a measuring process. Example: How much do you weigh? ___
A Continuous random variable
can take on any value within an interval.
A Discrete random variable
can take on only specified, distinct values.
cross sectional data
collected at the same or approximately the same point in time Data detailing the number of building permits issued in November 2013 in each of the counties of Ohio.
Time series data
collected over several time periods. Data detailing the number of building permits issued in Lucas County, Ohio, in each of the last 36 months. Graphs of time series data help analysts understand • what happened in the past • identify any trends over time, and • project future levels for the time series
sample survey
collecting data for a sample
Census
collecting data for the entire population
tabular summary
frequency and percent frequency
most common numerical descriptive statistic
mean/average
Subjective probabilities
measure the strengths of personal beliefs. Objective probability refers to stochastic processes.
data mining
methods for developing useful decision making information from large data bases • Using a combination of procedures from statistics, mathematics, and computer science, analysts "mine the data" to convert it into useful information. • The most effective data mining systems use automated procedures to discover relationships in the data and predict future outcomes prompted by general and even vague queries by the user. • The major applications of data mining have been made by companies with a strong consumer focus such as retail, financial, and communication firms. • Data mining is used to identify related products that customers who have already purchased a specific product are also likely to purchase (and then pop-ups are used to draw attention to those related products). • Data mining is also used to identify customers who should receive special discount offers based on their past purchasing volumes. • Statistical methodology such as multiple regression, logistic regression, and correlation are heavily used. • Also needed are computer science technologies involving artificial intelligence and machine learning. • A significant investment in time and money is required as well. • Finding a statistical model that works well for a particular sample of data does not necessarily mean that it can be reliably applied to other data. • With the enormous amount of data available, the data set can be partitioned into a training set (for model development) and a test set (for validating the model). • There is, however, a danger of overfitting the model to the point that misleading associations and conclusions appear to exist. • Careful interpretation of results and extensive testing is important.
Do you Smoke? Yes_____ No_____ What type of data?
nominal
scales of measurement
nominal, ordinal, interval, ratio
Please rank the taste of the following soft drinks from 1 to 5 (1=best, 2=nest best, etc.) ___Coke ___Pepsi ___7Up ___Sprite ___Dr. Pepper What type of data?
ordinal?
How many cigarettes did yo smoke int he last 3 days (72 hours)? What type of data?
ratio
Qualitative data
result in categorical responses. • Labels or names are used to identify an attribute of each element • Often referred to as qualitative data • Use either the nominal or ordinal scale of measurement • Can be either numeric or nonnumeric • Appropriate statistical analyses are rather limited
Quantitative data
result in numerical responses, and may be discrete or continuous In general, there are more alternatives for statistical analysis when the data are quantitative. • Quantitative data indicate how many or how much. • Quantitative data are always numeric. • Ordinary arithmetic operations are meaningful for quantitative data.
The mode
the value of the data that occurs with the greatest frequency.
Predictive Analytics
use models constructed from past data to predict the future or to assess the impact of one variable on another
descriptive statistics
• Most of the statistical information in newspapers, magazines, company reports, and other publications consists of data that are summarized and presented in a form that is easy to understand. • Such summaries of data, which may be tabular, graphical, or numerical, are referred to as descriptive statistics. Example The manager of Hudson Auto would like to have a better understanding of the cost of parts used in the engine tune-ups performed in her shop. She examines 50 customer invoices for tune-ups. The costs of parts, rounded to the nearest dollar, are listed on the next slide.
mean
demonstrates a measure of the central tendency, or central location of the data for a variable. Hudson's mean cost of parts, based on the 50 tune-ups studied, is $79 (found by summing up the 50 cost values and then dividing by 50).
The median
the data value such that half of the observations are larger than it and half are smaller.
Elements
the entities on which data are collected.
scales of measurement graph
the left side of the graph he seemed to be focusing on
The total number of data values in a complete data set is what?
the number of elements multiplied by the number of variables.
data warehousing
the process of capturing, storing, and maintaining data • Organizations obtain large amounts of data on a daily basis by means of magnetic card readers, bar code scanners, point of sale terminals, and touch screen monitors. • Walmart captures data on 20 to 30 million transactions per day. • Visa processes 6,800 payment transactions per second.
Nominal data
the same as Qualitative. It is a classification and consists of categories. When objects are measured on a nominal scale, all we can say is that one is different from the other. Data are labels or names used to identify an attribute of the element. A nonnumeric label or numeric code may be used. Examples: sex, occupation, ethnicity, marital status, student major, etc. Appropriate statistics: mode, frequency, percentage. We cannot use an average. It would be meaningless here. [Try to ask? What is the average SEX in this room? What is the average RELIGION? It makes no sense!] Example. Say we have 20 males and 30 females. These are frequencies. 60% are female. The mode - the data value that occurs most frequently - is 'female'. Now suppose we "code" the data, 1 for male and 2 for female. Can we compute the - um - average sex as (20 x 1 + 30 x 2) / 50 = 1.6? Is the average sex = 1.6? What are the units? 1.6 what? What does 1.6 mean? Students of a university are classified by the school in which they are enrolled using a nonnumeric label such as Business, Humanities, Education, and so on. Alternatively, a numeric code could be used for the school variable (e.g. 1 denotes Business, 2 denotes Humanities, 3 denotes Education, and so on).
Analytics
the scientific process of transforming data into insight for making better decisions
Please rank from 1 to 4 each of the following: ___being hit in the face with a dead rat ___being buried up to your neck in cow manure ___failing this course ___having nothing to eat except for chopped liver for a month What type of data is this?
1 2 4 3 Ordinal
data
Data are the facts and figures collected, analyzed, and summarized for presentation and interpretation. All the data collected in a particular study are referred to as the data set for the study.
Primary data
Data compiled by the researcher.
Secondary data
Data compiled or published elsewhere, e.g., Statistical Abstracts, census data. The trick is to find data that is useful. The data was probably collected for some purpose other than helping to solve the researcher's problem at hand.
Interval Data
Equal intervals, but no "true" zero. The data have the properties of ordinal data, and the interval between observations is expressed in terms of a fixed unit of measure. Interval data are always numeric. Examples: IQ, temperature, GPA. Since there is no true zero - the complete absence of the characteristic you are measuring - you cannot speak about ratios. Example: Suppose New York temperature is 40 degrees and Buffalo temperature is 20 degrees. Does that mean it is twice as cold in Buffalo as in NY? No. Appropriate statistics -same as for nominal, plus -same as for ordinal, plus -the mean Melissa has an SAT score of 1985, while Kevin has an SAT score of 1880. Melissa scored 105 points more than Kevin.
Independent events
Events A and B are independent if knowledge of the occurrence of one has no effect on the probability that the other will occur. Example: P(Blue eyes | Male) = P(Blue eyes)
statistical studies
Experimental • In experimental studies the variable of interest is first identified. Then one or more other variables are identified and controlled so that data can be obtained about how they influence the variable of interest. • The largest experimental study ever conducted is believed to be the 1954 Public Health Service experiment for the Salk polio vaccine. Nearly two million U.S. children (grades 1 through 3) were selected.
ethical guidelines for statistical practice
In a statistical study, unethical behavior can take a variety of forms including: • Improper sampling • Inappropriate analysis of the data • Development of misleading graphs • Use of inappropriate summary statistics • Biased interpretation of the statistical results One should strive to be fair, thorough, objective, and neutral as you collect, analyze, and present data. As a consumer of statistics, one should also be aware of the possibility of unethical behavior by others.
Big Data and what are the Three Vs of Big Data
Large and complex data set Volume: the amount of available data Velocity: speed at which data is collected and processed Variety: different types of data t/f is that one of the vs of big data??
Types of Samples
Nonprobability Samples - based on convenience or judgment -Convenience (or chunk) sample - students in a class, mall intercept -Judgment sample - based on the researcher's judgment as to what constitutes representativeness, e.g., one might say "these 20 stores are representative of the whole chain." -Quota sample - interviewers are given quotas based on demographics for instance, they may each be told to interview 100 subjects - 50 males and 50 females. Of the 50, say, 10 nonwhite and 40 white. -The problem with a nonprobability sample is that we do not know how representative our sample is of the population. Probability Sample. A sample collected in such a way that every element in the population has a known chance of being selected. -One type of probability sample is a Simple Random Sample. This is a sample collected in such a way that every element in the population has an equal chance of being selected. -Question: How do we collect a simple random sample? -Answer: Use a table of random numbers or a random number generator
Joint probability
P(A and B). P(A ∩ B). The probability of events A and B occurring together.
Simple probability:
P(A). The probability that an event (say, A) will occur. Also called a marginal probability.
Conditional probability
P(A|B), read "the probability of A given B." The probability that event A will occur given event B has occurred.
Ratio Data
Ratio data has both equal intervals and a "true" zero. Examples: height, weight, length, units sold Data have all the properties of interval data and the ratio of two values is meaningful. Ratio data are always numerical. Zero value is included in the scale. All scales, whether they measure weight in kilograms or pounds, start at 0. The 0 means something and is not arbitrary. -100 lbs. is double 50 lbs. (same for kilograms) -$100 is half as much as $200 Price of a book at a retail store is $200, while the price of the same book sold online is $100. The ratio property shows that retail stores charge twice the online price.
Sample
That portion of the population that is available, or is to be made available, for analysis. -A good sample is representative of the population. We will learn about probability samples and how they provide assurance that a sample is indeed representative. -The sample size is shown as lower case n. Example. If your company manufactures one million laptops, it might take a sample of say, 500, of them to test quality. The population size is N = 1,000,000 and the sample size is n= 500.
Random variable
That which is observed as the result of a stochastic process. A random variable takes on (usually numerical) values. Associated with each value is a probability that the value will occur.
statistical inference
The process of using data obtained from a sample to make estimates or test hypotheses about the characteristics of a population.
Statistical Inference
The process of using sample statistics to draw conclusions about population parameters. For instance, using X ̄ (based on a sample of, say, n=1000) to draw conclusions about μ (population of, say, 240 million). GE manufactures LED bulbs and wants to know how many are defective. Suppose one million bulbs a year are produced in its new plant in Staten Island. The company might sample, say, 500 bulbs to estimate the proportion of defectives. So, -N = 1,000,000 and n = 500 -If 5 out of 500 bulbs tested are defective, the sample proportion of defectives will be 1% (5/500). This statistic may be used to estimate the true proportion of defective bulbs (the population proportion).
What does the scale determine and indicate
The scale determines the amount of information contained in the data. The scale indicates the data summarization and statistical analyses that are most appropriate.
Population small definition
The set of all elements of interest in a particular study.
Prescriptive Analytics
The set of analytical techniques that yield a best course of action.
observation
The set of measurements obtained for a particular element A data set with n elements contains n observations.
Probability
The word probability is actually undefined, but the probability of an event can be explained as the proportion of times, under identical circumstances, that the event can be expected to occur. It is the event's long-run frequency of occurrence.
Descriptive Statistics
Those statistics that summarize a sample of numerical data in terms of averages and other measures for the purpose of description. This includes the presentation of data in the form of graphs, charts, and tables. Descriptive statistics are not concerned with the theory and methodology for drawing inferences that extend beyond the particular set of data examined. -This includes the presentation of data in the form of graphs, charts, and tables. -Descriptive statistics, as opposed to inferential statistics, are not concerned with the theory and methodology for drawing inferences that extend beyond the particular set of data examined, in other words from the sample to the entire population. All that we care about are the summary measurements such as the average (mean). -Thus, a teacher who gives an exam to a class, of say, 35 students, is interested in the descriptive statistics to assess the performance of the class. What was the class average, the median grade, the standard deviation, etc.? The teacher is not interested in making any inferences to some larger population.
What are three data acquisition considerations and describe them
Time Requirement • Searching for information can be time consuming. • Information may no longer be useful by the time it is available. Cost of Acquisition • Organizations often charge for information even when it is not their primary business activity. Data Errors • Using any data that happen to be available or were acquired with little care can lead to misleading information.
Mutually Exclusive events
Two events are mutually exclusive if they cannot occur together. E.g., male or female; heads or tails.
Population
Universe. The total category under consideration. It is the data which we have not completely examined but to which our conclusions refer. -This is the data which we have not completely examined but to which our conclusions refer. -The population size is usually indicated by a capital N. Examples: -Every lawyer in the United States -All single women in the United States.
variable
a characteristic of interest for the elements.
A probability distribution for a discrete random variable
a mutually exclusive listing of all possible numerical outcomes for that random variable, such that a particular probability of occurrence is associated with each outcome. Some discrete probability distributions: Binomial distribution, Hypergeometric distribution, Poisson distribution.
Ordinal Data
arise from ranking, and the intervals between the points are not equal We can say that one object has more or less of the characteristic than another object when we rank them on an ordinal scale. Thus, a category 5 hurricane is worse than a category 4 hurricane which is worse than a category 3 hurricane, etc. The data have the properties of nominal data, and the order or rank of the data is meaningful. A nonnumeric label or numeric code may be used. Examples: social class, hardness of minerals scale, income as categories, class standing, rankings of football teams, military rank (general, colonel, major, lieutenant, sergeant, etc.), ... Example: Income (choose one) Under $20,000 - checked by, say, John Smith $20,000 - $49,999 - checked by, say, Jane Doe $50,000 and over - checked by, say, Bill Gates In this example, Bill Gates checks the third category even though he earns several billion dollars. The distance between Gates and Doe is not the same as the distance between Doe and Smith. Appropriate statistics: - same as those for nominal data, plus the median; but not the mean. -Ranking scales are obviously ordinal. There is nothing absolute here. -Just because someone chooses a "top" choice does not mean it is really a top choice. Students of a university are classified by their class standing using a nonnumeric label such as Freshman, Sophomore, Junior, or Senior. Alternatively, a numeric code could be used for the class standing variable (e.g. 1 denotes Freshman, 2 denotes Sophomore, and so on).
Please rate each brand of soft drink on the scale indicated: What type of data?
interval
