Chapter 1: Data and Statistics
field of statistics
The branch of science that is concerned with making decisions (usually involving uncertainty)
prescriptive analytics
The set of analytical techniques that yield a best course of action.
sample survey
a method of collecting data for a sample. only part of the total population is selected ex. -one class of 90 ACMS 10145 students is randomly selected -every student in the class fills out a survey -this set of sample data can be summarized to estimate population parameter values for all ACMS 10145 students
population
a set of units (usually people, objects, transactions, or events) that we are interested in studying ex. -all u.s. males ages 18-35 -all sales made by the ND bookstore this week -all toyota camry vehicles manufactured in 2016
sample
a subset of the units of a population ex. -10,000 males ages 18-35 randomly selected from U.S. department of motor vehicles records -ND bookstore customer sales for every twentieth customer this week (systematic sample) -a random sample of 500 2016 toyota camrys
population parameter
a summary of the variable for the entire population (fixed values)
experimental unit
an object (person, thing, event, transaction, etc.) upon which we collect data
nominal
cannot be ordered ex. 1=christian, 2=jewish, so on cannot be ordered with meaning
interval
cannot make a logical statement about the ratio that makes sense
cross sectional data
collected from multiple subjects at the same time or at approximately the same point in time ex, midterm 1 scores for all sections of ACMS 10145 combined
time series data
collected over several time periods ex. average gas price per gallon from march 2006 to july 2009
census
collecting data for all units in the entire population. gathers information from every entity in a population. As a result, data is accurately representative of the whole population and detailed data can be made available right down to small areas
predictive analytics
consists of analytical techniques that use models constructed from past data to predict the future or to assess the impact of one variable on another
ordinal
data can be ordered ex. 1=dissatisfaction, 2-somewhat satisfied, 3=highly satisfied are ordered highest to lowest
descriptive analytics
encompasses the set of techniques that describes what has happened in the past
elements
entities on which data are collected
data
facts and figures collected, analyzed, and summarized for presentation and interpretation (everything)
Example: Suppose we are considering the temperatures in South Bend and Las Vegas. On a given day, the temperature in South Bend was 51 degrees Fahrenheit. In Las Vegas, the temperature was 102 degrees Fahrenheit. Does it make sense to say it is "twice as hot" in Las Vegas than South Bend? Is temperature interval or ratio?
nom interval, no meaningful zero
scales of measurment
nominal, ordinal, interval, ratio
statistics
numerical data such as averages, medians, percentages, and index numbers that help us understand a variety of business and economic situations; also refers to the art and science of collecting, analyzing, presenting, and interpreting data
descriptive statistics
numerical data used to measure and describe characteristics of groups. Includes measures of central tendency and measures of variation.
quantitative data
numerical data which can be manipulated mathematically in a meaningful way -can be interval or ratio
there are 32 rows in the dataset; one row corresponds to each NFL team. Thus we say there are 32 ____________ in the dataset
observations/elements
observational study
observes individuals and measures variables of interest but does not attempt to influence the responses
there are 9 ______ (yds,) recorded for each team
variables
Example: On the same day that temperatures were measured, the numbers of people who traveled through the South Bend and Las Vegas airports were also measured. Las Vegas had 132,800 passengers, while South Bend had 2,656 passengers. Does it make sense to say that "fifty times as many" passengers traveled through the Las Vegas airport compared to the South Bend airport? Is the number of passengers, interval or ratio?
yes, ratio, able to make statements about ratio about two individual values of a data set means you can take ratio itself of values and it has meaning. zero has meaning
ratio
you can make a logical statement about the ratio between two data values ex. six times or half as much
categorical data
(sometimes called "qualitative data") may be grouped into specific categories. such data is often coded numerically. either nominal or ordinal ex. zipcodes (not a number that can be summarized as as average nominal
variable
-A symbol used to represent a quantity that can change -characteristic of interest for the elements (name of what we're measuring)
why do we need statistics?
-census are usually not practical -saves money and time -helps us better understand the population we're interested in -a random sample can give accurate information about the population
what types of summary measures do we typically want to estimate
-mean -median -range -mode -standard deviation -IQR -variance -outliers -proportions
experiment
A research method in which an investigator manipulates one or more factors to observe the effect on some behavior or mental process. conducted under controlled conditions
economics
The study of how people seek to satisfy their needs and wants by making choices -economics use statistical information in making forecasts about the future of the economy or some aspect of it -using historical data, an advisor can determine whether a stock is under or over value Economists frequently provide forecasts about the future of the economy or some aspect of it. They use a variety of statistical information in making such forecasts. For instance, in forecasting inflation rates, economists use statistical information on such indicators as the Producer Price Index, the unemployment rate, and manufacturing capacity utilization. Often these statistical indicators are entered into computerized forecasting models that predict inflation rates.
production
Today's emphasis on quality makes quality control an important application of statistics in production. A variety of statistical quality control charts are used to monitor the output of a production process. In particular, an x-bar chart can be used to monitor the average output. Suppose, for example, that a machine fills containers with 12 ounces of a soft drink. Periodically, a production worker selects a sample of containers and computes the average number of ounces in the sample. This average, or x-bar value, is plotted on an x-bar chart. A plotted value above the chart's upper control limit indicates overfilling, and a plotted value below the chart's lower control limit indicates underfilling. The process is termed "in control" and allowed to continue as long as the plotted x-bar values fall between the chart's upper and lower control limits. Properly interpreted, an x-bar chart can help determine when adjustments are necessary to correct a production process
accounting
planning, recording, analyzing, and interpreting financial information -public accounting firms use statistical sampling procedures when conducting audits for their clients -saves time and money -a representative, random sample gives accurate results -use past and current data to make accurate predictions (forecast) Public accounting firms use statistical sampling procedures when conducting audits for their clients. For instance, suppose an accounting firm wants to determine whether the amount of accounts receivable shown on a client's balance sheet fairly represents the actual amount of accounts receivable. Usually the large number of individual accounts receivable makesreviewing and validating every account too time-consuming and expensive. As common practice in such situations, the audit staff selects a subset of the accounts called a sample. After reviewing the accuracy of the sampled accounts, the auditors draw a conclusion as to whether the accounts receivable amount shown on the client's balance sheet is acceptable.
marketing
the activity, set of institutions, and processes for creating, communicating, delivering, and exchanging offerings that have value for customers, clients, partners, and society at large Electronic scanners at retail checkout counters collect data for a variety of marketing research applications. For example, data suppliers such as ACNielsen and Information Resources, Inc., purchase point-of-sale scanner data from grocery stores, process the data, and then sell statistical summaries of the data to manufacturers. Manufacturers spend hundreds of thousands of dollars per product category to obtain this type of scanner data. Manufacturers also purchase data and statistical summaries on promotional activities such as special pricing and the use of in-store displays. Brand managers can review the scanner statistics and the promotional activity statistics to gain a better understanding of the relationship between promotional activities and sales. Such analyses often prove helpful in establishing future marketing strategies for the various products.
finance
the management of large amounts of money, especially by governments or large companies. -financial advisors use price-earnings ratios and dividend yields to guide their investment advice Financial analysts use a variety of statistical information to guide their investment recommendations. In the case of stocks, analysts review financial data such as price/earnings ratios and dividend yields. By comparing the information for an individual stock with information about the stock market averages, an analyst can begin to draw a conclusion as to whether the stock is a good investment. For example, The Wall Street Journal (June 6, 2015) reported that the average dividend yield for the S&P 500 companies was 2%. Microsoft showed a dividend yield of 1.95%. In this case, the statistical information on dividend yield indicates a lower dividend yield for Microsoft than the average dividend yield for the S&P 500 companies. This and other information about Microsoft would help the analyst make an informed buy, sell, or hold recommendation for Microsoft stock.
statistical inference
the process of using data obtained from a sample to make estimates and test hypothesis about the characteristics of a population
analytics
the scientific process of transforming data into insight for making better decisions
observation
the set of measurements obtained for a particular element (more than one thing)
