Business Analytics - Chapter 1
nominal
-Categorical -Least sophisticated -Values differ by label or name all we do is categorize or group the data -Example: marital status
ordinal - ranked
-Categorize and rank the data with respect to some characteristic or trait -Reflect labels or name, but can be ranked -Cannot interpret the difference between the ranked values (unable to add or subtract) -Example: reviews from 1 star (poor) to 5 stars (outstanding)
unstructured data
-Do not conform to a pre-defined, row-column format -Textual - written reports, emails, social media posts -Multimedia content - videos, photos -Do not conform to database structures
interval
-Numerical -Categorize and rank, differences are meaningful -Zero value is arbitrary and does not reflect absence of characteristic -Ratios are not meaningful -Example: temperature, height, weight, age
ratio - numerical
-Numerical -strongest level of measurment -A true zero point, reflects absence of characteristic -Ratios are meaningful -Example: profits, dept to GDP, wife to husband income
categorical (fixed) or qualitative
-Represent categories -Labels or names to identify distinguishing characteristics -Arithmetic operations on the labels/values are not meaningful -Coded into numbers for data processing -Example: marital status, gender, race, customer satisfaction, states, stock price $20.37
numerical (what degree you have) or quantitative
-Represent meaningful numbers -Arithmetic operations are meaningful -Discrete: assumes a countable number of values, nothing between the numbers, ex. 28 seats -Example: number of children in a family -Continuous: assumes an uncountable number of values within an interval, infinite numbers, height -Example: investment returns, height, weight, income
structured data
-Reside in a pre-defined, row-column format -Spreadsheet or database applications -Enter, store, query, and analyze -Numerical information that is objective and not open to interpretation historically companies relied most on structured data - high cost to store and process, performance limitations point of sale and financial data sale of retail products, money transfer between bank accounts, student enrollment in a university course
human or machine generated
-Structured human: price, income, retail sales -Structured machine: sensors, speed cameras, web server logs -Unstructured human: email, text, social media, presentations -Unstructured machine: satellite images, video data, camera images
variety
-all types, forms, granularity, structure or unstructured
veracity
-credibility and quality of the data, reliability
Volume
-immense amount of data compiled for a single or multiple sources
values
-methodological plan for formulating questions, curating the right data, and unlocking hidden potential
Big data
A massive volume of structured and unstructured data -Extremely difficult to manage, process, and analyze using traditional data processing tools -Present great opportunities to gain knowledge and game-changing intelligence •"High-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation." big data does not imply complete
Why is analytics important?
Decision making in general increases efficiency and improves service, increased customer satisfaction, and increased sales and profitability Maximization = efficiency
What do we need to perform analysis?
Information / knowledge
BA begins with understanding the business context
ask the right questions - identify the problem identify the appropriate analysis communication information
2 types of variables
categorical and numerical
variable
characteristic of interest differs in kind or degree among various observations (records) then the characteristic can be termed a variable martial status, income
business analytics
combines qualitative reasoning with quantitative tools to identify key business problems and translate data analysis into decisions that improve business performance
data
compilations of facts, figures or other contents, both numerical and nonnumerical
population
consists of all observations or items of interest in an analysis obtaining pop data is expensive impossible to examine every member of the population
knowledge
derived from a blend of data, contextual information, experience and intuition
3 types of analytics techniques
descriptive, predictive, prescriptive
fixed-width format
each column starts and ends at the same place in every row like excel document
-Unstructured human:
email, text, social media, presentations
First step for making decisions
find the right data and prepare it -Compilation of facts, figures, or other content -Numerical and non-numerical -All types and formats are generated from multiple sources -Often we have a large amount of data -Even small data can give insights •Data that have been organized, analyzed, and processed in a meaningful and purposeful way become information. •Use a blend of data, contextual information, experience, and intuition to derive knowledge.
velocity
generated at a rapid speed, management is a critical issue
Improvements in BA
growing availability of vast amounts of data improved computational power development of sophisticated algorithms colleges have classes emphasizing BA
information
is a set of data that are organized and processed in a meaningful and purposeful way
sample
is a subset of the population we examine sample data to make inferences about the population
time series data
multiple times collected over several time periods focusing on certain groups of people, specific events, or objects
4 major scales of measurement
nominal, ordinal, interval, ratio (NOIR)
cross-sectional data
one time recording a characteristic of many subjects at the same point in time
delimited format
piece of data is separated by a comma
-Structured human:
price, income, retail sales
-Unstructured machine:
satellite images, video data, camera images
-Structured machine:
sensors, speed cameras, web server logs
hyper text makrup language (HTML)
simple text based markup language for displaying content in web browsers
eXtensible markup language (XML)
simple text- based markup language for representing structured data uses user-defined markup tags to specify the structure of the data
java script object notation (JSON)
standard for transmitting human-readable data in compact files
3 characteristics of big data
volume, velocity, variety additional ones - veracity, values
Predictive analytics
what could happen in the future? likelihood of specific outcome, what grade you are likely to get in the class -Use historical data to make predictions -Analytical models help identify associations -Associations used to estimate the likelihood of a favorable outcome -Commonly considered advanced predictions -Build models that help an organization understand what might happen in the future -Use statistics and data mining •Examples -Identifying customers who are most likely to respond to specific marketing campaigns -Transactions that are likely to be fraudulent -Incidence of crime at certain regions and times
Descriptive analytics
what has happened? financial reports, health care stats, student report, something already known gather, organize, tabulate, visualize, summarize can be presented in number format - tables, graphs referred to as business intelligence
Prescriptive Analytics
what should we do? suggest action to meet goals -Optimization and simulation algorithms to provide advice -Explore several possible actions -Suggest course of action -Commonly considered advanced predictions -Build models that help an organization understand what might happen in the future -Use statistics and data mining •Examples -Scheduling employees' works hours -Select a mix of products to manufacture -Choose an investment portfolio
sample
•subset of the population and is used for analyses. Traditional statistical techniques use sample information to draw conclusions about the population