MGSC 300 Chapter 6
The four main tasks that machine learning applies known rules to include:
Categorizing people or things, predicting likely outcomes or actions based on identified patterns, identifying previously unknown patterns and relationships, and detecting unexpected behaviors
Three ways data can be analyzed using a time-series regression are:
Trend, Rate of Change, and Cycles
4 V's of Big Data:
Volume Variety Velocity Veracity
Big data
a data set that is too large or complex to be analyzed using traditional data processing applications
Modern BI
a more flexible and accessible than traditional BI. The focus of modern BI is to provide visual interactive self-service analytics to improve the speed and quality of decision-making. It includes "embedded BI."
Data science
a multi-disciplinary field that uses domain expertise, scientific methods, programming skills, algorithms and statistics to extract knowledge and insights from structured, semi-structured and unstructured big data sets to predict future behavior and prescribe actions
Predictive modeling
a process that uses data mining and probabilities to forecast outcomes to create a statistical model to predict outcomes
Business Intelligence (BI)
a set of best practices, software, infrastructure and tools to acquire and transform raw highly structured data into actionable insights to help managers at all levels of the organization make informed business decisions
Text mining
a specialized form of data mining that interprets words and concepts in context
Digital dashboard
a static or interactive electronic interface used to acquire and consolidate data across an organization
Data product
a technical function that encapsulates an algorithm and is designed to integrate directly into core applications
Predictive model
based on several factors likely to influence future behavior and predicts at some confidence level the outcome of an event
Geocoding
can convert postal addresses to geospatial data that can then be measured and analyzed
Mashups
combine business data and applications from two or more sources
Descriptive data analytics
create a summary of historical data to yield useful information and possibly prepare the data for future more sophisticated analysis
Goals of Big Data
cut costs, gain market share, establish a data-driven culture, create new ways to innovate and disrupt with technology, accelerate speed of offering new capabilities and services, launch new products and services, and improve processes
Four of the most important tools used in descriptive analytics are:
data mining, data visualization, digital dashboards, and mashups
Rules-based decision-making
decision-making that helps novices make decisions like an expert
Prescriptive data analytics
dedicated to finding the best course of action among various choices given the known parameters
Major components of a dashboard:
design, performance metrics (KPIs), APIs, Access
Linear regression
modeling used to predict the value of a variable that is dependent on the value of one or more other variables. Fits a straight line or surface that minimizes the discrepancies between predicted and actual output values and is used to make data-driven decisions rather than relying on experience and intuition.
Geographic information systems (GIS)
naturally synergistic technology that connects data with geography to understand what belongs where. GIS is not just about mapping data, government, businesses, and individuals find GIS useful in solving everyday problems using geospatial data
Traditional BI
provides managers with an easy to understand "snapshot" of what is happening now and what happened in the past to bring an organization to its current state. It is a relatively unsophisticated data analysis method that uses dashboards, data mashups, and data visualization
Machine learning
scientific algorithms that identify patterns in big data to learn from the data and create insights based on the data
Bounded Rationality
the idea that rationality is limited by the tractability of the decision, cognitive limitations of the mind and time available to make the decision
Heat maps
the most-used tool for representing complex statistical data and use a warm-to-cool color spectrum to show differences in classes of data
Data visualization
the presentation of data in a graphical format to make it easier for decision-makers to grasp difficult concepts or identify new patterns in the data
Optimization
the process of calculating values of variables that lead to an optimal value of the event under investigation
Data analytics
the process of examining data sets to draw conclusions about the information they contain, usually with the help of computer software
Data discovery
the process of using BI to collect data from various databases and consolidate it into a single source that can be easily and instantly evaluated
Predictive data analytics
the process of using data analytics methods and techniques to model and make predictions about unknown events from data
Data mining
the process of using software to analyze unstructured, semi-structured and structured data from various perspectives, categorize them, and derive correlations or patterns among fields in the data
Cognitive computing
the technology that uses machine learning algorithms
7 key attributes of modern BI software:
• Speed • Visualization • Single source of truth • Real-time collaboration • Comprehensive governance • Scalability • Mobility
The most common predictive and prescriptive data analytics tools are:
• Text mining • Spatial data mining • Regression • Optimization and rules-based decision-making • Machine learning
Business value that organizations gain from data mining falls into three categories:
•Making more informed decision at the time they need to be made. •Discovering unknown insights, patterns or relationships. •Automating and streamlining or digitizing business processes.
Time-series regression
A collection of data values over time, performed by plotting a series of well-defined data points and attempting to predict what will happen to it in the future based on measuring the data at consistent time intervals over a specific period of time, such as monthly, quarterly or annually.
Augmented Reality (AR)
The highest level of data visualization currently available - the use of more contemporary 3-D visualization methods and techniques to illustrate the relationships within data including smart mapping, smart routines, machines learning, and natural language processing
