BISM2202
What are the key policy areas for data governance?
- Data ownership - Data administration
What are the classifications of data analytics?
- Descriptive - Predictive - Prescriptive
What to look for in a dashboard?
- highlighting data and exceptions that require action - easy to use, don't need much training to use - combining data from different sources - enabling drill-down or drill-through and filtering - being dynamic (connected to data in a live fashion) - requiring little coding
What are the principles in managing data?
1. Need to manage data is permanent 2. Data can exist at several levels with an organisation 3. Application software should be separate from the database 4. Application software can be classified by how it treats data 5. Application software should be considered disposable 6. Data should be captured once 7. There should be strict data standards
What are the 3 types of performance dashboards?
1. Operational dashboards 2. Tactical dashboards 3. Strategic dashboards
What are the two methods of data reduction?
1. Variables (dimensional reduction, variable selection) 2. Cases/samples (sampling, balancing/stratisfiction)
Give a taxonomy of data mining tasks
3 main types: Prediction (classification, regression) [supervised learning method] Association aka "market basket" (link analysis, sequence analysis) [unsupervised learning method] Clustering (outlier analysis) [unsupervised learning method]
What is data?
A collection of facts obtained through experiments, observations, etc.
What is the primary source of accuracy estimation for classification problems?
A confusion matrix
What is usually the source of data for Data Mining?
A consolidated data warehouse (not always - can be unstructured data)
What provides visual displays of important information that is consolidated and arranged on a single screen so that information can be digested at a single glance and easily drilled in and further explored?
A dashboard
What is a data warehouse?
A large data storage facility containing data on major aspects of the enterprise. A physical repository where relational data are specially organised to provide enterprise-wide, cleansed data in a standardised format. A collection of integrated, subject-oriented database designed to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time.
What is a performance dashboard?
A multilayered application built on a business intelligence and data integration infrastructure that enables organisations to measure, monitor and manage business performance more . Commonly used in BPM software suites and BI platforms
Where are data standards kept?
A standards database called the data dictionary or repository
What is a data cube?
A two-, three-, or higher dimensional object where each dimension represents a measure of interest
What is online analytical processing (OLAP)?
An information system that enables the user, while at a PC, to query the system, conduct an analysis and so on. The result is generated in seconds It is a direct decision support method.
What is a report?
Any communication artifact prepared to convey specific information
What are the four patterns DM can extract from data?
Association Prediction Cluster (segmentation) Sequential (or time series) relationships
Why is data visualisation needed?
Because data is so voluminous, there is a need for visual tools to help people understand it.
What are the outcomes of prescriptive analytics?
Best possible business decisions/transactions
Who maintains the integrity of the database and quality of the data?
Both DA and DBA
Name a popular decision tree algorithm
CHAID (chair with a d)
What are the most common standard processes for DM?
CRISP-DM SEMMA (sample, explore, modify, model and assess) KDD (knowledge discovery in databases)
T! Why should storytelling be a part of your reporting and data visualisation?
Central idea of business reporting is to tell a story. Stories bring life to facts and data.
What are the five data mining goals?
Classifications -> definite target in mind Clusters -> no definite target Regressions -> definite target but target variable is continuous Sequences -> e.g. series of courses a uni student might take Forecasting -> estimating future value of some variable
In the CRISP-DM process, which step accounts for ~85% of project time?
Cleaning up the data i.e. data preparation
A _______ can help a decision maker sketch out the important qualitative factors and their causal relationships in a messy decision-making situation.
Cognitive map
Rules of thumb for presenting data visually?
Colour should be used to call attention to specific values to differentiate categorical variables Use colour intentionally (use of colour should be restrained) Say what you mean
What do pie charts do?
Compare categorical data
What are the five main methodologies of models (how data will be collected and processed)
Complete enumeration -> applying neural nets Algorithmic -> nurses to shifts Heuristic -> "good enough" solution Simulations -> e.g. traffic Analytical -> e.g. regression
Fundamental difference between data quality and data integrity?
Concept of data integrity to do with database technology. Wider concept of data quality is rooted in the business.
Limitations of multidimensionality?
Consumes lots of system resources Costs a lot More complex interfaces Harder to perform maintenance
How has the evolution of data science occurred?
DSS (decision support systems) -> EIS (executive information systems) -> BI (business intelligence) -> Analytics -> big data
What are 3 notable tools used in the visual component of BA?
Dashboards GIS OLAP
What is the lowest level of abstraction (from which information and knowledge is derived)?
Data
What parts make up the business intelligence framework?
Data Models Knowledge User interface
What are the stages of a business report?
Data acquisition -> information generation (reporting) -> decision making -> process management
What is a DA
Data administrator (or data administration unit) - decides on standards within administration - more high level, executive position than DBA
What are the stages of data preprocessing?
Data consolidation Data cleaning Data transformation Data reduction
What are the steps in data preparation (possibly not that important)
Data consolidation -> data cleaning -> data transformation -> data reduction
What are some common data mining myths?
Data mining... -provides instant solutions/predictions -not yet viable for business applications -requires a separate, dedicated database -only for large firms with lots of customer data -can only be done by those with advanced degrees
What are some classification algorithm/techniques?
Decision trees Neural networks Statistical analysis Support vector machines Case-based reasoning Baynesian classifiers Genetic algorithms Rough sets
What DM algorithm/technique employs the divide and conquer method?
Decision trees - recursively divides a training set until ...
What is a descriptive model?
Describes things as they are Investigating consequences of various courses of action No guarantee a solution is optimal - but will often be GOOD ENOUGH
What is business intelligence (narrow definition)?
Descriptive analytics tools and techniques (i.e. reporting tools)
How is classification different to clutstering?
Different because categorical is supervised learning
How is classification different to regression?
Different because classification has categorical output
What are the 3 components of multidimensional presentation?
Dimensions Measures Time
What is the fundamental challenge of dashboard design?
Displaying all the required information on a single screen, clearly and without distraction, in a matter that can be assimilated quickly
What does a scatter chart matrix do?
Displays multiple variables
What are some functions of a report?
Ensure proper departmental functioning Provide information Provide results of an analysis Persuade others to act Create an organisational memory
For those executives who don't have the time to go through lengthy reports, the best alternative is the ___________
Executive summary
What is ETL?
Extraction, transfer and load = the programs you use to transfer the data from legacy databases to data warehouses
What does association rule mining do?
Finds interesting relationships (affinities) between variables (items or events) - diapers & beers
What is data quality?
Fitness for purpose - as defined by the business users of the data and conformance to enterprise data quality standards
What is data capture?
Gathering data and populating databases
What is a dashboard type report?
Graphical presentation of several performance indicators in a single page using dials/gauges
What is a metric management report?
Help manage business performance through metrics (SLAs for internals and KPIs for externals)
What are the specialised charts and graphs?
Histogram Gantt chart PERT chart Geographic map Bullet graph Heat map/tree map Highlight table
Difference between bar chart and histogram?
Histogram has no gaps and number ranges on x-axis Bar chart has gaps and categories on x-axis
Decisions are often made by ___________ especially at lower managerial levels and in small organisations.
Individuals
Decision making that introduces too much information may lead to a condition known as ______
Information overload
T! What is the difference between information visualisation and visual analytics?
Information visualisation - descriptive and closely associated with business intelligence (reports, dashboards, scorecards, etc) VA - combines visualisation with predictive analytics - more predictive and closely associated with business analytics (forecasting, segmentation and correlation analysis)
What is visual analytics?
Information visualisation(descriptive) + predictive analytics(predictive) There is a strong move toward VA
What are the inputs and outputs for association rule mining
Input - simple point-of-sale transaction data Output - most frequent affinities among items
What is a new direction in data visualisation?
Integrating data visualisation with decision support tools/applications Intelligent visualisation - includes data interpetation
What is a popular cluster/outlier analysis algorithm?
K-means
T! Main differences between line, bar and pie charts? When to use one over the other?
Line - good for time series data Bar - good for nominal or numeric data that can be easily categorised Pie - good for depicting proportions (don't use if high number of categories)
What are the basic charts and graphs?
Line chart Bar chart Pie chart Scatter plot Bubble chart
What is a popular regression algorithm?
Linear regression
There is a continuous flow of activity from one phase to the next in a decision making process, but at any phase there may be a return to a previous phase. ________ is an essential part of this process.
Modelling
What are the 3 levels/layers of information that needs to be displayed on a dashboard?
Monitoring Analysis Management
What do you think is the "next big thing" in data visualisation?
More 3D visualisation Virtual reality environment - immersive experience Holographic visualisations
What is satisficing?
Most humans will settle for a good enough solution - limited capacity for rational thinking (bounded rationality). Tradeoff: time and cost of searching for an optimum vs. value of obtaining one
What is data transfer?
Move data from one database to another or otherwise bring data together
Common components/characteristics of business reporting systems?
OLTP Data supply (volume, variety, velocity...) ETL Data storage Business logic Publication medium
Dashboards are used for monitoring, analysis and management. Which data are most useful at the management layer?
Operational data that can identify what actions to take to resolve a problem (management concerned with operations)
In the design phase of decision making, selecting a principle of choice or criteria means that _________________________________________________.
Optimality is not the only criterion for acceptable solutions (satisficing)
Which type of visualisation tool is best used to show relative proportions of dollers per department allocated by a university department?
Pie chart
What is data visualisation?
Presentation of data and the results of data analysis
Why might DM be questioned as art or science?
Process is highly repetitive and experimental
What are the three main goals of maintaining data integrity?
Protecting existence Maintaining quality Ensuring confidentiality
Name some design principles in visualising data for reports
Ratio between data and ink Colour ?Pre-attentive attributes? etc.
What are the "enablers" of descriptive analytics?
Reporting Visualisation BPM (business process management) also: dashboards scorecards data warehousing
What are the dimensions of models?
Representation -> objective vs subjective Time dimension Linearity of relationship Deterministic vs stochastic Descriptive vs. normative Causality vs correlation Methodology dimension
What is an estimation methodology for classification?
Simple split: training data ~70% testing data ~30%
Examples of descriptive models?
Simulation, what-if analysis, COGNITIVE MAPPING
What do decision tree algorithms mainly differ on?
Splitting criteria Stopping criteria Pruning (generalisation method)
Types of models in decision making (not talking about normative or descriptive here)
Statistical models (e.g. regression analysis, ANOVA) Accounting models (e.g. depreciation models) Personnel models (e.g. role playing) Marketing models (e.g. advertising strategy product switch models)
T! What are the main categories of data? What types of data can we use for BI and analytics?
Structured and unstructured Both can be used - easier to use structured
Difference between supervised and unsupervised learning?
Supervised - training data contains both the descriptive attributes (independent variables) as well as the class attribute (output variable). Unsupervised - only has descriptive attribute
What is structured data?
Targeted for computers to process Numeric vs categorical
What is unstructured data?
Targeted for humans to digest
What is supervised learning?
Tell it what you're looking for
What is the format of a business report?
Text + tables + graphs/charts
What is data integrity?
The degree to which attributes of data associated with a specific occurrence of a given entity accurately describe that occurrence of the entity.
T! Where does the data for business analytics come from?
The internet and social media Business processes and systems Machines The internet of things
What is the definition of data mining?
The nontrivial process of identifying valid, novel potentially useful and ultimately understandable patters in data stored in a structured database.
What do variables have to do with modelling?
The process of modelling involves determining (usually mathematical) relationships between variables
T! What is prescriptive analytics? What kind of problems can be solved by prescriptive analytics?
The use of descriptive data and forecasts to identify the optimal decisions to maximise performance. Businesses can use to solve problems e.g. how much of a good to produce, what cost to charge, identify best locations for a store.
T! What is predictive analytics? How can organisations employ predictive analytics?
The use of statistical techniques and data mining to determine what is likely to happen in the future. Businesses can use predictive analytics to forecast e.g. what customers will buy, how they will respond to a business decision, whether a customer is creditworthy.
!What is descriptive analytics? What are the various tools employed?
The use of statistical techniques to present current or past circumstances in an understandable way and identify underlying causes. Tools include - visualisation, reporting, BPM, data warehouses, dashboards, scorecards.
What is the output variable for association rule mining?
There is none
Name 3 benefits of BI
Time savings Improved customer service Increased revenue Improved decision making Cost savings Faster, more accurate reporting
What is the purpose of a managerial report?
To improve managerial decisions
What are the two types of data
Unstructured vs structured
Is association rule mining supervised or unsupervised?
Unsupervised
What is cluster analysis used for and how does it work?
Used for automatic identification of natural groupings of things It learns the clusters of things from past data, then assigns new instances
What is usually the Data Mining environment?
Usually a client-server or web-based information systems architecture
What are the V's that define big data
VOLUME, VARIETY, VELOCITY and veracity, variability, value (proposition?)
What is another word for 'dimensions' in multidimensional presentation?
Variables
What are the outcomes of descriptive analytics?
Well-defined business problems and opportunities
What questions do descriptive analytics deal with?
What happened? What is happening?
What questions do prescriptive analytics deal with?
What should I do? Why should I do it?
What questions do predictive analytics deal with?
What will happen? Why will it happen?
When would cluster analysis be an appropriate data mining technique?
When the data records do not have predefined class identifiers
T! Why would you use a geographic map? What other types of charts can be combined with a geographic map?
When the data set contains location data Pie charts can!
Who is the inventor of the modern chart?
William Playfair
What is the generic rule for association rule mining?
X -> Y[S%, C%] where X, Y are products/services S - support: how often X and Y go together C - confidence: how often Y go together with X
What does a clustered column chart do?
alternative to stacked column chart
What is a popular link analysis algorithm?
apriori algorithm
What is a popular sequence analysis algorithm?
apriori algorithm
What type of variable is the output variable for classification data mining?
categorical (nominal or ordinal)
What does a stacked column chart do?
compare relative values of quantitative variables for the same category, in a bar chart
What do you look for for a good rule in a database?
confidence + support
What are some applications of association rule mining in business?
cross-marketing, cross-selling, store design , catalog design
What are the two main types of DM
hypothesis-driven DM (target in mind) discovery-driven DM (no target in mind)
How does classification work?
learn from past data, classify new data
Visual analytics is widely regarded as the combination of visualisation and _______ analytics
predictive
What are some applications of association rule mining in medicine?
relationships between symptoms and illness - diagnosis
What is the output variable for cluster analysis
there isn't one
What do heat maps do?
two-dimensional, uses shades of colour to indicate magnitude
What is a data lake
unstructured data storage technology for big data
Is cluster analysis supervised or unsupervised
unsupervised
What do bubble charts do?
visualise three variables in a two-dimensional graph (good alternative to 3d)
What is a petabyte?
10^15 bytes
What is a business report?
A written document that contains information regarding business matters
What are the outcomes of predictive analytics?
Accurate projections of future states/conditions
What is business intelligence (broad definition)?
An umbrella term that combines architectures, tools, databases, applications and methodologies
What is application independence?
Application software should be separate from the database
What are the "enablers" of prescriptive analytics?
Automated decision making Knowledge management Collaborative systems also: optimisation simulation decision modelling expert systems
Under structured data, what are the types of data?
Categorical (nominal, ordinal) Numerical (ratio, interval)
What is the source of a business report?
Data from inside and outside the organisation (via the use of ETL)
What are the "enablers" of predictive analytics?
Data mining Text mining Web analytics also: forecasting
What is big data?
Data that can't be stored or processed easily by traditional means
What are the challenges of big data analytics?
Data volume Data integration Processing capabilities Data governance Skill availability Solution cost
What are BI's architecture and components?
Data warehouse Business analytics Automated decision systems Performance and strategy Possibly also: Data sources
What are the metrics for analytics-ready data?
Data.... Source reliability Content accuracy Accessibility Security and privacy Richness Consistency Currency/timeliness Granularity Validity and relevancy
What is a DBA
Database administrator - manages an organisation's electronic databases - more technical position
What is a multidimensional database?
Database which supports multidimensional analysis
What variables are involved in modelling?
Decision variables - describe the alternatives from which a manager can choose e.g. how many cars to buy Result variables - describe the objective of the decision making problem e.g. sales, revenue, profit Uncontrollable variables - parameters that describe the environment e.g. economic conditions
What is the DBMS responsible for?
Defining, creating, modifying, deleting and reading data in an information system. Enforcing integrity constraints
What are the areas in which control activities must be taken to ensure data integrity?
Definition control Existence control Access control Update control Concurrency control Quality control
What is master data management (MDM)?
Disciplines, technologies and methods to ensure the currency, meaning and quality of reference data within and across subject areas. Simply, it's a way of managing the single version of the truth.
What is it called when you turn numerical data into categorical?
Discretisation
What are the 5 types of data standards a business must establish?
Identifier Naming Definition Integrity rule Usage rights
The _______ of a proposed solution to a problem is the initiation of a new order of things or the introduction of change.
Implementation
What are the enablers of big data analytics?
In-memory analytics In-database analytics Grid computing & MPP Appliances
What is the distribution of a business report?
In-print, email, portal/intranet
What is a balanced scorecard type report?
Include financial, customer, business process, and learning & growth indicators
T! You are about to buy a car. Using Simon's four-phase model, describe your activities at each step.
Intelligence - recognise you need a better car Design - determine parameters that describe the appropriate car to buy Choice - choose the car Implementation - purchase the car
What was Herbert Simon's model of decision-making he developed in the 1970s?
Intelligence -> Design -> Choice -> Implementation
What forms does big data come in?
Many forms Large, structured, unstructured, continuous
What are the types of business reports?
Metric management reports Dashboard type reports Balanced scorecard type reports
What are the types of models?
Normative Descriptive Heuristic
What form of decision theory assumes that decision makers are rational beings who always seek to strictly maximise economic goals?
Normative decision theory
Difference between OLAP and OLTP?
OLTP concentrates on processing repetitive transactions in large quantities and conducting simple manipulations. OLAP involves examining many data items, analysing complex relationships, looking for patterns, trends and exceptions
What is a normative model?
One in which the chosen alternative is demonstrably the best of all possible alternatives. Assume decision makers are rational - optimisation - rationalistation - suboptimisation
What is a model?
Representations of systems or problems (i.e. reality) with varying degrees of abstraction. -> Simplification through assumptions
T! You are about to sell your car. What principles of choice are you most likely to use in deciding whether to reject or accept offers? Why?
Satisficing - you cannot optimise so you must set an aspiration level and accept anything that exceeds it.
_________ is a study of the effects of a change in one or more input variables on a proposed solution
Sensitivity analysis
What types of unstructured data are there?
Textual Multimedia (image, audio, video) XML/JSON [semi-structured]
What is multidimensionality?
The ability to organise, present and analyse data by multiple dimensions