BISM2202

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

What are the key policy areas for data governance?

- Data ownership - Data administration

What are the classifications of data analytics?

- Descriptive - Predictive - Prescriptive

What to look for in a dashboard?

- highlighting data and exceptions that require action - easy to use, don't need much training to use - combining data from different sources - enabling drill-down or drill-through and filtering - being dynamic (connected to data in a live fashion) - requiring little coding

What are the principles in managing data?

1. Need to manage data is permanent 2. Data can exist at several levels with an organisation 3. Application software should be separate from the database 4. Application software can be classified by how it treats data 5. Application software should be considered disposable 6. Data should be captured once 7. There should be strict data standards

What are the 3 types of performance dashboards?

1. Operational dashboards 2. Tactical dashboards 3. Strategic dashboards

What are the two methods of data reduction?

1. Variables (dimensional reduction, variable selection) 2. Cases/samples (sampling, balancing/stratisfiction)

Give a taxonomy of data mining tasks

3 main types: Prediction (classification, regression) [supervised learning method] Association aka "market basket" (link analysis, sequence analysis) [unsupervised learning method] Clustering (outlier analysis) [unsupervised learning method]

What is data?

A collection of facts obtained through experiments, observations, etc.

What is the primary source of accuracy estimation for classification problems?

A confusion matrix

What is usually the source of data for Data Mining?

A consolidated data warehouse (not always - can be unstructured data)

What provides visual displays of important information that is consolidated and arranged on a single screen so that information can be digested at a single glance and easily drilled in and further explored?

A dashboard

What is a data warehouse?

A large data storage facility containing data on major aspects of the enterprise. A physical repository where relational data are specially organised to provide enterprise-wide, cleansed data in a standardised format. A collection of integrated, subject-oriented database designed to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time.

What is a performance dashboard?

A multilayered application built on a business intelligence and data integration infrastructure that enables organisations to measure, monitor and manage business performance more . Commonly used in BPM software suites and BI platforms

Where are data standards kept?

A standards database called the data dictionary or repository

What is a data cube?

A two-, three-, or higher dimensional object where each dimension represents a measure of interest

What is online analytical processing (OLAP)?

An information system that enables the user, while at a PC, to query the system, conduct an analysis and so on. The result is generated in seconds It is a direct decision support method.

What is a report?

Any communication artifact prepared to convey specific information

What are the four patterns DM can extract from data?

Association Prediction Cluster (segmentation) Sequential (or time series) relationships

Why is data visualisation needed?

Because data is so voluminous, there is a need for visual tools to help people understand it.

What are the outcomes of prescriptive analytics?

Best possible business decisions/transactions

Who maintains the integrity of the database and quality of the data?

Both DA and DBA

Name a popular decision tree algorithm

CHAID (chair with a d)

What are the most common standard processes for DM?

CRISP-DM SEMMA (sample, explore, modify, model and assess) KDD (knowledge discovery in databases)

T! Why should storytelling be a part of your reporting and data visualisation?

Central idea of business reporting is to tell a story. Stories bring life to facts and data.

What are the five data mining goals?

Classifications -> definite target in mind Clusters -> no definite target Regressions -> definite target but target variable is continuous Sequences -> e.g. series of courses a uni student might take Forecasting -> estimating future value of some variable

In the CRISP-DM process, which step accounts for ~85% of project time?

Cleaning up the data i.e. data preparation

A _______ can help a decision maker sketch out the important qualitative factors and their causal relationships in a messy decision-making situation.

Cognitive map

Rules of thumb for presenting data visually?

Colour should be used to call attention to specific values to differentiate categorical variables Use colour intentionally (use of colour should be restrained) Say what you mean

What do pie charts do?

Compare categorical data

What are the five main methodologies of models (how data will be collected and processed)

Complete enumeration -> applying neural nets Algorithmic -> nurses to shifts Heuristic -> "good enough" solution Simulations -> e.g. traffic Analytical -> e.g. regression

Fundamental difference between data quality and data integrity?

Concept of data integrity to do with database technology. Wider concept of data quality is rooted in the business.

Limitations of multidimensionality?

Consumes lots of system resources Costs a lot More complex interfaces Harder to perform maintenance

How has the evolution of data science occurred?

DSS (decision support systems) -> EIS (executive information systems) -> BI (business intelligence) -> Analytics -> big data

What are 3 notable tools used in the visual component of BA?

Dashboards GIS OLAP

What is the lowest level of abstraction (from which information and knowledge is derived)?

Data

What parts make up the business intelligence framework?

Data Models Knowledge User interface

What are the stages of a business report?

Data acquisition -> information generation (reporting) -> decision making -> process management

What is a DA

Data administrator (or data administration unit) - decides on standards within administration - more high level, executive position than DBA

What are the stages of data preprocessing?

Data consolidation Data cleaning Data transformation Data reduction

What are the steps in data preparation (possibly not that important)

Data consolidation -> data cleaning -> data transformation -> data reduction

What are some common data mining myths?

Data mining... -provides instant solutions/predictions -not yet viable for business applications -requires a separate, dedicated database -only for large firms with lots of customer data -can only be done by those with advanced degrees

What are some classification algorithm/techniques?

Decision trees Neural networks Statistical analysis Support vector machines Case-based reasoning Baynesian classifiers Genetic algorithms Rough sets

What DM algorithm/technique employs the divide and conquer method?

Decision trees - recursively divides a training set until ...

What is a descriptive model?

Describes things as they are Investigating consequences of various courses of action No guarantee a solution is optimal - but will often be GOOD ENOUGH

What is business intelligence (narrow definition)?

Descriptive analytics tools and techniques (i.e. reporting tools)

How is classification different to clutstering?

Different because categorical is supervised learning

How is classification different to regression?

Different because classification has categorical output

What are the 3 components of multidimensional presentation?

Dimensions Measures Time

What is the fundamental challenge of dashboard design?

Displaying all the required information on a single screen, clearly and without distraction, in a matter that can be assimilated quickly

What does a scatter chart matrix do?

Displays multiple variables

What are some functions of a report?

Ensure proper departmental functioning Provide information Provide results of an analysis Persuade others to act Create an organisational memory

For those executives who don't have the time to go through lengthy reports, the best alternative is the ___________

Executive summary

What is ETL?

Extraction, transfer and load = the programs you use to transfer the data from legacy databases to data warehouses

What does association rule mining do?

Finds interesting relationships (affinities) between variables (items or events) - diapers & beers

What is data quality?

Fitness for purpose - as defined by the business users of the data and conformance to enterprise data quality standards

What is data capture?

Gathering data and populating databases

What is a dashboard type report?

Graphical presentation of several performance indicators in a single page using dials/gauges

What is a metric management report?

Help manage business performance through metrics (SLAs for internals and KPIs for externals)

What are the specialised charts and graphs?

Histogram Gantt chart PERT chart Geographic map Bullet graph Heat map/tree map Highlight table

Difference between bar chart and histogram?

Histogram has no gaps and number ranges on x-axis Bar chart has gaps and categories on x-axis

Decisions are often made by ___________ especially at lower managerial levels and in small organisations.

Individuals

Decision making that introduces too much information may lead to a condition known as ______

Information overload

T! What is the difference between information visualisation and visual analytics?

Information visualisation - descriptive and closely associated with business intelligence (reports, dashboards, scorecards, etc) VA - combines visualisation with predictive analytics - more predictive and closely associated with business analytics (forecasting, segmentation and correlation analysis)

What is visual analytics?

Information visualisation(descriptive) + predictive analytics(predictive) There is a strong move toward VA

What are the inputs and outputs for association rule mining

Input - simple point-of-sale transaction data Output - most frequent affinities among items

What is a new direction in data visualisation?

Integrating data visualisation with decision support tools/applications Intelligent visualisation - includes data interpetation

What is a popular cluster/outlier analysis algorithm?

K-means

T! Main differences between line, bar and pie charts? When to use one over the other?

Line - good for time series data Bar - good for nominal or numeric data that can be easily categorised Pie - good for depicting proportions (don't use if high number of categories)

What are the basic charts and graphs?

Line chart Bar chart Pie chart Scatter plot Bubble chart

What is a popular regression algorithm?

Linear regression

There is a continuous flow of activity from one phase to the next in a decision making process, but at any phase there may be a return to a previous phase. ________ is an essential part of this process.

Modelling

What are the 3 levels/layers of information that needs to be displayed on a dashboard?

Monitoring Analysis Management

What do you think is the "next big thing" in data visualisation?

More 3D visualisation Virtual reality environment - immersive experience Holographic visualisations

What is satisficing?

Most humans will settle for a good enough solution - limited capacity for rational thinking (bounded rationality). Tradeoff: time and cost of searching for an optimum vs. value of obtaining one

What is data transfer?

Move data from one database to another or otherwise bring data together

Common components/characteristics of business reporting systems?

OLTP Data supply (volume, variety, velocity...) ETL Data storage Business logic Publication medium

Dashboards are used for monitoring, analysis and management. Which data are most useful at the management layer?

Operational data that can identify what actions to take to resolve a problem (management concerned with operations)

In the design phase of decision making, selecting a principle of choice or criteria means that _________________________________________________.

Optimality is not the only criterion for acceptable solutions (satisficing)

Which type of visualisation tool is best used to show relative proportions of dollers per department allocated by a university department?

Pie chart

What is data visualisation?

Presentation of data and the results of data analysis

Why might DM be questioned as art or science?

Process is highly repetitive and experimental

What are the three main goals of maintaining data integrity?

Protecting existence Maintaining quality Ensuring confidentiality

Name some design principles in visualising data for reports

Ratio between data and ink Colour ?Pre-attentive attributes? etc.

What are the "enablers" of descriptive analytics?

Reporting Visualisation BPM (business process management) also: dashboards scorecards data warehousing

What are the dimensions of models?

Representation -> objective vs subjective Time dimension Linearity of relationship Deterministic vs stochastic Descriptive vs. normative Causality vs correlation Methodology dimension

What is an estimation methodology for classification?

Simple split: training data ~70% testing data ~30%

Examples of descriptive models?

Simulation, what-if analysis, COGNITIVE MAPPING

What do decision tree algorithms mainly differ on?

Splitting criteria Stopping criteria Pruning (generalisation method)

Types of models in decision making (not talking about normative or descriptive here)

Statistical models (e.g. regression analysis, ANOVA) Accounting models (e.g. depreciation models) Personnel models (e.g. role playing) Marketing models (e.g. advertising strategy product switch models)

T! What are the main categories of data? What types of data can we use for BI and analytics?

Structured and unstructured Both can be used - easier to use structured

Difference between supervised and unsupervised learning?

Supervised - training data contains both the descriptive attributes (independent variables) as well as the class attribute (output variable). Unsupervised - only has descriptive attribute

What is structured data?

Targeted for computers to process Numeric vs categorical

What is unstructured data?

Targeted for humans to digest

What is supervised learning?

Tell it what you're looking for

What is the format of a business report?

Text + tables + graphs/charts

What is data integrity?

The degree to which attributes of data associated with a specific occurrence of a given entity accurately describe that occurrence of the entity.

T! Where does the data for business analytics come from?

The internet and social media Business processes and systems Machines The internet of things

What is the definition of data mining?

The nontrivial process of identifying valid, novel potentially useful and ultimately understandable patters in data stored in a structured database.

What do variables have to do with modelling?

The process of modelling involves determining (usually mathematical) relationships between variables

T! What is prescriptive analytics? What kind of problems can be solved by prescriptive analytics?

The use of descriptive data and forecasts to identify the optimal decisions to maximise performance. Businesses can use to solve problems e.g. how much of a good to produce, what cost to charge, identify best locations for a store.

T! What is predictive analytics? How can organisations employ predictive analytics?

The use of statistical techniques and data mining to determine what is likely to happen in the future. Businesses can use predictive analytics to forecast e.g. what customers will buy, how they will respond to a business decision, whether a customer is creditworthy.

!What is descriptive analytics? What are the various tools employed?

The use of statistical techniques to present current or past circumstances in an understandable way and identify underlying causes. Tools include - visualisation, reporting, BPM, data warehouses, dashboards, scorecards.

What is the output variable for association rule mining?

There is none

Name 3 benefits of BI

Time savings Improved customer service Increased revenue Improved decision making Cost savings Faster, more accurate reporting

What is the purpose of a managerial report?

To improve managerial decisions

What are the two types of data

Unstructured vs structured

Is association rule mining supervised or unsupervised?

Unsupervised

What is cluster analysis used for and how does it work?

Used for automatic identification of natural groupings of things It learns the clusters of things from past data, then assigns new instances

What is usually the Data Mining environment?

Usually a client-server or web-based information systems architecture

What are the V's that define big data

VOLUME, VARIETY, VELOCITY and veracity, variability, value (proposition?)

What is another word for 'dimensions' in multidimensional presentation?

Variables

What are the outcomes of descriptive analytics?

Well-defined business problems and opportunities

What questions do descriptive analytics deal with?

What happened? What is happening?

What questions do prescriptive analytics deal with?

What should I do? Why should I do it?

What questions do predictive analytics deal with?

What will happen? Why will it happen?

When would cluster analysis be an appropriate data mining technique?

When the data records do not have predefined class identifiers

T! Why would you use a geographic map? What other types of charts can be combined with a geographic map?

When the data set contains location data Pie charts can!

Who is the inventor of the modern chart?

William Playfair

What is the generic rule for association rule mining?

X -> Y[S%, C%] where X, Y are products/services S - support: how often X and Y go together C - confidence: how often Y go together with X

What does a clustered column chart do?

alternative to stacked column chart

What is a popular link analysis algorithm?

apriori algorithm

What is a popular sequence analysis algorithm?

apriori algorithm

What type of variable is the output variable for classification data mining?

categorical (nominal or ordinal)

What does a stacked column chart do?

compare relative values of quantitative variables for the same category, in a bar chart

What do you look for for a good rule in a database?

confidence + support

What are some applications of association rule mining in business?

cross-marketing, cross-selling, store design , catalog design

What are the two main types of DM

hypothesis-driven DM (target in mind) discovery-driven DM (no target in mind)

How does classification work?

learn from past data, classify new data

Visual analytics is widely regarded as the combination of visualisation and _______ analytics

predictive

What are some applications of association rule mining in medicine?

relationships between symptoms and illness - diagnosis

What is the output variable for cluster analysis

there isn't one

What do heat maps do?

two-dimensional, uses shades of colour to indicate magnitude

What is a data lake

unstructured data storage technology for big data

Is cluster analysis supervised or unsupervised

unsupervised

What do bubble charts do?

visualise three variables in a two-dimensional graph (good alternative to 3d)

What is a petabyte?

10^15 bytes

What is a business report?

A written document that contains information regarding business matters

What are the outcomes of predictive analytics?

Accurate projections of future states/conditions

What is business intelligence (broad definition)?

An umbrella term that combines architectures, tools, databases, applications and methodologies

What is application independence?

Application software should be separate from the database

What are the "enablers" of prescriptive analytics?

Automated decision making Knowledge management Collaborative systems also: optimisation simulation decision modelling expert systems

Under structured data, what are the types of data?

Categorical (nominal, ordinal) Numerical (ratio, interval)

What is the source of a business report?

Data from inside and outside the organisation (via the use of ETL)

What are the "enablers" of predictive analytics?

Data mining Text mining Web analytics also: forecasting

What is big data?

Data that can't be stored or processed easily by traditional means

What are the challenges of big data analytics?

Data volume Data integration Processing capabilities Data governance Skill availability Solution cost

What are BI's architecture and components?

Data warehouse Business analytics Automated decision systems Performance and strategy Possibly also: Data sources

What are the metrics for analytics-ready data?

Data.... Source reliability Content accuracy Accessibility Security and privacy Richness Consistency Currency/timeliness Granularity Validity and relevancy

What is a DBA

Database administrator - manages an organisation's electronic databases - more technical position

What is a multidimensional database?

Database which supports multidimensional analysis

What variables are involved in modelling?

Decision variables - describe the alternatives from which a manager can choose e.g. how many cars to buy Result variables - describe the objective of the decision making problem e.g. sales, revenue, profit Uncontrollable variables - parameters that describe the environment e.g. economic conditions

What is the DBMS responsible for?

Defining, creating, modifying, deleting and reading data in an information system. Enforcing integrity constraints

What are the areas in which control activities must be taken to ensure data integrity?

Definition control Existence control Access control Update control Concurrency control Quality control

What is master data management (MDM)?

Disciplines, technologies and methods to ensure the currency, meaning and quality of reference data within and across subject areas. Simply, it's a way of managing the single version of the truth.

What is it called when you turn numerical data into categorical?

Discretisation

What are the 5 types of data standards a business must establish?

Identifier Naming Definition Integrity rule Usage rights

The _______ of a proposed solution to a problem is the initiation of a new order of things or the introduction of change.

Implementation

What are the enablers of big data analytics?

In-memory analytics In-database analytics Grid computing & MPP Appliances

What is the distribution of a business report?

In-print, email, portal/intranet

What is a balanced scorecard type report?

Include financial, customer, business process, and learning & growth indicators

T! You are about to buy a car. Using Simon's four-phase model, describe your activities at each step.

Intelligence - recognise you need a better car Design - determine parameters that describe the appropriate car to buy Choice - choose the car Implementation - purchase the car

What was Herbert Simon's model of decision-making he developed in the 1970s?

Intelligence -> Design -> Choice -> Implementation

What forms does big data come in?

Many forms Large, structured, unstructured, continuous

What are the types of business reports?

Metric management reports Dashboard type reports Balanced scorecard type reports

What are the types of models?

Normative Descriptive Heuristic

What form of decision theory assumes that decision makers are rational beings who always seek to strictly maximise economic goals?

Normative decision theory

Difference between OLAP and OLTP?

OLTP concentrates on processing repetitive transactions in large quantities and conducting simple manipulations. OLAP involves examining many data items, analysing complex relationships, looking for patterns, trends and exceptions

What is a normative model?

One in which the chosen alternative is demonstrably the best of all possible alternatives. Assume decision makers are rational - optimisation - rationalistation - suboptimisation

What is a model?

Representations of systems or problems (i.e. reality) with varying degrees of abstraction. -> Simplification through assumptions

T! You are about to sell your car. What principles of choice are you most likely to use in deciding whether to reject or accept offers? Why?

Satisficing - you cannot optimise so you must set an aspiration level and accept anything that exceeds it.

_________ is a study of the effects of a change in one or more input variables on a proposed solution

Sensitivity analysis

What types of unstructured data are there?

Textual Multimedia (image, audio, video) XML/JSON [semi-structured]

What is multidimensionality?

The ability to organise, present and analyse data by multiple dimensions


Kaugnay na mga set ng pag-aaral

Chapter 11- Stock Valuation and Risk

View Set

A&P, Tort, Ch 6 skeletal system - bone tissue

View Set

Chapter Three: More Legal Concepts

View Set

PrepU - Ch.25 Assessment of Cardiovascular Function

View Set

CS271 - WEEK 8 - Summary Excersizes

View Set

Chapter 11 Additional Test Questions

View Set