ITM

Ace your homework & exams now with Quizwiz!

Itemset

A collection of one or more items

Data Lake

A storage repository that holds a vast amount of raw data in its original format until the business needs it

Time Series Patterns - Seasonal Pattern

A time series that shows a recurring pattern over one year or less

Stationary time series

A time series whose statistical properties are independent of time.

Trivial Rule

Already known by anyone who are familiar with the business

Basket Analysis

An analysis of items frequently co-occurring in transactions

Data Warehouse

An integrated, subject-oriented, time-variant, non-volatile database that provides support for decision making.

Analytic implementation - Localized analytics

Analytic efforts are isolated The organization collects transaction data efficiently but often lacks the right data for better decision-making.

Universal Principles for Data Ethics 10-12

Aspire to design practices that incorporate transparency, configurability, accountability, and auditability. Products and research practices should be subject to internal (and potentially external) ethical review. Governance practices should be robust, known to all team members and regularly reviewed.

Customer privacy dilemma - more customer data

Better understanding of customer Bigger privacy concern

Characteristics of ERP

Broad functionality Integration of "modules" Operates in real-time Evolving

Kotter's Steps for Leading Organizational Change - Manage

Build on the change Embed the change into the culture

Outliers

Can severely distort the representativeness of the clustering results

Applications of Cluster Analysis - Marketing

Cluster customers into smaller groups for additional analysis

Applications of Cluster Analysis - Psychology

Clusters are used to identify subcategories of illnesses

ERP (Enterprise Resource Planning)

Collection of integrated software for all functions of business management

Silhouette Coefficient

Combines the ideas of both cohesion and separation for individual points

Design of Datawarehouse storage - Data Cube

Combining 3 different attributes in a 3D shape

Kotter's Steps for Leading Organizational Change - Implement

Communicate the vision Remove obstacles Create short-term wins

Useful Rule

Contains high quality, actionable information

Statistic-based Text Mining

Count the number of times words occur Calculate the statistical proximity Produce many irrelevant results, or noise

Kotter's Steps for Leading Organizational Change - Prepare

Create a sense of urgency Form a powerful coalition Create a vision

data lake vs data warehouse - analytics

DW - batch reporting, visualisations DL - Machine learning, predictive analytics

data lake vs data warehouse - data

DW - data comes from business applications and operational databases DL - data comes from IoT devices and social media

data lake vs data warehouse - schema

DW - designed prior to implementation DL - designed at time of analysis

data lake vs data warehouse - price/performance

DW - fast query results that use high cloud storage DL - Slow query results using low cloud storage

data lake vs data warehouse - data quality

DW - highly curated data, that serves as the truth DL - raw data that is not curated

DELTA Model

Data - unique, accessible and available to you Enterprise wide - data and analytics available to a firm Leaders - at all levels that promote data analytics culture Targets - dealing with identifying business areas that benefit Analysts - to execute strategy

Role of Data

Data --> Information --> Decision-making

Output of a data warehouse

Data Mining Business Intelligence Data/Business Analytics for decision making -Explanatory Analytics -Predictive Analytics

Universal Principles for Data Ethics 7-9

Data can be a tool of both inclusion and exclusion As far as possible, explain methods for analysis and marketing to data disclosers. Data scientists and practitioners should accurately represent their qualifications (and limits to their expertise), adhere to professional standards, and strive for peer accountability.

Unstructured Data

Data does not exist in a fixed location and can include text documents, PDFs, voice messages, emails

Types of Data Warehouses - Operational Data Store

Data warehouse is refreshed in real time Preferred for routine activities like storing records of the Employees

Databases in Organizations - Middle management level

Deliver the data required for tactical planning Monitor the use of resources Evaluate performance Enforce security and privacy of data in the database

The relational (operational) database

Describes a precise set of data manipulation constructs

Issues for analytic project implementation - Data-related Challenges

Disparate Data Sources and Data Silos Data Warehouses are not the Only Option Dirty Data

Benefits of ERP

Efficiency Forecasting Collaboration Scalability Integrated Information Cost Savings

Regulations & Compliances for privacy - FCC Privacy Act

Ensures the accuracy and protects the privacy of every individual whose protected information is stored in Commission systems or records Regulate the collection, maintenance, use, and dissemination of Privacy Act-protected information

Analytic implementation - Analytical aspirations

Executives make a commitment to broader use of analytics The organization has business intelligence tools and data marts Most data remains un-integrated, non-standardized and inaccessible.

Natural Language Processing (NLP)

Find meanings in the text: -by recognizing a variety of word forms as having similar meanings -by analyzing sentence structure to provide a framework of understanding the text Achieve both speed and accuracy

Cluster Analysis - Objectives

Find useful groups of objects in data Find similar items in a group Find dissimilar items in a group

Data Lake Benefits

Flexibility - allows data to remain in its native form making more data available for analysis

Support

Frequency of transactions that contain both X and Y

Modular structure of an ERP

Functionalities are logically put into different business processes and structured into a module. The module can be detached without affecting other modules Modules are decided per department

Silhouette Value - 0.5 or more

Good evidence of reality of the clusters in the data

Types of Clustering - Partitional

Group objects into non-overlapping clusters, so that each data object is in exactly one cluster

Applications of Cluster Analysis - Information Technology

Group search terms in clusters that best captures the query

Applications of Cluster Analysis - Biology

Group similar living things together

Types of Clustering - Hierarchical

Grouping data into clusters where all data in each cluster is very similar. Do not have to assume any particular number of clusters

Analytic implementation - Analytical companies

High-quality data Enterprise-wide analytical plan IT processes and governance principles Some embedded or automated analytics.

Characteristics of a data warehouse - time-variant

Historical data is accumulated over the time.

Association Rule Mining - Reducing Candidates (Apriori principle)

If an itemset is frequent, then all of its subsets must also be frequent Support of an itemset never exceeds the support of its subsets

Approach to solve privacy issues - Utilitarian approach

If the overall harm exceeds the overall benefit, the practice is regarded as unethical If personal harm (e.g., loss of pleasure) exceeds any benefit (e.g., convenience), it is regarded as unethical

Goal of ERP

Integrate everything

Role of the DBMS

Intermediary between the user and the database Enables data to be shared Presents the end user with an integrated view of the data Receives and translates application requests into operations required to fulfill the requests

Classification

Known number of groups Assign new observations to per-determined set of groups

Data Lake disadvantages

Lack of governance

Customer privacy dilemma - less customer data

Less understanding of customer Smaller privacy concern

Silhouette Value - 0.25 or less

Little to no evidence of cluster reality

Confidence

Measures how often items in Y appear in transactions that contain X

Lift

Measures that take into account statistical dependence The higher the lift the stronger the association rule

Analytic implementation - Analytically impaired

Missing or poor-quality data Multiple definitions of data Poorly integrated systems.

Issues for analytic project implementation - Team-related Challenges

Need for an Analytics Roadmap Internal vs. External Expertise

Issues for analytic project implementation - Leadership Team Challenges

Old-School Mindset Lack of Continuous Involvement

Characteristics of a data warehouse - non-volatile

Only aggregated data is integrated into DW and never revised as opposed to the transnational data which can be changed

Approach to solve privacy issues - Kantian approach

People should be respected and treated as individuals capable of rational choice with regards to the electronic monitoring

Regulations & Compliances for privacy - HIPAA

Privacy Rule - establishes national standards for the protection of certain health information Security Rule - establishes national security standards for protecting certain health information that is held or transferred in electronic form

Types of Data Warehouses - Enterprise Data Warehouse

Provides decision making service Unified approach for organizing and representing data Ability to classify data according to the subject and give access according to those divisions

Regulations & Compliances for privacy - General Data Protection Regulation (GDPR)

Regulation in EU law on data protection and privacy for all individuals within the European Union

Databases in Organizations - Operational management level

Represent and support company operations Produce query results Enhance the company's short-term operations

Text Analytics

Searches through unstructured text data to look for useful patterns. Around 80% of data in an organization is in the form of text documents

Universal Principles for Data Ethics 4-6

Seek to match privacy and security safeguard with privacy and security expectations. Always follow the law, but understand that the law is often a minimum bar. Be wary of collecting data just for the sake of having more data.

Inexplicable Rule

Seems to have no explanation and do not suggest a course of action

Time Series Patterns - Cyclical Pattern

Shows a periodic pattern lasting more than one year

Silhouette Value - 0.25-0.5

Some evidence of reality of the clusters in the data More investigation needed

Types of Data Warehouses - Data Mart

Specially designed for a particular line of business, such as sales, finance, sales or finance Data can be collected from multiple sources

Design of Datawarehouse storage - Star Schema

Stores multi-dimensional data in tables Each table is connected to a central main table

Databases in Organizations - Top management level

Strategic decision planning Identify growth opportunities Define and enforce organizational policies Reduce costs and boost productivity Provide feedback

Cluster Cohesion

Sum of the weight of all links in a cluster

Cluster Separation

Sum of the weights between nodes in the cluster and nodes outside the cluster

Issues for analytic project implementation - External Challenges

The Big-Bang Approach vs. Low-Risk Approach Pretty Visualizations vs. Actionable Insights

Cluster Analysis

The data preparation technique used in market segmentation to divide consumers into different homogeneous groups is called

Universal Principles for Data Ethics 1-3

The highest priority is to respect the persons behind the data. Account for the downstream uses of datasets. The consequences of utilizing data and analytical tools today are shaped by how they've been used in the past.

Analytic implementation - Analytical competitors

The organization is routinely reaping big benefits from its enterprise-wide analytics capability The organization has a full-fledged analytic architecture that is enterprise-wide, fully automated and integrated into processes.

ETL Process (Extract, Transform, and Load)

The process of extracting data from source systems and bringing it into the data warehouse

Privacy Definition

The right to be Left Alone The right to Control Access to Self's Personal Information The right to Withhold Certain Facts from Public Knowledge

Time-series Analysis

To uncover a pattern in a time series and then extrapolate the pattern into the future Assumption is that the similar patterns in the past would be repeated in the future Based solely on past values

Structured Data

Typically numeric or categorical Can be organized and formatted in a way that is easy for computers to read, organize, and understand Can be inserted into a database in a seamless fashion.

Identifying Time Series Patterns (Forecasting)

Ultimately, the user should decide which model to use based on the software output and his managerial knowledge Uses linear regression

Clustering

Unknown number of groups Assign new observations based on having similarities and differences

Association Rule Mining - Min support/confidence level

Used in frequent itemset generation Generate all items whose support and confidence >= minsup/minconf threshhold

Characteristics of a data warehouse - subject-oriented

Uses multi-dimensions to slice-and-dice the aggregated data

Time Series Patterns - Horizontal Pattern

When data fluctuates randomly around a constant mean

Time Series Patterns - Trend Pattern

gradual shifts or movements to relatively higher or lower values over a longer period of time Such as population change


Related study sets

BCH480 All quizzes/Test Questions

View Set

ATI Learning System RN 2.0 | Pharmacology Final Practice

View Set

Chapters 4-5 Learn Smart; Biomechanics

View Set

Virginia Life & Health Study Guide #1

View Set

FINA final conceptual questions from all

View Set

ch two: beginnings of english america

View Set

Real Bis: VA Law and Regulations - Chapter 5

View Set

UCCS Basic Health Assessment Chap. 10

View Set

Economics Ch.4 Market Equilibrium

View Set