Business Intelligence & Data Warehousing

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Human-generated data

Human-generated data is data that humans, in interaction with computers, generate Human-generated structured data includes input data, click-stream data, or gaming data

Types of analytical processing

MOLAP (multidimensional online analytical processing) is an alternative to the ROLAP (Relational OLAP) technology))) indexes directly into a multidimensional database. ROLAP(relational online analytical processing) is an alternative to the MOLAP (Multidimensional OLAP) technology. HOLAP(hybrid online analytical processing) is a combination of ROLAP ( Relational OLAP) and MOLAP (Multidimensional OLAP) SQL -SQL (pronounced "ess-que-el") stands for Structured Query Language. SQL is used to communicate with a database. According to ANSI (American National Standards Institute), it is the standard language for relational database management systems.

The sources of structured data include:

Machine-generated data & Human-generated data (structured)

The sources of unstructured data include:

Machine-generated unstructured data & Human-generated unstructured data

Business Advantages of a Relational Database 5) Increased Information Security

Managers must protect information, like any asset, from unauthorized users or misuse Security risks are increasing as more and more databases and DBMS systems are moving to data centers run in the cloud

Kimball model

Model with the data mart approach (bottom up)

inom model

Model, also known as the EDW approach, emphasizes top-down development, employing established database development methodologies and tools, such as entity-relationship diagrams (ERD), and an adjustment of the spiral development approach.

kimball model

Model, also known as the data mart approach, is a "plan big, build small" approach. A data mart is a subject-oriented or department-oriented data warehouse. It is a scaled-down version of a data warehouse that focuses on the requests of a specific department, such as marketing or sales.

Operational data store

ODS. Provides a fairly recent form of customer information file (CIF). This type of database is often used as an interim staging area for a data warehouse. Used for short term decisions. Uploads just recent info not for long-term use. Data warehouse on the other hand stores permanent info. An ODS consolidates data from multiple source systems and provides a near-real time, integrated view of a volatile, current data.

most common analysis technique in data warehouse?

OLAP online analytical processing.

Types of analytical processing activities:

Online analytical processing (OLAP), data mining, querying, reporting, and other decision-support applications.

Business Analytics and its goals

The process of creating new insights from information is known as business analytics a) Business Intelligence --> Operational --> Here & Now b) Business Analytics --> Strategic --> Future Goals: Extracting the knowledge buried inside enterprise databases (discover unknown relationships) Analytical decision are put on a repeatable basis instead of treating as an ad hoc activity

BI

The set of techniques and tools for the transformation of raw data into meaningful and useful information for business analysis/decision support purposes.

Data mart bus architecture

This architecture is a viable alternative to the independent data marts where the individual marts are linked to each other via some kind of middleware. Not optimal for complex data queries.

Data Mining Analysis Methods

-Analyzing customer buying patterns to predict future marketing and promotion campaigns. -Building budgets and other financial information. -Detecting fraud by identifying deceptive spending patterns. -Finding the best customers who spend the most money. -Keeping customers from leaving or migrating to competitors. -Promoting and hiring employees to ensure success for both the company and the individual.

KPIs

-linked to a strategy w/ an objective -defines the target and actual performance measure (e.g. increase repeat business for bike customers by 15%)

Financial KPIs

"What are the economic consequences of the organization's past actions?" Examples: operating income, expenses, return on capital, profit margin, cash flow, economic value added

Business Process KPIs

"What are the existing and emerging internal business processes in which the supply chain organization must excel?" Examples: efficiency, cost, throughput, quality, effectiveness

Learning and Growth KPIs

"What infrastructure is needed to foster long-term growth and improvement?" Examples: employee satisfaction, employee retention, skill sets, education and training, information technology

Customer KPIs

"What value proposition is delivered to key customer segments?" Examples: customer satisfaction, customer retention, customer acquisition, market share in target segments, valued services

Grievances

/ˈgri vəns/ : grievances a real or imagined wrong or other cause for complaint or protest, especially unfair treatment.

4 categories of KPI examples

1) Financial 2) Customer 3) Business Process 4) Learning and Growth

Costs of Using Low-Quality Information

1) Inability to track customers accurately. 2) Difficulty identifying the organization's most valuable customers. 3) Inability to identify selling opportunities. 4) Lost revenue opportunities from marketing to nonexistent customers. 5) The cost of sending undeliverable mail. 6) Difficulty tracking revenue because of inaccurate invoices. 7) Inability to build strong relationships with customers.

6 Dashboard Elements in Performance Point

1) Indicators 2) Filters 3) Reports 4) KPIs 5) Scorecards 6) Dashboard

The four primary reasons for low-quality information

1) Online customers intentionally enter inaccurate information to protect their privacy. 2) Different systems have different information entry standards and formats. 3) Data-entry personnel enter abbreviated information to save time or erroneous information by accident. 4) Third-party and external information contains inconsistencies, inaccuracies, and errors.

IMPORTANT difference between data, information, knowledge

1) data = facts, observations, raw numbers 2) information = with meaning subset of data with its context, out of manipulated raw data, e.g. number of sales today 3) knowledge = derived information, justified believes (logic, empirical observations), about relationships among concepts, decisions are higher reliable if based on knowledge - not just data or informtion

6 Distinguishing features of KPIs

1) embody strategic objectives 2) measure performance against specific targets 3) targets have performance ranges (above, on, below) 4) ranges are encoded in software enabling visual display (e.g. red, yellow, green) 5) targets typically are assigned time frames by which they must be accomplished 6) targets are often measured against a benchmark (e.g. previous year's results

IMPORTANT four synergistic capabilities of BI

1) organizational memory: collect quantitive data, accumulated over time 2) information integration: non-quantitive and external data 3) insight creation: apply analytics 4) presentation: display in visual and user friendly formats --> They provide input to each other

two types of integrity constraints

1) relational 2) business critical

6 Dashboard Characteristics

1) use of visual components (e.g. charts, performance bars, spark lines, gauges, meters, stoplights) to highlight, at a glance, the data and exceptions that require action 2) transparent to the user, meaning that they require minimal training and are extremely easy to use 3) combine data from a variety of systems into a single, summarized, unified view of the business 4) enable drill-down or drill-through to underlying data sources or reports 5) present a dynamic, real-world view with timely data updates 6) require little, if any, customized coding to implement, deploy, and maintain

IMPORTANT ETL process

BI tools can also directly help obtaining data and information (such as through extraction, transformation, and loading of data).

Info & Info 2

Big data is one of the most promising technology trends occurring today. Of course, notable companies such as Facebook, Google, and Netflix are gaining the most business insights from big data currently, but many smaller markets are entering the scene, including retail, insurance, and health care. Over the next decade, as big data starts to improve your everyday life by providing insights into your social relationships, habits, and careers, you can expect to see the need for data scientists and data artists dramatically increase.

The four common characteristics of big data

Big data requires sophisticated tools to analyze all the unstructured information from millions of customers, devices, and machine interactions. Big data are analyzed for marketing trends in business as well as in the fields of manufacturing, medicine, and science

the three key factors that affect the presentation ability

Role different user groups (CEO, middle manager, customer support, ...) Task every task requires different content and format of the information Preference individuals differ in their preference (big picture vs. detail) --> a good BI solution should

Types of integration technologies that enable data and metadata integration:

Enterprise application integration (EAI, vehival pushes data from source to data warehouse), Enterprise information integration (EII, promotes real-time data integration).

ETL stands for

Exchange, transfer and load

What does ETL stand for?

Extraction, Transformation, and Load

Hub-and-spoke architecture

Famous data warehousing architecture today. Focus on building a scalable and maintainable infrastructure that includes a centralized data warehouse and several dependent data marts. Allows for easy customization of user interfaces and reports. Lacks a holistic enterprise view, and may lead to data redundancy and data latency.

ad-hoc reports

From that point on, the actual reports are created by business end-users. Ad-hoc is Latin for "as the occasion requires." This means that with this BI model, users can use their reporting and analysis solution to answer their business questions "as the occasion requires," without having to request queries from IT.

The technologies that come with Big Data are

Hadoop, MapReduce, and NoSQL, Hive

Decision Support Data

Historic data that is queried intensively in fewer less normalized tables. Has large data volumes.

business intelligence

comprehensive, cohesive, and integrated set of tools and processes used to capture, collect, integrate, store, and analyze data with the purpose of generating and presenting information used to support business decision making.

Business-critical integrity

constraints enforce business rules vital to an organization's success and often require more insight and knowledge than relational integrity constraints no product returns are accepted after 15 days past delivery (makes sense because of spoilage of produce)

A data mart...

contain data on one topic (e.g., marketing). A data mart can be a replication of a subset of data in the data warehouse. Data marts are a less expensive solution that can be replaced by or can supplement a data warehouse. Data marts can be independent of or dependent on a data warehouse.

biggest pitfalls associated with real-time information

continual change

Machine-generated data

created by a machine without human intervention Machine-generated structured data includes sensor data, point-of-sale data, and web log (blog) data

database management system (DBMS)

creates, reads, updates, and deletes data in a database while controlling access and security. Managers send requests to the DBMS, and the DBMS performs the actual manipulation of the data in the database

What is metadata

data about the data. in a data warehouse, metadata describe the contents of a data warehouse and the manner of its acquisition and use

Data integration uses three things:

data access, data federation (integration of business views across multiple data stores) and change capture (based on the identification, capture and delivery of changes made to enterprise data sources.

What solutions does business intelligence provide

data access, storage, data analysis and visualization technologies to support better decision making

data mart (1 of 3 core concepts of data warehousing)

data mart contains a subset of data warehouse information. To distinguish between data warehouses and data marts, think of data warehouses as having a more organizational focus and data marts as having a functional focus

A web-server is backed by both a

data warehouse and an application server. used for ease of access, platform independence, and lower cost.

The federated data warehouse

data warehouse architecture involves integrating disparate systems and analytical resources from multiple sources to meet changing needs or business conditions.

data warehouse parts

data warehouse itself, data acquisition (back-end), client (front-end).

Business Advantages of a Relational Database 4) Increased Information Integrity (Quality)

database design needs to consider integrity constraints

physical view of information

deals with the physical storage of information on a storage device

business rule

defines how a company performs certain aspects of its business and typically results in either a yes/no or true/false answer Stating that merchandise returns are allowed within 10 days of purchase is an example of a business rule

data quality audits

determine the accuracy and completeness of its data. Most organizations determine a percentage of accuracy and completeness high enough to make good decisions at a reasonable cost, such as 85 percent accurate and 65 percent complete.

several obstacles of BI introduction

difficult to find a fitting BI solution, because often expensive and benefits are rather long term business processes are often not constantly defined BI need for business user are difficult to identify

Business Advantages of a Relational Database 1) Increased Flexibility

distinction between logical and physical views is important in understanding flexible database user views

Transactional information

encompasses all of the information contained within a single business process or unit of work, and its primary purpose is to support daily operational tasks (Organizations need to capture and store transactional information to perform operational tasks and repetitive decisions such as analyzing daily sales reports and production schedules to determine how much inventory to carry)

Inmon vs kimball

inmom op-down, enterprise wide, complex, dubjrct driven, low end0user, IT professionals, WHEREAS kimball bottom-up, simple method, data marts, process oriented, dimensional modeling, high end user accessibilites.

Federated data warehouse

integrates analytical resources from multiple sources to meet changing needs or business conditions.

snowflake schema

is a logical arrangement of tables in a multidimensional database in such a way that the entity relationship diagram resembles a snowflake in shape.

Enterprise integration informaiton

is a mechanism for pulling data from source systems to satisfy a request for information. It is an evolving tool space that promises real-time data integration from a variety of sources, such as relational databases, Web services, and multidimensional databases.

Dependent Data Mart

is a subset that is created directly from the data warehouse. It has the advantage of using a consistent data model and providing quality data. A dependent data mart ensures that the end user is viewing the same version of the data that is accessed by all other data warehouse users. The high cost of data warehouses limits their use to large companies.

Unstructured data

is not defined, does not follow a specified format, and is typically free-form text such as emails, Twitter tweets, and text messages (Unstructured data accounts for about 80 percent of the data that surrounds us)

Data models

logical data structures that detail the relationships among data elements by using graphics or pictures

database

maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses) (store information) (core component of any system, regardless of size, is a database and a database management system)

Data warehousing used primarily to help

make informed decisions.

Relational Databases are not well suited for

manipulating records. support a lot of data. supports dynamic joining of data. proven technology. performance less than optimal cannot be used for purely optimized processing.

Information integrity

measure of the quality of information

Dimensional modeling

modeling is a retrieval-based system that supports high-volume query access.

Data visualization tools

move beyond Excel graphs and charts into sophisticated analysis techniques such as controls, instruments, maps, time-series graphs, and more Data visualization tools can help uncover correlations and trends in data that would otherwise go unrecognized

Information integrity issues

occur when a system produces incorrect, inconsistent, or duplicate data (can cause managers to consider the system reports invalid and will make decisions based on other sources)

Information inconsistency

occurs when the same data element has different values

Analysis paralysis

occurs when the user goes into an emotional state of over-analysis (or over-thinking) a situation so that a decision or action is never taken, in effect paralyzing the outcome In the time of big data, analysis paralysis is a growing problem. One solution is to use data visualizations to help people make decisions faster

How are data warehouses different from operational databases

operational databaseses are more product oriented and data warehouses use subject orientation to give a more comprehensive view of the organization.

Master data management (MDM)

practice of gathering data and ensuring that it is uniform, accurate, consistent, and complete, including such entities as customers, suppliers, products, sales, employees, and other critical entities that are commonly integrated across organizational systems

Data mining take analysis further by sifting through a large amount of data to find info using these such algorithms:

predictive modeling, database segmentation, link analysis, deviation detection.

Infographics

present the results of data analysis, displaying the patterns, relationships, and trends in a graphical format (exciting and quickly convey a story users can understand without having to analyze numbers, tables, and boring charts)

Distributed computing

processes and manages algorithms across many machines in a computing environment

Reports

provide access to interactive and static data in a variety of forms (e.g. analytic chart, analytic grid, Excel services, KPI details, web page)

OLAP tools

provide data access to end users. allow a user to "drill-down" into their data to view it at whatever level of detail they need.

Real-time systems

provide real-time information in response to requests. Many organizations use real-time systems to uncover key corporate transactional information

Metadata

provides details about data. F(an image could include its size, resolution, and date created. Metadata about a text document could contain document length, data created, author's name, and summary)

Two primary tools are available for retrieving information from a DBMS

query-by-example (QBE) tool and a structured query language (SQL)

Data governance

refers to the overall management of the availability, usability, integrity, and security of company data

association detection

reveals the relationship between variables along with the nature and frequency of the relationships

Relational integrity constraints

rules that enforce basic and fundamental information-based constraints. For example, a relational integrity constraint would not allow someone to create an order for a nonexistent customer, provide a markup percentage that was negative, or order zero pounds of raw materials from a supplier

Integrity constraints

rules that help ensure the quality of information

Machine-generated unstructured data

satellite images, scientific atmosphere data, and radar data

Business Advantages of a Relational Database 2) Increased Scalability and Performance

scalable to handle the massive volumes of information, the large numbers of users expected for the launch of the website, and need to perform quickly under heavy use

star schema

simplest form of dimensional modeling. contains a central tact table surrounded by and connected to several dimension tables. the fact table contains a large number of rows that correspond to observed facts and external links.

slice and dice

slice and dice: phrase of slice, divide a quantity of information up into smaller parts, especially in order to analyze it more closely or in different ways.

The growing demand for real-time information

stems from organizations' need to make faster and more effective decisions, keep smaller inventories, operate more efficiently, and track performance more carefully

entity (also referred to as a table)

stores information about a person, place, thing, transaction, or event (ex. TRACKS, RECORDINGS, MUSICIANS, and CATEGORIES) -columns, attributes, fields-> (supplier, inventory, materials, distribution)

relational database model

stores information in the form of logically related two-dimensional tables

Data visualization

technologies that allow users to see or visualize data to transform information into a business perspective Data visualization is a powerful way to simplify complex data sets by placing data in a format that is easily grasped and understood far quicker than the raw data alone

Human-generated unstructured data

text messages, social media data, and emails

structured query language (SQL)

that asks users to write lines of code to answer questions against a database

information cube

the common term for the representation of multidimensional information

retention /rɪˈtɛn ʃən/

the continued possession, use, or control of something. Membership retention, pro-mentorship, retain, the meeting,

Attributes (also called columns or fields)

the data elements associated with an entity (the entity TRACKS are TrackNumber, TrackTitle, TrackLength, and RecordingID. Attributes for the entity MUSICIANS are MusicianID, MusicianName, MusicianPhoto, and MusicianNotes)

Distributed database management system

would pull the requested data from databases across the organization, bring all the data back to the same place, and then consolidate in, sort it, and do whatever else was necessary to answer the user's question. Islands of data problem still existed.

3 tiers of data warehousing architecture. ( a 2 tier is more economical where the last two work together but not great for large companies).

Tier 1: Client workstation. Tier 2: Application server. Tier 3: Database server.

online analytical processing (OLAP),

Tools to create an advanced data analysis environment that supports decision making, business modeling, and operations research.

environmental scanning

Undirected viewing mode limited, irregular information Conditional viewing mode controlling for internal data, external data monitored Searching mode seeking information to update existing knowledge Enacting mode experimentation and trying new behaviors

Additional data warehouse characteristics include:

Web based, Relational/multidimensional, Client/Server (for easy access to end-users), Real time (newer data warehouses provide real-time or active data-access and analysis capabilities) Metadata (data about data, how its all organized and how to use them, etc).

dimensional modeling is

a retrieval based system that supports high-volume query access.

Independent Data Mart

a small warehouse designed for a strategic business unit (SBU) or a department, but its source is not an EDW.

A data warehouse is

a specially constructed data repository where data are organized so that they can be easily accessed by end users for several applications.

What is PerformancePoint Dashboard Designer?

a tool that you can use to create dashboards, scorecards, and reports and then publish them to a SharePoint site; Dashboard Designer is part of PerformancePoint Services in MS SharePoint Server 2012

What is an operational data stores (ODS)

a type of database often used as an interim area for a data warehouse

Dashboard definition

a visual display of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance

BI-... a) tool b) solution c) product d) process

a) BI-Tools are generic software sold by vendors like Oracle, SAP, Microsoft Dynamics, sage b) BI-Solutions are customized software, deployed within organizations c) BI-Product as result of BI where information & knowledge are created d) BI-Process how the organization obtain, analyze and distribute

input and output for Organizational Memory

a) Input: Data, information and knowledge is stored as events occur b) Output: accumulated information & knowledge about the past (not necessarily integrated)

Online Analytical Processing (OLAP), its goals and features

a) OLAP queries the data warehouse, response are pre-calculated b) Organizes data into cubes c) Dimensions summarize data and can be hierarchically drilled down d) OLAP allows to quickly manipulate the analytic results across the different dimensions, no waiting for queries or calculations

In-Flow DS flow

capturing data from legacy system, validating to test data for reality, repairing to examine and build data, transforming for consolidation, applying to move and load data.

chord /kɔrd/ or circus chart

chord chart is already implemented in Power BI: A chord diagram is a graphical method of displaying the inter-relationships between data in a matrix. The data is arranged radially around a circle with the relationships between the points typically drawn as arcs connecting the data together.

data dictionary

compiles all of the metadata about the data elements in the data model

advantages to using the web to access company databases

1) web browsers are much easier to use than directly accessing the database by using a custom-query tool 2) the web interface requires few or no changes to the database model 3) it costs less to add a web interface in front of a DBMS than to redesign and rebuild the system to support changes. Additional data-driven website advantages include: -Easy to manage content: Website owners can make changes without relying on MIS professionals; users can update a data-driven website with little or no training. -Easy to store large amounts of data: Data-driven websites can keep large volumes of information organized. Website owners can use templates to implement changes for layouts, navigation, or website structure. This improves website reliability, scalability, and performance. -Easy to eliminate human errors: Data-driven websites trap data-entry errors, eliminating inconsistencies while ensuring that all information is entered correctly.

Zhao described five levels of metadata management maturity:

1. Ad-hoc, discovered, managed, optimized, and automated.

complete but inaccurate information

2/31/10 is an example of complete but inaccurate information (February 31 does not exist)

Enterprise Application Integration (EAI)

= alternative to ERP EAI = middleware that can parse, duplicate or transform data between applications. It allows integration without redefining business practices EAI connects multiple systems that are isolated and make them work together and share their data. ERP in contrast is a monolithic software block.

data store

A data repository - either permanent for temporary - for data transformed by processes. Data Stores can be files or full database systems.

What is a data mart?

A departmental data warehouse that stores only relevant data

Data warehouse

A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format.

RDBMS vs DBMS

A relational database management system (RDBMS) is a database management system (DBMS) that is based on the relational model

What is an independent data mart

A small warehouse designed for a strategic business unit or department

data cube

A special database used to store data in OLAP reporting

Key Performance Indicator (KPI)

A strategic objective AND METRICS that measures performance against a goal

What is a dependent data mart

A subset that is created directly from a data warehouse

What is a data cube

A two-dimensional, three-dimensional, or higher-dimensional object in which each dimension of the data represents a measure of interest

Out flow:

Accessing to obtain data by consumer ad hoc and routine. Delivery: to render data by warehouse via publish and subscribe mechanisms.

Down-Flow

Aging. To archive data into storage hierarchy

business intelligence examples

Airlines: Analyze popular vacation locations with current flight listings. Banking: Understand customer credit card usage and nonpayment rates. Health care: Compare the demographics of patients with critical illnesses. Insurance: Predict claim amounts and medical coverage costs. Law enforcement: Track crime patterns, locations, and criminal behavior. Marketing: Analyze customer demographics. Retail: Predict sales, inventory levels, and distribution. Technology: Predict hardware failures.

relational online analytical processing (ROLAP)

Analytical processing functions that use relational databases and familiar relational query tools to store and analyze multidimensional data

EDW's are used to provide data for many types of DSS including:

CRM, supply chain management (SCM), business performance management (BPM), business activity monitoring (BAM), product life-cycle management (PLM), revenue management, and sometimes even Knowledge Management Systems (KMS).

Example of Low-Quality Information

Completeness. The customer's first name is missing. Another issue with completeness. The street address contains only a number and not a street name. Consistency. There may be a duplication of information since there is a slight difference between the two customers in the spelling of the last name. Similar street addresses and phone numbers make this likely. Accuracy. This may be inaccurate information because the customer's phone and fax numbers are the same. Some customers might have the same number for phone and fax, but the fact that the customer also has this number in the email address field is suspicious. Another issue with accuracy. There is inaccurate information because a phone number is located in the email address field. Another issue with completeness. The information is incomplete because there is not a valid area code for the phone and fax numbers.

data mart

Contains a subset of data warehouse information

Data warehousing depends on:

DBMS, Extraction and conversion tools, internetworking techniques, front-end analysis tools, graphics

Inmon model

EDW approach (top down)

EDW stands for

Enterprise Data Warehouse

What are the three main types of data warehouses?

Data marts, operational data store (ODS), and enterprise data warehouses (EDW)

The four major components of the data warehousing process

Data sources. Data extraction (using custom-written or commercial software called ETL), Data loading (data loaded to staging area) Comprehensive database, metadata (used by IT personnel and users).

Data-mining tools

Data-mining tools use a variety of techniques to find patterns and relationships in large volumes of information that predict future behavior and guide decision making. help users uncover business intelligence in their data

data warehouse enables business users, typically managers, to be more effective in many ways, including:

Developing customer profiles. Identifying new-product opportunities. Improving business operations. Identifying financial issues. Analyzing trends. Understanding competitors. Understanding product performance

4 contributes of BI & their improvement

Dissemination of real time information in a user-friendly fashion Creation of new knowledge based on the past Responsive and anticipative decisions based more closely on all the latest information Improved planning for the future through data and information about the past --> Improvement in operational performance, customer service and in identifying new opportunities

four enterprise architecture models

Diversification model low standardization low integration o Decentralized o Different markets with different products and services o Benefit from local autonomy Coordination model low standardization high integration o Sharing of customers, products, suppliers and partners o Business unit leaders have autonomy Replication model high standardization low integration o Independent units following highly standardized process (e.g. McDonalds) o Units do not depend on each other Unification model high standardization high integration o Integrated supply chains that share customer and supplier data (e.g. DOW Chemical)

OLAP vs OLTP

Online analytical processing VS online transactional processing. OTLP for capturing and storing data for day-to-day business functions such as ERP, CRM, SCM, point of sale, and so forth. Not for ad-hoc and complex queries that deal with a number of data items. OLAP on the other hand is designed to address this need by providing ad hoc analysis of organizational data much more effectively and efficiently. OLAP and OLTP rely on each other. OLAP uses the data captures by OLTP and OLTP automates the business processes that are managed by decisions supported by OLAP.

An ODS is a

Opertaional data stores. type of customer-information-file database that is often used as a staging area for a data warehouse.

Operational Data

Real-time data stored in relational database optimized to support daily transactions. Many tables that are normalized and is updated intensively.

Self-service business intelligence (SSBI)

Self-service business intelligence (SSBI) is an approach to data analytics that enables business users to access and work with corporate data even though they do not have a background in statistical analysis, business intelligence (BI) or data mining. Allowing end users to make decisions based on their own queries and analyses frees up the organization's business intelligence and information technology (IT) teams from creating the majority of reports and allows those teams to focus on other tasks that will help the organization reach its goals.

Centralized data warehouse

Similar to the hub-and-spoke one. except no dependent data marts, rather a big enterprise data warehouse that serves the needs of all organizational units. More holistic view. No data marts.

Slice And Dice

Slice and dice refers to a strategy for segmenting, viewing and understanding data in a database. Users slices and dice by cutting a large segment of data into smaller parts, and repeating this process until arriving at the right level of detail for analysis. Slicing and dicing helps provide a closer view of data for analysis and presents data in new and diverse perspectives.

Characteristics of Data Warehousing include

Subject oriented (data organized by detailed subject such as sales, customer,) Integrated (consistent format), Time Varient ( maintains historical data). Nonvolatile (users can't change data, changes are recorded as new data).

Meta-flow:

System modeling: to define structure of legacy systems, synthesizing to create valued, regulating to create modules for capturing.

Value of information

The ability to understand, digest, analyze, and filter information is key to growth and success for any professional in any industry

Information cleansing or scrubbing (2 of 3 core concepts of data warehousing)

a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information

How BI Can Answer Tough Customer Questions 2

Where has the business been? Historical perspective offers important variables for determining trends and patterns. Where is the business now? Looking at the current business situation allows managers to take effective action to solve issues before they grow out of control. Where is the business going? Setting strategic direction is critical for planning and creating solid business strategies

data artist

a business analytics specialist who uses visual tools to help people understand complex data

Dashboard (in PP)

a collection of 1 or more related scorecards or report elements arranged in a set of web pages, hosted by SharePoint Server

Big data

a collection of large, complex data sets, including structured and unstructured data, which cannot be analyzed using traditional database methods and tools

record

a collection of related data elements (in the MUSICIANS table, these include "3, Lady Gaga, gag.tiff, Do not bring young kids to live shows")

Enterprise Data Warehouse (EDW)

a data warehouse for the enterprise

star schema

a data-modeling technique used to map multidimensional decision support data into a relational database.

primary key

a field (or group of fields) that uniquely identifies a given record in a table. In the table RECORDINGS, the primary key is the field RecordingID that uniquely identifies each record in the table. Primary keys are a critical piece of a relational database because they provide a way of distinguishing each record in a table; for instance, imagine you need to find information on a customer named Steve Smith. Simply searching the customer name would not be an ideal way to find the information because there might be 20 customers with the name Steve Smith

Scorecards

a high-level snapshot of organizational performance; displays a collection of KPIs and the performance targets for those KPIs

data warehouse

a logical collection of information, gathered from many operational databases, that supports business analysis activities and decision-making tasks primary purpose is to combine information, more specifically, strategic information, throughout an organization into a single repository in such a way that the people who need that information can make decisions and undertake business analysis (collect information from multiple systems in a common location that uses a universal querying tool)

In an OLAP a cube is

a multidimensional data structure actual or virtual that allows fast analysis of data. The capability of efficiently manipulating and analyzing data from multiple perspectives. aimed for overcome a limitation of relational databases. an analyst can navigate through the database and screen for a particular subset of the data by changing the data's orientations and defining analytical calculations. not great for lots of data as a standard relational format is.

foreign key

a primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables

Extraction, transformation, and loading (ETL)

a process that extracts information from internal and external databases, transforms it using a common set of enterprise definitions, and loads it into a data warehouse. The data warehouse then sends portions (or subsets) of the information to data marts

why BI gets more important

a) exploding data volumes large data collection these can make decisions even more difficult b) complicate decisions increasingly difficult because of 24/7 worldwide complex processes larger diversity of required information to make decision c) need for quick reflexes market influences cause quick changes so decision has to be made in window of opportunity delays: converting, ingtegrating or resulting of information/knowledge d) technological process better tools for organization because ERP, DW systems need for data or text mining

drill down

access data that is in a lower level of a hierarchically structured database.

Middleware tools enable

access to the data warehouse. Power users such as analysts may write their own SQL queries.

Relational DBMS

allow multiple access queries.

Active Data warehousing (as opposed to traditional data warehousing)

allows for large users and operational staffs.Active Data Warehouse is repository of any form of captured transactional data so that they can be used for the purpose of finding trends

A relational database management system

allows users to create, read, update, and delete data in a relational database. Although the hierarchical and network models are important, this text focuses only on the relational database model

Metric

an analytical measurement intended to quantify the state of a system

dynamic catalog

an area of a website that stores information about products in a database (dynamic website information)

decision support system

an information system that helps managers understand specific kinds of problems and potential solutions and analyze the impact of different decision options using what if scenarios

data warehouse

an integrated, subject-oriented, time-variant, nonvolatile collection of data , that provides support for decision making.

data-driven website

an interactive website kept constantly updated and relevant to the needs of its customers using a database (especially useful when a firm needs to offer large amounts of information, products, or services. Can help limit the amount of information displayed to customers based on unique search requirements)

What is an oper marts

an operational data mart

market basket analysis

analyzes such items as websites and checkout scanner information to detect customers' buying behavior and predict future behavior by identifying affinities among customers' choices of products and services

Oper marts

are created when operational data needs to be analyzed multidimensionally. The data for an oper mart come from an ODS.

Analytical information

encompasses all organizational information, and its primary purpose is to support the performance of managerial analysis tasks (Analytical information is useful when making important decisions such as whether the organization should build a new manufacturing plant or hire additional sales personnel. Analytical information makes it possible to do many things that previously were difficult to accomplish, such as spot business trends, prevent diseases, and fight crime; identify many unusual trends)

primary concepts of the relational database model

entities, attributes, keys, and relationships

technologies used for information integration

environmental scanning events, trends, relationships and external environment which could influence the company (law change, new technology, competitors) text mining "reading" and analyzing text written in natural language web mining searching the web (forums, social media) and online text RFID information regarding the location of goods

Dirty data

erroneous or flawed data (complete removal of dirty data from a source is impractical or virtually impossible) dirty data is a business problem, not an MIS problem

Specialized software tools

exist that use sophisticated procedures to analyze, standardize, correct, match, and consolidate data warehouse information

data scientist

extracts knowledge from data by performing statistical analysis, data mining, and advanced analytics on big data to identify trends, market changes, and other relevant information

Advanced analytics

focuses on forecasting future trends and producing insights using sophisticated quantitative methods, including statistics, descriptive and predictive data mining, simulation, and optimization (uses data patterns to make forward-looking predictions to explain to the organization where it is headed)

logical view of information

focuses on how individual users logically access information to meet their own particular business needs

Indicators

graphical symbols used in KPIs to show whether performance is on or off target (e.g. stoplight symbols)

Structured data

has a defined length, type, and format and includes numbers, dates, or strings such as Customer Address. (typically stored in a traditional system such as a relational database or spreadsheet and accounts for about 20 percent of the data that surrounds us)

DBMS use three primary data models for organizing information

hierarchical, network, and the relational database, the most prevalent

Real-time information

immediate, up-to-date information

Dynamic information

includes data that change based on user actions. For example, static websites supply only information that will not change until the content editor changes the information. Dynamic information changes when a user requests information. A dynamic website changes information based on user requests such as movie ticket availability, airline prices, or restaurant reservations

Static information

includes fixed data incapable of change in the event of a user action

Filters

individual dashboard items that enable dashboard users to focus on specific information (e.g. geography filter enabling a user to view information for a specific geographical region)

multidimensional cube is

inflexible and does not support the ad hoc creation of multidimensional views of the products, services and customers. can't handle more then 30 gigabits of data.

Information redundancy Business Advantages of a Relational Database 3) Reduced Information Redundancy

the duplication of data, or the storage of the same data in multiple places (can cause storage issues along with data integrity issues, making it difficult to determine which values are the most current or most accurate. Employees become confused and frustrated when faced with incorrect information causing disruptions to business processes and procedures. One primary goal of a database is to eliminate information redundancy by recording each piece of information in only one place in the database)

Information granularity /ˈgræn yə lər/

the extent of detail within the information (fine and detailed or coarse and abstract)

content creator

the person responsible for creating the original website content

content editor

the person responsible for updating and maintaining website content

Data mining

the process of analyzing data to extract information not offered by the raw data alone (can also begin at a summary information level (coarse granularity) and progress through increasing levels of detail (drilling down) or the reverse (drilling up))

extraction, transformation, and loading (ETL)

the processes used in a data warehouse. It includes extracting data from outside sources, transforming it to fit operational needs, and loading it into the end target (database or data warehouse)

multidimensional databases lack

the scalability and flexibility for DSS

data element (or data field)

the smallest or basic unit of information (can include a customer's name, address, email, discount rate, preferred shipping method, product name, quantity ordered, and so on)

Time-series information

timestamped information collected at a particular frequency

performing extensive ETL (extraction, transformation, load)

to move data to the data warehouse may be a sign of poorly managed data and a fundamental lack of a coherent data management strategy.

Why do we need BI?

to support better decision making and to increase organizational knowledge base

query-by-example (QBE)

tool that helps users graphically design the answer to a question against a database

Business intelligence dashboards

track corporate metrics such as critical success factors and key performance indicators and include advanced capabilities such as interactive controls, allowing users to manipulate data for analysis. The majority of business intelligence software vendors offer a number of data visualization tools and business intelligence dashboards

two primary types of information

transactional and analytical

Multidimensional Database

usually contain a star model. designed for slice and dice and drill down analysis. highly indexed databases. provides data mining and drill down capabilities.

differences between BI and other information technologies like: a) knowledge management b) data warehousing c) data mining d) decision support systems

x) all kind of data: BI: data & info as input, results in *NEW* knowledge --- x) Focuses mainly on internal, structured data: a) Knowledge Management: info & knowledge as input, using the existing knowledge optimally b) Data Warehousing: ETL obtains data from multiple systems, stores them in single repository c) Data Mining: discovering hidden patterns in data, produces information d) Decision Support System: making appropriate decision


Ensembles d'études connexes

NCLEX Review Content Are: Fundamental skills: Fluids & Electrolytes

View Set

Έχουμε Διαγώνισμα (Μάθημα 12)

View Set

Chapter 7: Cognitive Maps & Heuristics

View Set

Fysiologi: Væskerum og væskebalance

View Set