Business Intelligence

Ace your homework & exams now with Quizwiz!

Dimensional Modeling Objectives

1. To produce data base structures the end user easily understands and can query them 2. To optimize queries performance (in opposition to update performance)

Digital transformation Building Blocks

Customer Experience Operational Process Business Model

Data Analyst

Data Analysts deliver value to their companies by taking data, using it to answer questions, and communicating the results to help make business decisions. Common tasks done by data analysts include data cleaning, performing analysis and creating data visualizations. Tasks Cleaning and organizing raw data. Using descriptive statistics to get a big-picture view of their data. Analyzing interesting trends found in the data. Creating visualizations and dashboards to help the company interpret and make decisions with the data. Presenting the results of a technical analysis to business clients or internal teams.

Metadata

Data about data In a data warehouse, metadata describe the contents of a data warehouse and the manner of its acquisition and use Metadata design, creation and usage can involve ethical questions

OLAP: Drill-Down and Roll-Up

Data can be disaggregated and aggregated along a dimension according to their natural hierarchy

Metadata Classification

We can classify metadata in: technical vs. business metadata Make distinction between: Syntactic metadata - (i.e. data describing the syntax of the data) Structural metadata - (i.e. data describing the structure of the data) Semantic metadata - (i.e. data describing the meaning of the data in a specific domain)

Data Warehouse (DW)

- A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format - A data set produced to support decision making - Data are usually structured to be available in a form ready for analytical processing activities (OLAP, Data Mining, querying, reporting, other decision support applications) "The data warehouse is a collection of integrated, subject-oriented databases designed to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time"

Independent Data Mart

- Lower-cost, scaled-down version of a data warehouse. - A small DM, designed for a strategic business unit or a department, but its source is not an EDW

Challenges of processing web data

-The Web is too big for effective data mining -The Web is too complex -The Web is too dynamic -The Web is not specific to a domain -The Web has everything

Data Warehousing

-Whereas a data warehouse is a repository of data, data warehousing is literally the entire process. -Data warehousing is a discipline that results in applications that provide decision support capabilities, allows ready access to business information, and creates business insight.

Analytics

...the process of developing actionable decisions or recommendations for actions based on insights generated from historical data Analytics represents the combination of computer technology, management science techniques, and statistics to solve real problems.

Step-by-Step Dimensional Modeling

1. Identify the business process Identify the business process the DW will represent. This process will be the source of the metrics and measurements. 2. Identify the grain Determine what is the meaning of each fact table row. In the example we saw the grain was 'monthly sales per location per product'. 3. Identify the dimensions The dimensions should be, as much as possible, descriptive (SQL VARCHAR or CHARACTER) and confirm the grain. 4. Finally identify the facts In this step the measures (or metrics or facts) are identified. These facts should be numerical and also confirm the grain defined in

Benefits of the Dimensional Model / DW

1. It provides a simplified model structure so that it is more intuitive for business users to write queries. 2. There are fewer tables, there will be fewer joins between different tables which yields less processing and higher performance.Essentially, you can sum or aggregate the data in the fact table based on any of the columns in any of the related dimensions to get the information you are interested in. 3. Another benefit of dimensional modeling is the ability to re-use dimensions as new business processes are modeled. This provides the ability to have one version of the truth. For example, the marketing department may need to understand their advertising better. The product dimension could be re-used for this new model along with any new dimensions that might be needed.

Tips for Storytelling in Reports

1. Know and understand the business or the process being reported 2. Know and understand your data 3. Develop a Hierarchy (Devide your report into parts) 4. Use the Right Visuals 5. Use a Visual Language

Examples of Business Analytics

1. Programmatic Advertising & Media Buying 2. Target 3. Attention, Shoppers 4. Watson: IBM Watson computer accuracy in cancer diagnosis is better than human doctors 5. Smart Cities > Urban Radiology

Continuous Industry Transformation

1st Platform 2nd Platform 3rd Platform Innovation Accelerators: Robotics Natural Interfaces 3D Printing Internet of Things Cognitive Systems Next Gen Security

Text Mining

85-90 percent of all corporate data is in some kind of unstructured form (e.g., text) Unstructured corporate data is doubling in size every 18 months Tapping into these information sources is not an option, but a need to stay competitive Answer: text mining A semiautomated process of extracting knowledge from unstructured data sources• a.k.a. text data mining or knowledge discovery in textual databases

Data Warehouse vs. Database

A data warehouse is built to store large quantities of historical data and enable fast, complex queries across all data. Use Cases: - Data Mining to get new insights - Market research -Analyzing online behaviour A database was built to store current transactions and enable fast access to specific transactions for ongoing business processes Use cases: - An airline using an online booking system - A hospital registering a patient - A bank adding an ATM withdrawal transaction to an account

Extraction, transformation, and loading (ETL)

A great deal of the data engineering work can be found in the processes related to ETL. Extraction, transformation, and load (ETL), in the context of data warehousing: Extraction: selecting data from one or more sources and reading the selected data Transformation: converting data from their original form to whatever form the DW needs. This step often also includes cleansing of the data to remove as many errors as possible. Loading: putting the converted (transformed) data into the DW

Information System

A set of interrelated components that collect (or retrieve), process, store, and distribute information to support decision making, coordination, and control in an organization.

Surrogate Key

A system-assigned primary key, generally numeric and auto-incremented.

Relational Database

A type of database in which information is organized into separate subject-based tables, and the relationship of the data in one or more tables is used to bring the data together.

Common Activities in the Data Integration Process

Arbitrary pieces of code to take data from a source, convert it into data for the data warehouse for analysis: Data Loading - read and convert from data sources Data Transformations - join, aggregate, filter, convert data Data de-duplication - finds multiple records referring to the same entity, merges them Data Profiling - builds tables, histograms, etc. to summarize data Data Quality - test against master values, known business rules, constraints, etc.

Online Transaction Processing (OLTP)

Capturing of transaction and event information using technology to process, store, and update -> Systems that handle a company's routine ongoing business Operational databases ERP, SCM, CRM, ... Goal: data capture

Characteristics of Effective Data Visualization

Clear Purpose High degree of relevancy Easy to understand Engaging storylines and flow Visually appealing User Centric Problem-solving Alignment of intent with the visuals Simple and self-describing Easy to navigate Right charts for the right data Use of colors is meaningful Minimal errors, broken views, and confusing interactions

Information

Clusters of facts that are meaningful and useful to human beings in the process such as making decisions

DAX

DAX is a collection of functions, operators, and constants that can be used in a formula, or expression, to calculate and return one or more values. Stated more simply, DAX helps you create new information from data already in your model, such as new measures and new calculated columns. It's important that you don't confuse it with M Language. DAX is the language you use when you create transformations in the Power BI Desktop main screens rather than via the query editor. For example, if you add a new column and need to do a calculation, you use the DAX expression language. M is the query language that can be viewed in the Power Query Editor.

Critical Issues on Data

Data is the Oil of 21st Century

web usage mining applications

Determine the lifetime value of clients Design cross-marketing strategies across products Evaluate promotional campaigns Target electronic ads and coupons at user groups based on user access patterns Predict user behavior based on previously learned rules and users' profiles Present dynamic information to users based on their interests and profiles

OLAP Features. Rotating the Data

Different users will require different views of the multidimensional cube - OLAP allows easy rotation of data

Entity Integrity

Every database table should have one or more columns designated as the Primary Key. A primary key is one or more fields that uniquely identifies a row in a table. The primary key cannot be null (blank). Entity integrity is ensuring that the primary key in a table is unique and that the value is not set to null.

Search Engines

Google, Bing, Yahoo Search engine is a software program that searches for documents (Internet sites or files) based on the keywords (individual words, multi-word terms, or a complete sentence) that users have provided that have to do with the subject of their inquiry They are the workhorses of the Internet

one-to-one relationship

In databases, a relationship in which each record in Table A can have only one matching record in Table B, and vice versa. This is not a common relationship type, as the data stored in table B could just have easily been stored in table A. However, there are some valid reasons for using this relationship type. A one-to-one relationship can be used for security purposes, to divide a large table, and various other specific purposes. In the above example, we could just as easily have put an HourlyRate column straight into the Employee table and not bothered with the Pay table. However, hourly rate could be sensitive data that only certain database users should see. So, by putting the hourly rate into a separate table, we can provide extra security around the Pay table so that only certain users can access the data in that table.

Text Analytics

Information Retrieval + Information Extraction + Data Mining + Web Mining or simply Text Analytics = Information Retrieval + Text Mining

Referencial Integrity

Is requiring that every value in a foreign key column will be found in the primary key of the table from which it originated

Search Engine Optimization

It is the intentional activity of affecting the visibility of an ecommerce site or a Web site in a search engine's natural (unpaid or organic) search results Part of an Internet marketing strategy Based on knowing how a Search engine works (Content, HTML, keywords, external links) Indexing based on .. Webmaster submission of URL .. Proactively and continuously crawling the web

Many-to-Many

Lastly, entities can also have a many-to-many relationship. Let's say you have a list of books, and a list of authors—each book may have one or more authors, and each author may have written multiple books. In this case, you have many books related to many authors.

m (language)

Microsoft Power Query provides a powerful data import experience that encompasses many features. Power Query works with Analysis Services, Excel, and Power BI workbooks. A core capability of Power Query is to filter and combine, that is, to mash-up data from one or more of a rich collection of supported data sources. Any such data mashup is expressed using the Power Query M Formula Language.

Web Content/Structure Mining

Mining the textual content on the Web Data collection via Web crawlers Web pages include hyperlinks - Authoritative pages - Hubs - Hyperlink-induced topic search (HITS) alg.

Text Mining Application

Marketing applications Enables better CRM Security applications • ECHELON, OASIS • Deception detection (...) Medicine and biology • Literature-based gene identification (...) Academic applications • Research stream analysis

Elements of Data Management

Relevance Access Quantity Quality Governance Security and Privacy

Most basic SQL statement

SELECT field_1 -> What column do we want FROM table_1 -> From which table WHERE criterion_1 -> With what condition

Sentiment Analysis

Sentiment: Belief, view, opinion, and conviction Sentiment Analysis: "What do people feel about a certain topic?" By analyzing data related to opinions of many using a variety of automated tools Used in variety of domains, but its application in CRM are especially noteworthy (which related to customers/consumers' opinions)

Business Analytics (BA)

Set of techniques and applications for the collection, storage, analysis, and data access that support the users of the organization in decision making Includes the direct application of analytical models in the business data With BA software the user can perform queries, demand ad hoc reports and perform different types of analysis Should include the information delivery to support decision making to the end user in an actionable way

Social Analytics / Social Network Analysis

Social Network: Social structure composed of individuals linking to each other Analysis of social dynamics Interdisciplinary field - Social psychology - Sociology - Statistics - Graph theory Social Networks help study relationships between individuals, groups, organisations, societies - Self organizing - Emergent - Complex Typical Social Network Types - Communcation neworks, community networks, criminal networks, innovation networks

Database

Something that stores data in a structured way, allowing everyone to share, manage and use data

RDBMS

Specialized tools that help to manage a database: Relational Data Base Management Systems Examples: SQL Server 2019, Oracle 11g /12 c, MySQL, PostGreSQL, IBM DB2, etc., Access

Sentiment Analysis Process

Step 1 - Sentiment Detection Comes right after the retrieval and preparation of the text documents It is also called detection of objectivity Fact (=objectivity) vs Opinion (=subjectivity) Step 2 - N-P Polarity Classification Given an opinionated piece of text, the goal is to classify the opinion as falling under one of two opposing sentiment polarities N (=negative) vs. P (=positive) Step 3 - Target Identification The goal of this step is to accurately identify the target of the expressed sentiment (eg a person, a product, an event, etc.) - Level of difficulty -> the application domain Step 4 - Collection and Aggregation Once the sentiments of all text data points in the document are identified and calculated, they are to be aggregated Word -> Statement -> Paragraph -> Document

Data

Streams of raw facts representing events such as business transactions

Goal for dimensional modeling

Surround facts with as much context (dimensions) as possible But you should not try to model all relationships in the data (unlike ER modeling!)

Data-Driven-Organizations

The Five Questions: What happenend? What is happening? Why did it happen? What will happen? What do I want to happen? Past - Present - Future

PowerQuery - Combine queries

There are two primary ways of combining queries - merging and appending.When you have one or more columns that you'd like to add to another query, you merge the queries. When you have additional rows of data that you'd like to add to an existing query, you append the query.

Foreign Key

These keys are used to create relationships between tables. Natural relationships exist between tables in most database structures.

One-to-Many

This is the most common relationship type. In this type of relationship, a row in table A can have many matching rows in table B, but a row in table B can have only one matching row in table A.

Web Analytics Metrics

Web Site Usability How were the visitors using my Website? Traffic Sources Where did they come from? Visitor profiles What do my visitors look like? Conversion statistics What does it all mean for the business?

Report Requirements

What's the business need? How are the readers going to use this data? Who's going to use this data? What decisions does the reader want to make based on this report?

Data Mart

Whereas a data warehouse combines databases across an entire enterprise, a data mart is usually smaller and focuses on a particular subject or department. A data mart is a subset of a data warehouse, typically consisting of a single subject area (e.g. marketing, operations).

Table

Within a specific database, there will be a collection of tables that represent certain entities. These tables store records as rows which are defined by columns attributes). Each of these columns is of a specific data type (eg number, text, date, etc.)

Business Intelligence

a broad category of applications, technologies, and processes for gathering, storing, accessing, and analyzing data to help business users make better decisions

Three types of Business Analytics

descriptive, predictive, prescriptive

Stream Analytics Applications

e-Commerce Telecommunication Law Enforcement and Cyber Security Power Industry Financial Services Health Services Government Smart Cities

Slowly Changing Dimensions

is the technique use to manage attribute changes in a dimension over time.

web usage mining

the extraction of information from data generated through Web page visits and transactions Clickstream Data Clickstream Analysis

Web mining (or Web data mining)

the process of discovering intrinsic relationships from Web data (textual, linkage, or usage)

SCD Type 1

• For this type of slowly changing dimension you simply overwrite the existing data values with new data values. • The drawback of this is you lose the historical value of the data because the dimension will always contain the current values for each attribute.

SCD Type 2

• This is the most commonly used type of slowly changing dimension. For this type of slowly changing dimension, add a new record encompassing the change and mark the old record as inactive. • This allows the fact table to still use the data stored under the old dimension key for historical reporting.

Sentiment Analysis Applications

• Voice of the customer (VOC) • Voice of the Market (VOM) • Voice of the Employee (VOE) • Brand Management • Financial Markets • Politics • Government Intelligence • ... others

Natural Language Processing (NLP)

Structuring a collection of text • Old approach: bag-of-words • New approach: natural language processing NLP is... • a very important concept in text mining • a subfield of artificial intelligence and computational linguistics • the studies of "understanding" the natural human language Syntax versus semantics-based text mining

Requirements for Successful BI

A clear business need Strong, committed sponsorship Alignment between the business and IT strategy A fact-based decision making culture A strong data infrastructure

Three Main Types of Data Warehouses

Data marts Operational Data Stores (ODS) Enterprise Data Warehouses (EDW)

Successful BI Implementation

Implementing and deploying a BI initiative is a lengthy, expensive and risky endeavor! Success of a BI system is measured by its widespread usage for better decision making. The typical BI user community includes - All levels of the management hierarchy (not just the top executives, as was for EIS) - Provide what is needed to whom he/she needs it A successful BI system must be of benefit to the enterprise as a whole.

Changing Business Environments and Evolving Needs for Decision Support and Analytics

Increased hardware, software, and network capabilities Group communication and collaboration Improved data management Managing giant data warehouses and Big Data Analytical support Overcoming cognitive limits in processing and storing information Knowledge management Anywhere, anytime support

IDC FutureScape: External Drivers

Next chapter of DX — Technology-driven transformation altering business and society The race to innovate — Speed of change, delivery, and operations separates thrivers and survivors Platforms, platforms, platforms — Industry competes for innovation at scale Sense, compute, actuate — Turning data into value Emerging autonomy — Learning to live with AI Rising customer expectations — More convenience, customization, and control Legacy inertia — Retrofit the old into the DX world

The Benefits of BI

Time savings Single version of truth Improved strategies and plans Improved tactical decisions More efficient processes Cost savings Faster, more accurate reporting Improved decision making Improved customer service Increased revenue

BI and Business Strategy

To be successful, BI must be aligned with the company's business strategy. BI changes the way a company conducts business by improving business processes and transforming decision making to a more data/fact/information driven activity BI should help execute the business strategy and not be an impediment for it!

Cubes

A "cube" may have many dimensions! -> Theoretically no limit for the number of dimensions -> Typical cubes have 4-12 dimensions But only 2-3 dimensions can be viewed at a time -> Dimensionality reduced by queries via projection/aggregation A cube consists of cells - A given combination of dimension values - empty cell = no data for this combination - Sparse cube: few non-empty cells - Dense cube: many non-empty cells - Cubes become sparse at high dimensionality

Enterprise-Wide Strategy

A comprehensive warehouse is built initially An initial dependent data mart is built using a subset of the data in the warehouse Additional data marts are built using subsets of the data in the warehouse Like all complex projects, it is expensive, time consuming, and prone to failure When successful, it results in an integrated, scalable warehouse

Data Scientist

A data scientist is a specialist that applies their expertise in statistics and building machine learning models to make predictions and answer key business questions. Tasks Evaluating statistical models to determine the validity of analyses. Using machine learning to build better predictive algorithms. Testing and continuously improving the accuracy of machine learning models. Building data visualizations to summarize the conclusion of an advanced analysis.

Fact Tables

A fact table contains a large number of rows that correspond to observed business or facts. A fact table contains the attributes needed to perform decision analysis, descriptive attributes used for query reporting, and foreign keys to link to dimension tables. The decision analysis attributes consist of performance measures, operational metrics, aggregated measures, and all the other metrics needed to analyze the organization's performance. In other words, the fact table primarily addresses what the data warehouse supports for decision analysis.

Geographical Information System (GIS)

A geographic information system (GIS) is a system designed to capture, store, manipulate, analyze, manage, and present all types of spatial or geographical data. In a general sense, the term describes any information system that integrates, stores, edits, analyzes, shares, and displays geographic information. GIS applications are tools that allow users to create interactive queries (user-created searches), analyze spatial information, edit data in maps, and present the results of all these operations. Geographic information science is the science underlying geographic concepts, applications, and systems.

Dimensional Modeling

A retrieval-based system that supports high-volume query access The so called dimensional modeling data base design technique was specifically conceived for data warehouse design, answering the limitation found in the traditional approaches to data base design Approach defined initally by Kimball (1996), Fundamental development for data warehousing since it provided a viable way to represent data in order to support end users in understanding and be able to query them

Dependent Data Mart

A subset that is created directly from the data warehouse. It has the advantages of using a consistent data model and provides quality data. Dependent data marts support the concept of a single enterprise-wide data model, but the data warehouse must be constructed first. A dependent data mart ensures that the end user is viewing the same version of the data that are accessed by all other Data warehouse Data marts data warehouses users

Operational Data Stores (ODS)

A type of database often used as an interim area for a data warehouse Unlike the static content of a data warehouse, the contents of an ODS are updated through the course of business operations. An ODS is used for short-term decisions involving mission-critical applications rather then for the medium and long-term decisions associated with a EDW An ODS consolidates data from multiple source systems and provides a near-real time, integrated view of volatile, current data.

Star Schema Advantages and Disadvantages

Advantages Easily understandable, even by non-technical users Supports increased performance and lower query times Easily expansible, answering the needs for future changes Disadvantage More Redundancy Less Complexity Dimensional Schema Desnormalized Schema Kimball's Reference

Snowflake Schema Advantages and Disadvantages

Advantages Less Redundancy Disadvantages More Complexity Relational Schema Normalized Schema Inmon's Reference

Cube: Aggregates

An aggregate is a value formed by combining values from a given dimension or set of dimensions to create a single value. This is often done by adding the values together using the sum aggregate, but other aggregation calculations can also be used.

Enterprise Data Warehouse (EDW)

An enterprise data warehouse (EDW) is a large-scale data warehouse that is used across the enterprise for decision support. The large-scale nature provides integration of data from many sources into a standard format for effective BI and decision support applications. EDW are used to provide data for many types of DSS, including CRM, SCM, BPM, BAZM, PLM, KMS, etc.

Oper Marts

An operational data mart. An oper mart is a small-scale data mart typically used by a single department or functional area in an organization. Oper marts are created when operational data need to be analyzed multidimensionally. The data for an oper mart comes from an ODS

Big Data

Big Data is data that cannot be stored or processed easily using traditional tools/means BigData typically refers to data that comes in many different forms: large, structured, unstructured, continuous 3Vs - Volume, Variety, Velocity Data(BigData or other wise) is worthless if it does not provide business value (and for it to provide business value, it has to be analyzed)

BI and BA

BI involves the acquisition of data and information (eventually knowledge) from a significant variety of sources, promoting its organization in data warehouses and its use to support decision making. Business Analytics supplies the analysis models and the procedures for Business Intelligence.

A Framework for Business Intelligence

BI is an evolution of decision support concepts over time Meaning of EIS/DSS Then: Executive Information System Now: Everybody's Information System (BI) BI systems are enhanced with additional visualizations, alerts, and performance measurement capabilities The term BI emerged from industry apps

Text Mining Concepts

Benefits of text mining are obvious especially in text- rich data environments • e.g., law (court orders), academic research (research articles), finance (quarterly reports), medicine (discharge summaries), biology (molecular interactions), technology (patent files), marketing (customer comments), etc. Electronic communization records (e.g., e-mail) • Spam filtering • E-mail prioritization and categorization • Automatic response generation

Data Mining vs. Text Mining

Both seek for novel and useful patterns Both are semi-automated processes Difference is the nature of the data: • Structured versus unstructured data • Structured data: in databases• Unstructured data: Word documents, PDF files, text excerpts, XML files, and so on To perform text mining - first, impose structure to the data, then mine the structured data

Dimensions

Core of multidimensional databases Dimensions are used for: - Selection of Data - Grouping of data the right level of detail Dimensions consist of dimension values Dimension values may have an ordering Used for comparing cube data across values Especially used for Time Dimension

Data Engineer

Data engineers build and optimize the systems that allow data scientists and analysts to perform their work. Every company depends on its data to be accurate and accessible to individuals who need to work with it. The data engineer ensures that any data is properly received, transformed, stored, and made acessible to other users. Tasks Building APIs for data consumption. Integrating external or new datasets into existing data pipelines. Applying feature transformations for machine learning models on new data. Continuously monitoring and testing the system to ensure optimized performance.

Data Facts

Data stored grows 4X faster than the world economy 90% of the worlds data has been created in the last 2 years

Big Data and Stream Analytics

Data-in-motion analytics and real-time data analytics -> One of the Vs in Big Data = Velocity Analytic process of extracting actionable information from continuously flowing data Why Stream Analytics? -> It may not be feasible to store the data, or lose its value Stream Analytics Versus Perpetual Analytics Critical Event Processing?

Two Data Warehousing Strategies

Enterprise-wide warehouse, top down, the Inmon methodology Data mart, bottom up, the Kimball methodology

Types of Facts

Event Fact (Transaction) A fact for every business event (sale) "Fact-less" Facts A fact per event (customer contact) No numerical measures An event has happened for a given dimension value combination Snapshot Fact A fact for every dimension combination at given time intervals Captures current status (inventory) Cumulative snapshot facts A fact for every dimension combination at given time intervals Captures cumulative status up to now (sales in year to date)

Facts

Facts represent the subject of the desired analysis A fact is most often identified via its dimension values Generally a fact should.. - be attached to exactly one dimension value in each dimension - only be attached to dimension values in the botttom levels

Location-Based Analytics

Geospatial Analytics Geocoding Visual maps Postal codes Latitude & Longitude Enables aggregate view of a large geographic area Integrate "where" into customer view

Location Intelligence (LI)

Interactive maps that further drill down to details about any location

Measures

Measures represent the fact property that the users want to study and optimize A measure has two components: Numerical value Aggregation formula (SUM): used for aggregating/combining a number of measure values into one

Granularity

Level of Detail: Given by combination of botton levels Example: Total Sales per Store per Day per Product Important for Scalability Often the granularity is a single business transaction, example: sale Sometimes the data is aggregated

Online Analytical Processing (OLAP)

Manipulation of information to create business intelligence in support of strategic decision making -> An information system that enables the user, while at a PC, to query the system, conduct an analysis, and so on. The result is generated in seconds Data warehouses Goal: decision support

Real-Time Location Intelligence

Many devices are constantly sending out their location information Cars, airplanes, ships, mobile phones, cameras, navigation systems, ... GPS, Wi-Fi, RFID, cell tower triangulation Reality mining? Real-time location information = real-time insight Path Intelligence (pathintelligence.com) Footpath - movement patterns within a city or store How to use such movement information

A Brief History of BI

Often said that this term was used as early as September, 1996, when a Gartner Group report said: Data analysis, reporting, and query tools can help business users wade through a sea of data to synthesize valuable information from it - today these tools collectively fall into a category called "Business Intelligence". "A Business Intelligence System" A paper written by H.P.Luhn in IBM Journal

IDC Futurescape Predicitions

Prediction 1: By 2023, commoditization of ever- higher layers of analytics and AI technology will result in 15% of the current spend on these items being replaced by spend on insights as a service. Prediction 2: By 2024, annual 7% rise in AI- based IT implementation project automation will drive a new wave of business process redesign, requiring services from firms with deep industry and functional expertise. Prediction 3: By 2021, algorithm opacity, decision bias, malicious use of AI, and data regulations will result in doubling of spending on relevant governance and compliance staff and explainability teams. Prediction 4: By 2022, reflecting the need for localized data processing and enabled by 5G, 25% of endpoint devices and systems will contain AI algorithms, driving two-thirds of the total annually shipped compute power. Prediction 5: By 2024, AI-enabled human- computer interfaces and business process automation will replace a third of today's screen-based B2B and B2C applications. Prediction 6: By 2021, 65% of new spending in analytics will use an event-driven architecture and streaming pipelines to ingest data, process data, evaluate and score predictions, make decisions, and initiate actions. Prediction 7: By 2023, growth in demand for the management of data in multiple formats will cause spending on multimodel databases to represent 25% of the spending on NoSQL databases. Prediction 8: By 2021, synthetic AI model training data created using small amounts of actual data and large amounts of simulated data will be available via data markets, doubling new AI models development speed. Prediction 9: By 2022, affective computing (emotion AI) will include vision and voice technologies and see an increase of 25% in real- world applications. Prediction 10: By 2022, 75% of IT operations will be supplanted by AI or analytics-driven automation, resulting in over 25% opex savings.

Decriptive Analytics

Questions What happened? What is happening? Enablers Business reporting Dashboards Scorecards Data warehousing OLAP / DW Descriptive or reporting analytics Answering the question of what happened Retrospective analysis of historic data

Prescriptive Analytics

Questions What should I do? Why should I do it? Enablers Optimization Simulation Decision modelling Expert systems Aims to determine the best possible decision Uses both descriptive and predictive to create the alternatives, and then determines the best on Analytics Applied to Many Domains Analytics or Data Science?

Predictive Analytics

Questions: What will happen? Why will it happen? Enablers Data mining Text Mining Web/media mining Forecasting Aims to determine what is likely to happen in the future (foreseeing the future events) Looking at the past data to predict the future

OLAP Operations

Slice a subset of a multidimensional array Dice a slice on more than two dimensions Drill-Down navigating among levels of data ranging from the most summarized to the most detailed Roll-Up computing all of the data relationships for one or more dimensions Pivot - used to change the dimensional orientation of a report or an ad hoc query-page display

Characteristics of Data Warehousing

Subject oriented - data are organized by detailed subject, such as sales, products, or customers, containing information relevant for decision support. For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?" This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented. Integrated - data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole. Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said to be integrated. Time-variant - In order to discover trends in business, analysts need large amounts of data. This is very much in contrast to online transaction processing (OLTP) systems, where performance requirements demand that historical data be moved to an archive. A data warehouse's focus on change over time is what is meant by the term time variant. Nonvolatile - Nonvolatile means that, once entered into the warehouse, data should not change. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred

Dimension Tables

Surrounding the central fact tables (and linked via foreign keys) are dimension tables. The dimension tables contain classification and aggregation information about the central fact rows. Dimension tables contain attributes that describe the data contained within the fact table; they address how the data will be analyzed. Dimension tables have a one-to-many relationship with rows in the central fact table. Some examples of dimensions that would support a product fact table are location, time, and size.

Data Mart Strategy

The most common approach Begins with a single mart and architected marts are added over time for more subject areas Relatively inexpensive and easy to implement Can be used as a proof of concept for data warehousing Can perpetuate the "silos of information" problem Can postpone difficult decisions and activities Requires an overall integration plan

Geographic Information System (GIS)

Used to capture, store, analyze, and manage the data linked to a location Combined with integrated sensor technologies and global positioning systems (GPS)

Text Mining Application Area

• Information extraction • Topic tracking • Summarization • Categorization • Clustering • Concept linking • Question answering

Challenges in NLP

• Part-of-speech tagging • Text segmentation • Word sense disambiguation • Syntax ambiguity • Imperfect or irregular input • Speech acts

NLP Task Categories

• Question answering • Automatic summarization • Natural language generation & understanding • Machine translation • Foreign language reading & writing • Speech recognition • Text proofing, optical character recognition • Optical character recognition


Related study sets

Ch 1 Who are Americans?: An Increasingly Diverse Nation

View Set

Speech to the Second Virginia Convention

View Set