K201 Chapter 3 Q3.1-3.8

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

2 human factors that inhibit knowledge sharing in org's

1. Employees can be reluctant to exhibit their ignorance 2. Employee Competition

BI server

A Web server application that is purpose-built for the publishing of business intelligence. EX: The Microsoft SQL Server Report manager (part of Microsoft SQL Server Reporting Services) is the most popular such product today

Cluster analysis

A common unsupervised technique. Statistical techniques identify groups of entities that have similar characteristics. A common use for _____ is to find groups of similar customers from customer order and demographic data.

Data mart

A data collection, smaller than the data warehouse, that addresses the needs of a particular department or functional area of the business. If the data warehouse is the distributor in a supply chain, then a _____ is like a retail store in a supply chain. Users in the _____ obtain data that pertain to a particular business function from the data warehouse. Such users do not have the data management expertise that data warehouse employees have, but they are knowledgeable analysts for a given business function.

Data warehouse

A facility for managing an organization's BI data. The functions of a data warehouse are to: obtain data, cleanse data, organize and relate data, catalog data. The prepared data is stored in a ______ database. ________ include data that is purchased from outside sources. The _____ extracts and provides data to BI applications. Think of a ______ as a distributor in a supply chain. The ______ takes data from the data manufacturers (operational systems and other sources), cleans and processes the data, and locates the data on the shelves, so to speak, of the ______. The data analysts who work with a data warehouse are experts at data management, data cleaning, data transformation, data relationships, and the like. However, they are not usually experts in a given business function.

Turing test

A machine could be considered intelligent if a human could have a conversation with it and not be able to tell if it was a machine or a human. This standard became known as the ______. This test will be passed by a weak AI, most AI research aims at intelligent action. ?explain?

OLAP (online analytical processing)

A second type of reporting application; is more generic than RFM. _______ provides the ability to sum, count, average, and perform other simple arithmetic operations on groups of data. The defining characteristic of _______ reports is that they are dynamic. The viewer of the report can change the report's format—hence the term online. An ______ report has measures and dimensions. The distinguishing characteristic of an ______ report is that the user can alter the format of the report. With an _____ report, it is possible to drill down into the data. ______ reports provide both perspectives (product vs. sales manager), and the user can switch between them while viewing the report.

Machine Learning

A subset of AI. The extraction of knowledge from data based on algorithms created from training data. Essentially, _______ is focused on predicting outcomes based on previously known training data. ______ can also help you make decisions. Can be used in a variety of tasks. Learning the same way you do—through experience. _______ allows automated systems to learn from users as they tag emails as spam and then filters future emails based on the content of those spam messages. Again, it's not perfect, but it's amazingly accurate.

RFM analysis

A technique readily implemented with basic reporting operations to analyze and rank customers according to their purchasing patterns. Stands for recency, frequency, monetary and the analysis is a marketing technique used to determine quantitatively which customers are the best ones by examining how recently a customer has purchased (recency), how often they purchase (frequency), and how much the customer spends (monetary). See page 79 for more in-depth explanation.

OLAP cube / cube

An _____ and an OLAP report are the same thing. The reason for this term is that some software products show these displays using three axes, like a _____ in geometry.

Naïve Bayes Classifier

An algorithm that predicts the probability of a certain outcome based on prior occurrences of related events. In other words, we're going to try to predict whether a new email is spam or not based on attributes of previous spam messages.

Hadoop

An open source program supported by the Apache Foundation that manages thousands of computers and that implements MapReduce. At present, deep technical skills are needed to run and use ____. For now, understand that experts are required to use it; you may be involved, however, in planning a Big Data study or in interpreting results.

Unsupervised Data Mining

Analysts do not create a model or hypothesis before running the analysis. Instead, they apply a data mining application to the data and observe the results. With this method, analysts create hypotheses after the analysis, in order to explain the patterns found. These findings are obtained solely by data analysis. There is no prior model about the patterns and relationships that exist. It is up to the analyst to form hypotheses, after the fact, to explain why such and such does so and so.

Reduce phase

As the processors finish, their results are combined in what is referred to as the ______. The result is a list of all the terms searched for on a given day and the count of each. The process is considerably more complex than described here, but this is the gist of the idea.

Static reports

BI documents that are fixed at the time of creation and do not change. In the BI context, most _____ are published as PDF documents. Little skill is needed. EX: A printed sales analysis

Dynamic reports

BI documents that are updated at the time they are requested. In almost all cases, publishing a dynamic report requires the BI application to access a database or other data source at the time the report is delivered to the user. Is more difficult; it requires the publisher to set up database access when documents are consumed. EX: A sales report that is current at the time the user accessed it on a Web server

Primary Publishing Alternatives

BI servers Knowledge management systems Content management systems.

MapReduce

Because Big Data is huge, fast, and varied, it cannot be processed using traditional techniques. ______ is a technique for harnessing the power of thousands of computers working in parallel. Contain both the map and reduce phases.

Business Intelligence

Data analysis where you don't know the second question to ask until you see the answer to the first one. What the patterns, relationships, trends, and predictions in BIS are referred to. Can be used for both sides: determining what is as well as what should be.

Structured Data

Data in the form of rows and columns; most of the time _____ means tables in a relational database, but it can refer to spreadsheet data as well.

Supervised Data Mining

Data miners develop a model prior to the analysis and apply statistical techniques to data to estimate parameters of the model.

Challenges of Content Management

Databases are huge Content dynamic Documents do not exist in isolation Contents are perishable In many languages

Push publishing

Delivers business intelligence to users without any request from the users; the BI results are delivered according to a schedule or as a result of an event or particular data condition.

Pig

Hadoop's query language

Watson

IBM's artificial intelligence which is a question answering system that draws on several areas of AI.

Forces Driving AI Innovation

Increases in computing power Availability of large data sets for training Cloud scalability and applications Increased device connectivity New AI techniques Practical AI applications

Business Intelligence (BI) Systems

Information systems that can produce patterns, relationships, and other information from organizational data (structured and unstructured) as well as from external, purchased data. Can be used to create value for an organization. Are information systems that process operational, social, and other data to identify patterns, relationships, and trends for use by business professionals and other knowledge workers. BI can be buried in data, and the function of a _____ is to extract it and make it available to those who need it. The boundaries of a ____ are blurry.

Content Management Systems (CMS)

Information systems that support the management and delivery of documents including reports, web pages, and other expressions of employee knowledge. _____ face serious challenges. EX: Typical users are companies that sell complicated products and want to share their knowledge of those products with employees and customers.

Types of Business Intelligence Systems

Informing Deciding Problem Solving Project Management

Dimension

Is a characteristic of a measure. EX: purchase date, customer type, customer location, and sales region

Neural Network

Is a computing system modelled after the human brain that is used to predict values and make classifications.

Big Data (analysis)

Is a term used to describe data collections that are characterized by huge volume, rapid velocity, and great variety. Considering volume, _____ refers to data sets that are at least a petabyte in size, and usually larger. High velocity: meaning that it is generated rapidly. (If you know physics, you know that speed would be a more accurate term, but speed doesn't start with a v, and the vvv description has become a common way to describe _____). Varied: ______ may have structured data, but it also may have free-form text, dozens of different formats of Web server and database log files, streams of data about user responses to page content, and possibly graphics, audio, and video files. Can involve both reporting and data mining techniques. The chief difference is, however, that _____ has volume, velocity, and variation characteristics that far exceed those of traditional reporting and data mining.

Process Quality

Is measured by effectiveness and efficiency, and KM can improve both.

Knowledge Management (KM)

Is the process of creating value from intellectual capital and sharing that knowledge with employees, managers, suppliers, customers, and others who need that capital. The goal of _____ is to prevent the kinds of problems just described. ______ was done before social media. Before we turn to specific technologies, however, consider the overall goals and benefits of ____. ____ benefits organizations in two fundamental ways: 1. It improves process quality. 2. It increases team strength. _____ enables employees to share knowledge with each other and with customers and other partners. By doing so, it enables the employees in the organization to better achieve the organization's strategy. At the same time, sharing knowledge enables employees to solve problems more quickly and to otherwise accomplish work with less time and other resources, hence improving process efficiency. The goal of _____ is to enable employees to be able to use knowledge possessed collectively by people in the organization. By doing so, both process quality and team capability improve.

BI application process

Processes the data w/ reporting applications, data mining applications, Big Data applications to produce business intelligence for knowledge workers.

Benefits of automated labor

Reduction in labor costs Productivity gains Reduce the amount of employee fraud w/in your org No need for severance packages Don't have union overhead Can't go on strike Don't steal intellectual property Won't file discrimination lawsuits Can't harass coworkers

3 techniques for processing BI data

Reporting Analysis Data Mining Analysis Big Data

Pull publishing

Requires the user to request BI results. Publishing media include print as well as online content delivered via Web servers, specialized Web servers known as report servers, automated applications, knowledge management systems, and content management systems.

artificial intelligence (AI)

The ability of a machine to simulate human abilities such as vision, communication, recognition, learning, and decision making in order to achieve a goal. Organizations hope to use ____ to increase the automation of mundane tasks typically done by humans. Is now becoming widely used, and it's becoming a core part of many tech companies' strategic advantage. The overall goal of _____ is to create a machine that can complete the same tasks as a human. It's the ability of a machine to simulate ALL human abilities.

Data mining

The application of statistical techniques to find patterns and relationships among data for classification and prediction. _____ resulted from a convergence of disciplines, including artificial intelligence and machine learning. Most _______ techniques are sophisticated, and many are difficult to use well. Such techniques are valuable to organizations, however. ______ techniques fall into two broad categories: unsupervised and supervised.

Map phase

The basic idea is that the Big Data collection is broken into pieces, and hundreds or thousands of independent processors search these pieces for something of interest. That process is referred to as the ________.

Measure

The data item of interest. It is the item that is to be summed or averaged or otherwise processed in the OLAP report. EX: total sales, average sales, and average cost

Big Data Analysis

The goal is to find patterns and relationships in the enormous amounts of data generated from sources like social media sites or Web server logs. As indicated, _____ techniques can include reporting and data mining as well. Consider the characteristics of each type.

Strong AI vs Weak AI

The goal of AI research is to create artificial general intelligence, or Strong AI: we can develop artificial systems that really think; that can complete all of the same tasks a human can. This includes the ability to process natural language; to sense, learn, interact with the physical world; to represent knowledge; to reason; and to plan. Weak AI: we can develop artificial systems that act intelligently; focused on completing a single specific task. SuperIntelligence: capable of intelligence more advanced than human intelligence. Some researchers see superintelligence as a potential threat to humans.

Granualarity

The level of detail represented by the data; can be too fine or too coarse. Instead, we need to obtain data that is fine enough for the lowest-level report we want to produce. In general, it is better to have too fine a granularity than too coarse. If the granularity is too fine, the data can be made coarser by summing and combining. If the granularity is too coarse, however, there is no way to separate the data into constituent parts.

BI analysis

The process of creating business intelligence. The three fundamental categories are reporting, data mining, and BigData.

automation

The process of making systems operate without human intervention. A good example of an organization benefiting from an _____ system is online banking.

Deep Learning

This multilayered neural network technique was applied to learning tasks and is now commonly known as ______. Has greatly increased the accuracy and practical usefulness of AI.

Data Mining Analysis

Used primarily for classifying and predicting.

Reporting Analysis

Used to create information about past performance. The process of sorting, grouping, summing, filtering, and formatting structured data.

Subscriptions

User requests for particular BI results on a particular schedule or in response to particular events. EX's: a user can subscribe to a daily sales report, requesting that it be delivered each morning OR the user might request that RFM analyses be delivered whenever a new result is posted on the server OR a sales manager might subscribe to receive a sales report whenever sales in his region exceed $1M during the week

Metadata

______ concerning the data—its source, its format, its assumptions and constraints, and other facts about the data—is kept in a data warehouse _____ database.

Successful Teams

_______ not only accomplish their assigned tasks, but they also grow in capability, both as a team and as individuals. By sharing knowledge, team members learn from one another, avoid making repetitive mistakes, and grow as business professionals.

Reporting Application

a BI application that inputs data from one or more sources and applies reporting operations to that data to produce business intelligence

Corpus of knowledge

a large set of related data and texts

a problem

a perceived difference between what is and what ought to be

Algorithm

a set of procedures used to solve a mathematical problem that best fits our situation

decision support systems

a synonym for decision-making BI systems where people define BI systems as supporting decision making only (in which case they use this older term)

3 primary activities in the BI process

acquire data perform analysis publish results

Possible problems w/ source data

dirty data missing values inconsistent data data not integrated wrong granularity too much data

BI systems: 5 standard components

hardware software data procedures people

2 Functions of a BI service

management and delivery

Regression analysis

measures the impact of a set of variables on another variable

Knowledge Workers

non-specialist users of BI results EX: A loan approval officer at a bank is a _____ but not a BI user.

Purchased data

often contains missing elements

Dirty data

problematic data Examples are a value of B for customer gender and of 213 for customer age. Other examples are a value of 999-999-9999 for a U.S. phone number, a part color of gren, and an email address of [email protected]. All of these values can be problematic for BI purposes.

Exception Reports

reports produced when something out of predefined bounds occurs EX: a hospital might want an ______ showing which doctors are prescribing more than twice the amount of pain medications than the average doctor. This could help the hospital reduce the potential for patient addiction to pain medications.

Business Intelligence Users

specialists in data analysis

Natural Language Processing (NLP)

the ability of a computer system to understand spoken human language, to answer questions

Curse of Dimensionality

the more attributes there are, the easier it is to build a model that fits the sample data but that is worthless as a predictor

Source data for a BI system

the organization's own operational data, social media data, data that the organization purchases from data vendors, employee knowledge

Publish Results

the process of delivering business intelligence to the knowledge workers who need it see page 72 @ bottom for specific examples

Data Acquisition

the process of obtaining, cleaning, organizing, relating, and cataloging source data

BI application

the software component of a BI system

Drill down

to further divide the data into more detail


संबंधित स्टडी सेट्स

A million flashcards for MCAT and i hate my life but i have no other plan

View Set

Community Nutrition: Government programs

View Set

Chapter 21 Oncology Learning Lab

View Set

Information Technology - Governance

View Set