Business Intelligence Mid-Term E

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

What is it called when an individual's problem solving capability is limited when a wide range of diverse information and knowledge is required?

Cognitive Limits

What is a system that stores data tables as sections of columns of data rather than as rows of data?

Columnar Database / Column Oriented Database Management Systems **much finer grain of control**

What is the EWD to support all decision analysis by providing relevant summarized and detailed information originating from many different sources?

Comprehensive Database

What focuses on a specific industry sector and build on their existing relationships in that industry through their niche platforms and services for data collection?

Data Serviced Providers

When would the data be transformed for better processing and aggregated?

Data Transformation

What contains a wide variety of data that presents a coherent picture of business conditions at a single point in time?

Data Warehouse

What is a discipline that results in applications that provide decision support capability, allows ready access to business information, and creates business insight?

Data Warehouse

What is a pool of data produced to support decision making and is a repository of current and historical data of potential interest to managers throughout the organization?

Data Warehouse

What is a subject oriented, integrated, time variant, nonvolatile collection of data in support of management's decision making process?

Data Warehouse

Who possess solid business insight and be familiar with high performance software, hardware, and networking technologies?

Data Warehouse Administrator

What consists of an integrated set of servers, storage, operating systems, database management systems, and software specifically preinstalled and preoptimized for data warehousing?

Data Warehouse Appliances

What provides solutions for the mid-warehouse to Bi Data warehouse market, offering two cost performance on data volumes in the terabyte to petabyte range?

Data Warehouse Appliances (low cost of ownership)

What are companies that include their own hardware to provide efficient data storage, retrieval, and processing?

Data Warehouse Providers i.e. IBM, Oracle, and Teradata

What describes where the company wants to go, why it wants to go there, and what it will do when it gets there?

Data Warehousing Strategy

What means that data are easily and readily obtainable?

Data accessibility Answers the question "Can we easily get the data when we need to?"

When would when the data be cleaned and the values are identified and dealt with?

Data cleaning

What means that the data are accurately collected and combined/merged?

Data consistency

When would relevant data be collected from identified sources, necessary records an variables are selected, and the records coming form multiple data sources are integrated and merged?

Data consolidation

What means that data are correct and are a godo match for the analytics problem?

Data content accuracy Answers the question "Do we have the right data for the job?"

What means that the data should be up to date for a given analytics mode and is recorded at or neat the time of the event or observation so that the time delay related misrepresentation of the data is prevented?

Data currency/data timeliness

What is the evolution of decision support, business intelligence, and analytics?

Decision Support Systems --> Enterprise/Executive Information Systems --> Business Intelligence --> Analytics --> Big Data

What conveniently organizes information and knowledge in a systematic, tabular manner to prepare it for analysis?

Decision Tables

What divides a training set until each division consists entirely or primarily of examples for one class?

Decision Tree

What shows the relationships of the problem graphically and can handle complex situations in a complex form?

Decision Tree

What classifies data into a finite number of classes based on the values of the input variables?

Decision Trees

What includes many input variables / attributes that may have an impact on the classification of different patterns?

Decision Trees

What is a hierarchy of if then statements and are thus significantly faster than neural networks?

Decision Trees -Classify data into a finite number of classes based on the values of the input variables

What describe alternative courses of action?

Decision Variables

What are some terms that are content free expressions and there is no universally accepted definition?

Decision support system, management information system,

What are examples of an enterprise data warehouse?

Decision support systems, customer relationship management, supply chain management, revenue management, etc.

What involves a situation with a limited number of events that can take on only a finite number of values?

Discrete Distributions

What refers to building a model of a system where the interaction between different entities is studied?

Discrete Event Simulation

What is used to estimate or describe the degree of variation in a given variable of interest?

Dispersion -- used for judging central tendency.

What is a common representation schema of the frequency based relationship between the terms and documents in tabular format where terms are listed in columns?

Occurrence Matrix / Term by Document Matrix

What refers to web measurement and analysis about you and your products that takes place outside your web site?

Off Site Web Analytics

What are the two main categories of Web analytics?

Off site and on site.

What measure visitors behavior once thy are on the web site and measures the performance in a commercial context?

On Site Web Analytics

What is the term used for analyzing, characterizing, and summarizing structured data stored in organizational databases?

Online Analytics Processing (OLAP)

What handles a company's routine ongoing business and responds immediately to user requests?

Online Transaction Processing (OLTP)

What consolidates data from multiple source systems and provides a near real time, integrated view of volatile current data?

Operational Data Stores

What provides a fairly recent form of customer information file and is used as an interim staging area for a data warehouse?

Operational Data Stores

What is used for short term decisions involving mission critical applications rather than for the medium and long term decisions associated with EDW?

Operational Data Stores (think short term memory)

What are the two metrics to evaluate search engines?

Effectiveness and Efficiency

How many players are involved in the analytics environment?

Eleven clusters Inner and outer petals & seed of the flower

What is the process of intelligently combining the information created and provided by two or more information sources?

Ensemble Models. -Improving accuracy and robustness of information outcomes while reducing uncertainty and bias associated with individual models.

What provides a vehicle for pushing data from source systems into the data warehouse and involves integrating application functionality and is focused on sharing functionality across systems?

Enterprise Application Integration (EAI)

What is a large scale data warehouse that is used across the enterprise for decision support and provides integration of data from many sources into a standard format?

Enterprise Data Warehouse

What is a mechanism for pulling data from source systems to satisfy a request for information and uses predefined metadata to populate view that make integrated data appear relational to end users?

Enterprise Information Integration (EII)

What is an evolving tool space that promises real time data integration from a variety of sources, such as relational databases, Web services, and multi-dimensional databases?

Enterprise Information Integration (EII)

What system collects all the data from every corner of the enterprise and integrates it into a consistent schema so that every part of the organization has access to the single version when and where needed?

Enterprise Resource Planning (ERP) systems

What is used to identify and stop malicious attacks on critical information infrastructure?

Information Warfare

What is the most commonly used data analysis technique in data warehouses and has been growing in popularity due to the exponential increase in data volumes and the recognition of the business value of data driven analytics?

OLAP

What is used for a transaction system that is primarily responsible for capturing and storing data related to day to data business functions such as ERP, CRM, SCM, POS, and so forth?

OLTP

What translates an organization's strategic objectives and goals into a set of well defined tactics and initiatives, resource requirements, and expected results fro some future time period?

Operational Plan *key to success is integration

What decision support model used data that was obtained from the domain experts use of manual processes to build mathematical or knowledge to solve constrained optimization problems?

Operations Research

What are some other names for sentiment analysis?

Opinion mining, subjectivity analysis, and appraisal extraction

What is the solution that has the highest degree of goal attainment associated with it known as?

Optimal Solution

What are enablers of prescriptive analytics?

Optimization, simulation, decision modeling, and expert systems.

What has finite ordered values?

Ordinal Data

What contains codes assigned to objects or events as labels that also represent the rank order among them?

Ordinal Data i.e.e credit score, age group.

What aims to minimize the sum of squared residuals and leads to a mathematical expression for the estimated value of the regression line?

Ordinary Least Squares Method

What can be made at the word, term, sentence, or document level?

Polarity Identification

What are other areas that utilize sentiment analysis applications?

Politics, government intelligence, and e-Commerce sites.

What are the two technical ways of collecting the data for on site analytics?

Server log files analysis and page tagging

What is the main difference between the SEMMA and Crisp DM?

The CRISP DM takes a more comprehensive approach and SEMMA implicitly assumes that the data mining project's goals and objectives along with the appropriate data sources have been identified and understood.

What adapts traditional relational database tools to the development needs of an enterprise wide data warehouse and provides a consistent & comprehensive view of the enterprise?

Top Down Development / EDW Approach

What is used by the model builder and has 2/3 of the data?

Training Set

In a simple split, what are the three mutually exclusive subsets used to prevent overfitting?

Training, validation, and testing.

What is a computerized record of a discrete event?

Transaction

Online ________ is a term used for a transaction system that is primarily responsible for capturing and storing data related to day-to-day business functions such as ERP, CRM, SCM, and point of sale.

Transaction Processing

In which stage of extraction, transformation, and load (ETL) into a data warehouse are data aggregated?

Transformation

What is the process of discovering intrinsic relationships from Web data which are expressed in the form of textual, linkage, or useful information?

Web Mining

What is the extraction of useful information from data generated through web page visits and transactions?

Web Usage Mining

What are characteristics that enable data warehouses to be tuned exclusively for data access?

Web based, relation/multidimensional, client/server, real time, include metadata.

What involves converting the extracted data from its previous form into the form in which it needs to be so that it can be placed into a data warehouse or simply another database?

Transformation

What is the extraction of useful information from Web pages?

Web content mining

What is the taxonomy of web analytics?

Web content mining, web structure mining, and web usage mining.

What are automated techniques that are used to read through the content of a Web site.

Web crawlers

What conforms to the search engine's guidelines and involves no deception?

White Hat SEO

What are some tools used for predictive analytics?

SAS, SPSS, and IBM

What is a categorized block of text in a sentence?

Tokenizing

What are important criteria when selecting an ETL tool?

-Ability to read from and write to an unlimited number of data source architectures -Automatic capturing and delivery of metadata -History of conforming to open standards -Easy to use interface for the developer and function user

When are column oriented organizations more efficient?

-An aggregate needs to be computer over many rows, but for a notably smaller subset of all columns of data -New values of a column are supplied for all rows at once because that column data can be written efficiently

What are the four major types of patterns that data mining seeks to identify?

-Associations -Predictions -Clusters -Sequential Relationships

What are examples of social networks relevant to business activities?

-Communication Networks -Community Networks -Criminal Networks -innovation Networks

What are the major characteristics and objectives of data mining?

-Data is presented in many formats. -Data mining environment is usually a client/server architecture or a Web based IS architecture. -Sophisticated new tools helps to remove the information ore buried in corporate files and public records. -Miner has little or no programming skill. -Striking it rich finds an unexpected result and requires end user to think creatively throughout the process. -Data mining can be analyzed and deployed quickly and easily. -Necessary to parallel processing for data mining

What are the components of a Linear Programming Model?

-Decision Variables -Objective Function -Objective Function Coefficients -Constrains -Capacities -Input/Output Coefficients

What are the analysis tools for measuring social media?

-Descriptive analytics -Social network analysis -Advanced analytics

What are direct benefits of implementing a data warehouse?

-End users can perform extensive analysis in numerous ways. -Consolidated view of corporate data -Better and more timely information -Enhance system performance -Data access is simplified

What are indirect benefits of implementing a data warehouse?

-Enhance business knowledge -Present a competitive advantages -Improve customer service and satisfaction -Facilitate decision making -Help reform business processes

What are the four main areas of effective security in a data warehouse?

-Establishing effective corporate and security policies and procedures; start at top management -Implementing logical security procedures and techniques to restrict access -Limit physical access to the data center environment. -Establish an effective internal control review process with an emphasis on security and privacy.

What are some factors other than hardware, software, and network capabilities that have contributed to facilitating growth of decision support and analytics?

-Group communication and collaboration -Improved data management -Managing giant data warehouses and Big Data -Analytical support -Overcoming cognitive limits in processing and storing information -Knowledge management -Anytime, anywhere support

What can cluster results be used for?

-Identify a classification scheme -Suggest statistical models to describe populations -Indicate rules for assigning new cases to classes for identification, targeting, and diagnostic purposes. -Provide measures of definition, size, and change in what were previously broad concepts -Find typical cases to label and represent classes. -Decrease the size and complexity -Identify outliers in a specific domain

What are factors that affect the architecture selection decision?

-Information interdependence between organizational units -Upper management's information needs -Urgency of need for a data warehouse. -Nature of end user tasks -Constraints on resources -Strategic view of the data warehouse prior to implemnetation -Compatibility with existing systems -Perceived ability of the in-house IT staff -Technical issues -Social and political factors

What are the benefits of implementing a data warehouse?

-Keepers: money saved by improving traditional decision support functions (20%) -Gathers: money saved due to automated collection and dissemination of information (30%) -Users: money saved or gained from decisions made using the data warehouse (50%)

How can visitor profiles be leveraged with web analytics and segmentation?

-Keywords -Content groupings -Geography -Time of Day -Landing page profiles

When are row oriented organizations more efficient?

-Many columns of a single row are required at the same time and the row size is relatively small -Writing a new row if all of the column data is supplied at the same time.

Why has data mining become more popular?

-More intense competition at the global scale. -General recognition of the untapped value hidden in large data sources. -Consolidation and integration of database records. -Exponential increase in data processing and storage technologies. -Significant reduction in the cost of hardware and software for data storage. -Movement toward demassification of business practice

What are conversion statistics?

-New & returning visitors -Leads -Sales Conversion -Abandonment / Exit Rates

What are challenges associated with implementing NLP?

-Part of speech tagging. -Text segmentation -Word sense disambiguation -Syntactic Ambiguity -Imperfect or Irregular Input -Speech Acts

What are the types of organizations or professionals that comprise the analytics industry?

-Provide advice to the analytics industry providers and users -Professional societies or organizations that are membership based and organized. -Analytics ambassadors, influences, or evangelists that have presented their enthusiasms for analytics through seminars, books, or other publications.

What are the main types of a data warehouse?

-Provide decision support capability -Allows ready access to business information -Creates business insight

What are characteristics that differentiate between social and industrial media?

-Quality -Reach -Frequency -Accessibility -Usability -Immediacy -Updatability

What are the reasons for the upswing of open source software?

-Recession has driven up interest in low cost open source software -Open source tools are coming into a new level of maturity -Open source software augments traditional enterprise software without replacing it.

What are components of the inner petal of the analytics ecosystem?

-Regulators and policy makers -Analytics industry analysts & influencers -Academic institutions and certification agencies -Application Developers: industry specific or general

What does an LP allocation model assume?

-Returns from different allocations can be compared -Return from any allocation is independent of others. -All data are known with certainty -The resources are used in the most economical manner.

What are the two broad categories of SEOs?

-Search engines that recommend as part of a good site design. -Techniques of which search engines do not approve.

What are the data mining mistakes?

-Selecting the wrong problem for data mining. -Ignoring what your sponsor thinks data mining is and what it can/can't do. -Beginning without the end in mind. -Define the project around a foundation that your data can't support. -Leaving insufficient time for data preparation. -Looking only at aggregated results and not at individual records. -Not keeping track of the data mining procedure and results. -Using data from the future to predict the future. -Ignoring suspicious findings and quickly moving on. -Starting with high profile complex project first. -Running data mining algorithms repeatedly and blindly. -Ignore the subject matter experts. -Believing everything you are told about the data. -Assuming full cooperation. -Measuring your results differently from the way your sponsor does. -If you build, they will come mindset.

What are the three key components of a BPM?

-Set of integrated, closed loop management and analytic processes that address financial and operational activities -Tools for business to define strategic goals and measure / manage performance against those goals. -Core set of processes linked to organizational strategy

What are various risks and issues when developing a successful data warehouse?

-Starting with the wrong sponsorship chain -Setting expectations you can't meet -Engaging in politically in naive behavior -Loading the warehouse with data just because it's available -Believing that the data warehousing is the same as transactional database design -Choosing a data warehouse manager who is technology oriented rather than user oriented. -Focusing on traditional orientated data and ignoring the value of external data -Delivering data with overlapping and confusing definitions. -Believing promises of performance, capacity, and scalability. -Believing that your problems are over when the data warehouse is up and running. -Focusing on ad hoc data mining and periodic reporting instead of alerts

What is the main difference between statistics and data mining?

-Statistics collects sample data to test the hypothesis whereas data mining and analytics use all the existing data to discover novel patterns and relationships. -Size of data varies

What are best practices in social media analytics?

-Think of measurement as a guidance -Track the elusive statement -Improve the accuracy of text analysis -Look at the ripple effect -Look beyond the brand -Identify your most powerful influencers -Look closely at the accuracy of your analytic tool -Incorporate social media intelligence into planning

Why is master data management gaining popularity?

-Tighter integration with operational systems demands -Most data warehouses still lack MDM and data quality functions -Regulatory and financial reports must by perfectly clean and accurate

What are challenges with the Web?

-Too big for effective data mining -Too complex & dynamic -Not specific to a domain -Web has everything

What are difficulties that arise when analyzing multiple goals?

-Usually difficult to obtain an explicit statement of the organization's goals. -Goals and subgoals are viewed different -Decision maker may change the importance assigned to specific goals over time or for different decision scenarios. -Personal agendas -Importance assessment differently.

What are the ways to manage multiple goals?

-Utility theory -Goal Programming -Expression of goals as constraints -Points system

What are the four categories of web analytics?

-Web site usability -Traffic sources -Visitor profiles -Conversion statistics

What will play a significant role in defining the future of data warehouse?

-Web, social media, and Big Data -Open source software -SaaS -Cloud Computing -Data lakes

What identifies the natural grouping of thins based on their known characteristics?

Clusters

What are the steps of CRISP DM Process?

1. Business Understanding 2. Data Understanding 3. Data Preparation 4. Model Building 5. Testing & Evaluation 6. Deployment

What are the steps of data processing steps/

1. Data Consolidation -- collect, select, and integrate 2, Data Cleaning -- impute values, reduce noise, eliminate duplicates 3. Data Transformation -- normalize, discretize, and create attributes 4. Data Reduction -- dimension, volume, and balance data

What are the steps of simulation?

1. Define the problem. 2. Construct the simulation model. 3. Test & validate the model 4. Design the experiment 5. Conduct the experiment

What are the steps of the text mining process?

1. Establish the corpus 2. Create the term document matrix. 3. Extract knowledge

What are the steps for a sentiment analysis?

1. Sentiment Detection: calculate the OS Polarity 2. NP Polarity Classification 3. Target Identification: Identify the target for sentiment 4. Collection & aggregation

How far does data warehousing trace back to?

1970s

What are examples of traffic sources?

=Referral Web Sites -Search Engines -Direct Searches via bookmarking of web page or using URL -Offline campaigns -Online Campaigns

What is the difference between the clustering and the classification?

Classification learns the function between the characteristics of things and their membership through a supervised learning process whereas clustering is an unsupervised learning process where only the input variables are presented to the algorithm.

What are examples of transaction processing?

ATM withdrawals, bank deposits, cash register scans at the grocery store, etc.

What are subcategories of prediction?

Classification, regression, and time series.

What is the percentage of test data set samples correctly classified by the model?

Accuracy Rate

What is the outcome of predictive analytics?

Accurate projections of future events and outcomes.

What includes predictive analysis and text analytics that examine the content in online conversations?

Advance analytics

When would all items start in individual clusters and the clusters are joined together?

Agglomerative

What is the process of developing actionable decisions or recommendations for actions based on insights generated from historical data?

Analytics

Who has developed analytics software for general use with data that has been collected in a data warehouse or is available through one of the platforms?

Analytics Focused Software Developers

What is the most commonly used algorithm to discover association rules that attempts to find subsets that are common to at least a minimum number of the itemsets ?

Aprirori Algorithm *uses bottom up approach*

What is a graphical assessment technique where the true positive rate is plotted on the y axis and the false positive is plotted on the x-axis?

Area Under the ROC Curve

What is the most popular and most commonly used measure of central tendency?

Arithmetic Mean

What is the sum of all the values/observations divided by the number of observations in the data set?

Arithmetic Mean

The data mining algorithm type used for classification somewhat resembling the biological neural networks in the human brain is

Artificial neural networks

What aims to find interesting relationships between variables in large databases?

Association Rule Mining

What finds the commonly co-occuring grouping of things?

Associations

What is a popular and well-researched technique for discovering interesting relationships among variables in large database?

Associations

What is the creation of a shortened version of a textual document by a computer program that contains the most important points of the original document

Automatic Summarization

What is the clickable photos, text links in the copy, downloads, and navigation on a page?

Click map

What can reveal where you might be losing visitors in a specific process?

Click paths

What is the analysis of the information collected by web servers can help better understand user behavior?

Clickstream Analysis

What is the main difference between BSC and Six Sigma?

BSC is focused on improving overall strategy and the Six Sigma is focused on improving processes.

What is used when introducing structure to a collection of text based documents to classify them into two or more predetermined classes or to cluster them into natural groupings?

Bag of Words i.e. Spam Filtering

What is the best known and most widely used performance management system that suggests people view the organization from four perspectives?

Balance Scorecard (BSC)

What is both a performance measurement and a management methodology that helps translate an organizations financial, customer, internal process, and learning / growth objectives and targets into a set of actionable initiatives?

Balanced Scorecard

What is the outcome of prescriptive analytics?

Best possible business decisions and actions

What refers to data that is structured, unstructured , in a stream and so forth?

Big Data

Who's if the father of data warehousing?

Bill Inmon

What attempts to improve rankings in way that are not approved by the search engines or involve deception?

Black Hat SEO

When is a fixed number of instances from the original data are sampled for training and the rest of the data set is used for testing?

Bootstrapping

What is the plan big, build small approach that focuses on the request of a specific department?

Bottom Up Approach / Data Mart Approach (DM)

What is a graphical illustration of several descriptive statistics about a given data set?

Box & Whiskers Set / Box Plot

What approach uses probability theory to build classification models based on the past occurrence that are capable of placing a new instance into a most probable class/category?

Boyesian Classifiers

What represents the outcome of a test to classify a pattern using one of the attributes?

Branch

What focuses on listening to social media where anyone can post opinions that can damage or boost your reputation?

Brand Management

Who is an individual who weak ties fill a structural hole providing the only link between two individuals or clusters?

Bridge

What are the subcategories of distributions (SNA)?

Bridge, centrality, density, distance, structural holes, and tie strength

What is a collection of tools for manipulating, mining, and analyzing the data in the warehouse?

Business Analytics

What enables interactive access to data, manipulation of data, and the ability to conduct appropriate analysis?

Business Intelligence

What is an umbrella term that combines architectures, tools, data bases, analytical tools, applications, and methodologies?

Business Intelligence

What is based on the transformation of data to inflammation, then decisions, and finally actions?

Business Intelligence

______ is an umbrella term that combines architectures, tools, databases, analytical tools, applications, and methodologies.

Business Intelligence

What are the business processes, methodologies, metrics, and technologies used by enterprises to measure, monitor, and manage business performance?

Business Performance Management (BPM)

What is used for monitoring and analyzing performance?

Business Process Management

What are enablers of descriptive analytics?

Business reporting, dashboards, scorecards, and data warehousing

What uses a sequence of six steps that starts with a good understanding of the business and the need for the data mining project and ends with the deployment of the solution that satisfies the specific business need?

CRISP DM

Which data mining process/methodology is advocated by IBM-SPSS?

CRISP-DM

What represents the labels of multiple classes used to divide a variable into specific groups and represents a finite number of values with no continuum between them?

Categorical data / Discrete data i.e. race, sex, age group, and educational level

What refers to a group of metrics that aim to quantify the importance or influence of a particular node within a network?

Centrality

What warehousing architecture has a gigantic EDW that serves the needs of all organizational units and provides users with access to all the data in the data warehouse?

Centralized Data Warehouse

What is assumed that complete knowledge is available so that the decision maker knows exactly what the outcome of each course of action is?

Certainty

What is based on the identification, capture, and delivery of the changes made to enterprise data sources?

Change Capture

What is the most common data mining tasks and analyzes the historical data stored in a database and automatically generates a model that can predict future behaviors?

Classification

What is the most frequently used data mining method for real world problems?

Classification

What is the primary source for accuracy estimation in classification problems?

Classification Matrix

What are the subcategories of segmentation for SNA?

Cliques and social orders, clustering coefficient, and cohesion.

What implies that optimum performance is achieved by setting goals and objectives, establishing initiatives and plans to achieve those goals, monitoring actual performance, and taking corrective action?

Closed Loop BPM Cycle

What has been used extensively for fraud detection and market segmentation of customers in contemporary CRM systems?

Cluster Analysis

What is the means of identifying classes of items so that items in a cluster have more in common with each other than with items in other clusters AND identify natural groupings of events or objects so that a common set of characteristics?

Cluster Analysis

What is used to sort case into groups or clusters so that the degree of association is strong among members of the same cluster and weak among members of different cluster?

Cluster Analysis

What partitions a collection of things into segments whose members share similar characteristics, but the class labels are unknown?

Clustering

What are subcategories of segmentation?

Clustering and outlier analysis

What is the measurehood of likelihood that two members of a node are associates?

Clustering coefficient

What enables people to overcome their cognitive limits by quickly accessing and processing vast amounts of stored information?

Computerized Systems

What are features generated from a collection of documents by means of manual, statistical, rule based, or hybrid categorization methodology?

Concepts

What is the process called that predicts machinery failures before they occur through the use of sensory data?

Condition Based Maintenance

What are the metrics of measuring social network analysis?

Connections, distributions, and segmentation.

What represents the dimensional information coming from potentially disparate source, but pertaining to the same subject?

Consistent data

What is the assumption that states that the response variables have the same variance in this error?

Constant Variance/Homoscedasticity

What are situations with unlimited numbers of possible events that follow density functions?

Continuous Distributions

What is a large and structured set of texts prepared for the purpose of conducting knowledge discovery?

Corpus

What gives an estimate on the degree of association between the variables?

Correlation

What is interested in low level relationships between two variables?

Correlation

When would you introduce structure to the corpus?

Create the term document matrix?

What is a multidimensional data structure that allows fast analysis of data and is defined as the capability of efficiently manipulating and analyzing data from multiple perspective?

Cube

What creates a one on one relationships with customers by developing an intimate understanding of their needs and wants?

Customer Relationship Management

What are the perspectives that an organization should develop objectives, measures, targets, and initiatives?

Customer, financial. internal business process, and learning & growth.

What was used to describe the process through which previously unknown patterns in data were discovered?

Data Mining

What is the tedious and time demanding process that is necessary to convert the raw real world data into a well refined form for analytics algorithms?

Data Preprocessing

What is the term for professional who utilizes predictive analysis, statistical analysis, and more advance analytical tools and algorithms?

Data Scientists

What is frequently a convenient first step to acquiring experience in constructing and managing a data warehouse while presenting business users with the benefits of better access to their data?

DM Approach

What is a closed loop business improvement model and encompasses the steps of defining, measuring, analyzing, improving, and controlling a process?

DMAIC

What is the collection of facts usually obtained as the result of experiments, observations, transaction, or experiences?

Data

What is the main ingredient for any BI, data science, and business analytics initiative?

Data

What is the ability to access and extract data from any data source?

Data Access

What is a term for professionals who were doing BI in the form of data compilation, cleaning, reporting, and perhaps some visualization?

Data Analyst

What is the integration of business view across multiple data stores?

Data Federation

What companies enable generating and collection of data that may be used fr developing analytical insights?

Data Generation Infrastructure Providers

What comprises three major processes that permit data to be accessed ad made accessible to an array of ETL and analysis tools and data warehousing environment: data access, data federation, and change capture?

Data Integration

What is a large storage location that can hold vast quantities of data in its native/raw format for future potential analytics consumption?

Data Lakes

What includes the organizations that provide hardware and software targeting the basic foundation for all management solutions?

Data Management Infrastructure Providers

What usually smaller and focuses on a particular subject or department?

Data Mart

What architecture has the individual marts linked to each other via some kind of middleware?

Data Mart Bus Architecture

What is the process that uses statistical, mathematical, and artificial intelligence techniques to extract and identify useful information and subsequent knowledge from large sets of data?

Data Mining

What is used to describe discovering or mining knowledge from large amounts of data?

Data Mining

What requires that the variables and data values be defined at the lowest level of detail for the intended use of the data?

Data granularity

What are enablers of predictive analytics?

Data mining, text mining, web/media mining, and forecasting

Where is the most time spend on the analytics tasks?

Data preprocessing

What is the term that means that the variables in the data set are all relevant to the study being conducted?

Data relevancy

What means that all required data elements are included in the data set and build a predictive or prescriptive analytics model?

Data richness Available variables portray enough dimensional of the underlying subject matter for an accurate and worthy analytics study.

What means that data is secured to only allow those people who have the authority and the need to access it and to prevent anyone else from reaching it?

Data security and data privacy

What refers to the originality and appropriateness of the storage medium where the data is obtained?

Data source reliability Answers the question "Do we have the right confidence and belief in this data source?"

What are the major components of the data warehousing process?

Data sources, data extraction/transformation, data loading, comprehensive database, metadata, and middleware tools

What is the term used to describe a match/mismatch between the actual and expected data values of a given variable?

Data validity

What are the four major components of business intelligence?

Data warehouse, business analytics, business process management, and user interface.

What are the parts that comprise the data warehousing architectures?

Data warehouse, data acquisition software (application server), client front end software (database server)

What is the successful administration and management of a data warehouse entails skills and proficiency?

Database Administrator

What is the component where the most work must be done to implement a data model and optimize it for query performance?

Database Management Systems (DBMS)

What area are the prediction models that differentiate deceptive statements from truthful ones classified as?

Deception Detection

What are storage solution providers?

Dell and Netapp

What is the proportion of direct ties in a network relative to the total number possible?

Density

What is a subset that is created directly from the data warehouse and uses a consistent data model to provide quality data?

Dependent Data Mart

What ensures that the end user is viewing the same version of the data that is accessed by all other data warehouse users?

Dependent Data Mart --high cost limits this to large companies

________ analytics help managers understand current events in the organization including causes, trends, and patterns.

Descriptive

What refers to knowing what is happening in data organization and understanding some underlying trends and causes of such occurrences?

Descriptive / Reporting Analytics

What answers the question "What happened?" and"What is happening?"

Descriptive Analytics

What is the entry level in the business analytics taxonomy?

Descriptive Analytics

What helps us convert our numbers and symbols into meaningful representatives for anyone to understand and use?

Descriptive Statistics

What is used to describe the sample data on hand and summarizes it in a way that is meaningful and easily understandable patterns emerge?

Descriptive Statistics

What are the levels of decision/normative analytics?

Descriptive, predictive, and prescriptive

What cycle creates a huge database of documents / pages organized and indexed based on their content and information value?

Development Cycle

What are the two main cycles of a search engine?

Development Cycle and a responding cycle

What is the slice on more than two dimensions of a data cube?

Dice

What has one to many relationships with rows in the central fact table?

Dimension Table

What contain classification and aggregation information about central fact rows and the attributes that describe the data contained within the fact table?

Dimension Tables

What is a retrieval based system that supports high volume query access?

Dimensional Modeling -- star and snowflake schema

When the number of variables can be rather large, the analyst must reduce the number down to a manageable size. What is the process called?

Dimensional Reduction / Variable Selection

What is the minimum number of ties required to connect two particular actors?

Distance

What is the frequency of data points counted and plotted over a small number of class labels or numerical ranges?

Distribution

When would all items start in one cluster and are broken apart?

Divisive

What happens when the user navigates among levels of data ranging from the most summarized up to the most detailed?

Drill Up / Down

What are the leading indicators / value drivers that measure activities that have a significant impact on outcome KPS?

Driver KPI

What is an integral compomental in the process in any data centric project and consists of extraction, transformation, and loading integrated & cleansed data?

ETL

What measures the extent of uncertainty or randomness in a data set and is used to build subtrees so that the entropy of each final subset is 0?

Entropy

What is the monitoring, scanning, and interpretation of collected information?

Environmental Scanning and Analysis

When would you collect and organize the domain specific unstructured data?

Establish the corpus

What systems were designed as graphical dashboards and scorecards so that they could serve as visually appealing displays while focusing on the most important factors for decision makers to keep track of the key performance indicators?

Executive Information Systems

What are issues affect whether an organization will purchase tools or build the transformation process itself?

Expensive, long learning curve, and it's difficult to measure how the IT organization is doing until it has learned to use the tools.

What is an independent variable also known as?

Explanatory or input

How does sentiment appear in text?

Explicit - subjective sentence directly expresses an opinion AND Implicit - The text implies an opinion

When would you discover novel patterns from the T-D matrix?

Extract knowledge

What involves reading data from one or more databases?

Extraction

What are the sub categories of connections in SNA?

Homophily, multiplexity, mutuality, network closure, propinquity.

What is it called when another firm develops and maintains the data warehouse?

Hosted Data Warehouse

What contains the descriptive attributes needed to perform decision analysis and query reporting?

Fact Table

True or false: The ETL process in data warehousing usually takes up a small portion of the time in a data-centric project.

False

What is the outcome of when the predictive class is negative and the observed class is positive?

False Negative

What is the outcome of when the predictive class is positive and the observed class is negative?

False Positive

What uses all possible means to integrate analytical resources from multiple sources to meet changing needs or business conditions?

Federated Data Warehouse

Where has the most common use of data mining been used on the commercial side?

Finance, retail, and healthcare sectors.

What has been used in economics to measure the diversity of a population and can be used to determine the purity of a specific class as a result of a decision to branch along a particular attribute or variable?

Gini Index

What are factors that are forcing business managers to rethink how they integrate and manage their businesses?

Global competitive pressures, demand for ROI, management, investor inquiry, and government regulations

What calculates the values of the inputs necessary to achieve a desired level of an output/goal?

Goal Seeking

What combines the outcomes of two or more of the same type of models such as decision trees?

Homogeneous Ensemble Model

What is the extent to which actors form ties with similar vs. dissimilar others?

Homophily

What is one or more Web pages that provide a collection of links to authoritative pages and implicitly conferring the authorities on a narrow field?

Hub

What is the most famous data warehousing architecture today because it's focused on building a scalable and maintaining infrastructure?

Hub & Spoke Architecture -allows for easy customization of user interfaces and reports, but can have data redundancy and latency.

What is the most popular publicly known and referenced algorithm used to calculate hubs and authorities?

Hyperlink Induced Topic Search

What are the major hardware players that provide the infrastructure for database computing?

IBM, Dell, HP, Oracle,.

What are tools used for predictive analytics?

IBM, Oracle, SAP, Teradata, Informatica

What companies provide indigenous hardware and software platforms?

IBM, Oracle, and Teradata

What is the level of understanding and insight provided by the model?

INterpretability

What is the OS polarity?

If the objectivity value is close to 1, then there is no opinion to mine.

How does clustering improve search effectiveness for text mining?

Improved search recall and search precision.

What is the integration of the algorithmic extent of data analytics into data warehousing?

In Database Processing / In Database Analytics *used for high throughput,real time application environments, including fraud detection, credit score, risk management, etc.*

What keeps the data permanently in the main memory?

In Memory Database

What assumption states that the errors of the response variable are uncorrelated with each other?

Independence (weaker than actual statistical independence)

What is a small warehouse designed for strategic business unit or a department, but its source is not an enterprise data warehouse?

Independent Data Mart --lower cost & lower scale

What is the simplest and least costly architecture alternative?

Independent Data Marts *Developed to operate independent of each other and serve the needs of individual organizational units*

What is used to draw inferences or conclusions about the characteristics of the population?

Inferential Statistics

What are graphical models of a model that can facilitate the identification process?

Influence Diagram

What is the identification of key phrases and relationships within text by looking for predefined objects and sequences in text by way of pattern matching?

Information Extraction

What is the splitting mechanism used in ID3 which is perhaps the most widely known decision trees algorithm and was developed by Ross Quinlan?

Information Gain

What are the measures to assess the success of an architecture?

Information quality, system quality, individual impacts, and organizational impacts.

What does it mean to place data from different sources into a consistent format?

Integrated

What reflects intermediate outcomes in mathematical models?

Intermediate Result Variables

What are variables that can be measured on interval scales?

Interval Data i.e. Temperature

When is the accuracy calculated by leaving one sample out at each iteration of the estimation process?

Jackknifing

When is the complete data set randomly split into k mutually exclusives subsets of approximately equal size?

K-Fold Cross Validation / Rotation Estimation

What represents a strategic objective and measures performance against a goaL?

Key Performance Indicator (KPI)

The ________ Model, also known as the data mart approach, is a "plan big, build small" approach. A data mart is a subject-oriented or department-oriented data warehouse. It is a scaled-down version of a data warehouse that focuses on the requests of a specific department, such as marketing or sales.

Kimball

What is a process of using data mining methods to find useful information and patterns in the data which involved using algorithms to identify patterns in data?

Knowledge Discovery in Databases (KDD)

When would an expert's knowledge about the categories be encoded into the system either declarative or in the form of procedural classification rules?

Knowledge Engineering Approach

What are the two main approaches to text classification?

Knowledge engineering and machine learning.

What are other names of data mining?

Knowledge extraction, pattern analysis, data archaeology, information harvesting, pattern searching and data dreging

Which graph measures the degree to which a distribution is more of less peaked than a normal distribution?

Kurtosis

What is used to change the dimensional orientation of a report or ad hoc query page display?

Pivot

What represents the final class choice for a pattern?

Leaf Node

What is used when every data point is used for testing once as many models developed as there are a number of data points?

Leave One Out *time consuming, but best for small data sets

What is the catalog of words, their synonyms, and their meanings for a given language and create a variety of special purpose lexicons for use in sentiment analysis projects?

Lexicon

What is the best known technique in a family of optimization tools called mathematical programming?

Linear Programming *all relationships among variables are linear

What assumption states that the relationship between the response variable and the explanatory variable are linear?

Linearity

What are assumptions associated in linear regression?

Linearity, independence, normality, constant variance, and multicollinearity

When is the linkage among many objects of interest is discovered automatically?

Link Analysis

What involves putting the data into the data warehouse?

Load

What is a very popular, statistically sound, probability classified algorithm that employs supervised learning?

Logistic Regression

What is used to classify a categorical variable?

Logistic Regression

When would a general inductive process build a classifier by learning from a set of reclassified examples?

Machine Learning Approach

What does an analyst use to navigate through the database and screen for a particular subset of the data by changing the data's orientations and defining analytical calculations?

OLAP

When can text mining be used to increase cross selling and up selling by analyzing the unstructured data generated by call centers?

Market Applications -Invaluable for CRM!

What are subcategories of association?

Market basket, link analysis, and sequence analysis.

What is a family of tools designed to help solve managerial problems in which the decision maker must allocate scarce resources among competing activities to optimize a measurable goal?

Mathematical Programming

What is a simpler way to calculate the overall deviation from the mean and is calculated by measuring the absolute values of the differences between each data point and the mean?

Mean Absolute Deviation

What is a single numerical value that aims to describe a set of data by simply identifying or estimating the central position within the data?

Measure of Central Tendency

What is the measure of center value in a given data set?

Median

What is the most standardized and orderly making it a more minable information source?

Medical Literature

What can drive changes in business intelligence?

Mergers & acquisitions, regulatory requirements, and introduction of new channels.

What describes the structure and some meaning about data contributing to their effective or ineffective use?

Metadata

What is a worldwide source for access to Mircosoft's SQL Server suite for academic purposes teaching and research?

Microsoft Enterprise Consortium

Where are data and models stored in the same relational database environment, making model management a considerably easier task?

Microsoft SQL Server

Who provides easy to use tools for reporting or descriptive analytics?

Middleware Providers i.e. Oracle, SAP, and IBM

Who provides tools that enable reporting or descriptive analytics?

Middleware industry players i.e. Microsoft SQL, Tableau, SAS

What enables access to the data warehouse?

Middleware tools

What is the observation that occurs most frequently?

Mode *most useful for data with a small number of unique values

What is the most common two step methodology of classification type?

Model development/training and model testing/deployment.

What is the most common simulation method for business decisions that begins with building a model of the decision problem without having to consider the uncertainty of any variables?

Monte Carlo Simulation

What is a branch of the field of linguistics and a part of the NLP that studies the internal structure of words?

Morphology

What is the assumption that states that the explanatory variables are not correlated?

Multicollinearity

What involves data analysis in several dimensions and are generally shown in a spreadsheet format?

Multidimensional Analysis

What is the number of content forms constrained in a tie?

Multiplexity

What is the approach to quickly answer ad hoc questions by executing multidimensional analytical queries against organizational data repositories?

OLAP

What is the systems that convert information from computer databases into readable human language?

Natural Language Generation

What is a subfield of artificial intelligence and computation linguistics and studies the problem of understanding the natural human language with the view of converting depictions of human language into more formal representations that are easier for computer programs to manipulate?

Natural Language Processing

What is the measure of the completeness of relational triads?

Network Closure

What is the term for describing analytics that relate to groups of people, social networks, supply chain networks, etc?

Network Science

What involves the development of mathematical structures that have the capability to learn from past experiences presented in the form of well structured data sets?

Neural Networks -Classification algorithm

What has finite non-ordered values?

Nominal Data

What contains measurements of simple codes assigned to objects as labels which are not measurements?

Nominal data - can be represented with binomial values having two possible values i.e. variable marital status (single, married, divorce)

What means that some experimentation type search or inference is inolved?

Nontrival

What does it mean if users can't change or update the data?

Nonvolatile

What assumption states that the errors of the response variable are normally distributed?

Normality

What means that the patterns are not previously known to the user within the context of the system being analyzed?

Novel

What are numeric values?

Numeric Data

What represents the numeric values of specific variables?

Numeric Data / Continuous Data (scalable data) -- can be integer or real.

What are lagging indicators that measure the output of past activity?

Outcome KPI (financial in nature)

What uses JavaScript embedded in the site page code to make image requests to a 3rd party analytics dedicated server whenever a page is rendered by a web browser?

Page Tagging

What is the most basic of measurements and is presented as the average page views per visitor?

Page Views

What enables multiple CPUs to process data warehouse query requests simultaneously and provides scalability?

Parallel Processing

What assists managers in tracking the implementation of business strategy by comparing actual results against strategic goals and objectives?

Performance Measurement Systems

What are examples of decision analysis attributes?

Performance measures, operational metrics, aggregated measures, and all the others to analyze the organization's performance.

What means that the discovered patterns should lead to some benefit to the user or task?

Potentially Useful

What is the act of telling about the future?

Prediction / Forecasting

What are categories of data mining tasks?

Prediction, Association, and Segmentation

What tells the nature of future occurrences of certain events based on what has happened in the past?

Predictions

What is most commonly used assessment factor for classification models that predicts the class label of new or previously unseen data?

Predictive Accuracy

What aims to determine what is likely to happen in the future and is based on statistical techniques?

Predictive Analytics

What answers the question "What will happen?" and "Why will it happen?"

Predictive Analytics

Where has the biggest growth in analytics been?

Predictive Analytics

What answers the question "What should I do?" or "Why should I do it?"

Prescriptive Analytics

What is used to provide a decision or a recommendation for a specific action?

Prescriptive Analytics

What is used to recognize what is gong on as well as the likely forecast and make decisions to achieve the best performance possible?

Prescriptive Analytics

What is a modeling a key element for?

Prescriptive analytics

In what simulation would one or more of the independent variables be probabilistic?

Probabilistic Simulation

What implies the data mining comprises many iterative steps?

Process

What is the tendency for actors to have more ties with geographically close others?

Propinquity

What contains both nominal and ordinal data?

Qualitative Data / Categorical Data

What is made up of result variables, decision variables, uncontrollable variables, and intermediate result variables?

Quantitative Models

What is a quarter of the number of data points given in a data set?

Quartile

What is a useful measure of dispersion because they are much less affected by outliers or a skewness in the data set?

Quartile Reported along with the median as the best choice of measure of dispersion and central tendency

What are the two components of a response cycle?

Query Analyzer and Document Matcher/Ranker

What employs a hierarchal clustering approach where the most relevant documents to the posed query appear in small tight clusters that are nested in larger clusters containing less similar documents, creating a spectrum of relevance levels among the documents?

Query Specific Clustering

What is the task of automatically answering a question posed in natural language?

Question Answering

What are the open source platforms that have emerged as popular industrial strength software tools for predictive analytics?

R, Rapid Miner, and KNIME

What ranges from 0 to 1 with 0 indicating that the proposed model is NOT a good fit and 1 indicating that the proposal model is a perfect fit?

R2

Cost of the business intellligence system software licensing is one component included in the _______ calculation.

ROBII

What is the difference between the largest and smallest values in a given data set?

Range (simplest measure of dispersion)

What is the most popular general platform for data mining/data science?

RapidMiner

What includes measurement variables commonly found in the physical sciences and engineering?

Ratio Data i.e. Mass, length, time, plane angle, energy

What implies that the refresh cycle of an existing data warehouse to update the data is more frequent?

Real-Time Data Warehousing

What attempts to describe the dependence of a response variable on one explanatory variables where it implicitly assumes that there is a one way casual effect from the explanatory variable to the response variable?

Regression

What is a simple statistical technique to model the dependence of a variable on one explanatory variables?

Regression

What is concerned with the relationships between all explanatory variables and the response variable?

Regression

What is the most widely known and used analytics techniques in statistics used for hypothesis testing and prediction/forecasting?

Regression

What is a dependent variable also known as?

Response or output

What reflects the level of effectiveness of a system by indicating how well the system performs or attains its goals?

Result/Outcome Variables

What is the mantra for business intelligence?

Right information at the right time and in the right place.

When must the decision maker consider several possible outcomes for each alternative each with a given probability of occurrence?

Risk / Probabilistic / Stochastic Decision Making

What is a decision making method that analyzes the risk associated with different alternatives?

Risk Analysis

What is the ability's to make reasonably accurate predictions given noisy data or data with missing and erroneous values?

Robustness

What involves computing all the data relationships for one or more dimensions?

Roll-Up

What system captured experts' knowledge in a format that computers could process so that these could be used for consultation and allowed scare expertise to be made available where and when needed?

Rule Based Expert Systems

Who are some examples of ETL providers?

SAS, Microsoft, Oracle, IBM

What makes a statistically representative sample of data to apply exploratory statistical and visualization techniques, select, and transform the most significant predictive variables, models the variables to predict outcomes, and confirm a mode'l's accuracy?

SEMMA

What are some data solution providers offering hardware and platform independent database management systems?

SQL Server family of MIcrosoft and SAP

What are the most commonly used database management systems?

SQL Server, Oracle, and DB2

What is a creative way of deploying information systems applications where the provider licenses its applications to customers for use a a service on demand?

SaaS (Extended ASP Model)

What are the steps of the SEMMA Data Mining Process?

Sample, Explore, Modify, Model, and Assess.

What is the ability to construct a prediction model efficiently given a rather large amount of data?

Scalability

What are the two most popular clustering methods for text mining?

Scatter/gather and query specific clustering

What is a software program that searches for documents, base don keywords users have provided that have to do with the subject of their inquiry?

Search Engine

What is the intentional activity of affecting the visibility of an e-commerce site or a web site in a search engine's natural search results?

Search Engine Optimization (SEO)

What are the man concerns for a data warehouse professional?

Security & privacy of information

What are URLs known as?

Seeds

What is the most common method for solving this risk analysis problem?

Select the alternative with the greatest expected value.

What attempts to assess the impact of change in the input data or parameters on the proposed solution?

Sensitivity Analysis

What collects a massive amount of data at a faster rate and have been adopted by various sectors such as healthcare, sports, and energy?

Sensors

What is a technique used to detect favorable and unfavorable opinions toward specific products and services using a large number of textual data sources?

Sentiment Analysis

When are the relationships examined in terms of their order of occurrence to identify associations over time?

Sequence Mining

What is the discovery of time ordered events?

Sequential Relationships

What partitions the data into two mutually exclusive subsets called a training set and a test set?

Simple Split

What is normally used when a problem is too complex to be treated using numerical optimization techniques?

Simulation

What is the appearance of reality and is a technique for conducting experiments with a computer on a model of a management system?

Simulation

What reduces the overall dimensionality of the input matrix to a lower dimensional space where each consecutive dimension represents the largest degree of variability possible?

Singular Value Decomposition

What is a performance management methodology aimed at reducing the number of defects in a business process to as close to 0 DPMO as possible?

Six Sigma

What is a measure of asymmetry in a distribution of the data that portrays a unimodal structure with only one peak exists in the distribution?

Skewness

What is a subset of multidimensional array corresponding to a single value set for one or more of the dimensions not in the subset?

Slice -3D Cub

What are commonly used OLAP operations?

Slice & Dice, drill down, roll-up, and pivot.

What is a logical arrangement of tables in a multidimensional database in such a way that the entity relationship diagram resembles a snowflake in shape?

Snowflake Schema -dimensions are normalized into multiple related tales.

What is the mining of textual context created in social media and analyzing socially established networks for the purpose of gaining insight about existing and potential customers' current and future behaviors and about the likes and dislikes toward a firm's product/service?

Social Analytics

What is the enabling technologies of social interactions among people in which they create, share, and exchange information?

Social Media

What is the systematic and scientific ways to consume vast amount of content created by web based social media outlets, tools, and techniques for the betterment of an organization's competitiveness?

Social Media Analytics

What is a social structure composed of individuals/people linked to one another with some type of connections/relationships and provides a holistic approach to analyzing the structure and dynamics of social entities?

Social Network

What is a theoretical construct useful in the social sciences to study relationships between individuals, groups, organizations, or even societies?

Social Network

What follows the links between friends, fans, and followers to identify connections of influence as well as the biggest sources of influence?

Social Network Analysis

What is the systematic examination of social networks that view social relationships in terms of network theory?

Social Network Analysis (SNA)

What can be placed on a separate server in the network or on the transnational application databases themselves and can use event and process based approaches to proactively and intelligently measure and monitor operational processes?

Software Monitors / Intelligent Agents

What is the semiautomated process of extracting patterns from large amounts of unstructured data sources?

Text Mining

What converts spoken words to machine readable input?

Speech Recognition

What are the computation costs involved in generating and using the model where faster is deemed to be better?

Speed

What is a test on one or more attributes and determines how the data are to be divided further?

Split Point

What is the most popular end user modeling tool because it incorporates many powerful financial, statistical, mathematical, and other functions?

Spreadsheet

What is the measure of the spread of values within a set of data?

Standard Deviation

What is the most commonly used and the simplest style of dimensional modeling that contains a central fact table surrounded and connected to dimension tables?

Star Schema

What is designed to provide fast query response time, simplicity, and ease of maintenance for read only database structures?

Star Schema -dimensions are denormalized with each dimension being represented by a single table

When representing data in a data warehouse, using several dimension tables that are each connected only to a fact table means you are using which warehouse structure?

Star schema

What is a collection of mathematical techniques to characterize and interpret data?

Statistics

What is the process of reducing inflected words to their stem form?

Stemming

What are words that are filtered out prior to or after processing of natural language data?

Stop waords

What are the two aspects to managing data that can't be stored in a single unit?

Storing and processing.

What is a computer program that automatically converts normal language text into human speech?

Text to Speech / Speech Synthesis

What are the steps of a closed loop BPM strategy?

Strategize, plan, monitor/analyze, and act/adjust

What is a high level plan of action, encompassing a long period of time to achieve a defined goal?

Strategy

What are features of a KPI?

Strategy, targets, ranges, encoding, time frames, and benchmarks.

What is the absence of ties between two parts of a network?

Structural Holes

What do data mining algorithm used and can be classified as categorical or numeric?

Structured data -Categorical: Nominal, ordinal -Numerical: Interval, ratio

What enables users to determine how their business is performing and why?

Subject Oriented -- provides a more comprehensive view of the organization.

What are characteristics of data warehousing?

Subject oriented, integrated, time variant, nonvolatile.

What are the types of metadata (based on pattern)?

Syntactic, structural, and semantic.

What is a single word or multi-word phrase extracted directly from the corpus of a specific domain by means of NLP methods?

Term

When would rows represent documents and columns represent terms?

Term Document Matrix

What is used when the classifier is build and then tested on the test set and has 1/3 of the data?

Test Set

What is more commonly used in a business application context,?

Text Analytics -- relatively new term

What is frequently used in academic research?

Text Mining

What are issues that are pertaining to scalability?

The amount of data in a warehouse, how quickly the warehouse is expected to grow, the number of concurrent users, and the complexity of user queries. **must scale both horizontal and vertically

What are popular techniques for time series forecasting?

The averaging methods -- simple average, moving average, weighted moving average, and exponential smoothing.

What is the purpose of technology providers or the outer petals?

They provide technology, solutions, and training to analytics user organizations so they can employ these technologies in the most effective ad efficient manner.

A Web client that connects to a Web server, which is in turn connected to a BI application server, is reflective of a

Three tier architecture

What is defined by the linear combination of time, emotional intensity, intimacy, and reciprocity?

Tie Strength

What is the structure of a two tier architecture?

Tier 1: Client Workstation Tier 2: Application & Database Server **more economical, but more performance problems

What is the structure of a three tier architecture?

Tier 1: Client Workstation Tier 2: Application Server Tier 3: Database Server **eliminates resource constraints and makes it possible to easily create DMs

What is a situation in which it's not important to know exactly when the event occurred?

Time Independent

What is a sequence of data points of the variable of interest, measured and represented as successive points in time spaced at uniform time intervals?

Time Series

What assumes all the explanatory variables are aggregated and consumed in the response variable's time variant behavior?

Time Series Forecasting

What is the use of mathematical modeling to predict future values of the variable of interest based on previously observed values?

Time Series Forecasting

What measures the visitor's interaction with the website?

Time on Site

True or false: During classification in data mining, a false positive is an occurrence classified as true by the algorithm while being false in reality.

True

True or false: The cost of data storage has plummeted recently, making data mining feasible for more firms.

True

True or false: The number of users of free/open source data mining software now exceeds that of users of commercial software versions.

True

What is the outcome of when the predictive class is negative and the observed class is negative?

True Negative

What is the outcome of when the predictive class is positive and the observed class is positive?

True Positive

What means that the pattern should make business sense that leads to the user saying they understand?

Ultimately Understandable

When would the decision maker consider situations in which several outcomes are possible for each course of action and the decision maker does not know the probability of occurrence of the possible outcomes?

Uncertainty

What are the factors that affect the result variables, but are not under the control of the decision maker?

Uncontrollable variables/Paramaters

What is composed of any combination of textual, imagery, voice, and Web content?

Unstructured data

What is a critical success factor in data warehouse development?

User participation

What uses animated computer graphic displays to present the impact of different managerial decisions?

VIS

What means that the discovered patterns should hold true on new data with a sufficient degree of certainty?

Valid

What is used to calculate the deviation of all data points in a given data set from the mean?

Variance

What is a simulation method that lets decision makers see what the model is doing and how it interacts with the decisions made, as they are made?

Visual Interactive Simulation (VIS)

What is a significant technology that has become a key player in descriptive analytics?

Visualization

What is an integral part of analytic CRM and customer experience management systems that helps to better understand and better manage customer complaints/praises?

Voice of the Customer

What has been limited to employee satisfaction surveys and is a way to listen what employees are saying?

Voice of the Employee

What is about understanding aggregate opinions and trends and helps companies with competitive intelligence and product development and positioning?

Voice of the Market

What is primarily Web site usage data focused and aims to describe what has happened on the Web site?

Web Analytics

What is the process of extracting useful information from the links embedded in Web documents and identifies authoritative pages and hubs?

Web structure mining

What is the integration of data warehousing and Internet that offer important solutions for managing corporate data?

Web-Based Data Warehousing

What are the two main components of the development cycle?

Webcrawler & Document Indexer

What open source data mining software includes a large number of algorithms for different data mining tasks and has an intuitive user interface most popular in educational circles?

Weka

What are examples of open source, free data mining software?

Weka, KNIME< Rapid Miner

What is the outcome of descriptive analytics?

Well-defined business problems and opportunities.

What is structured as "What will happen to the solution if an input variable, an assumption, or a parameter value is changed?"

What If Analysis

________ analytics help managers make decisions to achieve the best performance in the future.

prescriptive

What is the difference between white hat and black hat SEO?

White hats tend to produce results that last a long time and black hats anticipate that their sites may eventually be banned either temporarily or permanently once they discover what they are doing.

What is a laboriously hand coded database of English words, their definitions, sets of synonyms, and various semantic relations between synonym sets?

WordNet expensive to build and maintain for NLP

What is the purpose of the analytics accelerators or the inner petals?

Works with both technology providers and users.

What is the most granular polarity identification?

World Level

What is the world's largest data and text repository?

World Wide Web (WWW)

What is the process used to optimally prices services to maximize revenues as a function of time varying transactions?

Yield Management

Data warehouses are intended to work with informational data used for online ________ processing systems.

analytical

Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features?

clustering

At a very high level, the text mining process can be broken down into three consecutive tasks, the first of which is to establish the ________.

corpus

Knowledge extraction, pattern analysis, data archaeology, information harvesting, pattern searching, and data dredging are all alternative names for ________.

data mining

The three main types of data warehouses are data marts, operational ________, and enterprise data warehouses.

data stores

From the data taxonomy, YouTube videos are an example of _________ data.

unstructured

In The Business Pressures-Responses-Support Model, globalization, customer demand, and competition are examples of ___________ factors.

environment

The ________ Model, also known as the EDW approach, emphasizes top-down development, employing established database development methodologies and tools, such as entity-relationship diagrams (ERD), and an adjustment of the spiral development approach.

inmon

Data ________ comprises data access, data standardization, and change capture.

integration

What is used to develop probabilistic models between one or more explanatory models between one or more explanatory predictor variables?

logistic Regression

Because of its successful application to retail business problems, association rule mining is commonly called ________.

market basket analysis

Prediction problems where the variables have numeric values are most accurately defined as

regressions

A(n) ________ engine is a software program that finds Web sites or files based on keywords.

search

In which stage of extraction, transformation, and load (ETL) into a data warehouse are anomalies detected and corrected?

transformation


Set pelajaran terkait

Ch. 4 "The Carbohydrates: Sugars, Starches, & Fiber" - 4.3 "Glucose in the Body"

View Set

Chapter 14 Managing Engagement and Turnover

View Set

Matter and Periodic Table Review

View Set

Unified Science Chapter 13 (Stars and Galaxies), Section 4: Galaxies and the Universe, pp. 386-391.

View Set

Muscle Origin, Insertion, and Action

View Set

Marilyn Hughes (Fracture arm/leg)

View Set