BCOR 2020 Exam 1

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Benefits of BI and Analytics

1. Detect fraud 2. Improve forecasting 3. Increase sales 4. Optimize operations 5. Reduce costs

Options for addressing missing data

1. Discard observations (rows) with any missing values 2. Discard any variable (column) with missing values 3. Fill in missing entries with estimated values 4. To apply a data-mining algorithm that can handle missing values

Challenges of Big Data

1. How to choose what subset of the data to store 2. Where and how to store the data 3.How to find the nuggets of data that are relevant to the decision making at hand 4.How to derive value from the relevant data 5.How to identify which data needs to be protected from unauthorized access

Decision Making Process

1. Identify and define the problem 2. Determine the criteria 3. Determine the set of alternative solutions 4. Evaluate the alternatives 5. Choose an alternative

Data mining

A BI analytics tool used to explore large amounts of data for hidden patterns to predict future trends and behaviors for use in decision making

Data Attribute

A characteristic of an entity

Data modeling

A diagram of data entities and their relationships

Primary Key

A field or set of fields that uniquely identifies the record

Histogram

A graphical display of a frequency distribution, relative frequency distribution, or percent frequency distribution of quantitative data constructed by placing the class intervals on the horizontal axis and the frequencies, relative frequencies, or percent frequencies on the vertical axis.

Conversion Funnel

A graphical representation that summarizes the steps a consumer takes in making the decision to buy your product and become a customer.

Database Management System (DBMS)

A group of programs that manipulate the database and provide an interface between the database and the user of the database and other application programs.

Data warehouse

A large database that collects business information from many sources in the enterprise in support of management decision making

Linear Regression

A mathematical technique for predicting the value of a dependent variable based on a single independent variable and the linear relationship between the two

Geometric Mean

A measure of location that is calculated by finding the nth root of the product of n values, used in analyzing growth rates in financial data

Variance

A measure of variability that uses all data, based on the deviation about the mean, which is the difference between the value of each observation and the mean, the deviations about the mean are squared, and it measures how far a set of numbers are spread out from their average value

Online Analytical Processing (OLAP)

A method to analyze multidimensional data from many different perspectives. OLAP enables users to identify issues and opportunities and perform trend analysis

Data Entity

A person, place, or thing for which data is collected, stored, or maintained

Data Query

A request for information with certain characteristics

Value Chain

A series (chain) of activities that an organization performs to transform inputs into outputs in such a way that the value of the input is increased.

Information System

A set of interrelated elements that: collect, process, store, disseminate data and information, and provides a feedback mechanism to monitor and control its operation to make sure it continues to meet its goals and objectives

Process

A set of logically related tasks performed to achieve a defined outcome

Computer-based information system (CBIS)

A single set of hardware, software, databases, networks, people, and procedures that are configured to collect, manipulate, store, and process data into information.

Sample data

A subset of the population

Frequency Distributions for Categorical Data

A summary of data that shows the frequency of observations in each of several non-overlapping classes (bins)

Scatter chart

A useful graph for analyzing the relationship between two variables. Positive relationship: one variable increases, the other generally increases as well

Word cloud

A visual depiction of a set of words that have been grouped together because of the frequency of their occurrence.

Tactical (Managerial) Decisions

About how the organizations should achieve the goals and objectives set by strategy

Operational Decisions

Affect how the firm runs it's day-to-day operations

Technology infrastructure

All the hardware, software, databases, telecommunications, people, and procedures that are configured to collect, manipulate, store, and process data into information.

Database as a Service (DaaS)

An arrangement where the database is stored on a service provider's servers and accessed by the service subscriber over a network, typically the Internet, with the database administration handled by the service provider.

Database

An organized collection of data

Data with bell-shaped distribution

Approx. 68% will be within 1 standard deviation, approx. 95% of the data will be within 2 standard deviations, and almost all data will be within 3 standard deviations.

Categorical data

Arithmetic operations can't be performed on them

Utility Theory

Assigns values to outcomes based on the decision maker's attitude toward risk, loss, and other factors

Knowledge

Awareness and understanding of a set of information and the ways it can be useful to support a task. Process of defining relationships among data to create useful information requires knowledge

Spreadsheets

Business managers can often import data into a spreadsheet program and can be used to perform operations of the data based on formulas created by the end user. Also used to create reports and graphs based on that data. Excel Scenario Manager: Used to perform "what-if" analysis to evaluate various alternatives

Range

Can be found by subtracting the smallest value from the largest value in the data set. Drawback: range is based on only two of the observations and thus is highly influenced by extreme values

Cross-sectional data

Collected from several entities at the same, or approximately the same, point in time

Time Series data

Collected over several time periods

Random sampling

Collecting a sample that ensures that (i.) each element selected comes from the same populations and each (ii.) each element is selected independently

Data Dashboard

Collections of tables, charts, maps, and summary statistics that are updated as new data become available

Simulation Optimization

Combines the use of probability and statistics to to model uncertainty with optimization techniques to find good decisions in highly complex and highly uncertain settings

Predictive Analytics

Consists of techniques that use models constructed from past data to predict the future or ascertain the impact of one variable on another. Examples include: Linear regression, time series analysis, some data-mining techniques, and simulation (risk-analysis)

Data Cubes

Contain numeric facts called measures which are categorized by dimensions, such as time and geography. Can be built to summarize unit sales of a specific item on a specific day for a specific store

Before building a database

Content, access, logical structure, physical organization, archiving, security

Information

Data by itself isn't very useful. Collection of data organized so they have value beyond facts themselves

Enterprise data modeling

Data modeling done at the level of the entire enterprise

Entity relationship (ER) diagrams

Data models that use basic graphical symbols to show the organization of and relationship between data

Sources of Data

Data necessary to analyze a business problem or opportunity can often be obtained with an appropriate study (experimental or observational)

Hierarchy of Data

Database > Files > Records > Fields > Characters (8 bits)

Covariance

Descriptive measure of the linear association between two variables. If the covariance is > 0 it indicates a positive relationship. If it is near 0 the variables aren't linearly related. If it is < 0 they're negatively related. In excel: =COVARIANCE.S(ARRAY,ARRAY)

Coefficient of Variation

Descriptive stat that indicates how large the standard deviation is in relative to the mean. Expressed as a percentage

Supply Chain Management (SCM)

Encompasses all the activities required to get the right product into the right consumer's hands in the right quantity at the right time and at the right cost

Descriptive Analytics

Encompasses the set of techniques that describes what has happened in the past. Examples include: data queries, reports, descriptive statistics, and data visualization including data dashboards, some data-mining techniques, and basic what-if spreadsheet models

Business Problems

Every business has problems. Ex. How much to produce? How much to buy? When to open the store? Products to promote?

ETL process

Extract, transform, and load

Identifying Outliers

Extreme values in a data set. Can be identified using z-scores. Any data value with a z-score less than -3 or greater than +3 is an outlier. Such data values can be reviewed to determine their accuracy and whether they belong in the data set.

Box plot

Graphical summary of the distribution of data, developed from quartiles for a data set, by using the IQR limits are located. Lower limit = Q1-1.5 and Upper limit = Q3 + 1.5

Strategic Decisions

High level issues concerned with overall directions of the organization. Define goals and strategies.

Neural computing

Historical data is examined for patterns that are then used to make predictions

Case-based reasoning

Historical if-then-else cases are used to recognize patterns

Business Intelligence

Includes a wide range of applications, practices, and technologies for the extraction, transformation, integration, visualization, analysis interpretation, and presentation of data to support improved decision making. Data used in BI is often pulled from multiple sources and may come from sources internal and external to the organization

Group IS

Includes information systems that improve communications and support collaboration among members of a work group

Personal IS

Includes information systems that improve the productivity of individual users

Enterprise IS

Includes information systems that organizations use to define structured interactions among their own employees and/or external customers, suppliers, government agencies, etc.

Prescriptive Analytics

Indicates a best course of action to take. They provide a forecast or prediction but doesn't provide a decision. A forecast or prediction when combined with a rule becomes a prescriptive model. Examples include: rule-based models, portfolio models in finance, supply network design models in operations, and price-markdown models in retailing (optimization models)

Data Preparation

Involves descriptive stats and data visualization. Treating missing data and identifying erroneous data and outliers

Supply Chain

Key value chain in a manufacturing organization

Non-experimental (observational)

Make no attempt to control the variables of interest

Correlation Coefficient

Measures the relationship between two variables. Not affected by the units of measurement for x and y. It it's less than 0 it's negative linear, it it's near 0 its not linear, and if it's greater than 0 it's positive linear. In excel: =CORREL(ARRAY,ARRAY)

Z-Score

Measures the relative location of a value in the data set. Helps to determine how far a value is from the mean relative to the data set's standard deviation. Often called the standardized value.

Optimization Models

Models that give the best decision subject to the constraints of the situation

Mean/Arithmetic Mean

Most common measure of location and average value for a variable

Frequency distributions for quantitative data

Must be more careful in defining the non-overlapping bins to be used in distribution

Legitimately Missing Data

Naturally missing data. Generally no remedial action taken

Quantitative data

Numeric and arithmetic operations can be performed on them

Managers

Plans, coordinate, organize, and lead their organizations to better performance

Types of data

Population and sample data, Quantitative and categorical data, and cross-sectional and time-series data

Database activities

Providing a user view of the database Adding and modifying data Storing and retrieving data Manipulating data and Generating reports

Domain

Range of allowable values for a data attribute

Histogram Skewness

Skewed whichever way the tail extends further on

Three types of business decisions

Strategic, Tactical (Managerial), and Operational

Percent frequency

Summarizes the percent frequency of the data for each bin

Relative frequency

Tabular summary of data showing the relative frequency for each bin

Data lake

Takes a "store everything" approach to big data, saving all the data in its raw, unaltered form. Also called enterprise data hub. Raw data is available when the users decide just how they want to use the data. Only when the data is accessed for a specific analysis is it extracted from the data lake

Data

The facts and figures collected, analyzed, and summarized for presentation and interpretation. Raw facts: Alphanumeric, audio, image, and video

Standard Deviation

The positive square root of the variance, measured in the same units as the original data, used to quantify the amount of variation or dispersion of a set of data values

Data Visualization Tools

The presentation of data in a pictorial or graphical format. Representing data in a visual form brings immediate impact to dull and boring numbers

Population data

The set of all elements of interest in a particular study

Business Analytics

The solution to the scientific process of transforming data into insights for making better decisions. Creates insights from data, improves our ability to more accurately forecast for planning, helps us quantify risk, categories, and yields better alternatives through analysis and optimization.

Data Item

The specific value of an attribute

Imputation

The systematic replacement of missing values with values that seem reasonable

Missing at random (MAR)

The tendency for an observation to be missing a value for some variable is related to the value of some other variable(s) in the data. Ex. Diagnostic tests are missing when the patient is too sick to do procedure

Missing completely at random (MCAR)

The tendency for an observation to be missing the value for some variable is entirely random; whether data are missing does not depend on either the value of the missing data or any other variable in the data

Missing not at random (MNAR)

The tendency for the value of a variable to be missing is related to the value that's missing. Ex. high income individuals don't want to report income

Data Mining

The use of analytical techniques for better understanding patterns and relationships that exist in large data sets

Percentile

The value of a variable at which a specified (approximate) percentage of observations are below that value. the pth percentile tells us the point in data where: approx. p percent of the observations have values less than the pth percentile and approx. (100-p) percent of the observations have values greater than the pth percentile.

Common approaches to decision making

Tradition (we've always done it this way), intuition ("gut-feeling"), and rules of thumb (offer two sections of BCOR 2020 each semester), and using relevant data

The Database Approach

Traditional approach to data management: Each distinct operational system used data files dedicated to that system. The database approach: Information systems share a pool of related data, offers the ability to share data and information resources, and a database management systems (DBMS) is required

Challenges of decision making

Uncertainties and enormous number of alternatives

Illegitimately missing data

Unnaturally occurring missing data

Simulation (Risk-Analysis)

Use of probability and statistics to construct a computer model to study the impact of uncertainty on a decision

Cumulative distributions

Uses the number of classes, class widths and class limits developed for the frequency distribution, shows the number of data items with values less than or equal to the upper class limit of each class

Median

Value in the middle when the data are arranged in ascending order, take mean of middle two values if needed, mean is preferred measure of central location but it is influenced by extremely small and large data values, and when datasets contain extreme values the median is preferred

Mode

Value that occurs the most frequently in a given data set. Multimodal data: data with multiple modes. Bimodal data: data that contains exactly two modes.

Experimental

Variable of interest is first identified then one or more other variables are identified and controlled or manipulated to obtain data about how these variables influence the variable of interest

Quartiles

When data is divided into 4 equal parts. Each part contains approx. 25% of observations. Second quartile = the median. The difference between the third and first quartile in the IQR.

Empirical rule

When the distribution of the data exhibits a symmetrical bell-shape the empirical rule can be used to determine the percentage of data values that are within a specified number of standard deviations of the mean

Relational Database Model

a simple but highly useful way to organize data into collections of two-dimensional tables called relations. Each row in the table represents an entity and each column represents an attribute of that entity

SQL Databases

a special-purpose programming language for accessing and manipulating data stored in a relational database. SQL databases conform to ACID (Atomicity, consistency, isolation, and durability) properties. in 1986 SQL was adopted by ANSI as the standard query language for relational databases

Association analysis

a specialized set of algorithms sorts through data and forms statistical rules about relationships among the items

Data mart

a subset of a data warehouse that is used by small- and medium-sized businesses and departments within large companies to support decision making. A specific area in the data mart might contain greater detailed data than the data warehouse

Reporting and Querying Tools

can present data in an easy to understand fashion via formatted data, graphs, charts. many tools enable users to make their own data requests and format the results without the need for additional help from IT organizations

Big data

extremely large and complex datasets, typically characterized as being of high volume, variety, and velocity

Database Administrations (DBA)

skilled and trained IS professionals. Works with users to define their data needs, applies database programming languages to craft a set of databases to meet those needs, tests and evaluates databases, implements changes to improve their databases performance, and assures that data is secure from unauthorized access


Kaugnay na mga set ng pag-aaral

One Step multiplication and Division Equations

View Set

MC Chapter 28 the child with a GI condition & 29 the child with a GU condition

View Set

Chp. 19. Program Design and Technique for Speed and Agility Training

View Set

U-world: Gen-Chem: Thermo chemistry#1

View Set