cis

Ace your homework & exams now with Quizwiz!

There are two types of integrity constraints:

(1) relational and (2) business critical.

The Solution: Business Intelligence

Employee decisions are numerous, and they include providing service information, offering new products, and supporting frustrated customers.

Exploratory Data Analysis

Identifies patterns in data, including outliers, uncovering the underlying structure to understand relationships between the variables.

Identity management

Identity management

Increased Scalability and Performance

In its first year of operation, the official website of the American Family Immigration History Center, www.ellisisland.org, generated more than 2.5 billion hits. The site offers immigration information about people who entered America through the Port of New York and Ellis Island between 1892 and 1924. The database contains more than 25 million passenger names that are correlated to 3.5 million images of ships' manifests.4 The database had to be scalable to handle the massive volumes of information and the large numbers of users expected for the launch of the website. In addition, the database needed to perform quickly under heavy use.

Increased Information Integrity (Quality)

Integrity constraints Information integrity

Pattern Recognition Analysis

The classification or labeling of an identified pattern in the machine learning process.

Speech Analysis

The process of analyzing recorded calls to gather information; brings structure to customer interactions and exposes information buried in customer contact center interactions with an enterprise. Speech analysis is heavily used in the customer service department to help improve processes by identifying angry customers and routing them to the appropriate customer service representative.

Miners:

Transactions are authenticated by a network of 'miners' who complete complex mathematical problems. When all miners arrive at the same unique solution, the transaction is verified and recorded as a new 'block'. New transaction blocks are added to the digital ledger in a chained fashion, forming a 'blockchain'. The distribution of miners means that the system cannot be hacked by a single source. If anyone tries to tamper with one ledger, all of the nodes will disagree on the integrity of that ledger and will refuse to incorporate the transaction into the blockchain.[1]

Behavioral Analysis

Using data about people's behaviors to understand intent and predict future actions.

A repository

a central location in which data is stored and managed.

data warehouse

a logical collection of information—gathered from many different operational databases—that supports business analysis activities and decision-making tasks.

Information cleansing or scrubbing

a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information.

relational database management system

allows users to create, read, update, and delete data in a relational database. (Although the hierarchical and network models are important, this text focuses only on the relational database model.)

Data-driven decision management

an approach to business governance that values decisions that can be backed up with verifiable data. The success of the data-driven approach is reliant upon the quality of the data gathered and the effectiveness of its analysis and interpretation.

Attributes

called columns or fields) are the data elements associated with entity. A record is an entity occupies one row in its respective table.

data dictionary

compiles all of the metadata about the data elements in the data model. Looking at a data model along with reviewing the data dictionary provides tremendous insight into the database's functions, purpose, and business rules.

Data Mart

contains a subset of data warehouse information. To distinguish between data warehouses and data marts, think of data warehouses as having a more organizational focus and data marts as having focused information subsets particular to the needs of a given business unit such as finance or production and operations.

Bitcoin uses encryption to maintain the integrity of transactions, which is why it is called a ____

cryptocurrency

Business-critical integrity constraints

enforce business rules vital to an organization's success and often require more insight and knowledge than relational integrity constraints. Consider a supplier of fresh produce to large grocery chains such as Kroger. The supplier might implement a business-critical integrity constraint stating that no product returns are accepted after 15 days past delivery. That would make sense because of the chance of spoilage of the produce. Business-critical integrity constraints tend to mirror the very rules by which an organization achieves success.

Dirty data

erroneous or flawed data. (incorrect data, misleading data, duplicated data) The complete removal of dirty data from a source is impractical or virtually impossible. To increase the quality of organizational information and thus the effectiveness of decision making, businesses must formulate a strategy to keep information clean.

Bitcoin

every user is allowed to connect to the network, send new transactions to it, verify transactions, and create new blocks.(peer - to-peer)

logical view of information

focuses on how individual users logically access information to meet their own particular business needs.

data broker

is a business that collects personal information about consumers and sells that information to other organizations.

Information integrity

is a measure of the quality of information.

predictive and prescriptive

is that the former forecasts potential future outcomes, while the latter helps you draw up specific recommendations.

Anomaly detection

is the process of identifying rare or unexpected items or events in a data set that do not conform to other items in the data set. One of the key advantages of performing advanced analytics is to detect anomalies in the data to ensure they are not used in models creating false results.

Infographics (information graphics)

present the results of data analysis, displaying the patterns, relationships, and trends in a graphical format. Infographics are exciting and quickly convey a story users can understand without having to analyze numbers, tables, and boring charts

foreign key

primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables.

A relational database model

stores information in the form of logically related two-dimensional tables.

Data aggregation

the collection of data from various sources for the purpose of data processing.

Data latency

the time it takes for data to be stored or retrieved. Some organizations must be able to support hundreds or thousands of users including employees, partners, customers, and suppliers, who all want to access and share the same information with minimal data latency.

use primary keys and foreign keys

to create logical relationships

Business intelligence dashboards

track corporate metrics such as critical success factors and key performance indicators and include advanced capabilities such as interactive controls, allowing users to manipulate data for analysis. The majority of business intelligence software vendors offer a number of data visualization tools and business intelligence dashboards.

An entity

(also referred to as a table) stores information about a person, place, thing, transaction, or event.

Which of the following properly characterize Analytics?

1) Analytics is the use of math and statistics to derive meaning from data. 2) Analytics can be prescriptive. 3) Analytics can be predictive. 4) Analytics has a goal to make better business decisions. 5) Analytics can be descriptive.

Advantage of bitcoin

1) Any well-connected node in the Bitcoin blockchain can determine, with certainty 2) whether a transaction does or does not exist in the data set. 3) A prohibitively high cost to attempt to rewrite or alter transaction history

Business Intelligence is commonly discussed using the shorthand label BI (pronounced Bee Eye). Which of the following properly characterize BI?

1) BI is about delivering relevant and reliable information to the right people at right time. 2) BI has a goal of achieving better business decisions. 3) BI is a broad field that includes Analytics, predictive modeling, text mining and many other approaches for using data.

Big Data refers to the vast volumes and types of information that companies can now collect and process using increasingly high-tech systems. Where does big data come from?

1) Structured data 2) External data sources 3) Internal data sources 4) Unstructured data

DBMS use three primary data models for organizing information

1) hierarchical, 2) network, 3) relational database

Business Advantages of a Relational Database

1. Increased Flexibility 2. Increased Scalability and Performance 3. Reduced Information Redundancy 4. Increased Information Integrity (Quality) 5. Increased Information Security

Social Media Analysis

Analyzes text flowing across the Internet, including unstructured text from blogs and messages.

Web Analysis

Analyzes unstructured data associated with websites to identify consumer behavior and website navigation.

Text Analysis

Analyzes unstructured data to find trends and patterns in words and sentences. Text mining a firm's customer support email might identify which customer service representative is best able to handle the question, allowing the system to forward it to the right person.

Hash and Digital Signature (thus the label 'crypto-currency'):

Computer Science and advanced mathematics (in the form of cryptographic hash functions) protect the blockchain's integrity and anonymity. Each transaction has a digital hash calculated and attached. The hash includes digital signatures from the existing blockchain as well as the new transaction. In this way each block vouches the the integrity of all prior blocks and thereby prevents falsification or manipulation of prior transactions. The result is a fully transparent ledger with strong collective trust

1. Increased Flexibility

Databases tend to mirror business structures, and a database needs to handle changes quickly and easily, just as any business needs to be able to do. Equally important, databases need to provide flexibility in allowing each user to access the information in whatever way best suits his or her needs. The distinction between logical and physical views is important in understanding flexible database user views.

Correlation Analysis

Determines a statistical relationship between variables, often for the purpose of identifying predictive factors among the variables.

Increased Information Security

Managers must protect information, like any asset, from unauthorized users or misuse. As systems become increasingly complex and highly available over the Internet on many different devices, security becomes an even bigger issue. Databases offer many security features including passwords to provide authentication, access levels to determine who can access the data, and access controls to determine what type of access they have to the information.

The Problem: Data Rich, Information Poor

Source data

data lake

a storage repository that holds a vast amount of raw data in its original format until the business needs it.

data map

a technique for establishing a match, or balance, between the source data and the target data warehouse. This technique identifies data shortfalls and recognizes data issues. They can also alert managers to inconsistencies or help determine the cause and effects of enterprise wide business decisions.

A data set

an organized collection of data.

Algorithms

are mathematical formulas placed in software that performs an analysis on a data set.

Integrity constraints

are rules that help ensure the quality of information. The database design needs to consider integrity constraints. The database and the DBMS ensures that users can never violate these constraints.

The specification and enforcement of integrity constraints produce higher-quality information that will provide

better support for business decisions. Organizations that establish specific procedures for developing integrity constraints typically see an increase in accuracy that then increases the use of organizational information by business professionals.

comparative analysis

can compare two or more data sets to identify patterns and trends. Employees can base their decisions on data sets, experience, or knowledge and preferably a combination of all three. Business intelligence can provide managers with the ability to make better decisions.

The physical view of information

deals with the physical storage of information on a storage device.

A business rule

defines how a company performs certain aspects of its business and typically results in either a yes/no or true/false answer. Stating that merchandise returns are allowed within 10 days of purchase is an example of a business rule.

Data visualization

describes technologies that allow users to see or visualize data to transform information into a business perspective. Data visualization is a powerful way to simplify complex data sets by placing data in a format that is easily grasped and understood far quicker than the raw data alone.

A data scientist

extracts knowledge from data by performing statistical analysis, data mining, and advanced analytics on big data to identify trends, market changes, and other relevant information. displays the techniques a data scientist will use to perform big data advanced analytics.

primary key

field (or group of fields) that uniquely identifies a given record in a table. Primary keys are a critical piece of a relational database because they provide a way of distinguishing each record in a table;

Source data

identifies the primary location where data is collected. Source data can include invoices, spreadsheets, time-sheets, transactions, and electronic sources such as other databases. Managers send their information requests to the MIS department where a dedicated person compiles the various reports. In some situations, responses can take days, by which time the information may be outdated and opportunities lost.

A data artist

is a business analytics specialist who uses visual tools to help people understand complex data. Great data visualizations provide insights into something new about the underlying patterns and relationships. Just think of the periodic table of elements and imagine if you had to look at an Excel spreadsheet showing each element and the associated attributes in a table format.

An outlier

is a data value that is numerically distant from most of the other data points in a set of data. Anomaly detection helps to identify outliers in the data that can cause problems with mathematical modeling.

Blockchain

is a distributed ledger that provides a way for information to be recorded and shared by a community. In this community, each member maintains his or her own copy of the information and all members must validate any updates collectively. The information could represent transactions, contracts, assets, identities, or practically anything else that can be described in digital form. Entries are permanent, transparent, and searchable, which makes it possible for community members to view transaction histories in their entirety. Each update is a new 'block' added to the end of the 'chain.' A protocol manages how new edits or entries are initiated, validated, recorded, and distributed. With blockchain, cryptology replaces third-party intermediaries as the keeper of trust, with all block chain participants running complex algorithms to certify the integrity of the whole.

data point

is an individual item on a graph or a chart. Organizational data includes far more than simple structured data elements in a database; the set of data also includes unstructured data such as voice mail, customer phone calls, text messages, video clips, along with numerous new forms of data, such as tweets from Twitter.

Fast data

is the application of big data analytics to smaller data sets in near-real or real-time in order to solve a problem or create business value. The term fast data is often associated with business intelligence and the goal is to quickly gather and mine structured and unstructured data so that action can be taken.

Analytics

is the science of fact-based decision making. Analytics uses software-based algorithms and statistics to derive meaning from data. Advanced analytics uses data patterns to make forward-looking predictions to explain to the organization where it is headed.

Information redundancy(One primary goal of a database)

is to eliminate information redundancy by recording each piece of information in only one place in the database. This saves disk space, makes performing information updates easier, and improves information quality.

Data visualization tools

move beyond Excel graphs and charts into sophisticated analysis techniques such as controls, instruments, maps, time-series graphs, and more. Data visualization tools can help uncover correlations and trends in data that would otherwise go unrecognized.

Analysis paralysis

occurs when the user goes into an emotional state of over-analysis (or over-thinking) a situation so that a decision or action is never taken, in effect paralyzing the outcome. In the time of big data, analysis paralysis is a growing problem. One solution is to use data visualizations to help people make decisions faster.

logical view of information

one user could perform a query to determine which recordings had a track length of four minutes or more. At the same time, another user could perform an analysis to determine the distribution of recordings as they relate to the different categories. For example, are there more R&B recordings than rock, or are they evenly distributed? This example demonstrates that while a database has only one physical view, it can easily support multiple logical views that provides for flexibility.

Distributed computing

processes and manages algorithms across many machines in a computing environment. A key component of big data is a distributed computing environment that shares resources ranging from memory to networks to storage. With distributed computing individual computers are networked together across geographical areas and work together to execute a workload or computing processes as if they were one single computing environment

1)Relational integrity constraints

rules that enforce basic and fundamental information-based constraints. For example, a relational integrity constraint would not allow someone to create an order for a nonexistent customer, provide a markup percentage that was negative, or order zero pounds of raw materials from a supplier.

The vision for highly effective BI will____?

take messy information and turn it into a tidy and accessible grocery store.

cube

the common term for the representation of multidimensional information. a cube (cube a) that represents store information (the layers), product information (the rows), and promotion information (the columns).

Virtualization

the creation of a virtual (rather than actual) version of computing resources, such as an operating system, a server, a storage device, or network resources. With big data it is now possible to virtualize data so that it can be stored efficiently and cost-effectively. Improvements in network speed and network reliability have removed the physical limitations of being able to manage massive amounts of data at an acceptable pace.

3. Reduced Information Redundancy

the duplication of data, or the storage of the same data in multiple places. Redundant data can cause storage issues along with data integrity issues, making it difficult to determine which values are the most current or most accurate.

A key idea within data warehousing

to collect information from multiple systems in a common location that uses a universal querying tool. This allows operational databases to run where they are most efficient for the business, while providing a common location using a familiar format for the strategic or enterprisewide reporting information.

The primary purpose of a data warehouse

to combine information, more specifically, strategic information, throughout an organization into a single repository in such a way that the people who need that information can make decisions and undertake business analysis.

BI can help managers with competitive monitoring

where a company keeps tabs of its competitor's activities on the web using software that automatically tracks all competitor website.

The data warehouse modeled in compiles information from internal databases transactional/operational databases and external databases through extraction, transformation, and loading (ETL)

which is a process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse. The data warehouse then sends subsets of the information to data marts.


Related study sets

Unit 1 AP Macroeconomics (September 2023 Economics)

View Set

Nursing Care of Patients with Vascular Diseases

View Set

Ch 12 Antepartum Nursing Assessment

View Set

Sociology Final Exam Study Guide Question

View Set

Harnessing the Science of Persuasion by Robert B. Cialdini

View Set

Cognitive Psychology, Goldstein, Ch. 12, Problem Solving

View Set

Population Health Exam 1: Units 1, 2, 3

View Set