Information Technology Management, Chapter 3

Ace your homework & exams now with Quizwiz!

Factors That Determine the Performance of a DBMS

* Data Latency * Ability to handle the volatility of the data * Query response time * Data consistency * Query predictability

Three technologies involved in preparing raw data for analytics

- ETL, - change data capture (CDC), - data deduplication ("deduping the data")

Hadoop

- It places no conditions on the structure of the data it can process. - distributes computing problems across a number of servers.

Business value categories

- Making more informed decisions at the time they need to be made - Discovering unknown insights, patterns, or relationships - Automating and streamlining or digitizing business processes

Four factors contributing to increased use of BI.

- Smart Devices Everywhere have created demand for effortless 24/7 access to insights. - Data are Big Business when they provide insight that supports decisions and action. - Advanced Bl and Analytics help to ask questions that were previously unknown and unanswerable. - Cloud Enabled Bl and Analytics are providing low-cost and flexible solutions

Benefits of centralized database

1. Better control of data quality. Data consistency is easier when data are kept in one physical location because data additions, updates, and deletions can be made in a supervised and orderly fashion. 2. Better IT security. Data are accessed via the centralized host computer, where they can be protected more easily from unauthorized access or modifcation.

BI governance program

1. Clearly articulate business strategies. 2. Deconstruct the business strategies into a set of specific goals and objectives— the targets. 3. Identify the key performance indicators (KPIs) that will be used to measure progress toward each target. 4. Prioritize the list of KPIs. 5. Create a plan to achieve goals and objectives based on the priorities. 6. Estimate the costs needed to implement the BI plan. 7. Assess and update the priorities based on business results and changes in business strategy.

Text Analytics Steps

1. Exploration. 2. Preprocessing. 3. Categorizing and Modeling.

data moved from databases to a warehouse are:

1. Extracted from designated databases. 2. Transformed by standardizing formats, cleaning the data, integrating them. 3. Loaded into a data warehouse.

MapReduce Stages

1. Map stage: MapReduce breaks up the huge dataset into smaller subsets; then distributes the subsets among multiple servers where they are partially processed. 2. Reduce stage: The partial results from the map stage are then recombined and made available for analytic tools.

Three general data principles relate to the data life cycle

1. Principle of diminishing data value. 2. Principle of 90/90 data use. 3. Principle of data in context.

Four V's of Data Analytics

1. Variety: 2. Volume: 3. Velocity: 4. Veracity:

Principle of Integrity.

A recordkeeping program will be able to reasonably guarantee the authenticity and reliability of records and data.

Principle of Accountability.

An organization will assign a senior executive to oversee a recordkeeping program; adopt policies and procedures to guide personnel; and ensure program audit ability.

Data security:

Check and control data integrity over time.

Data integrity and maintenance:

Correct, standardize, and verify the consistency and integrity of the data.

Enterprise data warehouses (EDW)

Data warehouses that pull together data from disparate sources and databases across an entire enterprise

HDFS

HaDoop File Systems

Data synchronization:

Integrate, match, or link data from disparate sources.

electronic records management (ERM) system

Keeps most records in electronic format and maintained throughout their life cycle—from creation to final archiving or destruction

Volume:

Large volumes of structured and unstructured data are analyzed.

Eventual consistency,

Means not all query responses will reflect data changes uniformly

OLTP

Online transaction processing systems

The highest-ranking enterprise DBMSs in mid-2014

Oracle's MySQL, Microsoft's SQL Server, PostgreSQL, IBM's DB2, and Teradata Database

Data filtering and profiling:

Process and store data efficiently. Inspect the data for errors, inconsistencies, redundancies, and incomplete information.

Data Access

Provide authorized access to data in both planned and ad hoc ways within acceptable time

Principle of Retention.

Records and data will be maintained for an appropriate time based on legal, regulatory, fiscal, operational, and historical requirements.

Principle of Availability.

Records will be maintained in a manner that ensures timely, efficient, and accurate retrieval of needed information.

Sentiment Analysis

Social commentary and social media are being mined to understand consumer intent

Velocity:

Speed of access to reports that are drawn from data defines the difference between effective and ineffective analytics.

Variety:

The analytic environment has expanded from pulling data from enterprise systems to include big data and unstructured sources.

Principle of data in context.

The capability to capture, process, format, and distribute data in near real time or faster requires a huge investment in data architecture and infrastructure to link remote POS systems to data storage, data analysis systems, and reporting apps. The investment can be justifed on the principle that data must be integrated, processed, analyzed, and formatted into "actionable information."

Principle of Transparency.

The processes and activities of an organization's recordkeeping program will be documented in an understandable manner and available to all personnel and appropriate parties.

Principle of Protection.

The recordkeeping program will be constructed to ensure a reasonable level of protection to records and information that are private, confidential, privileged, secret, or essential to business continuity.

Principle of Compliance.

The recordkeeping program will comply with applicable laws, authorities, and the organization's policies.

Principle of diminishing data value.

The value of data diminishes as they age. This is a simple, yet powerful principle. Most organizations cannot operate at peak performance with blind spots (lack of data availability) of 30 days or longer. Global f nancial services institutions rely on near real time data for peak performance

Veracity:

Validating data and extracting insights that managers and workers can trust are key factors of successful analytics. Trust in analytics has grown more difficult with the explosion of data sources.

Text mining

a broad category that involves interpreting words and concepts in context - helps companies tap into the explosion of customer opinions expressed online

Principle of 90/90 data use.

a majority of stored data, as high as 90 percent, is seldom accessed after 90 days (except for auditing purposes). That is, roughly 90 percent of data lose most of their value after 3 months

distributed database system

allows apps on computers and mobiles to access data from both local and remote databases

Queries

are ad hoc (unplanned) user requests for specific data

Databases

are collections of datasets or records stored in a systematic way

Data marts

are lower-cost, scaled-down versions of data warehouses that can be implemented in a much shorter time, for example, in less than 90 days. They serve a specific department or function, such as finance, marketing, or operations

Data marts

are small-scale data warehouses that support a single function or one department

Master data entities

are the main entities of a company, such as customers, products, suppliers, employees, and assets.

Data and text mining

are used to discover knowledge that you did not know existed in the databases

ERM systems

consist of hardware and software that manage and archive electronic documents and image paper documents; then index and store them according to company policy

Record examples

contracts, research and development, accounting source documents, memos, customer/client communications, hiring and promotion decisions, meeting minutes, social posts, texts, e-mails, website content, database records, and paper and electronic files

Business analytics

describes the entire function of applying technologies, algorithms, human expertise, and judgment

SQL Server

ease of use, availability, and Windows operating system integration make it an easy choice for firms that choose Microsoft products for their enterprises

Data mining software

enables users to analyze data from various dimensions or angles, categorize them, and find correlations or patterns among fields in the data warehouse

Data ownership problems

exist when there are no policies defining responsibility and accountability for managing data

ETL

extract, transform, and load

GIGO

garbage in, garbage out

BI tools

integrate and consolidate data from various internal and external sources and then process them into information to make smart decisions

Data warehouses

integrate data from multiple databases and data silos, and organize them for complex analysis, knowledge discovery, and to support decision making

Database, data warehouse, big data, and business intelligence (BI) technologies

interact to create a new biz-tech ecosystem

Online transaction processing (OLTP) systems

is a database design that breaks down complex information into simpler data tables to strike a balance between transaction-processing efficiency and query efficiency

Operating margin

is a measure of the percent of a company's revenue left over after paying for its variable costs, such as wages and raw materials

The data life cycle

is a model that illustrates the way data travel through an organization

SQL

is a standardized query language for accessing databases

online analytics-processing systems (OLAP)

is a term used to describe the analysis of complex data from the data warehouse

A data entity

is anything real or abstract about which a company wants to collect and store data.

A record

is documentation of a business event, action, decision, or transaction

Big data analytics

is not just about managing more or varied data. Rather, it is about asking new questions, formulating new hypotheses, exploration and discovery, and making data-driven decisions.

Latency

is the elapsed time (or delay) between when data are created and when they are available for a query or report

PostgreSQL

is the most advanced open source database, often used by online gaming applications and Skype, Yahoo!, and MySpace

Market share

is the percentage of total sales in a market captured by a brand, product, or company

DB2

is widely used in data centers and runs on Linux, UNIX, Windows, and mainframes

Immediate consistency

means that as soon as data are updated, responses to any new query will return the updated value.

Fault tolerance

means that no single failure results in any loss of service

Scalability

means the system can increase in size to handle data growth or the load of an increasing number of concurrent users

CDC

minimizes the resources required for ETL processes by only dealing with data changes

ETL processes

move data from databases into data warehouses or data marts, where the data are available for access, reports, and analysis.

OLAP

online analytics-processing systems

Dirty data

poor-quality data

master data management (MDM)

processes integrate data from various sources or enterprise applications to create a more complete (unified) view of a customer, product, or other entity

Deduping

processes remove duplicates and standardize data formats, which helps to minimize storage and data synch

Relational management systems (RDBMSs)

provide access to data using a declarative language—structured query language (SQL)

A decision model

quantifies the relationship between variables, which reduces uncertainty

active data warehouse (ADW)

real time data warehousing and analytics

Volatile

refers to data that change frequently

Declarative languages

simplify data access by requiring that users only specify what data they want to access without defining how access will be achieved

business-driven development approach

starts with a business strategy and work backward to identify data sources and the data that need to be acquired and analyzed

Data Warehouses

store data from various source systems and databases across an enterprise in order to run analytical queries against huge datasets collected over long time periods. They are the primary source of cleansed data for analysis, reporting, and BI

Relational databases

store data in tables consisting of columns and rows, similar to the format of a spreadsheet

Centralized database

stores data at a single location that is accessible from anywhere. Searches can be fast because the search engine does not need to check multiple distributed locations to find responsive data

Data

the driving force behind any successful business

Business intelligence (BI)

tools and techniques process data and do statistical analysis for insight and discovery—that is, to discover meaningful relationships in the data, keep informed in real time, detect trends, and identify opportunities and risks

MySQL

which was acquired by Oracle in January 2010, powers hundreds of thousands of commercial websites and a huge number of internal enterprise applications

The major ERM tools

workflow software, authoring tools, scanners, and databases

Functions performed by a DBMS

• Data filtering and profiling • Data integrity and maintenance: • Data synchronization: • Data security: • Data access:

Data warehouses are:

• Designed and optimized for analysis and quick response to queries. • Nonvolatile. This stability is important to being able to analyze the data and make comparisons. When data are stored, they might never be changed or deleted in order to do trend analysis or make comparisons with newer data. • OLAP systems. • Subject-oriented, which means that the data captured are organized to have similar data linked together.

Databases are:

• Designed and optimized to ensure that every transaction gets recorded and stored immediately. • Volatile because data are constantly being updated, added, or edited. • OLTP systems.

An ERM can help a business to become more efficient and productive by:

• Enabling the company to access and use the content contained in documents. • Cutting labor costs by automating business processes. • Reducing the time and effort required to locate information the business needs to support decision making. • Improving the security of content, thereby reducing the risk of intellectual property theft. • Minimizing the costs associated with printing, storing, and searching for content.

Advantages of NoSQL

• Higher performance • Easy distribution of data on different nodes, which enables scalability and fault tolerance • Greater flexibility • Simpler administration

HDFS Stages

• Loads data into HDFS. • Performs the MapReduce operations. • Retrieves results from HDFS.

Cost of Poor Quality Data

• Lost business. • Time spent preventing errors. • Time spent correcting errors.

Data Warehouses support

• Marketing and sales. • Pricing and contracts. • Forecasting. • Sales. • Financial

Generally Accepted Recordkeeping Principles

• Principle of Accountability. • Principle of Transparency. • Principle of Integrity. • Principle of Protection. • Principle of Compliance. • Principle of Availability. • Principle of Retention. • Principle of Disposition.


Related study sets

OMGT 5783 Project Management Midterm

View Set

CH 7 Safety and Patient Reception EHS

View Set

Keyboarding communication skills

View Set

Phys 100 Wk 3: Newton's 2nd Law of Motion

View Set

econ 2 - midterm 2 modules 5 - 8

View Set