IS chapter 3
benefits of data warehousing
-access data quickly and easily via web browsers bc they are located in one place -conduct expensive analysis with data in ways that may not have been possible before -consolidate view or organizational data
characteristics of high quality info include
-accuracy -completeness -consistency -uniqueness -timeliness
data warehouse
-captures organizational level and information -data is cleaned -data does not change -used for mining the data for historical trends and patterns
master data management
a strategy for data governance involving a process that spans all organizational business processes and applications providing companies with the ability to store, maintain, exchange, and synchronize a consistent, accurate, and timely for the company's master data.
relational database model
based on the concept of 2D tables and its usually designed with a number related tables with each of these tables contains records (listed in rows) and attributes (listed i columns)
variety
big data formats change rapidly and can include satellite imagery broadcast audio streams digital music flies and web page content
what is the order of the data hierarchy
bit byte field record data file database
explicit
can be articulated and written
volume
lots of big data
database management system
- collection of programs to store, delete, access, analyze data -security -data dictionary/ meta data - a set of programs that provide users with tools to create and manage database
bid data institure (TBDI) defines big data as
-exhibit variety -includes structured, unstructured, and semi structured data -are generated at high velocity with an uncertain pattern -do not fit neatly into traditional structured relationsl databases -can be captured, processed, transformed, and analyzed in a reasonable amount of time only by sophisticated information systems.
sources of error of information include
-intentionally inaccurate information to protect privacy -different entry standards and formats -abbreviated or erroneous information by accident or to save time -external information contains inconsistencies, inaccuracies and errors
normalization
-minimum redundancy -maximum data integrity -best processing performance
data warehouse and data marts characteristics
-organized by business dimension or use online analytical processing (OLAP) -integrated -time variant -nonvolatile -multidimensional
what does big data consist of
-traditional enterprise data -machine generated / sensor data -social data -images captured by billions of devices located throughout the world
data mart
A low-cost, scaled-down version of a data warehouse that is designed for the end-user needs in a strategic business unit (SBU) or a department.
knowledge management KM
A process that helps organizations identify, select, organize, disseminate, transfer, and apply information and expertise that are part of the organization's memory and that typically reside within the organization in an unstructured manner.
Master Data
A set of core data, such as customer, product, employee, vendor, geographic location, and so on, that spans an enterprise's information systems.
storing the data
A variety of architectures can be used to store decision-support data and the most common architecture is one central enterprise data warehouse, without data marts.
integrated
Data are collected from multiple systems and then integrated around subjects.
unstructured data
Data does not exist in a fixed location and can include text documents, PDFs, voice messages, emails
nonvolatile
Data warehouses and data marts are nonvolatile—that is, users cannot change or update the data.
time variant
Data warehouses and data marts maintain historical data (i.e., data that include time as a variable).
create knowledge
Knowledge is created as people determine new ways of doing things or develop know-how. Sometimes external knowledge is brought in.
disseminate knowledge
Knowledge must be made available in a useful format to anyone in the organization who needs it, anywhere and anytime.
manage knowledge
Like a library, the knowledge must be kept current. It must be reviewed regularly to verify that it is relevant and accurate.
Capture Knowledge
New knowledge must be identified as valuable and be represented in a reasonable way.
Refine Knowledge
New knowledge must be placed in context so that it is actionable. This is where tacit qualities (human insights) must be captured along with explicit facts.
Knowledge Management Systems
Refer to the use of modern information technologies - the Internet, intranet, extranets, databases - to systematize, enhance, and expedite intrafirm and interfirm knowledge management.
source systems
Systems that provide a source of organizational data.
tacit knowledge
The cumulative store of subjective or experiential learning, which is highly personal and hard to formalize.
data quality
The quality of the data in the warehouse must meet users' needs
bit (binary digit)
The smallest unit of data stored in a computer. A bit can have the value of 0 or 1.
users
There are many potential BI users, including IT developers; frontline workers; analysts; information workers; managers and executives; and suppliers, customers, and regulators.
governance
To ensure that BI is meeting their needs, organizations must implement governance to plan and control their BI activities. Governance requires that people, committees, and processes be in place.
multidimensional
Typically the data warehouse or mart uses a multidimensional data structure. Recall that relational databases store data in two-dimensional tables.
issues with big data
Untrusted data sources Big Data is dirty Big Data changes, especially in data streams
store knowledge
Useful knowledge must then be stored in a reasonable format in a knowledge repository so that other people in the organization can access it.
data file
a collection of logically related records
field
a column of data containing a logical grouping of characters into a word, a small group of words ( last name, social security )
data model
a diagram that represents entities in the database and their relationships
what is big data according to gartner.com
a diverse, high volume, high velocity information assets that require new forms of processing to enable enhance decision making, insight discovery and process optimization
primary key
a field in a database that uniquely identify each record so that it can be retrieved, uploaded and stored,
foreign key
a field or group of fields in one table that uniquely identifies a row of another table. it is used to establish and enforce a link between two tables
secondary key
a field that has some identifying information, but typically does not identify the record with complete accuracy and therefore cannot serve the primary key
byte
a group of 8 bites represents a single character
data base
a logical grouping of related data files aka database tables
record
a logical grouping of related fields in a row ( students name, the courses taken, the date)
data warehouse
a repository of historical data that are organized by subject to support decision makers in the organization
data governance
an approach to managing information across an entire organization involving a formal set of unambiguous rules for creating, collecting, handling, and protecting its info
information silos
an info system that does not communicate with other related info system in an org
normalization data occurs when
attributes in the table depend on the primary key
data file
logical grouping of related records is a data file or a table similar in appearance to a spreadsheet in excel consisting of multiple columns and rows
entity
a person, place, thing, or event
external data sources
commercial databases, government reports, and corporate web sites
internal data sources
corporate databases and company documents
the KMS cycle consists of six steps what are they
create knowledge capture knowledge refine knowledge store knowledge manage knowledge disseminate knowledge
metadata
data maintained about the data within the data warehouse. (e.g., database, table, and column names; refresh schedules; and data-usage measures.
a DSMS minimizes the following problems
data redundancy data isolation data inconsistency
a DBMs maximizes the following issues
data security data integrity data independence from applications
what does DBMS stand for
database management system
big data is dirty
dirty data refers to inaccurate incomplete incorrect duplicate or erroneous data
attribute
each characteristic or quality of a particular entity
what do ER diagrams consist of
entities attributes and relationships
database designers plan the database design in a process called
entity relationship modeling (ER)
big data changes
especially in data streams: Organizations must be aware that data quality in an analysis can change, or the data itself can change, because the conditions under which the data are captured can change.
social data
examples are customer feedback comments; microblogging sites such as Twitter; and social media sites such as Facebook, YouTube, and LinkedIn.
traditional enterprise data
examples are customer information from customer relationship management systems, transactional enterprise resource planning data, Web store transactions, operations data, and general ledger data
Machine-generated data
examples are smart meters, manufacturing sensors, sensors integrated into smartphones, automobiles, airplane engines and industural machines, and trading system data
managing big data
first step- integrate info silos into a database environment and develop data warehouses for decision making second step- making sense of their proliferating data
NoSQL database
many organizations are turning into them, it can manipulate structured as well as unstructured data and inconsistend or missing data providing an alternative for firms that have more and different kinds of data (big data) in addition to the traditional structured data that fit neatly into the rows and columns of relational dataabase
cardinality
maximum number of times an instance of an entry can be associated with another instance of entity
modality
minimum number of times an instance of entity can be associated with another instance of an entity
use online analytical processing is
olap
common examples of source systems include
operational/transactional systems enterprise resource planning (ERP) systems Web site data third-party data (e.g., customer demographic data) operational databases
whats the problem with big data
organizations collect more data than they can hope to analyze and use
personal data sources
personal thoughts, opinions, and experiences
data rot
refers primarily to problems with the media on which the data are stored. Over time, temperature, humidity, and exposure to light can cause physical problems with storage media and thus make it difficult to access the data.
data integration
reflects the growing number of ways that source system data can be handled. Typically organizations need to Extract, Transform, and Load (ETL) data from source system into a data warehouse or data mart.
federal regulations of managing data
sarbanes - oxley act of 2002 requires that - public companies evaluate and disclose the effectiveness of their financial controls - independent auditors for these companies agree to this disclosure
What does SQL stand for
structured query language
tacit
that is difficult to encode and one that cannot be fully written
explicit knowledge
the more objective, rational, and technical types of knowledge
velocity
the rate at which data flow into an org is rapidly increasing and it is critical because it increases the speed of the feedback loop between a company and its customers
how are ER relationships described as
their chardinality and modality
clickstream data
those data that visitors and customers produce when they visit a Web site and click on hyperlinks
what can big data reveal
valuable patterns, trends and infor that were previously hidden -spot business trends more rapidly and accurately -tracking the spread of disease -crime -detecting fraud
characteristics of big data
volume, velocity, variety
how are database relationships established
with a primary key