CIS NOTES
The KMS Cycle
1. Create knowledge. Knowledge is created as people deter- mine new ways of doing things or develop know-how. Sometimes external knowledge is brought in. Store 2. Capture knowledge. New knowledge must be identified as valuable and be represented in a reasonable way. 3. Refine knowledge. New knowledge must be placed in context so that it is actionable. This is where tacit qualities (human insights) must be captured along with explicit facts. 4. Store knowledge. Useful knowledge must then be stored in a reasonable format in a knowl- edge repository so that others in the organization can access it. 5. Manage knowledge. Like a library, the knowledge must be kept current. It must be reviewed regularly to verify that it is relevant and accurate. 6. Disseminate knowledge. Knowledge must be made available in a useful format to anyone in the organization who needs it, anywhere and anytime.
data file
A data file is a collection of logically related records. In a file management environment, each application has a specific data file related to it. This file contains all of the data records the application requires. Over time, organizations developed numerous applications, each with an associated, application-specific data file.
data mart
A data mart is a low-cost, scaled-down version of a data warehouse that is designed for the end-user needs in a strategic business unit (SBU) or an individual department. Data marts can be imple- mented more quickly than data warehouses, often in less than 90 days.
data model
A data model is a diagram that represents entities in the database and their relationships
A data warehouse
A data warehouse is a repository of historical data that are organized by subject to support decision makers in the organization.
record.
A logical grouping of related fields, such as the student's name, the courses taken, the date, and the grade, comprises a record.
data file or a table
A logical grouping of related records is called a data file or a table. For example, a grouping of the records from a particular course, consisting of course number, professor, and students' grades, would constitute a data file for that course
secondary keys
A secondary key is another field that has some identifying information but typically does not identify the record with complete accuracy.
A/B experiments,
A/B experiments, because each experiment has only two possible outcomes.
entity
An entity is a person, place, thing, or event—such as a customer, an employee, or a product—about which informa- tion is maintained.
instance
An instance of an entity is a specific, unique representation of the entity. For example, an instance of the entity STUDENT would be a particular student.
Defining Big Data
Big Data as diverse, high-volume, high-velocity information assets that require new forms of processing to enable enhanced deci- sion making, insight discovery, and process optimization. Second, the Big Data Institute (TBDI; www.the-bigdatainstitute.com) defines Big Data as vast data sets that: • Exhibitvariety; • Include structured, unstructured, and semi-structured data; • Are generated at high velocity with an uncertain pattern; • Do not fit neatly into traditional, structured, relational databases (discussed later in this chapter); and • Can be captured, processed, transformed, and analyzed in a reasonable amount of time only by sophisticated information systems.
Cardinality
Cardinality refers to the maximum number of times an instance of one entity can be associ- ated with an instance in the related entity
database
Continu- ing up the hierarchy, a logical grouping of related files constitutes a database. Using the same example, the student course file could be grouped with files on students' personal histories and financial backgrounds to create a student database.
Leveraging Big Data Organizations must do more than simply manage Big Data; they must also gain value from it. In general, there are six broadly applicable ways to leverage Big Data to gain value.
Creating Transparency. Simply making Big Data easier for relevant stakeholders to access in a timely manner can create tremendous business value. Enabling Experimentation. Experimentation allows organizations to discover needs and improve performance. As organizations create and store more data in digital form, they can collect more accurate and detailed performance data (in real or near-real time) on everything from product inventories to personnel sick days. Segmenting Population to Customize Actions. Big Data allows organizations to cre- ate narrowly defined customer segmentations and to tailor products and services to precisely meet customer needs. Replacing/Supporting Human Decision Making with Automated Algorithms. Sophisticated analytics can substantially improve decision making, minimize risks, and un- earth valuable insights. Innovating New Business Models, Products, and Services. Big Data enables com- panies to create new products and services, enhance existing ones, and invent entirely new business models.
Data governance
Data governance is an approach to managing information across an entire organization. It involves a formal set of business processes and policies that are designed to ensure that data are handled in a certain, well-defined fashion. That is, the organization follows unambiguous rules for creating, collecting, handling, and protecting its information.
entity-relationship diagram.
Designers plan and create the database through the process of entity-relationship modeling, using an entity-relationship diagram. There are many ap- proaches to ER diagramming.
attribute.
Each characteristic or quality of a particular entity is called an attribute. For example, if our entities were a customer, an employee, and a product, entity attributes would include cus- tomer name, employee number, and product color.
The benefits of data warehousing include the following:
End users can access needed data quickly and easily via Web browsers because these data are located in one place. End users can conduct extensive analysis with data in ways that were not previously possible. End users can obtain a consolidated view of organizational data.
primary key
Every record in a file must contain at least one field that uniquely identifies that record so that it can be retrieved, updated, and sorted. This identifier field is called the primary key.
Explicit knowledge
Explicit knowledge deals with more objective, rational, and technical knowledge. In an organization, explicit knowledge consists of the policies, pro- cedural guides, reports, products, strategies, goals, core competencies, and IT infrastructure of the enterprise. in other words, explicit knowledge is the knowledge that has been codified (documented) in a form that can be distributed to others or transformed into a process or a strategy.
Governance.
Governance requires that people, committees, and processes be in place. Companies that are effective in BI governance often create a senior-level committee comprised of vice-presidents and directors who (1) ensure that the business objec- tives and BI strategies are in alignment, (2) prioritize projects, and (3) allocate resources.
Intellectual capital (or intellectual assets)
Intellectual capital (or intellectual assets) is another term for knowledge.
Knowledge management (KM)
Knowledge management (KM) is a process that helps organizations manipulate important knowledge that comprises part of the organization's memory, usually in an unstructured for- mat. For an organization to be successful, knowledge, as a form of capital, must exist in a for- mat that can be exchanged among persons. In addition, it must be able to grow.
Knowledge management systems (KMSs)
Knowledge management systems (KMSs) refer to the use of modern information technologies—the Internet, intranets, extranets, databases—to systematize, enhance, and expedite intrafirm and interfirm knowledge management. KMSs are intended to help an organization cope with turnover, rapid change, and downsizing by making the expertise of the organization's human capital widely accessible.
Machine-generated/sensor data
Machine-generated/sensor data—examples are smart meters; manufacturing sensors; sen- sors integrated into smartphones, automobiles, airplane engines, and industrial machines; equipment logs; and trading systems data.
Master data
Master data are a set of core data, such as customer, product, employee, vendor, geo- graphic location, and so on, that span the enterprise information systems
Master data management
Master data management is a process that spans all organizational business processes and applications. It provides companies with the ability to store, maintain, exchange, and synchronize a consistent, accurate, and timely "single version of the truth" for the company's master data.
Metadata.
Metadata. It is important to maintain data about the data, known as metadata, in the data warehouse. Both the IT personnel who operate and manage the data warehouse and the users who access the data need metadata.
Modality
Modality refers to the minimum number of times an instance of one entity can be associated with an instances in the related entity
NoSQL databases
NoSQL databases (think of them as "not only SQL" databases) to process Big Data. These databases provide an alternative for firms that have more and different kinds of data (Big Data) in addition to the traditional, structured data that fit neatly into the rows and columns of relational databases. As you will see later in this chapter, traditional relational databases such as Oracle and MySQL store data in tables organized into rows and columns. Each row is associated with a unique record, for instance a customer account, and each column is associated with a field that defines an attribute of that account
NoSQL databases
NoSQL databases can manipulate structured as well as unstructured data and inconsistent or missing data. For this reason, NoSQL databases are particularly useful when working with Big Data. Many products utilize NoSQL databases, i
Normalization
Normalization is a method for ana- lyzing and reducing a relational database to its most streamlined form to ensure minimum redundancy, maximum data integrity, and optimal processing performance
The basic characteristics of data warehouses and data marts include the following:
Organized by business dimension or subject. Use online analytical processing. Integrated. Data are collected from multiple systems and then integrated around subjects. Time variant. Data warehouses and data marts maintain historical data (i.e., data that include time as a variable) Nonvolatile. Data warehouses and data marts are nonvolatile—that is, users cannot change or update the data. Multidimensional. Typically the data warehouse or mart uses a multidimensional data structure.
Social data
Social data—examples are customer feedback comments; microblogging sites such as Twitter; and social media sites such as Facebook, YouTube, and LinkedIn.
Source Systems
Source Systems. There is typically some "organizational pain" (i.e., business need) that motivates a firm to develop its BI capabilities. Working backward, this pain leads to information requirements, BI applications, and source system data requirements. The data requirements can range from a single source system, as in the case of a data mart, to hundreds of source systems, as in the case of an enterprisewide data warehouse.
The environment for data warehouses and marts includes the following:
Source systems that provide data to the warehouse or mart • Data-integration technology and processes that prepare the data for use • Different architectures for storing data in an organization's data warehouse or data marts • Different tools and applications for the variety of users. (You will learn about these tools and applications in Chapter 5.) • Metadata, data-quality, and governance processes that ensure that the warehouse or mart meets its purposes
Structured query language
Structured query language (SQL) is the most popular query language used for this operation. SQL allows people to perform complicated searches by using relatively simple statements or key words.
Storing the Data
The most common architecture is one central enterprise data warehouse, without data marts. Most organizations use this approach, because the data stored in the warehouse are accessed by all users and represent the single version of the truth.
Data Quality.
The quality of the data in the warehouse must meet users' needs. If it does not, the data will not be trusted and ultimately will not be used.
relational database model
The relational database model is based on the concept of two-dimensional tables. A rela- tional database generally is not one big table—usually called a flat file—that contains all of the records and attributes. Such a design would entail far too much data redundancy. Instead, a relational database is usually designed with a number of related tables. Each of these tables contains records (listed in rows) and attributes (listed in columns).
Transaction data,
Transaction data, which are generated and captured by operational systems, describe the business's activities, or transactions. In con- trast, master data are applied to multiple transactions and are used to categorize, aggregate, and evaluate the transaction data.
Users.
Users. Once the data are loaded in a data mart or warehouse, they can be accessed. At this point the organization begins to obtain business value from BI; all of the prior stages constitute creating BI infrastructure. There are many potential BI users, including IT developers; frontline workers; analysts; infor- mation workers; managers and executives; and suppliers, customers, and regulators. Some of these users are information producers whose primary role is to create information for other users. IT developers and analysts typically fall into this category.
Characteristics of Big Data
Volume: Velocity:Velocity: The rate at which data flow into an organization is rapidly increasing. Velocity is critical because it increases the speed of the feedback loop between a company and its Variety: Traditional data formats tend to be structured, relatively well described, and they change slowly.
exabyte
an exabyte is one trillion terabytes
attributes,
attributes, or properties, that describe the entity's characteristics.
best practices,
best practices, the most effective and efficient ways of doing things, readily available to a wide range of employees. Enhanced access to best-practice knowledge improves overall organizational performance.
The Data Hierarchy
bit A bit (binary digit) represents the smallest unit of data a computer can process. The term binary means that a bit can consist only of a 0 or a 1. A group of eight bits, called a byte, represents a single character. A byte can be a letter, a number, or a symbol
data dictionary
data dictionary defines the required format for entering the data into the database. The data dictionary provides information on each attribute, such as its name, whether it is a key or part of a key, the type of data expected (alpha- numeric, numeric, dates, and so on), and valid values.
data integration
data integration extract the data, transform them, and then load them into a data mart or warehouse. This process is often called ETL, but the term data integration is increasingly being used to reflect the growing number of ways that source system data can be handled
Traditional enterprise data
examples are customer information from customer relation- ship management systems, transactional enterprise resource planning data, Web store transactions, operations data, and general ledger data.
field.
grouping of characters into a word, a small group of words, or an identification number is called a field.
identifiers
identifiers, which are attributes (attributes and identifiers are syn- onymous) that are unique to that entity instance.
multidimensional structure
multidimensional structure. A common representation for this multidimensional structure is the data cube.
tacit knowledge
tacit knowledge is the cumulative store of subjective or experiential learning. In an organization, tacit knowledge consists of an organization's experiences, insights, expertise, know-how, trade secrets, skill sets, understanding, and learning.
This system minimizes the following problems:
• • • • Data redundancy: The same data are stored in multiple locations. Data isolation: Applications cannot access data associated with other applications. Data inconsistency: Various copies of the data do not agree. In addition, database systems maximize the following: Data security: Because data are "put in one place" in databases, there is a risk of losing a lot of data at once. Therefore, databases have extremely high security measures in place to minimize mistakes and deter attacks. Data integrity: Data meet certain constraints; for example, there are no alphabetic charac- ters in a Social Security number field. • Data independence: Applications and data are independent of one another; that is, applications and data are not linked to each other, so all applications are able to access the same data.