C175 Data Management
Punch Card
A card that is perforated and can hold commands or data
Punch Card
A card that is perforated and can hold commands or data.
attribute
A characteristic of an entity or object. An attribute has a name and a data type.
HAVING
A clause applied to the output of a GROUP BY operation to restrict selected rows.
single-user database
A database that supports only one user at a time.
business rule
A description of a policy, procedure, or principle within an organization. For example, a pilot cannot be on duty for more than 10 hours during a 24-hour period, or a professor may teach up to four classes during a semester.
entity relationship diagram (ERD)
A diagram that depicts an entity relationship model's entities, attributes, and relations.
class diagrams
A diagram used to represent data and their relationships in UML object notation.
derived attribute
An attribute that does not physically exist within the entity and is derived via an algorithm. For example, the Age attribute might be derived by subtracting the birth date from the current date.
nonprime attribute
An attribute that is not part of a key.
dependent
An attribute whose value is determined by another attribute.
network model
An early data model that represented data as a collection of record types in 1:M relationships.
repeating group
In a relation, a characteristic describing a group of multiple entries of the same type for a single key attribute occurrence. For example, a car can have multiple colors for its top, interior, bottom, trim, and so on.
data integrity
In a relational database, a condition in which the data in the database complies with all entity and referential integrity constraints.
master data management (MDM)
In business intelligence, a collection of concepts, techniques, and processes for the proper identification, definition, and management of data elements within an organization.
dashboards
In business intelligence, a web-based system that presents key business performance indicators or information in a single, integrated view with clear and concise graphics.
disaster recovery
Involves rebuilding an entire information system or significant parts of one after a catastrophic natural disaster such as a hurricane, tornado, earthquake, building collapse, or even a major fire.
SQL
Is a comprehensive database management language which incorporates DML and DDL
Foreign Key
Is a field (or collection of fields) in one table that uniquely identifies a row of another table.
logical view
Is a mapping onto a physical table or tables that allows an end user to access only a specified portion of data.
Data normalization
Is a methodology for organizing attributes into tables so that redundancy among the non-key attributes is eliminated.
Candidate Key
Is any column or a combination of columns that can qualify as unique key in database.
AND operator
It displays a record if more than one condition is true.
Unary Relationship
It is an associate occurrence of an entity type with other occurrences of the same entity type.
Clustering
It is the task of taking a large collection of entities and dividing that collection into smaller groups of entities that exhibit some similarity.
JOIN clause
It is used to combine rows from more than one table, based on a common field between them.
DISTINCT operator
It is used to eliminate duplicate rows in a query result.
What are the fundamental aspects of a Data Warehouse?
- Centralized repository of information. - Organized around relevant subject areas. - Provides platform for queries. - Used for analysis and not transactional processing. - Data is nonvolatile. - Target location for integrating data from multiple sources.
SELECT command
- Data retrieval in SQL is accomplished with the SELECT command.
DROP TABLE command
- Discards a table from a database.
What are the four major DBMS approaches?
- Hierarchical - Network - Relational - Object-oriented
What is Controller Card RAID? Efficient or inefficient?
- Host-based RAID implementation in which a specialized RAID controller is installed in the host, and disk drives are connected to it. - Not an efficient solution in a data center environment with a large number of hosts.
Candidate Key
- If a relation has more than one attribute or minimum group of attributes that represents a way of uniquely identifying the entities, then they are each called a candidate key.
Define the delete rule SET-TO-NULL...
- If the delete rule between the two relations is SET-TO-NULL and an attempt is made to delete a record on the "one side" of the one-to-many relationship, that record will be deleted and the matching foreign key values in the records on the "many side" of the relationship will be set to null.
IN operator
- It allows you to specify a list of character strings to be included in a search
BETWEEN operator
- It allows you to specify a range of numeric values in a search.
AND operator
- It displays a record if more than one condition is true
DISTINCT operator
- It is used to eliminate duplicate rows in a query result.
AVG() function
- It returns the average value of a numeric column.
ORDER BY clause
- It simply takes the result of a SQL query and orders them by one or more specified attributes.
Describe the ER model for "(L to R) A salesperson is authorized to sell many products; (R to L) a product can be sold by many salespersons." - What is the name of this relationship?
- Many-to-many binary relationship
router
(1) An intelligent device used to connect dissimilar networks. (2) Hardware/software equipment that connects multiple and diverse networks.
entity set
A collection of like entities.
relational database management system (RDBMS)
A collection of programs that manages a relational database. The RDBMS software translates a user's logical requests (queries) into commands that physically locate and retrieve the requested data.
record
A collection of related (logically connected) fields.
class
A collection of similar objects with shared structure (attributes) and behavior (methods). A class encapsulates an object's data representation and a method's implementation. Classes are organized in a class hierarchy.
Business intellegence (BI)
A comprehensive, cohesive, and integrated set of tools and processes used to capture, collect, integrate, store, and analyze data with the purpose of generating and presenting information to support business decision making.
referential integrity
A condition by which a dependent table's foreign key must have either a null entry or a matching entry in the related table.
hardware independence
A condition in which a model does not depend on the hardware used in the model's implementation. Therefore, changes in the hardware will have no effect on the database design at the conceptual level.
transitive dependency
A condition in which an attribute is dependent on another attribute that is not part of the primary key.
partial dependency
A condition in which an attribute is dependent on only a portion (subset) of the primary key.
full functional dependence
A condition in which an attribute is functionally dependent on a composite key but not on any subset of the key.
data independance
A condition in which data access is unaffected by changes in the physical data storage characteristics.
data inconsistency
A condition in which different versions of the same data yield different (inconsistent) results.
data anomaly
A data abnormality in which inconsistent changes have been made to a database. For example, an employee moves, but the address change is not corrected in all files in the database.
structural dependence
A data characteristic in which a change in the database schema affects data access, thus requiring changes in all access programs.
key-value
A data model based on a structure composed of two data elements: a key and a value, in which every key has a corresponding value or set of values. The key-value data model is also called the associative or attribute-value data model.
ERM
A data model that describes relationships (1:1, 1:M, and M:N) among entities at the conceptual level with the help of ER diagrams. The model was developed by Peter Chen.
entity
A person, place, thing, concept, or event for which data can be stored. See also attribute.
Structured Query Language (SQL)
A powerful and flexible relational database language composed of commands that enable users to create database and table structures, perform various types of data manipulation and data administration, and query the database to extract useful information.
iterative process
A process based on repetition of steps and procedures.
denormalization
A process by which a table is changed from a higher-level normal form to a lower-level normal form, usually to increase processing speed. Denormalization potentially yields data anomalies.
normalization
A process that assigns attributes to entities so that data redundancies are reduced or eliminated.
data management
A process that focuses on data collection, storage, and retrieval. Common data management functions include addition, deletion, modification, and listing.
Write the SQL query to "Add a new salesperson into the SALESPERSON table whose salesperson number is 489, name is Quinlan, commission percentage is 15, year of hire is 2011, and department number is 59.":
INSERT INTO SALESPERSON VALUES ('489','Quinlan',15,'2011','59');
What is the correct syntax of the INSERT command?
INSERT INTO table_name VALUES (value1,value2,value3,...):
Candidate key
If a relation has more than one attribute or minimum group of attributes that represents a way of uniquely identifying the entities, then they are each called a candidate key.
optional participation
In ER modeling, a condition in which one entity occurrence does not require a corresponding entity occurrence in a particular relationship.
composite identifier
In ER modeling, a key composed of more than one attribute.
optional attribute
In ER modeling, an attribute that does not require a value; therefore, it can be left empty.
required attribute
In ER modeling, an attribute that must have a value. In other words, it cannot be left empty.
EXISTS
In SQL, a comparison operator that checks whether a subquery returns any rows.
existence-independent
A property of an entity that can exist apart from one or more related entities. Such a table must be created first when referencing an existence-dependent table.
existence-dependent
A property of an entity whose existence depends on one or more other entities. In such an environment, the existence-independent table must be created and loaded first because the existence-dependent key cannot reference a table that does not yet exist.
cardinality
A property that assigns a specific value to connectivity and expresses the range of allowed entity occurrences associated with a single occurrence of the related entity.
subquery
A query that is embedded (or nested) inside another query. Also known as a nested query or an inner query.
inner query
A query that is embedded or nested inside another query. Also known as a nested query or a subquery.
query
A question or task asked by an end user of a database in the form of SQL code. A specific request for data manipulation issued by the end user or the application to the DBMS.
Functional Dependency
A relationship between attributes in which one attribute (or group of attributes) determines the value of another attribute in the same table Example The price of one delicious Girl Scout cookie can determine the price of a box of 12 cookies (CookiePrice, Qty) -------> BoxPrice Functional Dependency = BoxPrice
recursive relationship
A relationship found within a single entity type. For example, an EMPLOYEE is married to an EMPLOYEE or a PART is a component of another PART.
mandatory participation
A relationship in which one entity occurrence must have a corresponding occurrence in another entity. For example, an EMPLOYEE works in a DIVISION. (A person cannot be an employee without being assigned to a company's division.)
non-identifying relationship
A relationship in which the primary key of the related entity does not contain a primary key component of the parent entity.
weak relationship
A relationship in which the primary key of the related entity does not contain a primary key component of the parent entity.
strong (identifying) relationship
A relationship that occurs when two entities are existence-dependent; from a database design perspective, this relationship exists whenever the primary key of the related entity contains the primary key of the parent entity.
dependency diagram
A representation of all data dependencies (primary key, partial, or transitive) within a table.
internal schema
A representation of an internal model using the database constructs supported by the chosen database.
conceptual schema
A representation of the conceptual model, usually expressed graphically. See also conceptual model.
Crow's Foot notation
A representation of the entity relationship diagram that uses a three-pronged symbol to represent the "many" sides of the relationship.
data model
A representation, usually graphic, of a complex "real-world" data structure. Data models are used in the database design phase of the Database Life Cycle.
constraint
A restriction placed on data, usually expressed in the form of rules. For example, "A student's GPA must be between 0.00 and 4.00." Constraints are important because they help to ensure data integrity.
entity occurrence
A row in a relational table. Also known as entity instance.
entity instance
A row in a relational table. Also known as entity occurrence.
desktop database
A single-user database that runs on a personal computer.
unary relationship
An ER term used to describe an association within an entity. For example, an EMPLOYEE might manage another EMPLOYEE.
object
An abstract representation of a real world entity that has a unique identity, embedded properties, and the ability to interact with other objects and itself.
field
An alphabetic or numeric character or group of characters that defines a characteristic of a person, place, or thing. For example, a person's Social Security number, address, phone number, and bank balance all constitute fields.
alias
An alternative name for a column or table in a SQL statement.
decision support system (DSS)
An arrangement of computerized tools used to assist managerial decision making within a business.
relationship
An association between entities.
Associative Entity
An associative entity is an element of the entity-relationship model. All relationships for the associative entity should be many.
multidimensional online analytical processing (MOLAP)
An extension of online analytical processing to multidimensional database management systems.
unique index
An index in which the index key can have only one associated pointer value (row).
data warehouse
An integrated, subject-oriented, time-variant, nonvolatile collection of data in a specialized database that stores historical and aggregated data in a format that provides support for decision making.
complex object
An object formed by several different objects in complex relationships. See also abstract data types.
MapReduce
An open-source application programming interface (API) that provides fast data analytics services; one of the main Big Data technologies that allows organizations to process massive data stores.
index
An ordered array of index key values and row ID values (pointers). Indexes are generally used to speed up and facilitate data retrieval. Also known as an index key.
International Organization for Standardization (ISO)
An organization formed to develop standards for diverse network systems.
database system
An organization of components that defines and regulates the collection, storage, management, and use of data in a database environment.
Heap Files
An unsorted set of records
Heap File
An unsorted set of records.
relational online analytical processing (ROLAP)
Analytical processing functions that use relational databases and familiar relational query tools to store and analyze multidimensional data.
rules of precedence
Basic algebraic rules that specify the order in which operations are performed. For example, operations within parentheses are executed first, so in the equation 2 + (3 5), the multiplication portion is calculated first, making the correct answer 17.
What is the correct syntax of the CREATE TABLE command?
CREATE TABLE table_name (column_name1 data_type(size), column_name2 data_type(size),...);
What is the correct syntax of the CREATE VIEW command?
CREATE VIEW view_name AS SELECT column_name(s) FROM table_name WHERE condition
Cardinality
Cardinality represents the maximum number of entities that can be involved in a particular relationship.
What data mining activity is the process of organizing data into predefined classes?
Classification
What activity groups data members that are similar?
Clustering
Equijoin
Combines two or more tables based on a column that is common to the tables.
Write the SQL query to "Delete the row for salesperson 186 from the SALESPERSON table.":
DELETE FROM SALESPERSON WHERE SPNUM = '186';
metadata
Data about data; that is, data about data characteristics and relationships. See also data dictionary.
Association Rules
Descriptive; discovers links or associations amongst data.
Sequence Discovery
Descriptive; discovers sequential patterns.
Clustering
Descriptive; groups similar data together into clusters.
Summarization Rules
Descriptive; maps data into subsets with associated simple descriptions or generalizations.
Response time
Is the delay from the time that the Enter Key is pressed to execute a query until the result appears on screen.
Throughput
Is the measure of how many queries from simultaneous users must be satisfied in a given period of time by the application set and the database that it supports.
IN operator
It allows you to specify a list of character strings to be included in a search.
BETWEEN operator
It allows you to specify a range of numeric values in a search.
LIKE operator
It allows you to specify partial character strings in a "wildcard" sense.
OR operator
It displays a record if either the first condition OR the second condition is true.
* in the SELECT clause
It indicates that all attributes of the selected row are to be retrieved.
Modality
It is a minimum number of entity occurrences that can be involved in a relationship.
Estimation
It is a process of assigning some continuously valued numeric value to an object.
Affinity grouping
It is a process of evaluating relationships or associations between data elements.
Binary Relationship
It is a relationship between two entity types.
Structured Query Language
It is a standard language for data management in relational databases, known as Structured Query Language or SQL.
Primary key
It is an attribute or group of attributes whose values are unique throughout all ***rows*** of the re
Intersection Data
It is data that describes a many-to-many relationship.
Data definition
It is operationalized with a data definition language (DDL), involves instructing the DBMS software on what tables will be in the database, what attributes will be in the tables, which attributes will be indexed, and so forth.
Cardinality
It is the maximum number of entities that can be involved in a particular relationship.
Description
It is the process of trying to characterize what has been discovered or trying to explain the results of the data mining process.
Unique identifier
It is used to uniquely identify each record in a database table.
E-R Model
It is well named, as it diagrams entities (together with their attributes) and the relationship among them.
One-to-One Binary Relationship
It means that a single occurrence of one entity type can be associated with a single occurrence of the other entity type and vice versa.
One-to-one Binary Relationship
It means that a single occurrence of one entity type can be associated with a single occurrence of the other entity type and vice versa.
AVG() function
It returns the average value of a numeric column.
ORDER BY clause
It simply takes the results of a SQL query and orders them by one or more specified attributes.
Join
Joins 2 tables together
Modality
Modality represents the minimum number of entity occurrences that can be involved in a relationship.
subquery
One SELECT statement is "nested" within another.
client node
One of three types of nodes used in the Hadoop Distributed File System (HDFS). The client node acts as the interface between the user application and the HDFS. See also name node and data node.
data node
One of three types of nodes used in the Hadoop Distributed File System (HDFS). The data node stores fixed-size data blocks (that could be replicated to other data nodes). See also client node and name node.
name node
One of three types of nodes used in the Hadoop Distributed File System (HDFS). The name node stores all the metadata about the file system. See also client node and data node.
key
One or more attributes that determine other attributes. See also superkey, candidate key, primary key (PK), secondary key, and foreign key.
identifiers
One or more attributes that uniquely identify each entity instance.
What is the key factor based on the need for linear scalability?
Performance
active-active array
- high-end storage system - large number of controllers and cache memory - large storage capacity - large amounts of cache - fault tolerance - ability to handle large amounts of concurrent i/os
What are characteristics of cloud computing?
- on-demand self-service - broad network access - resource pooling - rapid elasticity - measured service
What are the layers of the cloud computing infrastructure?
- physical - virtual - applications and platform software - cloud management and service creation tools
Benefits of cloud computing?
- reduced it cost - business agility - flexible scaling - high availability
Front-end controller
- routes data to and from cache via the internal data bus
Cache
- semiconductor memory where data is placed temporarily to reduce the time required to service i/o requests from the host.
RAID 4
- similar to RAID 3, but data is not striped on the parity disk - good read throughput and reasonable write throughput
RAID 3
- stripes data for performance and uses parity for fault tolerance - good option for applications that involve large sequential data access, such as data backup or video streaming
What are the three RAID techniques?
- striping: spreading of data across multiple drives to use the drives in parallel. - mirroring: the same data is stored on two different disk drives, yielding two copies of the data. - parity: redundancy technique that ensures protection of data w/o maintaining a full set of duplicate data.
RAID 5
- versatile - similar to RAID 4, but the parity is distributed across all disks to overcome the write bottleneck in RAID 4 - good for read-intensive i/o applications, preferred for messaging, data mining, and relational database management systems.
relational model
Developed by E. F. Codd of IBM in 1970, the relational model is based on mathematical set theory and represents data as independent relations. Each relation (table) is conceptually represented as a two dimensional structure of intersecting rows and columns. The relations are related to each other through the sharing of common entity characteristics (values in columns).
ad hoc query
A "spur-of-the-moment" question.
object/relational database management system (O/R DBMS)
A DBMS based on the extended relational model (ERDM). The ERDM, championed by many relational database researchers, constitutes the relational model's response to the OODM. This model includes many of the object-oriented model's best features within an inherently simpler relational database structure.
data dictionary
A DBMS component that stores metadata—data about data. Thus, the data dictionary contains the data definition as well as their characteristics and relationships. A data dictionary may also include data that are external to the DBMS. Also known as an information resource dictionary. See also active data dictionary, metadata, and passive data dictionary.
Write the SQL query to "List the NAMES of the products of which salesperson Adams has sold more than 2000 units":
SELECT PRODNAME FROM SALESPERSON, PRODUCT, SALES WHERE SALESPERSON.SPNUM=SALES.SPNUM AND SALES.PRODNUM=PRODUCT.PRODNUM AND SPNAME='Adams' AND QUANTITY>2000;
Write the SQL query to "Find the name of the salesperson responsible for Customer Number 1525":
SELECT SPNAME FROM SALESPERSON, CUSTOMER WHERE SALESPERSON.SPNUM=CUSTOMER.SPNUM AND CUSTNUM=1525;
Write the SQL query to "List the salesperson numbers and salesperson names of those salespersons who have a commission percentage of 10.":
SELECT SPNUM, SPNAME FROM SALESPERSON WHERE COMMPERCT=10;
Write the SQL query to "List the salesperson numbers, salesperson names, and commission percentages of the salespersons whose commission percentage is less than 12.":
SELECT SPNUM, SPNAME, COMMPERCT FROM SALESPERSON WHERE COMMPERCT<12;
Using the Dimensional Model for Business Intelligence
Simplicity Lack of bias. Extensibility
application programming interface (API)
Software through which programmers interact with middleware. An API allows the use of generic SQL code, thereby allowing client processes to be database server-independent.
flags
Special codes implemented by designers to trigger a required response, alert end users to specified conditions, or encode values. Flags may be used to prevent nulls by bringing attention to the absence of a value in a table.
ETL Stages - Loading
Target dependencies Refresh volume and frequency
embedded mode
The SELECT command is embedded within the lines of a higher-level language program and functions as an input or "read" statement for the program.
COMMIT
The SQL command that permanently writes data changes to a database.
ALTER TABLE
The SQL command used to make changes to table structure. When the command is followed by a keyword (ADD or MODIFY), it adds a column or changes column characteristics.
OR
The SQL logical operator used to link multiple conditional expressions in a WHERE or HAVING clause. It requires only one of the conditional expressions to be true.
AND
The SQL logical operator used to link multiple conditional expressions in a WHERE or HAVING clause. It requires that all conditional expressions evaluate to true.
slice and dice
The ability to focus on slices of a data cube (drill down or roll up) to perform a more detailed analysis.
external model
The application programmer's view of the data environment. Given its business focus, an external model works with a data subset of the global database schema.
Determinants
The attribute (or attributes)that we use as the starting point (the variable on the left side of the question) is called a determinant Example (cookiePrice, Qty) -------> BoxPrice Determinant= cookiePrice
first normal form (1NF)
The first stage in the normalization process. It describes a relation depicted in tabular format, with no repeating groups and a primary key identified. All nonkey attributes in the relation are dependent on the primary key.
American National Standards Institute (ANSI)
The group that accepted the DBTG recommendations and augmented database standards in 1975 through its SPARC committee.
intersection data
describes the relationship between the two entities(Quantity)
OLAP cube
environment provides an aggregate view of data variables across the dimensions across each dimension's hierarchy.
referential integrity
problem because it revolves around the circumstance of trying to refer to data in one relation in the database, based on values in another relation.
unique identifier
primary key
Column also called
Field or Attribute
Table also called
File or Relation
Why a Business Intelligence Program?
Financial value associated with increased profitability, whether derived from lowered costs or increased revenues; Productivity value associated with increased throughput with decreased workloads, diminished time for executing end-to-end processes (such as manufacturing or operational workflows), and increasing the percentage of high quality products or outcomes; Trust value, such as greater customer, employee, or supplier satisfaction, as well as increasing confidence in forecasting, maintaining consistent operational and management reports, reductions in time spent in "analysis paralysis," and better results from decisions; and Risk value associated with improved visibility into credit exposure, confidence in capital and asset investments, and auditable compliance with jurisdictional and industry standards and regulations.
UNION
To create a result set that combines the results from several queries
drill down
To decompose data into more atomic components—that is, data at lower levels of aggregation. This approach is used primarily in a decision support system to focus on specific geographic areas, business types, and so on.
Third Normal Form
- Non-key attributes are not allowed to define other non-key attributes.
Describe the ER model for "A salesperson sells to many customers." - What is the name of this relationship?
- One-to-many binary relationship
Describe the ER model for "(L to R) A salesperson manages zero to many other salespersons." and "(R to L) A salesperson is managed by exactly one other salesperson." - What is the name of this model?
- One-to-many unary relationship
Describe the ER model for "A salesperson works in one office." - What is the name of this relationship?
- One-to-one binary relationship
Describe the ER model for "One salesperson backs-up another salesperson." - What is the name of this model?
- One-to-one unary relationship
What are the limitations of Software RAID?
- Performance: affects overall system performance. - Supported features: does not support all RAID levels. - O/S compatibility: inflexibility in the data-processing environment.
Attribute
- Property, characteristic, or fact that we know about an entity.
Nested RAID
- RAID 1+0 or RAID 0+1 - combine the performance benefits of RAID 0 with the redundancy benefits of RAID 1 - requires even number of disks with the minimum being 4 disks
What is the general theme of the ETL process?
1. Get the data 2. Map the data to staging area 3. Validate and clean the data 4. Apply necessary transformations 5. Map data to loading model 6. Move data to repository 7. Load data to warehouse
What are three important point about Third Normal Form?
1. It is completely free of redundancy 2. All foreign keys appear where needed to logically tie together related tables. 3. It is the same structure that would have been derived from a properly drawn entity-relationship diagram of the same business environment.
What are the five basic principles of The Database Concept?
1. The creation of a datacentric environment that is a significant company resource, which can be shared inside and outside the company. 2. The ability to achieve data integration while storing data in a non-redundant fashion. 3. The ability to store data representing entities involved in multiple relationships w/o introducing data redundancy. 4. Managing data control issues such as data security, backup and recovery, and concurrency control. 5. High degree of data independence.
Hadoop
A Java based, open source, high speed, fault-tolerant distributed storage and computational framework. Hadoop uses low-cost hardware to create clusters of thousands of computer nodes to store and process data.
AVG
A SQL aggregate function that outputs the mean average for a specified column or expression.
COUNT
A SQL aggregate function that outputs the number of rows containing not null values for a given column or expression, sometimes used in conjunction with the DISTINCT clause.
MAX
A SQL aggregate function that yields the maximum attribute value in a given column.
MIN
A SQL aggregate function that yields the minimum attribute value in a given column.
SUM
A SQL aggregate function that yields the sum of all values for a given column or expression.
WHERE
A SQL clause that adds conditional restrictions to a SELECT statement that limit the rows returned by the query.
ORDER BY
A SQL clause that is useful for ordering the output of a SELECT query (for example, in ascending or descending order).
DISTINCT
A SQL clause that produces only a list of values that are different from one another.
FROM
A SQL clause that specifies the table or tables from which data is to be retrieved.
GROUP BY
A SQL clause used to create frequency distributions when combined with any of the aggregate functions in a SELECT statement.
Fundamental aspects of a data warehouse
A data warehouse is a centralized repository of information. A data warehouse is organized around the relevant subject areas important to the organization. A data warehouse provides a platform for different consumers (both human and automated) to submit queries about enterprise information. A data warehouse is used for analysis and not for transaction processing. The data in a data warehouse is nonvolatile. A data warehouse is the target location for integrating data from multiple sources, both internal and external to an enterprise.
Operational Database
A database designed primarily to support a company's day-to-day operations. Also known as a transactional database, OLTP database, or production database.
analytical database
A database focused primarily on storing historical data and business metrics used for tactical or strategic decision making.
Database
A database is a collection of information that is organized so that it can easily be accessed, managed, and updated.
Centralized
A database located at a single site.
multidimensional database management systems (MDBMSs)
A database management system that uses proprietary techniques to store data in matrixlike arrays of n dimensions known as cubes.
XML database
A database system that stores and manages semistructured XML data.
general-purpose databases
A database that contains a wide variety of data used in multiple disciplines.
discipline-specific databases
A database that contains data focused on specific subject areas.
cloud database
A database that is created and maintained using cloud services, such as Microsoft Azure or Amazon AWS.
multiuser database
A database that supports multiple concurrent users.
relation
A logical construct perceived to be a two dimensional structure composed of intersecting rows (entities) and columns (attributes) that represents an entity set in the relational model.
table
A logical construct perceived to be a two dimensional structure composed of intersecting rows (entities) and columns (attributes) that represents an entity set in the relational model.
schema
A logical grouping of database objects, such as tables, indexes, views, and queries, that are related to each other. Usually, a schema belongs to a single user or application.
Extensible Markup Language (XML)
A meta-language used to represent and manipulate data elements. Unlike other markup languages, XML permits the manipulation of a document's data elements. XML facilitates the exchange of structured documents such as orders and invoices over the Internet.
candidate key
A minimal superkey; that is, a key that does not contain a subset of attributes that is itself a superkey. See key.
eventual consistency
A model for database consistency in which updates to the database will propagate through the system so that all data copies will be consistent eventually.
extended relational data model
A model that includes the object-oriented model's best features in an inherently simpler relational database structural environment. See extended entity relationship model (EERM).
Big Data
A movement to find new and better ways to manage large amounts of web-generated data and derive business insight from it, while simultaneously providing high performance and scalability at a reasonable cost.
composite key
A multiple-attribute key.
workgroup database
A multiuser database that usually supports fewer than 50 users or is used for a specific department in an organization.
file
A named collection of related records.
cascading order sequence
A nested ordering sequence for a set of rows, such as a list in which all last names are alphabetically ordered and, within the last names, all first names are ordered.
noSQL
A new generation of database management systems that is not based on the traditional relational database model.
query language
A nonprocedural language that is used by a DBMS to manipulate its data. An example of a query language is SQL.
set theory
A part of mathematical science that deals with sets, or groups of things, and is used as the basis for data manipulation in the relational model.
wildcard character
A symbol that can be used as a general substitute for: (1) all columns in a table (*) when used in an attribute list of a SELECT statement or, (2) zero or more characters in a SQL LIKE clause condition ( % and _ ).
third normal form (3NF)
A table is in 3NF when it is in 2NF and no nonkey attribute is functionally dependent on another nonkey attribute; that is, it cannot include transitive dependencies.
fourth normal form (4NF)
A table is in 4NF if it is in 3NF and contains no multiple independent sets of multivalued dependencies.
Ternary Relationship
A ternary relationship is when three entities participate in the relationship.
Unique Identifier
A unique identifier is any identifier which is guaranteed to be unique among all identifiers used for those objects and for a specific purpose.
hub
A warehouse of data packets housed in a central location on a local area network. It contains multiple ports that copy the data in the data packets to make it accessible to selected or all segments of the network.
data visualization
Abstracting data to provide information in a visual format that enhances the user's ability to effectively comprehend the meaning of the data.
performance tuning
Activities that make a database perform more efficiently in terms of storage and access speed.
What is used for populating summaries or any cube dimensions that can be performed at the staging area (ETL)?
Aggregation
E-R model
An E-R model is a data model for describing a database in an abstract way.
binary relationship
An ER term for an association (relationship) between two entities. For example, PROFESSOR teaches CLASS.
participants
An ER term for entities that participate in a relationship. For example, in the relationship "PROFESSOR teaches CLASS," the teaches relationship is based on the participants PROFESSOR and CLASS.
determinant
Any attribute in a specific row whose value directly determines other values in that row. See also Boyce-Codd normal form (BCNF).
client
Any process that requests specific services from server processes in a client/server environment.
many-to-many (M:N or *..*) relationship
Association among two or more entities in which one occurrence of an entity is associated with many occurrences of a related entity and one occurrence of the related entity is associated with many occurrences of the first entity.
one-to-one (1:1 or 1..1) relationship
Associations among two or more entities that are used by data models. In a 1:1 relationship, one entity instance is associated with only one instance of the related entity.
one-to-many (1:M or 1..*) relationship
Associations among two or more entities that are used by data models. In a 1:M relationship, one entity instance is associated with many instances of the related entity.
Which entity is uniquely identified by concatenating the primary keys of the two entities it connects?
Associative entity
Summarization Rules
Descriptive - maps data into subsets with associated simple descriptions or generalizations.
object-oriented database management system (OODBMS)
Data management software used to manage data in an object-oriented database model.
DML
Data manipulation language refers to the four basic operations that can and must be performed on data stored in any DBMS: data retrieval, data update, insertion of new records, and deletion of existing records.
What is called a decomposition process?
Data normalization
SELECT command
Data retrieval in SQL is accomplished with the SELECT command.
unstructured data
Data that exists in its original, raw state; that is, in the format in which it was collected.
semistructured data
Data that has already been processed to some extent.
structured data
Data that has been formatted to facilitate storage, use, and information generation.
ETL Stages - Transformation
Data type conversion Data cleansing. Integration. Referential integrity checking. Derivations Denormalization and renormalization. Aggregation Audit information. Null conversion.
abstract data type (ADT)
Data type that describes a set of similar objects with shared and encapsulated data representation and methods. An abstract data type is generally used to describe complex objects. See also class.
very large databases (VLDBs)
Database that contains huge amounts of data—gigabyte, terabyte, and petabyte ranges are not unusual.
Definer
Definer is a MySQL term where AuthID is the same for another DBMS
Extraction/Transformation/Loading
ETL is the process of extracting raw data and then transforming and loading into a target to be used with Business intelligence.
referential integrity
Enforces rules to guarantee that the foreign key relationship stays intact with no mismatches.
balancing
Ensuring that the processing load is distributed evenly among multiple servers.
What data mining activity is a process of assigning some continuously valued numeric value to an object?
Estimation
data redundancy
Exists when the same data is stored unnecessarily at different places.
Data
Facts gathered together for analysis
cube cache
In multidimensional OLAP, the shared, reserved memory area where data cubes are held. Using the cube cache assists in speeding up data access.
sparsity
In multidimensional data analysis, a measurement of the data density held in the data cube.
portals
In terms of business intelligence, a unified, single point of entry for information distribution.
availability
In the context of data security, it refers to the accessibility of data whenever required by authorized users and for authorized purposes.
segment (SEGM)
In the hierarchical data model, the equivalent of a file system's record type.
method
In the object-oriented data model, a named set of instructions to perform an action. Methods represent real-world actions, and are invoked through messages.
inheritance
In the object-oriented data model, the ability of an object to inherit the data structure and methods of the classes above it in the class hierarchy. See also class hierarchy.
islands of information
In the old file system environment, pools of independent, often duplicated, and inconsistent data created and managed by different departments.
tuple
In the relational model, a table row.
linking table
In the relational model, a table that implements an M:M relationship. See also composite entity.
primary key (PK)
In the relational model, an identifier composed of one or more attributes that uniquely identifies a row. Also, a candidate key selected as a unique entity identifier. See also key.
Unstructured Data
Information that does not have structure (such as text)
Structured Data
Information with a high degree of organization
Structured Data
Information with a high degree of organization.
What includes exploiting the discovery of table and foreign keys for representing linkage between different tables?
Integration
Intersection Data
Intersection Data associated with the concatenation of two segments.
Ternary Relationship
Involves three different entity types.
Time Series Analysis
Predictive - analysis information over time to predict future data
Classification
Predictive - maps data into predefined groups or classes
Regression
Predictive - used to map a data item to a real valued prediction variable
Classification
Predictive; maps data into predefined groups or classes.
Regression
Predictive; used to map a data item to a real valued prediction variable.
data
Raw facts, or facts that have not yet been processed to reveal their meaning to the end user.
Row also called
Record or Tuple
What is the correct syntax of the UPDATE command?
UPDATE table_name SET column1=value1,column2=value2,... WHERE some_column=some_value;
Referential Integrity
Referential integrity is a database concept that ensures that relationships between tables remain consistent.
website
Refers to the Web server and the collection of Web pages stored on the local hard disk of the server computer.
column subquery
Returns a single column of one or more values.
row subquery
Returns a single row of one or more values.
table subquery
Returns a table of one or more rows of one or more columns.
Write the SQL query to "List the customer records for those customers whose names begin with the letter 'A' ":
SELECT * FROM CUSTOMER WHERE CUSTNAME LIKE 'A%';
Write the SQL query to "Retrieve the entire record for salesperson 186":
SELECT * FROM SALESPERSON WHERE SPNUM=186;
Write the SQL query to "Find the average number of units of the different products that Salesperson 137 has sold":
SELECT AVG(QUANTITY) FROM SALES WHERE SPNUM=137;
Write the SQL query to "Find the commission percentage and year of hire of salesperson 186":
SELECT COMMPERCT, YEARHIRE FROM SALESPERSON WHERE SPNUM=186;
Write the SQL query to "Find the customer numbers, customer names, and headquarters cities of those customers with the customer numbers greater than 1000. List the results in alphabetic order by headquarters cities (and have the customer names within the same city alphabetized)":
SELECT CUSTNUM, CUSTNAME, HQCITY FROM CUSTOMER WHERE CUSTNUM>1000 ORDER BY HQCITY, CUSTNAME;
index records in the tree
The index records contain salesperson number key values copied from the salesperson records. Each key value in the tree is associated with a pointer that is the address of either a lower-level index record or a cylinder containing the salesperson records. Each index record, at every level of the tree, contains space for the same number of key value/pointer pairs (four in this example). This index record capacity is arbitrary, but once it is set, it must be the same for every index record at every level of the index. Each index record is at least half full (in this example each record actually contains at least two key value/pointer pairs).
data definition language (DDL)
The language that allows a database administrator to define the database structure, schema, and subschema.
granularity
The level of detail represented by the values stored in a table's row. Data stored at its lowest level of granularity is said to be atomic data.
data cube
The multidimensional data structure used to store and manipulate data in a multidimensional DBMS. The location of each data value in the data cube is based on its x-, y-, and z-axes. Data cubes are static, meaning they must be created before they are used, so they cannot be created by an ad hoc query.
relationship degree
The number of entities or participants associated with a relationship. A relationship degree can be unary, binary, ternary, or higher.
relational schema
The organization of a relational database as described by the database administrator.
class hierarchy
The organization of classes in a hierarchical tree in which each parent class is a superclass and each child class is a subclass. See also inheritance.
conceptual model
The output of the conceptual design process. The conceptual model provides a global view of an entire database and describes the main data objects, avoiding details.
Enterprise Database
The overall company data representation, which provides support for present and expected future needs.
data processing (DP) specialist
The person responsible for developing and managing a computerized file processing system.
subschema
The portion of the database that interacts with application programs.
data manipulation language
The set of commands that allows an end user to manipulate the data in the database, such as SELECT, INSERT, UPDATE, DELETE, COMMIT, and ROLLBACK.
class diagram notation
The set of symbols used in the creation of class diagrams in UML object modeling.
external schema
The specific representation of an external view; the end user's view of the data environment.
Prediction
The subtle difference between prediction and the previous two tasks is that prediction is the attempt to classify objects according to some expected future behavior. Classification and estimation can be used for the purposes of prediction by using historical data, where the classification is already known, to build a model (this is called training). That model can then be applied to new data to predict future behavior. You must be careful when using training sets for prediction. There may be a risk of an inherent bias in the data that may lead you to draw inferences or conclusions that are relevant in relation to the bias. Use different data sets for training and test, test, test!
atomicity
The transaction property that requires all parts of a transaction to be treated as a single, indivisible, logical unit of work. All parts of a transaction must be completed or the entire transaction is aborted.
Information
The transformation of raw data into useful facts
Information
The transformation of raw data into useful facts.
Business Intelligence
The transformation of raw data into useful information.
connectivity
The type of relationship between entities. Classifications include 1:1, 1:M, and M:N.
physical data format
The way a computer "sees" (stores) data.
logical data format
The way a person views data within the context of a problem domain.
Data management
There are two aspects of data management: data definition and data manipulation.
3 Vs
Three basic characteristics of Big Data databases: volume, velocity, and variety.
Unary Relationships
Unary relationships associate occurrences of an entity type with other occurrences of the same entity type.
Data Volume Assessment
Understanding of how much data will be in a database or a table within a database
Primary Key
Uniquely identifies each record in the table.
predicate logic
Used extensively in mathematics to provide a framework in which an assertion (statement of fact) can be verified as either true or false.
Index
Used to help a DBMS find data quickly
Cartesian product
Usually the result of a missing join condition or a method of expanding the data of 1 table by the number of rows in the second table.
Which type of entity is also called a dependent entity?
Weak entity
social media
Web and mobile technologies that enable "anywhere, anytime, always on" human interactions.
ETL Stages - Extraction
What data should be extracted? How should that data be extracted?
data encryption
When data is encrypted, it is changed, bit by bit or character by character, into a form that looks totally garbled. It can and must be reconverted, or decrypted, back to its original form to be of use.
Subquery
When one SELECT statement is "nested" within another in a format, it is known as subquery.
Cascade Delete
Will delete all records that reference the primary key
Restrict Delete
Will not allow deletes if the primary key is referenced
Set-to-Null on Delete
Will set values to null when primary key is deleted
functional dependence
Within a relation R, an attribute B is functionally dependent on an attribute A if and only if a given value of attribute A determines exactly one value of attribute B. The relationship "B is dependent on A" is equivalent to "A determines B," and is written as A B.
World Wide Web (WWW or the web)
Worldwide network collection of specially formatted and interconnected documents known as Web pages. Also called the Web.
Flat Files
a file having no internal hierarchy
Affinity grouping
is a process of evaluating relationships or associations between data elements that demonstrate some kind of affinity between objects.
attribute
is a property or characteristic of an entity. Examples of attributes include an employee's employee number, the weight of an automobile, a company's address, or the date of a sales meeting.
entity
is an object or event in our environment that we want to keep track of. A person is an entity. So is a building, a piece of inventory sitting on a shelf, a finished product ready for sale, and a sales meeting (an event).
The difference between clustering and classification
is that during the clustering task, the classes are not defined beforehand.
scalar subquery
is the most restrictive subquery because it produces only a single value
data warehouse
is the primary source of information that feeds the analytical processing within an organization.
cardinalities
maximums
modalities
minimums
role
n Oracle, a named collection of database access privileges that authorize a user to connect to a database and use its system resources.
IS NULL
n SQL, a comparison operator used to check whether an attribute has a value.
dimensional model
standard for representing and managing data in a data warehouse
referential integrity problems can surface in any of the three operations
that result in changes to the database—insert, delete, and update records
NOT
A SQL logical operator that negates a given predicate.
Binary relationship
A binary relationship is a relationship between two entity types.
roll up
(1) To aggregate data into summarized components, that is, higher levels of aggregation. (2) In SQL, an OLAP extension used with the GROUP BY clause to aggregate data by different dimensions. Rolling up the data is the exact opposite of drilling down the data.
Foreign key
(Not Unique identifer)If, in a collection of relations that make up a relational database, an attribute or group of attributes serves as the primary key of one relation and also appears in another relation, then it is called a foreign key in that other relation.
DELETE command
- Specify which row(s) of a table are to be deleted based on data values within those rows.
Boolean algebra
A branch of mathematics that uses the logical operators OR, AND, and NOT.
Common DDL commands:
- CREATE - ALTER - DROP - TRUNCATE - RENAME
What is the positioning and meaning for Cardinality and Modality on an ER model?
- Cardinality is the outer symbol; represents the maximum. - Modality is the inner symbol; represents the minimum.
DROP VIEW command
- Discards views.
What are the values of Business Intelligence?
- Financial value associated w/ increased profitability. - Productivity value associated with increased throughput. - Trust value (customer, employee, supplier satisfaction) as well as increased confidence in forecasting. - Risk value
Define the delete rule CASCADE...
- If the delete rule between two relations is CASCADE and an attempt is made to delete a record on the "one side" of the relationship, not only will the record be deleted but all of the records on the "many side" of the relationship that have a matching foreign key value will also be deleted. - In other words, the delete will "cascade" from one relation to the other.
Define the delete rule RESTRICT...
- If the delete rule between two relations is RESTRICT and an attempt is made to delete a record on the "one side" of the one-to-many relationship, the system will forbid the delete to take place if there are any matching foreign key values in the relation on the "many side".
LIKE operator
- It allows you to specify partial character strings in a "wildcard" sense.
OR operator
- It displays a record it either the first condition OR the second condition is true.
* in the SELECT clause
- It indicates that all attributes of the selected row are to be retrieved
JOIN clause
- It is used to combine rows from more than one table, based on a common field between them.
Describe the ER model for "(L to R) A product can be part of no other products or part of several other products." and "(R to L) A product can be composed of no other products or be composed of several other products." - What is the name of this model?
- Many-to-many unary relationship
Natural Join
- Matches each row in a table against each row in another table based on common values found in columns sharing a common name and data type.
What does 'refer' in Referential Integrity imply?
- Referential integrity revolves around the circumstance of trying to refer to data in one relation in the database, based on values in another relation.
Tuple
- Rows are referred to as tuples.
Common DML commands:
- SELECT - INSERT - UPDATE - DELETE - MERGE
What are the reasons for using the Dimensional Model for BI?
- Simplicity. - Lack of bias. - Extensibility.
CREATE VIEW command
- Specifies the base tables on which the view is to be based and the attributes and rows of the table that are to be included in the view.
What are four key differences between a RELATION and a FILE?
- The columns of a relation can be arranged in any order w/o affecting the meaning of the data. That is not true of a file. - Similarly, the rows of a relation can be arranged in any order, which is not true of a file. - Every row/column position, sometimes referred to as a "cell", can have only a single value, which is not necessarily true in a file. - No two rows of a relation are identical, which is not necessarily true in a file.
CREATE TABLE command
- The command that creates base tables and tells the system what attributes will be in them.
Normalization
- The process of organizing the fields and tables of a relational database to minimize redundancy (duplication) and dependency.
Second Normal Form
- The second step in normalizing a database. - Builds on the first normal form. - All non-key attributes must be fully functionally dependent on the entire key of that table.
Foreign Key
- When an attribute or group of attributes serves as the primary key of one relation and also appears in another relation.
Subquery
- When on SELECT statement is "nested" within another in a format, it is known as subquery.
Concurrency Problem
- When two or more users are trying to update the same record simultaneously.
active-passive arrays
- aka midrange storage system - provide optimal storage solutions at a lower cost - less storage capacity and cache than high-end storage - fewer front-end ports - ensure high redundancy and high performance for applications with predictable workloads
RAID 1
- based on the mirroring technique - consists of two drives and every write is written to both drives - good option for applications that require high availability and cost is no constraint
RAID 0
- configuration uses data striping - data is striped across all disks within the RAID set - uses the full storage capacity of a RAID set - good option for applications that need high i/o throughput
What are the four key components of an intelligent storage system?
- front end - cache - back end - physical disks
DROP TABLE
A SQL command used to delete database objects such as tables, views, indexes, and users.
UPDATE
A SQL command that allows attribute values to be changed in one or more rows of a table.
DELETE
A SQL command that allows data rows to be deleted from a table.
INSERT
A SQL command that allows the insertion of one or more data rows into a table.
CREATE TABLE
A SQL command that creates a table's structures using the characteristics and attributes given.
CREATE INDEX
A SQL command that creates indexes on the basis of a selected attribute or attributes.
ROLLBACK
A SQL command that restores the database table contents to the condition that existed after the last COMMIT statement.
DROP INDEX
A SQL command used to delete database objects such as tables, views, indexes, and users.
data dependence
A data condition in which data representation and manipulation are dependent on the physical data storage characteristics.
entity relationship (ER) model
A data model that describes relationships (1:1, 1:M, and M:N) among entities at the conceptual level with the help of ER diagrams. The model was developed by Peter Chen.
object-oriented data model (OODM)
A data model whose basic modeling structure is an object.
Flat Files
A file having no internal hierarchy
Hashed Files
A file that has been encrypted for security purposes
Hashed Files
A file that has been encrypted for security purposes.
Internet
A global network of computers connected together through a standard network protocol known as Transmission Control Protocol/Internet Protocol (TCP/IP). You can think of the Internet as the "highway" on which the data travel. The terms Internet and World Wide Web are often used interchangeably, but they are not synonyms.
relational diagram
A graphical representation of a relational database's entities, the attributes within those entities, and the relationships among the entities.
Hadoop Distributed File System (HDFS)
A highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speeds.
prime attribute
A key attribute; that is, an attribute that is part of a key or is the whole key. See also key attributes.
secondary key
A key used strictly for data retrieval purposes. For example, customers are not likely to know their customer number (primary key), but the combination of last name, first name, middle initial, and telephone number will probably match the appropriate table row. See also key.
Unified Modeling Language (UML)
A language based on object-oriented concepts that provides tools such as diagrams and symbols to graphically model a system.
versioning
A property of an OODBMS that allows the database to keep track of the different transformations performed on an object.
database
A shared, integrated computer structure that houses a collection of related data. A database contains two types of data: end-user data (raw facts) and metadata.
data mart
A small, single-subject data warehouse subset that provides decision support to a small group of people.
Boyce-Codd normal form (BCNF)
A special type of third normal form (3NF) in which every determinant is a candidate key. A table in BCNF must be in 3NF. See also determinant.
ternary relationship
An ER term used to describe an association (relationship) between three entities. For example, a DOCTOR prescribes a DRUG for a PATIENT.
Attribute
An attribute is a property or characteristic of an entity.
foreign key (FK)
An attribute or attributes in one table whose values must match the primary key in another table or whose values must be null. See key.
superkey
An attribute or attributes that uniquely identify each entity in a table. See key.
composite attribute
An attribute that can be further subdivided to yield additional attributes. For example, a phone number such as 615-898-2368 may be divided into an area code (615), an exchange number (898), and a four-digit code (2368). Compare to simple attribute.
multivalued attributes
An attribute that can have many values for a single entity occurrence. For example, an EMP_DEGREE attribute might store the string "BBA, MBA, PHD" to indicate three different degrees held.
single-valued attribute
An attribute that can have only one value.
atomic attribute
An attribute that cannot be further subdivided to produce meaningful components. For example, a person's last name attribute cannot be meaningfully subdivided.
simple attribute
An attribute that cannot be subdivided into meaningful components. Compare to composite attribute.
hierarchical model
An early database model whose basic concepts and characteristics formed the basis for subsequent database development. This model is based on an upside-down tree structure in which each record is called a segment. The top record is the root segment. Each segment has a 1:M relationship to the segment directly below it.
composite entity
An entity designed to transform an M:N relationship into two 1:M relationships. The composite entity's primary key comprises at least the primary keys of the entities that it connects. Also known as a bridge entity or associative entity. See also linking table.
Entity
An entity is an object or event in our environment that we want to keep track of.
weak entity
An entity that displays existence dependence and inherits the primary key of its parent entity. For example, a DEPENDENT requires the existence of an EMPLOYEE.
strong entity
An entity that is existence-independent, that is, it can exist apart from all of its related entities. Also called a regular entity.
DCL
Data control language is used to control access to data stored in a database.
DDL
Data definition language - involves instructing the DBMS software on what tables will be in the database, what attributes will be in the tables, which attributes will be indexed, and so forth.
ETL process
Get the data from the source location. Map the data from its original form into a data model that is suitable for manipulation at the staging area. Validate and clean the data. Apply any transformations to the data that are required before the data sets are loaded into the repository. Map the data from its staging area model to its loading model. Move the data set to the repository. Load the data into the warehouse.
factors that lead to redundant data across multiple files:
Data was stored in different formats in different files. Data was often not shared among different programs that needed it, necessitating the duplication of data in redundant files. Little was understood about file design, resulting in redundant data within individual files. Files often could not be rebuilt after damage by a software error or a hardware failure. Programs were usually written in such a manner that if the way that the data was stored changed, the program had to be modified to continue working. Changes in everything from access methods to tax tables required programming changes.
online analytical processing (OLAP)
Decision support system (DSS) tools that use multidimensional data analysis techniques. OLAP creates an advanced data analysis environment that supports decision making, business modeling, and operations research.
Data volatility
Describes how often stored data is updated.
Association Rules
Descriptive - discovers links or associations amongst data
Sequence Discovery
Descriptive - discovers sequential patterns
Clustering
Descriptive - groups similar data together into clusters
IN
In SQL, a comparison operator used to check whether a value is among a list of specified values.
LIKE
In SQL, a comparison operator used to check whether an attribute's text value matches a specified string pattern.
nested query
In SQL, a query that is embedded in another query. See subquery.
BETWEEN
In SQL, a special comparison operator used to check whether a value is within a range of specified values.
superclass
In a class hierarchy, the superclass is the more general classification from which the subclasses inherit data structures and behaviors.
dimension tables
In a data warehouse, tables used to search, filter, or classify facts within a star schema.
extraction, transformation, and loading (ETL)
In a data warehousing environment, the integrated processes of getting data from original sources into the data warehouse. ETL includes retrieving data from original data sources (extraction), manipulating the data into an appropriate form (transformation), and storing the data in the data warehouse (loading).
key performance indicators (KPIs)
In business intelligence, quantifiable numeric or scale-based measurements that assess a company's effectiveness or success in reaching strategic and operational goals. Examples of KPIs are product turnovers, sales by promotion, sales by employee, and earnings per share.
governance
In business intelligence, the methods for controlling and monitoring business health and promoting consistent decision making.
domain
In data modeling, the construct used to organize and describe an attribute's set of possible values.
internal model
In database modeling, a level of data abstraction that adapts the conceptual model to a specific DBMS model for implementation. The internal model is the representation of a database as "seen" by the DBMS. In other words, the internal model requires a designer to match the conceptual model's characteristics and constraints to those of the selected implementation model.
Write the SQL query to "List the customer numbers, customer names, and headquarters cities of the customers that are headquartered in New York and that have a customer number higher than 1500":
SELECT CUSTNUM, CUSTNAME, HQCITY FROM CUSTOMER WHERE HQCITY='New York' AND CUSTNUM>1500;
Write the SQL query to "List the customers, customer names, and headquarters cities of the customers that are headquartered in New York or that satisfy the two conditions of having a customer number higher than 1500 and being headquartered in Atlanta":
SELECT CUSTNUM, CUSTNAME, HQCITY FROM CUSTOMER WHERE HQCITY='New York' OR (CUSTNUM>1500 AND HQCITY='Atlanta');
Write the SQL query to "List the customer numbers, customer names, and headquarters cities of the customers that are headquartered in New York OR that have customer numbers higher than 1500":
SELECT CUSTNUM, CUSTNAME, HQCITY FROM CUSTOMER WHERE HQCITY='New York' OR CUSTNUM>1500;
Write the SQL query to "List the customer numbers and headquarters cities of all customers that have a customer number of at least 1700":
SELECT CUSTNUM, HQCITY FROM CUSTOMER WHERE CUSTNUM>=1700;
Write the SQL query to "Find the total number of units of all products that Salesperson 137 has sold":
SELECT SUM(QUANTITY) FROM SALES WHERE SPNUM=137;
Write the basic SQL query command:
SELECT<columns> FROM<table> WHERE<predicates identifying rows to be included>
In which of the normal forms should every non-key attribute be fully functionally dependent on the entire key of a table?
Second form
subclasses
See class hierarchy.
associative entity
See composite entity.
bridge entity
See composite entity.
Chen notation
See entity relationship (ER) model.
index key
See index.
nonkey attribute
See nonprime attribute.
online transaction processing (OLTP) database
See operational database.
production database
See operational database.
transactional database
See operational database.
regular entity
See strong entity.
data retrieved before database management systems
Sequentially from simple files
Inner Join
Shows row that have matches in both tables
outer join
Shows rows in one table that have no match in the other table. Two kinds of outer joins are left and right joins.
null
The absence of an attribute value. Note that a null is not a blank.
Denormalization
The act of duplicating data in a database for performance or the preservation of historical information
key attribute
The attributes that form a primary key. See also prime attribute.
knowledge
The body of information and facts about a specific subject. Knowledge implies familiarity, awareness, and understanding of information as it applies to an environment. A key characteristic is that new knowledge can be derived from old knowledge.
query result set
The collection of data rows returned by a query.
database management system (DBMS)
The collection of programs that manages the database structure and controls access to the data stored in the database.
query mode
The command goes directly to the relational DBMS, which evaluates the query and processes it against the database.
data modeling
The process of creating a specific data model for a determined problem domain.
database design
The process that yields the description of the database structure and determines the database components. The second phase of the Database Life Cycle.
entity integrity
The property of a relational table that guarantees each entity has a unique value in a primary key and that the key has no null values.
Information
The result of processing raw data to reveal its meaning. Information consists of transformed data and facilitates decision making.
determination
The role of a key. In the context of a database table, the statement "A determines B" indicates that knowing the value of attribute A means that the value of attribute B can be looked up.
second normal form (2NF)
The second stage in the normalization process, in which a relation is in 1NF and there are no partial dependencies (dependencies in only part of the primary key).