Data Management Test #1
ad hoc query
A "spur-of-the-moment" question.
data dictionary
A DBMS component that stores metadata—data about data. Thus, the data dictionary contains the data definition as well as their characteristics and relationships. A data dictionary may also include data that are external to the DBMS. Also known as an information resource dictionary.
Hadoop
A Java based, open source, high speed, fault-tolerant distributed storage and computational framework. Hadoop uses low-cost hardware to create clusters of thousands of computer nodes to store and process data.
entity set
A collection of like entities.
relational database management system (RDBMS)
A collection of programs that manages a relational database. The RDBMS software translates a user's logical requests (queries) into commands that physically locate and retrieve the requested data.
class
A collection of similar objects with shared structure (attributes) and behavior (methods). Encapsulates an object;s data representation and a method's implementation
data quality
A comprehensive approach to ensuring the accuracy, validity, and timeliness of data.
hardware independence
A condition in which a model does not depend on the hardware used in the model's implementation. Therefore, changes in the hardware will have no effect on the database design at the conceptual level.
data independence
A condition in which data access is unaffected by changes in the physical data storage characteristics.
data inconsistency
A condition in which different versions of the same data yield different (inconsistent) results.
logical independence
A condition in which the internal model can be changed without affecting the conceptual model. (The internal model is hardware-independent because it is unaffected by the computer on which the software is installed. Therefore, a change in storage devices or operating systems will not affect the internal model.)
physical independence
A condition in which the physical model can be changed without affecting the internal model.
data anomaly
A data abnormality that exists when inconsistent changes to a database have been made. For example, an employee moves, but the address change is corrected in only one file and not across all files in the database.
structural dependence
A data characteristic in which a change in the database schema affects data access, thus requiring changes in all access programs.
structural independence
A data characteristic in which changes in the database schema do not affect data access.
data dependence
A data condition in which data representation and manipulation are dependent on the physical data storage characteristics.
entity relationship (ER) model (ERM)
A data model that describes relationships (1:1, 1:M, and M:N) among entities at the conceptual level with the help of ER diagrams.
object-oriented data model (OODM)
A data model whose basic modeling structure is an object.
analytical database
A database focused primarily on storing historical data and business metrics used for tactical or strategic decision making.
XML database
A database system that stores and manages semistructured XML data.
general-purpose database
A database that contains a wide variety of data used in multiple disciplines.
discipline-specific database
A database that contains data focused on specific subject areas.
cloud database
A database that is created and maintained using cloud services, such as Microsoft Azure or Amazon AWS.
operational database, Online transaction processing (OLTP) database
A database that is designed primarily to support a company's day-to-day operations. Also known as a transnational database or production database.
multiuser database
A database that supports multiple concurrent users.
single-user database
A database that supports only one user at a time.
business rules
A description of a policy, procedure, or principle within an organization. For example, a pilot cannot be on duty for more than 10 hours during a 24-hour period, or a professor may teach up to four classes during a semester
Entity Relationship Diagram (ERD)
A diagram that depicts an entity relationship model's entities, attributes, and relations.
class diagram
A diagram used to represent data and their relationships in UML object notation.
relational diagram
A graphical representation of a relational database's entities, the attributes within those entities, and the relationships among the entities.
Hadoop Distributed File System (HDFS)
A highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speeds.
Unified Modeling Language (UML)
A language based on object-oriented concepts that provides tools such as diagrams and symbols to graphically model a system.
table (relation)
A logical construct perceived to be a two dimensional structure composed of intersecting rows (entities) and columns (attributes) that represents an entity set in the relational model.
record
A logically connected set of one or more fields that describes a person, place, or thing.
distributed database
A logically related database that is stored in two or more physically independent sites.
Extensible Markup Language (XML)
A metalanguage used to represent and manipulate data elements. Unlike other markup languages, XML permits the manipulation of a document's data elements. XML facilitates the exchange of structured documents such as orders and invoices over the Internet. Uses semi structured and unstructured data
physical model
A model in which physical characteristics such as location, path, and format are described for the data. The physical model is both hardware- and software-dependent.
extended relational data model (ERDM)
A model that includes the object-oriented model's best features in an inherently simpler relational database structural environment.
Big Data
A movement to find new and better ways to manage large amounts of web-generated data and derive business insight from it, while simultaneously providing high performance and scalability at a reasonable cost.
workgroup database
A multiuser database that usually supports fewer than 50 users or is used for a specific department in an organization.
NoSQL
A new generation of database management systems that is not based on the traditional relational database model.
query language
A nonprocedural language that is used by a DBMS to manipulate its data. An example of a query language is SQL.
entity
A person, place, thing, concept, or event for which data can be store
Structured Query Language (SQL)
A powerful and flexible relational database language composed of commands that enable users to create database and table structures, perform various types of data manipulation and data administration, and query the database to extract useful information.
data management
A process that focuses on data collection, storage, and retrieval. Common data management functions include addition, deletion, modification, and listing.
software independence
A property of any model or application that does not depend on the software used to implement it.
query
A question or task asked by an end user of a database in the form of SQL code. A specific request for data manipulation issued by the end user or the application to the DBMS.
internal schema
A representation of an internal model using the database constructs supported by the chosen database.
conceptual schema
A representation of the conceptual model, usually expressed graphically. See also conceptual model.
Crow's Foot notation
A representation of the entity relationship diagram that uses a three-pronged symbol to represent the "many" sides of the relationship.
data model
A representation, usually graphic, of a complex "real-world" data structure. Data models are used in the database design phase of the Database Life Cycle.
constraint
A restriction placed on data, usually expressed in the form of rules. For example, "A student's GPA must be between 0.00 and 4.00." Constraints are important because they help to ensure data integrity.
Business Intelligence
A set of tools and processes used to capture, collect, integrate, store, and analyze data to support business decision making.
Online Analytical Processing (OLAP)
A set of tools that provide advanced data analysis for retrieving, processing, and modeling data from the data warehouse.
database
A shared, integrated computer structure that houses a collection of related data. A database contains two types of data: end-user data (raw facts) and metadata
desktop database
A single-user database that runs on a personal computer.
data warehouse
A specialized database that stores historical and aggregated data in a format optimized for decision support.
Logical Design
A stage in the design phase that matches the conceptual design to the requirements of the selected DBMS and is therefore software-dependent. Logical design is used to translate the conceptual design into the internal model for a selected database management system, such as DB2, SQL Server, Oracle, IMS, Informix, Access, or Ingress.
performance tuning
Activities that make a database perform more efficiently in terms of storage and access speed.
object
An abstract representation of a real world entity that has a unique identity, embedded properties, and the ability to interact with other objects and itself.
relationship
An association between entities.
network model
An early data model that represented data as a collection of record types in 1:M relationships.
hierarchical model
An early database model whose basic concepts and characteristics formed the basis for subsequent database development. This model is based on an upside-down tree structure in which each record is called a segment. The top record is the root segment. Each segment has a 1:M relationship to the segment directly below it.
MapReduce
An open-source application programming interface (API) that provides fast data analytics services; one of the main Big Data technologies that allows organizations to process massive data stores.
database system
An organization of components that defines and regulates the collection, storage, management, and use of data in a database environment.
Many-to-many (M:N or *..*) relationship
Associations among two or more entities in which one occurrence of an entity is associated with many occurrences of a related entity and one occurrence of the related entity is associated with many occurrences of the first entity.
one-to-one (1:1 or 1..1) relationship
Associations among two or more entities that are used by data models.
one-to-many (1:M or 1..*) relationship
Associations among two or more entities that are used by data models. In a 1:M relationship, one entity instance is associated with many instances of the related entity.
Metadata
Data about data; that is, data about data characteristics and relationships. See also data dictionary
object-oriented database management systems (OODBMS)
Data management software used to manage data in an object-oriented database model
unstructured data
Data that exists in its original, raw state; that is, in the format in which it was collected.
semistructured data
Data that has already been processed to some extent.
structured data
Data that has been formatted to facilitate storage, use, and information generation.
relational model
Developed by E. F. Codd (of IBM) in 1970, it represents a major breakthrough for users and designers because of its conceptual simplicity. The relational model, based on mathematical set theory, represents data as independent relations. Each relation (table) is conceptually represented as a matrix of intersecting rows and columns. The relations are related to each other through the sharing of common entity characteristics (values in columns).
data redundancy
Exists when the same data is stored unnecessarily at different places.
data integrity
In a relational database, a condition in which the data in the database complies with all entity and referential integrity constraints.
internal model
In database modeling, a level of data abstraction that adapts the conceptual model to a specific DBMS model for implementation. The internal model is the representation of a database as "seen" by the DBMS. In other words, the internal model requires a designer to match the conceptual model's characteristics and constraints to those of the selected implementation model.
segment
In the hierarchical data model, the equivalent of a file system's record type.
method
In the object-oriented data model, a named set of instructions to perform an action. Methods represent real-world actions, and are invoked through messages.
Inheritance
In the object-oriented data model, the ability of an object to inherit the data structure and methods of the classes above it in the class hierarchy.
islands of information
In the old file system environment, pools of independent, often duplicated, and inconsistent data created and managed by different departments.
tuple
In the relational model, a table row.
client node
One of three types of nodes used in the Hadoop Distributed File System (HDFS). The client node acts as the interface between the user application and the HDFS. See also name node and data node.
data node
One of three types of nodes used in the Hadoop Distributed File System (HDFS). The data node stores fixed-size data blocks (that could be replicated to other data nodes). See also client node and name node.
name node
One of three types of nodes used in the Hadoop Distributed File System (HDFS). The name node stores all the metadata about the file system. See also client node and data node.
Data
Raw facts, or facts that have not yet been processed to reveal their meaning to the end user;
external model
The application programmer's view of the data environment. Given its business focus, an external model works with a data subset of the global database schema.
knowledge
The body of information and facts about a specific subject. Knowledge implies familiarity, awareness, and understanding of information as it applies to an environment. A key characteristic is that new knowledge can be derived from old knowledge.
query result set
The collection of data rows returned by a query.
Database Management System (DBMS)
The collection of programs that manages the database structure and controls access to the data stored in the database.
semantic data model
The first of a series of data models that more closely represented the real world, modeling both data and their relationships in a single structure known as an object.
American National Standards Institute (ANSI)
The group that accepted the DBTG recommendations and augmented database standards in 1975 through its SPARC committee.
Data Definition Language (DDL)
The language that allows a database administrator to define the database structure, schema, and subschema.
class hierarchy
The organization of classes in a hierarchical tree in which each parent class is a superclass and each child class is a subclass. See also inheritance.
conceptual model
The output of the conceptual design process. Provides a global view of an entire database and describes the main data objects, avoiding details.
enterprise database
The overall company data representation, which provides support for present and expected future needs.
data processing (DP) specialist
The person responsible for developing and managing a computerized file processing system.
subschema
The portion of the database that interacts with application programs.
Data Modeling
The process of creating a specific data model for a determined problem domain.
database design
The process that yields the description of the database structure and determines the database components. The second phase of the Database Life Cycle.
information
The result of processing raw data to reveal its meaning. Information consists of transformed data and facilitates decision making.
Data Manipulation Language (DML)
The set of commands that allows an end user to manipulate the data in the database. The commands include SELECT, INSERT, UPDATE, DELETE, COMMIT, and ROLLBACK.
class diagram notation
The set of symbols used in the creation of class diagrams.
External Schema
The specific representation of an external view, that is, the end user's view of the data environment.
Connectivity
The type of relationship between entities. Classifications include 1:1, 1:M, and M:N.
physical data format
The way a computer "sees" (stores) data.
logical data format
The way a person views data within the context of a problem domain.
3 Vs
Three basic characteristics of Big Data databases: volume, velocity, and variety.
social media
Web and mobile technologies that enable "anywhere, anytime, always on" human interactions.
field
a character or group of characters (alphabetic or numeric) that has a specific meaning. A field is used to define and store data
attribute
a characteristic of an entity or object. Has a name and data type
file
a collection of related records.
centralized database
a database located at a single site
Schema
a logical grouping of database objects, such as tables, indexes, views, and queries, that are related to each other
entity instance (entity occurrence)
a row in a relational table
data type
defines the kind of values that can be used or stored. Also, used in programming languages and database systems to determine the operations that can be applied to such data