Data Management Foundations - C175
- Prime Attribute - Nonprime attribute
- In normalization terminology, any attribute that is at least part of a key - An attribute that is not part of a key
Between objects
Where does affinity grouping occur in data mining?
- DATE () - SYSDATE - AND, OR, and NOT - Boolean Algebra
- What function returns the current date in MS Access? - What about Oracle? - What are the three main logical operators in SQL? - What is the name of a branch of mathematics that uses these operators?
API (Application Programming Interface)
Software through which programmers interact with middleware. Allows use of generic SQL code, allowing client processes to be database server-independent.
Flags
Special codes implemented by designers to trigger a required response, alert end users to specified conditions, or encode values. Can also be used to prevent Nulls by bringing attention to the absence of a value in a table.
Index File
The concept of this file is similar to the concept of an index in the back of a book. It contains keywords and phrases that are located in a particular file. It also contains a pointer to the location in the file where those keywords can be found.
Data Management
A process that focuses on data generation, storage, and retrieval. Common functions include addition, deletion, modification, and listing.
UML (Unified Modeling Language)
Object-oriented data models are typically depicted using this type of class diagram. This is a language based on OO concepts that describes a set of diagrams and symbols you can use to graphically model a system.
DSS (Decision Support System)
The precursor to the modern BI environment; it is an arrangement of computerized tools used to assist managerial decision making
Identifiers (Primary Keys)
These are underlined in an ER diagram
Flat File
These are unstructured, plain text or binary files. To access or edit data, the entire file is read into computer memory. After the database options have completed, the entire file is then written back out.
Data Dictionary
This compiles all of the metadata about the data elements in the data model
Data Inconsistency
This exists when different versions of the same data appears in different places. For example: when a company's sales department stores a sales representative's name as Bill Brown and the company's personnel department stores that same person's name as William G. Brown
Required Attribute
This is an attribute that must have a value
1. Volume: amounts of data being stored. 2. Velocity: speed data grows and need to process data quickly 3. Variety: data being collected comes in multiple different formats
What are the basic characteristics of Big Data databases that are known as the "3 V's"?
SQL (Structured Query Language)
This is the de facto query language and data access standard supported by the majority of DBMS vendors. It lets the user specify what must be done without specifying how to do it.
Entity Types
What is the uniquely identifiable element about which data can be categorized in an entity-relationship diagram?
1. Delete Rules: involved when a table record is considered for deletion 2. Update Rules: when an existing record is about to be updated 3. Insert Rules: when a new record is about to be inserted into a table
What three types of rules do Relational DBMSs put in place to help enforce referential integrity?
- ORDER BY - Cascading Order Sequence - DISTINCT - SELECT DISTINCT Department
- A SQL clause that is useful for ordering the output of a SELECT query (for example, in ascending or descending order). - What is it called when you use the above in order to create a nested ordering sequence for a set of rows, such as a list organized alphabetically by last name, then first name, then middle initial? - What operator would filter out redundant results so, for example, if we only wanted a list of School Departments to appear only once instead of each time a record contains the same Department? - How would you write this last example?
- Structural Independence - Data independence - Logical data format - Physical data format
- A data characteristic in which changes in the database schema do not affect an application's ability to access data. - This exists when you can change the data storage characteristics without affecting the program's ability to access the data. - How a human being views the data - How the computer must work with the data
- MDBMS (Multidimensional Database Management System) - VLDBs (Very Large Databases)
- A database management system that uses proprietary techniques to store data in matrixlike arrays of n dimensions known as cubes. - A database that contains a huge amounts of data, sometimes into the petabyte range.
- Data Mart - Data Warehouse - Data Visualization
- A small, single-subject data warehouse subset that provides decision support to a small group of people. - An integrated, subject-oriented, time-variant, nonvolatile collection of data that provides support for decision making. - Abstracting data to provide information in a visual format that enhances the user's ability to effectively comprehend the meaning of the data.
- Atomic Attribute (see also: simple attribute) - Ganularity; data stored at its lowest level is said to be Atomic Data
- An attribute that cannot be further subdivided to produce meaningful components. For example, a person's last name attribute cannot be meaningfully subdivided. - This refers to the level of detail represented by the values stored in a table's row
- Composite Entity, aka Bridge or Associative Entity. For example, the M:N relationship of STUDENT to CLASS can be change to a 1:M with the inclusion of an ENROLL table. - a Linking Table
- An entity designed to transform an M:N relationship into two 1:M relationships. Its primary key comprises at least the primary keys of the entities that it connects. - What is the implementation of this called in table form?
- Existent Dependent - Existent Independent - Strong Entity or Regular Entity - Crow's Foot since they are used extensively to design databases. The Chen notation is conceptual only and does not distinguish between weak or strong relationships
- An entity is said to be this if it can exist in the database only when it is associated with another related entity occurrence. A sign of this is if it has a mandatory foreign key—that is, a foreign key attribute that cannot be null. - This is when an entity can exist apart from all of its related entities. - What are two other names for the latter? - Which type of notation do these concepts go with?
- ROLAP (Relational Online Analytical Processing) - MOLAP (Multidimensional Online Analytical Processing)
- Analytical processing functions that use relational databases and familiar relational query tools to store and analyze multidimensional data. Supports medium to large databases with unlimited dimensions. - This is similar to the above but for MDMSs (Multidimensional Database Management Systems). Uses business techniques to store data in matrix like arrays known as data cubes. Supports large data sets with predefined dimensions.
- Dashboards - Portals
- In BI (Business Intelligence), a web-based system that presents key business performance indicators or information in a single, integrated view with clear and concise graphics. - In BI, a unified, single point of entry for information distribution
- Dimension Tables
- In a data warehouse, tables used to search, filter, or classify facts within a star schema.
- Repeating Groups - A Relational Table must not contain any Repeating Groups
- In a relation, this is a characteristic describing a group of multiple entries of the same type for a single key attribute occurrence. For example, a car can have multiple colors for its top, interior, bottom, trim, and so on - How do these interact with Relational Tables
- Tuple - Domain
- In a relational model, this is the name for a table row. - In data modeling, this is the construct used to organize and describe an attribute's set of possible values; it is the specific range of permissible values for each column in a relational table
- Functional Dependence: The attribute B is fully functionally dependent on the attribute A if each value of A determines one and only one value of B. For example: PROJ_NUM → PROJ_NAME - Determinant (PROJ_NUM) - Dependent (PROJ_NAME)
- Occurs when the value of one or more attributes determines the value of one or more other attributes; when a specific terminology and notation is used to describe relationships based on determination. - This is the attribute whose value determines another - This is the attribute whose value is determined by the other attribute
- ALTER TABLE - COMMIT - COUNT - DELETE - DROP
- The SQL command used to make changes to table structure. When the command is followed by a keyword (ADD or MODIFY), it adds a column or changes column characteristics. - The SQL command that permanently writes changes to a datbase - A SQL aggregate function that outputs the number of rows containing not null values for a given column or expression, sometimes used in conjunction with the DISTINCT clause. - A SQL command that allows rows to be deleted from a table - A SQL command that is used to delete an index or table
- AND - GROUP BY - HAVING - ROLLBACK
- The SQL logical operator used to link multiple conditional expressions in a WHERE or HAVING clause. It requires that all conditional expressions evaluate to true. - A SQL clause used to create frequency distributions when combined with any of the aggregate functions in a SELECT statement - A command applied to the output of the above clause which restricts selected rows based on a condition - This command restores the database table contents to the condition that existed after the last COMMIT statement
- Data Cube - Cube Cache - Sparsity
- The multidimensional data structure used to store and manipulate data in a multidimensional DBMS. The location of each data value is based on its x-, y-, and z-axes. These are static, meaning they must be created before they are used, so they cannot be created by an ad hoc query. - The reserved memory area where the above are held in multidemensional OLAP - This is the measurement of the data density of the first item
- OLAP (online analytical processing)
- These are DSS (Decision Support System) tools that use multidimensional data analysis techniques.
- Data - Field - Record - File
- These are raw facts, such as a telephone number, a birth date, a customer name, and a year-to-date (YTD) sales value. This has little meaning unless it has been organized in some logical manner. - In a File system, a character or group of characters (alphabetic or numeric) that has a specific meaning. It is used to define and store data. - In a File system, a logically connected set of one or more fields that describes a person, place, or thing. - In a File system, a collection of related records.
- SELECT - FROM - WHERE - An example might look like this: SELECT EMP_NAME, EMP_NUMBER FROM EMPLOYEES WHERE SALARY <= 50000
- This SQL statement retrieves all rows that match the specified condition(s) - This SQL clause identifies which table to look in - This SQL clause adds conditional restrictions to the first statement that limit the rows returned by the query. The conditional statements are separated by logical operators
- Key - Determination - PK (Primary Key)
- This consists of one or more attributes that determine other attributes. For example, an invoice number identifies all of the invoice attributes, such as the invoice date and the customer name. - This is the state in which knowing the value of one attribute makes it possible to determine the value of another. The role of the above item is based on this concept. An example is being able to determine Profit if you are provided Revenue and Cost. - An attribute or combination of attributes that uniquely identifies any given row in a table.
- Data - Information - Knowledge
- This consists of raw facts which have not yet been processed to reveal their meaning. - This is the result of processing the above to reveal its meaning - This is the body of information and facts about a specific subject
- Modality - Modality 1 is a straight line, while Modality 0 has a little circle put on the straight line
- This denotes an instance of a specific entity is optional or mandatory in a relationship. For example, every OPEN ORDER must have a CUSTOMER, but not every CUSTOMER needs to have an OPEN ORDER. - How is this expressed in Crow's Foot notation?
- Data anomaly 1. Update anomaly: when a change to data requires hundreds or more manual updates in records, this is easy to occur 2. Insertion anomaly: if only one file exists and you need to add a new sales agent, for example, you would have to also create a dummy customer data entry to reflect the new agent's creation. 3. Deletion anomaly: in a flat file, deleting customers would also delete the information of an active sales agent.
- This develops when not all of the required changes in the redundant data are made successfully. - What are three common types of the above?
- Data redundancy 1. Poor data security 2. Data inconsistency: when different and conflicting versions of the same data appear in different places. 3. Data entry errors 4. Data integrity problems
- This exists when the same data is stored unnecessarily at different places. - What are four problems this can lead to?
- Relationship Degree 1. A binary relationship: when two entities participate, and is the most common relationship degree. 2. A unary relationship: when both participants in the relationship are the same entity; an association is maintained within a single entity 3. A ternary relationship: when three entities participate in the relationship.
- This indicates the number of entities or participants associated with a relationship. - What are the three most common types of these in ER models?
- Weak Entity - The weak entity will have a double-walled rectangle - There will be a "PK/FK" designation inside the rectangle
- This is an entity that meets two conditions: 1. It is existence dependent; cannot exist w/out the entity with which it has a relationship, and 2. It has a primary key that is partially or totally derived from the parent entity in the relationship - How is this distinguished in the Chen notation? - The Crow's Foot notation?
- Hadoop - MapReduce - NoSQL
- This is a Java based, open source, high speed, fault-tolerant distributed storage and computational framework. It uses low-cost hardware to create clusters of thousands of computer nodes to store and process data. - This is an open source application programming interface (API) that provides fast data analytics services. It distributes the processing of the data among thousands of nodes in parallel. - This is a large-scale distributed database system that stores structured and unstructured data in efficient ways.
- Business Rule - Generally speaking, a noun in a business rule will translate into an entity in the model, and a verb (active or passive) that associates the nouns will translate into a relationship among the entities. For example, the business rule "a customer may generate many invoices" contains two nouns (customerand invoices) and a verb (generate) that associates the nouns. "Customer" and "invoices" are objects of interest and there is a "generate" relationship between them.
- This is a brief, precise, and unambiguous description of a policy, procedure, or principle within a specific organization. Properly written, they are used to define entities, attributes, relationships, and constraints. For example: "an agent can serve many customers, and each customer can be served by only one agent." - What is a general rule for how these define things like entities?
- HDFS (Hadoop Distributed File System)
- This is a highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speeds; uses the write once, read many model. Uses three types of nodes: a Name Node that stores all the metadata about the file system, a Data Node that stores fixed size data blocks, and a Client Node that acts as the interface with the user.
- Composite Key - Key Attribute - Superkey - Candidate Key - FK (Foreign Key) - Secondary Key
- This is a key that is composed of more than one attribute. - An attribute that is a part of a key is called a___ - This is a key that can uniquely identify any row in the table; it functionally determines every attribute in the row. - This is a minimal, irreducible type of the above key—that is, it is without any unnecessary attributes. It is based on a full functional dependency. - This is the primary key of one table that has been placed into another table to create a common attribute. - This is a key that is used strictly for data retrieval purposes. An example would be a customer using their phone number instead of customer ID #.
- Composite Identifier - Simple Attribute - Composite Attribute - Single-Valued Attribute - Multivalued Attribute - Derived or Computed Attribute
- This is a primary key composed of more than one attribute - This is an attribute that cannot be further subdivided - This is an attribute that can be further subdivided into additional attributes; for example, an ADDRESS can be subdivided into street, state, and zip code - This is an attribute that can have only a single value; for example, a car's VIN - This is an attribute that can have many values; for example, a person who has several certifications - This is an attribute whose value is calculated from other attributes and they don't need to be physically stored in the database; for example, EMPLOYEE_AGE can be found by computing the difference between the current date and EMP_DOB
- Normalization - 1NF, 2NF, 3NF (best) - Generally, the higher the normal form, the more relational join operations you need to produce a specified output. Also, more resources are required by the database system to respond to end-user queries. - Denormalization; this produces a lower normal form. It can increase performance, but lead to greater data redundancy.
- This is a process for evaluating and correcting table structures to minimize data redundancies, thereby reducing the likelihood of data anomalies. - What are the first three stages of this called and which one is considered best from a structural view? - What is the drawback of using the higher levels? - What process will let you overcome these limitations if the database has performance requirements?
- Relational Diagram
- This is a representation of the relational database's entities, the attributes within those entities, and the relationships between those entities.
- Database - Metadata - DBMS (Database Management System)
- This is a shared, integrated computer structure that stores a collection of end-user data and metadata. - This is data about data, through which the end-user data is integrated and managed. It describes the data characteristics and the set of relationships that links the data found within the database - This is a collection of programs that manages the database structure and controls access to the data stored in the database; serves as the intermediary between the user and the database.
- XML (Extensible Markup Language) - NoSQL (Not Only SQL)
- This is a special language used to represent and manipulate data elements in a textual format. It is being used to address unstructured and semistructured data storage and management needs with its own type of database - This term is generally used to describe a new generation of database management systems that is not based on the traditional relational database model. Used with companies like FB, Twitter, LinkedIn, etc that have tons of new data coming in every day. These are databases designed to handle the unprecedented volume of data, variety of data types and structures, and velocity of data operations that are characteristic of these new business requirements.
- Query - Ad-hoc query - Query result set - Data quality
- This is a specific request issued to the DBMS for data manipulation—for example, to read or update the data. - This is a spur of the moment question - The DBMS sends back this answer to the application that is asking the question - This is a comprehensive approach to promoting the accuracy, validity, and timeliness of data
- Business Intelligence (BI) - MDM (Master Data Management)
- This is a term that describes a comprehensive, cohesive, and integrated set of tools and processes used to capture, collect, integrate, store, and analyze data with the purpose of generating and presenting information to support business decision making. - This is a collection of concepts, techniques, and processes for the proper identification, definition, and management of data elements within an organization.
- Alias
- This is an alternate name given to a column or table in any SQL statement. For example, changing the default name if you are exporting a value calculated from existing columns & rows
- Use the "does not equal" sign. Depending on your version this may be "<>" or "!=" - the "*"
- What is a workaround if your version of SQL doesn't support the use of the logical NOT statement? - What character would retrieve all of the columns from a table if the conditions of the SQL statement are met?
- ERDM (Extended Relational Data Model) - O/R DBMS (Object/Relational Database Management System)
- This is an evolution of the relational model which adds many of the OO model's features within the inherently simpler relational database structure. It gave birth to a new generation of relational databases that support OO features such as objects, extensible data types based on classes, and inheritance. - A DBMS that is based on the above model is referred to as what?
- An Index; for example, a resource that would let you find a book in a library without having to wander around hoping you stumble on the correct book - Index Key
- This is an orderly arrangement used to logically access rows in a table. - This is the reference point of the above; an ordered arrangement of keys and pointers
- Entity Integrity 1. All of the values in the primary key must be unique 2. No key attribute in the primary key can contain a null (reminder: a null is no value at all, NOT a zero or a space). - Null - Referential Integrity
- This is is the condition in which each row (entity instance) in the table has its own unique identity. - To ensure the above, what are the two requirements that the primary key must have? - This is the absence of any data value and should be avoided as they are problematic with the relational model. - This is the condition in which every reference to an entity instance by another entity instance is valid
- Connectivity - Cardinality - In terms of (x,y) with x being min and y being max. so next to COURSES would be (1,4) since a professor much teach at least one course and can do a maximum of 4 courses, and next to PROFESSOR would be (1,1) since each course can only be taught by one professor - Through Business Rules
- This is the classification of the relationship between entities such as 1:1, 1:M, or M:N - This expresses the minimum and maximum number of entity occurrences associated with one occurrence of the related entity - Given that one PROFESSOR can teach up to four COURSES, how would the above be expressed? - How are these established?
- Schema - Subschema - DML (Data Manipulation Language) - DDL (Data Definition Language)
- This is the conceptual organization of the entire database as viewed by the database administrator - This defines the portion of the database "seen" by the application programs that actually produce the desired information from the data within the database. - This defines the environment in which data can be managed and is used to work with the data in the database. - This enables the database administrator to define the schema components.
- Data Modeling - Data Model - Problem Domain
- This is the first step in designing a database; it refers to the process of creating a specific data model for a determined problem domain. - This is a relatively simple representation, usually graphical, of more complex real-world data structures. - This is a clearly defined area within the real-world environment, with a well-defined scope and boundaries that will be systematically addressed.
- Data Warehouse - OLAP (Online Analytical Processing) - Business Intelligence
- This major part of an analytical database is a specialized database that stores data in a format optimized for decision support. It contains historical data obtained from the operational databases as well as data from other external sources. - The second major part of an analytical database; a set of tools that work together to provide an advanced data analysis environment for retrieving, processing, and modeling data from the data warehouse. - This second area has grown so much that it has evolved into its own discipline called ______. It is a comprehensive approach to capture and process business data with the purpose of generating information to support business decision making.
- Optional Relationship Participation - Mandatory Relationship Participation
- This means that one entity occurrence does not require a corresponding entity occurrence in a particular relationship. - This is the opposite and means that one entity occurrence requires a corresponding entity occurrence in a particular relationship.
- Chen - Crow's Foot - UML
- This notation of modeling is used primarily for Conceptual models - This notation is used primarily for implementation models - This notation can be used either for conceptual or implementation modeling
- Cascade Delete Rule - Restrict Delete Rules
- This rule states that if an attempt is made to delete a record in one table where one or more records with matching foreign key values exist in another table, all associated records will be deleted. - This rule states that in the above situation, that the delete operation will not be allowed. - This rule states that in the same situation, then the foreign key values are set to NULL so we know that the record they used to point to has been deleted. This would prevent something like a unique Customer ID being issued out twice when the original customer was deleted.
- Intersection Data - Associative Entity
- This type of data describes an attribute of a many-to-many relationship; it describes the relationship between two entities and is data which would not make sense to place as a description or characteristic of one particular entity alone. An example would be how many units of a particular product a particular salesman has sold. That number would not make sense to place with just the salesman or the product, but describes the relationship of each. - This is a special type of entity created by utilizing the type of data above and contains attributes particular to the many-to-many relationship of two other entities.
- Operational, Online Transaction Processing (OLTP), Transactional, or Production database - Analytical database
- This type of database is designed primarily to support a company's day-to-day operations - This type focuses primarily on storing historical data and business metrics used exclusively for tactical or strategic decision making; they allow the end user to perform advanced analysis of business data using sophisticated tools.
- A Derived Attribute; the Crow's Foot notation does not distinguish this - A Weak Relationship
- What does a dotted line signify in the Chen notation? - What does a dotted line signify in the Crow's Foot notation?
- Weak or Non-Identifying Relationship - Strong or Identifying Relationship
- What exists if the primary key of the related entity does not contain a primary key component of the parent entity? For example, if a COURSE entity has a PK of CRS_CODE, but CRS_CODE is only a Foreign Key in the related entity CLASS and not part of its PK - What exists when the opposite occurs and the primary key of the related entity contains a primary key component of the parent entity? To use the above example, it would be if CLASS did inherit the CRS_CODE as part of its PK
- LIKE - "%" or "*" depending on DBMS - "_" (underscore) or "?" depending on DBMS; an example of use for this would be if you weren't sure if an employee spelled their name "Johnson" or "Johnsen", you would write "WHERE EMP_NAME LIKE 'Johns_n' ;" - WHERE EMP_NAME LIKE 'J%' ;
- What special operator in SQL is used in conjunction with wildcards to find patterns within string attributes? - What wildcard character means that any and all following or preceding characters are eligible? - What wildcard character means any one character may be substituted? - How would this be written to find all employee names that start with the letter "J"?
- BETWEEN - IS NULL - LIKE - IN - EXISTS
- What special operator is used to check whether an attribute value is within a range? - Used to check whether an attribute has a value or not? - Used to check whether an attribute value matches a given string pattern? - Used to check whether an attribute value matches any value within a value list? - Used to check whether a subquery returns any rows?
1. All of the key attributes are defined 2. There are no repeating groups in the table, which means each row contains one and only one value per cell 3. All attributes are dependent on the Primary Key - When it is in 1NF and also includes no Partial Dependencies, that is no attribute is dependent on one part of a composite key (2NF can still contain transitive dependencies) - When it is in 2NF and contains no transitive dependencies - BCNF (Boyce-Codd Normal Form)
- What three things make a table considered to be in 1NF of normalization? - When is a table considered to be in 2NF? - When is a table considered to be in 3NF? - This is a special type of 3NF in which every determinant is a candidate key
- O/R DBMS - OO DBMS - XML (Extensible Markup Language)
- What type of DBMS represents the dominant market share of OLTP and OLAP database applications? - What type of DBMS is popular in niche markets such as computer-aided drawing/computer-aided manufacturing (CAD/CAM), geographic information systems (GIS), telecommunications, and multimedia, which require support for more complex objects? - This has emerged as the de facto standard for the efficient and effective exchange of structured, semistructured, and unstructured data.
- Single user database - Workgroup database - Enterprise database - Centralized database - Distributed database - Cloud database - General-purpose databases - Discipline-specific databases
- With this type of database, if one user is using it, any other users must wait until the first user is done - This database supports small numbers of users, usually less than 50 or a specific department in an organization - This database is used by many users or everyone within an organization - This database supports data located at a single site' - This type supports data distributed across several sites - This type of database has growing in popularity in recent years; examples are AWS and Microsoft Azure - These contain a wide variety of data used in multiple disciplines - These focus focused on specific subject areas. They are used mainly for academic or research purposes within a small set of disciplines.
Recursive Relationship
A relationship found within a single entity type. For example, an EMPLOYEE is married to an EMPLOYEE or a PART is a component of another PART. This can be 1:1, 1:M, or M:N
ADT (Abstract Data Type)
Data type that describes a set of similar objects with shared and encapsulated data representation and methods. It is generally used to describe complex objects.
A domain of values
What is necessary for a primary key in one relation of a database to match with its corresponding foreign key in another relation of the same database?
1. Flat Files: files having no internal hierarchy 2. Heap files: files containing an unsorted set of records that are uniquely identified by a record ID which allows them to be inserted or deleted by that ID. 3. Index Files: files that store a list of lookup field values from a data file, along with the location (address) in the data file of the corresponding record. 4. Hashed Files: files which use a hash function to decide where a record should be placed on a disk. This allows for a faster data lookup without the use of an index file.
What are four different file types that can be utilized in an older, file-based system?
1. Entities: a person, place, thing, or event about which data will be collected and stored. It is "distinguishable" which means each occurrence is unique. They are usually capitalized to make them easily identifiable. For example, an entity called "CUSTOMER" would have many unique occurrences 2. Attributes: a characteristic of an entity. Has a name and a data type. For example: CUSTOMER last name, first name, zip code, etc. Equivalent of Fields in file systems 3. Relationships: describes an association among entities. For example, a relationship exists between customers and agents that can be described as follows: an agent can serve many customers, and each customer may be served by one agent. Can be one-to-many, many-to-many, or one-to-one. These are bi-directional; for example of 1:M, one CUSTOMER can generate many INVOICEs, but each INVOICE can only be generated by one CUSTOMER. 4. Constraints: a restriction placed on the data. They help to ensure data integrity. For example: a student's GPA must be between 0.00 and 4.00.
What are the four building blocks of all data models?
1. Objects: an abstraction of a real-world entity; these are roughly equivalent to Entities in the ER database model. Represents only one occurrence of an entity. 2. Attributes: these describe the properties of an object.For example, a PERSON object includes the attributes Name, Social Security Number, and Date of Birth. 3. Classes: a collection of similar objects with shared structure (attributes) and behavior (methods). Similar to an entity set from the ER model, but classes contain Methods which represent a real-world action such as "finding", "changing", or "printing" a selected PERSON's name 4. Class Hierarchy: how classes are organized. This resembles an upside-down tree in which each class has only one parent. For example, the CUSTOMER and EMPLOYEE both share a parent PERSON class. 5. Inheritance: the ability of an object within the class hierarchy to inherit the attributes and methods of the. classes above it, e.g. CUSTOMER and EMPLOYEE would inherit all attributes & methods from PERSON
What are the main parts of the OODM (Object-Oriented Data Model)?
1. ETL Tools (Data Extraction, Transformation, and Loading): these collect, filter, integrate, and aggregate internal and external data to be saved into a data store optimized for decision support; a process where data is migrated from source systems into data warehouses 2. Data Store: optimized for decision support and is generally represented by a data warehouse or a data mart. 3. Query & Reporting: performs data selection and retrieval 4. Data Visualization: presents data to the end user in a variety of meaningful and innovative ways. 5. Data Monitoring & Alerting: allows real-time monitoring of business activities. 6. Data Analytics: performs data analysis and data-mining tasks using the data in the data store
What are the six basic components of a BI (Business Intelligence) framework?
1. Name Node: stores all the metadata about the file system 2. Data Node: stores fixed size data blocks 3. Client Node: acts as the interface with the user.
What are the three types of nodes used with the HDFS?
1. One-to-Many (1:M or 1..*): A painter creates many different paintings, but each is painted by only one painter. Thus, the painter (the "one") is related to the paintings (the "many"). Therefore, database designers label the relationship "PAINTER paints PAINTING" as 1:M 2. Many-to-Many (M:N or *..*): An employee may learn many job skills, and each job skill may be learned by many employees. Database designers label the relationship "EMPLOYEE learns SKILL" as M:N. 3. One-to-one (1:1 or 1..1): A retail company's management structure may require that each of its stores be managed by a single employee. In turn, each store manager, who is an employee, manages only a single store. Therefore, the relationship "EMPLOYEE manages STORE" is labeled 1:1.
What are the three types of relationships among entities?
It is copied to the index
What happens to the original data in database indexing?
Candidate Key
What is a term for a set of columns in a table that can uniquely identify any record in that table without referring to other data?
It has a Primary Key that consists of only one attribute. Since a partial dependency can only exist when a table's PK is composed of several attributes, a table whose PK is just one attribute is automatically in 2NF
What is an easy sign that a table is automatically in 2NF normalization?
Reference to data in one relation is based on values in another relation.
What is an important aspect of "referential integrity"?
1. Partial Dependency: exists when there is a functional dependence in which the determinant is only part of the primary key. For example: if PROJ_NUM and EMP_NUM together form the PK, but PROJ_NAME is only determined by the PROJ_NUM, there is a partial dependency between those two since PROJ_NUM is not related to EMP_NUM. These tend to be straightforward and easy to identify. 2. Transitive Dependency: condition in which an attribute is dependent on another attribute that is not part of the Primary Key. For example: if EMP_NUM determines EMP_NAME, JOB_CLASS, and CHG_HOUR but you study the data and see that the hourly charges are determined by job class and not by the employee name, then a transitive dependency exists between JOB_CLASS and CHG_HOUR b/c neither of them are prime attributes
What two types of dependencies are related to normalization?
Alternate Key
What uniquely identifies each entity in a collection of entities but is not the primary key?