Chapter 11 Review Questions

¡Supera tus tareas y exámenes ahora con Quizwiz!

Name three ways that null values in a database can be interpreted. Why is this problematic?

A null value in a field can indicate that there should not be a value in the field (i.e., blank is correct). It can also mean that an error was made, and a value that should have been entered was incorrectly omitted. It can also indicate that a value for the field has been deleted, which may or may not be correct. The difficulty in really knowing why the null exists is the major problem with nulls.

What are the key issues in deciding between using perfectly normalized databases and denormalized databases?

A perfectly normalized database is optimized for storage efficiency, minimizing wasted storage space. This data storage design is not as useful when data must be frequently queried, since the data is spread across many tables that must be joined in processing the query. Access speed will degrade in these circumstances. Therefore, if the data is going to be accessed frequently, it may be valuable to denormalize the design to reduce the number of joins that must be processed in a query.

Describe the purpose of the primary and foreign keys.

A primary key serves as the unique identifier for each record to be stored in a table, and one is required for identified for each table. A foreign key is one in which an attribute in one table is the primary key in another. Identifying foreign keys is important in enforcing referential integrity.

What is the difference between an end-user database and an enterprise database? Provide an example of each one.

An end-user database is one that is designed to run on a PC and is used to create personal database applications. An end-user in sales might develop a Microsoft Access database, for example, to keep track of current and prospective client contacts. An enterprise database is one that is capable of handling huge volumes of information for the entire organization. Applications that serve the entire enterprise can be built upon these enterprise databases. These databases are fast, high capacity, but also complex. Oracle is a vendor of enterprise database management systems.

What is an index, and how can it improve the performance of a system?

An index is a small, quickly searchable table that contains values from the table and indicates where in the table those values can be found. System performance is improved with an index because it is no longer necessary to search the entire table for the desired values. The small index table can be quickly searched to reveal exactly where the desired values are stored.

Describe several techniques that can improve performance of a database.

Denormalization adds selected fields back to tables in a data model. This adds a little redundancy, but improves the data access speed. Clustering involves physically placing records together so that like records are stored close to each other. Indexing creates small, quickly searchable tables that contain values from the table and indicate where in the table those values can be found. Finally, proper estimation of the data set size is important to assure that adequate hardware is obtained for the system.

Describe three situations that can be good candidates for denormalization.

Denormalization is performed to speed up data access. Redundancy is added back into tables in order to reduce the number of joins that are required to produce the desired information. In a normalized Order table, the customer name will not be included; however it may be added back in to the Order table to improve processing speed. This represents a situation in which some parent entity attributes are included in the child entity. Similarly, a lookup table of zip codes and states may be set up in the normalized data model, but could be added back in to the physical model design. Another situation is where a table of product codes lists the description and price. These may also be added back into the physical model to improve application performance. Lookup tables are common candidates for denormalization. Finally, 1:1 relationships may be good candidates for denormalization, since the information may be accessed together frequently.

How are a file and a database different from each other?

Files are essentially an electronic list of information that is formatted for a particular transaction. Any programs that are written must be developed to work with the file exactly as it is laid out. If there is a need to combine data in a new way, a new file must be created (usually by extracting data from other files) and a program written to work specifically with that new file. Databases, on the other hand, are made up of a collection of data sets that are related to each other in some way. Database management system software creates these data groupings. The DBMS provides access to the data and can usually provide access to any desired subset of data. It is not necessary to write new programs to build a new file in order to retrieve data from the database in a new way.

What are the two most important factors in determining the type of data storage format that should be adopted for a system? Why are these factors so important?

First, evaluate the type of data that will be stored. Relational databases are the standard for simple data such as numbers, text, and dates. If the data is more complex (video, images, or audio), then object databases may be required. If the data needs to be aggregated, then multidimensional databases are recommended. The second factor is the type of system being developed. Transaction processing systems require rapid update and retrieval capability, and will best be constructed using files, relational databases, or object databases. Decision support types of applications require rapid access to data in ad hoc ways. These types of systems are best implemented using relational or multidimensional databases. These two factors are very important because you must select a data storage format that is suitable for the data the system will include and the uses planned for that data.

Name two types of legacy databases and the main problems associated with each type.

Hierarchical databases use hierarchies, or inverted trees, to represent relationships. The main problem with this database model is that it cannot be used efficiently to represent non-hierarchical associations. Network databases avoid this problem, but require a considerable amount of programming effort. Programs must be written to follow the database structure, and if the database structure changes, then complex programming must be done to change the application programs as well.

What is the difference between interfile and intrafile clustering? Why are they used?

Interfile clustering physically orders records within a table in some meaningful way, such as by primary key value. Interfile clustering identifies records for separate tables that are typically retrieved together and physically stores them together.

Name five types of files and describe the primary purpose of each type.

Master files store the business's or application's core data. The data in a master file is considered fairly permanent, does not change much, and is usually retained for long periods. Look-up files contain reference information that is used primarily during validation processing. A list of valid code values for a field might be referred to during data entry to ensure that a valid code was entered. Transaction files contain information that will be used to update a master file. These files are usually temporary in nature; they are used to collect transactions, the transactions update the master file, and then the transaction files are archived. Audit files are used to help trace changes made to files. An image of a record before and after a change may be written to an audit file so that the change is documented. History files serve as archives for older information that is rarely used. These files can be accessed if necessary, but are usually stored on tape to remove the little-used data from the active data storage areas.

Describe the metadata associated with the physical ERD.

Metadata included in the physical ERD includes information regarding attributes such as data type, field size, format, default values, primary keys, and foreign keys.

How does the multidimensional database store data?

Multidimensional databases store data using several dimensions. Data may be aggregated and/or detailed, depending upon the access needs of the users.

What is referential integrity and how is it implemented in a relational database?

Referential integrity refers to the need to make sure that the values linking the table together through the primary and foreign keys are valid and correctly synchronized. For example, if a customer is placing an order, we need to have information on the customer in the customer table. The RDBMS will check to see if there is a record for that customer in the Customer table before it will let an order be entered. Checking for known required relationships helps assure referential integrity.

What is the most popular kind of database today? Provide three examples of products that are based on this database technology.

Relational databases are most popular today due to their ease of use and conceptual simplicity. Examples of relational DBMSs on the market include MS Access, Oracle, DB2, Sybase, Informix, and MS SQL Server.

What is the biggest strength of the object database? Describe two of its weaknesses.

The biggest strength of object databases is the reusability of objects. This accelerates system development and helps keep costs manageable. Object databases are also very suitable to store complex data (e.g., graphics, video, and sound). Two weaknesses of object databases and the lack of experienced developers and the steep learning curve associated with OODBMSs.

Why is it important to understand the initial and projected size of a database during the design phase?

The design team needs to be sure that the hardware that is specified for the system is adequate to support the size of the database. If inadequate hardware is chosen, the performance of the system will be poor regardless of the 'tuning' techniques that are applied.

Describe the two steps to data storage design.

The first step is to select the appropriate format for the data storage. There are several different methods of storing data (files, relational data bases, multidimensional databases, object-oriented databases) and the analyst should select the one that will provide the best approach to storing the system data. Second, the data storage must be designed to optimize its processing efficiency, which involves considering how the data will be used, and making the appropriate design decisions.

What are the differences between the logical and physical ERDs?

The logical ERD represents the data required by the application, and presents a 'business view' of the data without including implementation details. The physical ERD includes all elements of a logical ERD, but includes implementation details which aid in describing characteristics of the system and presenting a 'systems view' of the new system.

What is the purpose of normalization?

The purpose of normalization is to optimize the data storage design for storage efficiency. Normalization helps ensure that data redundancy and null values are kept to a minimum.

Describe what should be considered when estimating the size of a database.

The size of the database will be based on the amount of raw data expected, the growth rate of raw data that is expected, and the overhead requirements of the DBMS.

What are the two dimensions in which to optimize a relational database?

The two dimensions in which to optimize a relational database are for storage efficiency and for speed of access.

Why should you consider the storage formats that already exist in an organization when deciding upon a storage format for a new system?

This factor is important because the project team needs to be aware of the existing base of technical skills that are available to work with the data storage format. If a data storage format is chosen that is new to the organization, then the team must allocate training and learning time into the project schedule.


Conjuntos de estudio relacionados

Speakout intermediate - Entries - sentences

View Set

NU141- Chapter 22 Psychotherapeutic Agents

View Set

Effects of Exercise on the Human Body

View Set

Brunner and Suddharth Med Surg Chapter 42 Study Guide Questions Part 2

View Set

Chapter 4: Igneous Rocks (chart into word form)

View Set

Prechapter 10 reading financial accounting

View Set