Database Final

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Name the different techniques that can be used to implement database auditing.

traces, log scanning, audit columns, network packet sniffing, and tapping the database server.

What role does the database log play in disaster recovery?

A backup of the database log is made so that if a disaster occurs after a database backup has been made and sent to the remote site you have a log of the subsequent modifications that have occurred to the database since the last backup.

What is the difference between a check constraint and a rule?

A check constraint can have a null value. Also, some loading features don't check constraints before loading data, meaning the table could have data that doesn't match the constraint. Rules check the constraints every time data is entered or updated.

What is the difference between a data warehouse and a data mart?

A data mart is basically a departmental data warehouse defined for a single business unit or area. Actually, there is no universally agreed-on difference between a data warehouse and a data mart. A data mart is not necessarily smaller than an enterprise data warehouse. It may be smaller, but size is determined based on business needs.

What is a data steward?

A data steward is accountable for actions taken using a defined set of data. A data stewardship policy will identify individuals within the organization whose responsibility it is to create, maintain, and delete data. A data steward is not necessarily the data owner. A comprehensive data stewardship policy will also define the consumers of the data—that is, those who directly use the data during the course of their jobs.

Describe the purpose of a repository.

A repository stores information about an organization's data assets. In other words, repositories are used to store metadata.

What is a trigger, and how can triggers be used to enhance data integrity?

A trigger is an event driven procedure attached to a database table. They can be used to check the integrity whenever data is entered into it.

What is a potential drawback of implementing a database auditing solution?

Auditing can be a large consumer of system resources. When the audit queue is full, tasks that generate audit records will wait until the auditing task can resume. Consider using a larger audit queue if performance suffers. As a last resort, discontinue auditing when performance is unacceptable.

What type of problem can be caused when using a table editor to modify data in a production database table?

All columns are available for update, and a simple miskeying can cause unwanted updates.

Why can't standby databases, replication, or disk mirroring be used to replace traditional backup and recovery?

All of these options only make copies of the current database and do not account for needing to go back to a previous version of the database

What is a disaster?

Any unplanned, extended loss of critical business applications due to lack of computer processing capabilities for more than a 48 hour period.

Denormalization is always a good option for data warehouse databases: true or false?

As you design the data warehouse, be alert for situations where denormalization may be beneficial. In general, denormalization speeds data retrieval, which may be desirable for a data warehouse. However, proceed with caution when denormalizing the data warehouse design.

Cite examples of flaws introduced during the design phase of an Internet-enabled database that could lead to the creation of a hostile database.

Assigning inapropriate table and column names, designing database with output in mind, overstuffing columns, and poorly designed keys.

Define what is meant by entity integrity.

Basic data integrity provided by relational databases that says each occurrence of an entity must be uniquely identifiable.

Why would it be important to contractually guarantee that the vendor escrow the source code for any tools your organization purchases?

Because it give you the functionality of the tool and the way it satisfies the needs of your organization, the nature and stability of the product vendor are important also.

Why are most database files stored on disk devices instead of other storage media?

Because, disk devices are the predominant storage vehicles used for persistent data storage in comparison with the other storage media

Why is it beneficial to place data files and index files on separate disk devices?

Because, when index files are separated from data files, it provides better performance.

What is the difference between OLAP and data mining?

Both OLAP and data mining are methods for searching out patterns in corporate data. However, OLAP is user driven, while data mining is computer driven.

Ensuring compliance requires a collaborative effort among which three groups?

Business: must understand the legal requirements imposed on their data and systems as dictated in regulations. Legal: must be involved to interpret the legal language of the regulations and ensure that the business is taking proper steps to protect itself. IT: must be involved to implement the policies and procedures to enact the technology to support the regulatory mandates.

What is the benefit of purchasing database utilities from a third-party vendor when most DBMS products already ship with a set of utilities "in the box"?

By purchasing an online standards manual, new—and even more experienced—shops can quickly come up to speed with each new DBMS. Everyone benefits from these products if the third-party vendor automatically ships updates whenever there is a new version or release of the DBMS product. Online manuals can function as cheap training in the new DBMS release

Discuss the capabilities disaster recovery team members must possess to assure a viable recovery.

Capable of installing and configuring the DBMS system software. Assuring the integration of the DBMS with other system software components. Recovering Individual databases, testing the integrity of the databases. Recovering related data that may not be stored in a database. Installing and configuring application software, testing applications.

What type(s) of tool(s) would be most helpful to an organization that is brand-new to database management, and why?

Catalog Query and Analysis Tools The system catalog or data dictionary contains a wealth of information essential to the operation of the DBMS. Information about all database objects, authority, and recovery is stored and maintained in the system catalog. DBAs rely on this information to perform their job. The system catalog is composed of relational tables and can be queried using SQL and/or vendor-supplied stored procedures

What is connection pooling?

Connection pools are caches of database connections that are maintained so that the the connections can be reused when future requests to the database are required

What is the difference between data and information?

Data can be any character, text, words, number, pictures, sound, or video and, if not put into context, means little or nothing to a human. However, information is useful and usually formatted in a manner that allows it to be understood by a human.

What is the purpose of data masking?

Data masking is the process of protecting sensitive information in non-production databases from inappropriate visibility.

Define database archiving. Contrast it with data purging.

Database archiving is the process of removing selected data records from operational databases that are not expected to be referenced again and storing them in an archive data store where they can be retrieved if needed. Data Purging is the process of removing data from the operational data store completely then discarding the data.

What is e-vailability?

E-vailablity is a made up word trying to emphasize the necessity of having your servers and software working all the time to get business from customers at any time and any place so you can get the 5 9's.

What ETL stands for?

ETL stands for extract, transform, and load

What are the four perspectives of data warehouse performance management?

Extract performance—how smoothly data is updated and refined in the warehouse Data management—the quality of the data warehouse design and database maintenance Query performance—how well the database is tuned for analytical processing and how well each query is written Server performance—the configuration and performance of the hardware

In a client/ server environment, there are always multiple clients communicating with only one server: TRUE or FALSE

FALSE. Some server software is capable of distributing computing, which contains many servers.

Calculating the amount of storage required for a table is as simple as adding up the number of bytes for the columns in a row and multiplying by the total number of rows: true or false?

False

Once the disaster recovery plan is written it should not be changed: true or false?

False

DBAs never require business metadata to do their job, and business users never need technology metadata: true or false? Explain why it is true or false.

False. The metadata in the repository can be used to integrate views of multiple systems, helping developers to understand how the data is used by those systems. Usage patterns can be analyzed to determine how data is related in ways that may not be formally understood within the organization.

What is the difference between fat clients and thin clients?

Fat clients are fully fledged/independent machines which connect to an application service for use. Thin clients have almost no built in software and very minimal computing hardware. They connect to a central high-powered machine and do operations in that manner.

With regard to data, what are the three categories of risk?

Financial Loss Business service interruption Legal responsibilities

What factors should be considered when determining the criticality of a database object for disaster recovery planning?

Financial cost of the application being unavailable. Legal responsibilities your organization has for each application and its data. Cost of losing customers and future business due to a delay in bringing the application back online.

Why should the DBA participate in developing a metadata management strategy even in organizations with a well-defined data administration function?

For DBAs, the DBMS itself is a good source of metadata. The system catalog used to store information about database objects is a vital store of DBA metadata—technology metadata. DBAs and developers make regular use of the metadata in the DBMS system catalog to help them better understand database objects and the data contained therein.

What is the difference between a logical backup and an image copy backup?

Logical backup is the process of backing up just the data, and not the entire physical file

Your organization has decided to purchase RAID arrays to support your DBMS. Which levels of RAID would you advise them to use and why?

I would advise them to use the reduced write performance and fault tolerance of RAID-1 and simply because is less costly than he others.

When might data need to be purged from the data warehouse?

If large amounts of unused data are identified, the DBA may decide to remove the data from the warehouse. Such action should not be taken indiscriminately. Only after verifying that the data has been unused for at least a year—to take into account seasonal access variations—should any action be taken on that data. The unused data may then be purged from the data warehouse or perhaps moved to cheaper storage media (tape, optical disc, or CD-R, for example). Of course, purged data should be stored for a period of time to assure that it was not purged in error.

Why are metadata and data quality important aspects of regulatory compliance with respect to database systems and applications?

If the data is not accurate, how can you be sure that the proper controls are being applied to the right pieces of data to comply with the appropriate regulations.

The DBA should grade each database object in terms of its _______________ and ____________ to determine how frequently it should be backed up

Important of data and volatility of data

Describe the star schema and why it is an effective design for data warehouse databases.

In a star schema design, a central fact table stores the primary business activity at the center of the star. This fact table is usually based on a numeric value or a group of numeric values. The fact table is surrounded by the dimensions that affect the activity, and the dimension tables make up the points of the star. The star schema is well suited for the design of data warehouse databases for the following reasons: • The design is flexible enough to facilitate quick changes or additions as the data warehouse grows or the application changes. • The design is easy to understand for both developers and end users. • The design mimics the way end users typically think about and use their business data. • The design is easily implemented as a physical database and results in efficient queries because many DBMSs are built to recognize and optimize star joins.

Describe two ways to recover an index.

Indexes can be either backed up and recovered after failure or after a full backup is recovered, the database will sometimes remake the indexes when database is rolled forward

What is metadata?

Information about data is referred to as metadata. The simplest definition of metadata is "data about data." To be a bit more precise, metadata describes data, providing information such as type, length, textual description, and other characteristics.

Name and describe the three types of database failures that may require recovery.

Instance failures can be caused by internal exceptions within the DBMS, Operating system failure, or other software-related database failure Application failure can be caused by programs or scripts that are run at the wrong time, using the wrong input, or in the wrong order Media failure This can be caused by damage to disk storage devices, file system failures, tape degradation or damage, or deleted data files

What is a user-defined data type, and how can it be used to enhance data integrity?

It allows the DBA to further clarify the legal values in a column. It allows for companies to allow special circumstances in their database without causing errors. For example, if a business needed to allow for multiple currencies. They can create a UDT for each currency to help prevent errors.

How can LOAD and UNLOAD be used to maintain test beds of data for applications?

LOAD and UNLOAD can be used to maintain a consistent test bed of data for the purposes of testing application programs

What are the considerations for choosing a viable disaster recovery site?

Location far enough away from primary site to not be involved in same disaster. On a different power grid, floodplain, and not on the same earthquake fault line as primary site.

Describe two different approaches to database disaster backup and recovery.

Making it an integral component of your overall business recovery plan and making it global in scope. Checking all risks associated with data loss and what impact it will have on your company and business and what areas need more attention than others.

Why is metadata important to an organization?

Metadata characterizes data. It is used to provide documentation such that data can be understood and more readily consumed by your organization. Metadata answers the who, what, when, where, why, and how questions for users of the data

What techniques can be used to limit the amount of data that is unloaded?

Most UNLOAD utilities provide options to specify a limited number of rows to unload. These options typically take three forms: LIMIT: A LIMIT parameter is used to limit the number of rows to be unloaded by the UNLOAD utility. SAMPLE: A SAMPLE parameter is used to unload a sampling of the data instead of the entire table. WHEN: WHEN clause is used to supply SQL predicates to the UNLOAD utility such that only certain data is unloaded.

Why is database design a potential problem are for Internet-enabled database application development projects?

Most of these types of projects are designed with speed in mind. Businesses want their product online as quickly as possible (and understandably so.) However, this rushed project approach can be detrimental to long-term lifespan of the database if it made to work with only a couple initial applications. A database needs to be designed to be easily used long into the future, which is something that cannot normally happen overnight. These need to planned out and vetted for as much time as can be allowed.

What is the biggest threat to efficient performance for a distributed database system?

Network Traffic

How are nulls handled when loading data into a table using the LOAD utility?

Nulls are usually handled with an indicator byte and a specific clause for checking that byte.

Discuss the barriers to successful repository implementation.

One of the biggest challenges in implementing and using repository technology is keeping the repository up-to-date. The repository must be populated using data from multiple sources—all of which can change at any time.

What is the biggest DBA challenge when managing an Internet-enabled database?

One the biggest challenges for DBAs is getting it done NOW. Doing things too quickly can lead to huge problems in the future, not all business administrations understand that speed is not always the best way to go about databases.

What is PII?

Personally Identifiable Information (PII) is information that can be used to uniquely identify, contact, or locate a person or can be used with other sources to uniquely identify an individual.

A client/ server application comprises what three components?

Presentation, Business Logic, Database.

Name five goals to consider when building a storage system

Preventing loss of data—the number-one priority • Assuring that adequate capacity is available and that the storage solution can easily scale as storage needs grow • Selecting a solution that provides fast access to data with minimal or no interruptions to service • Choosing storage solutions that are fault tolerant and that can be repaired quickly when a failure occurs • Selecting a storage solution that allows you to add or replace disks without an outage

What is the difference between a propagation tool and a replication tool?

Propagation tools capture data as it changes at the source (external applications and heterogeneous databases) and then insert it into target database tables. Replication tools extract data from external application systems and heterogeneous databases for population into target database tables

Describe the pros and cons of using RAID-4 for database files.

Pros: Raid 4 is good for sequential data access. Cons: It is not used very much

What storage management actions should be taken to assure the viability and speed of the transaction log file(s)?

Provide sufficient space for the log file to capture all database modifications that will occur between log archrivals during your busiest processing period

Name the different types of recovery and discuss the factors that influence when each type of recovery should be performed

Recover to Current To successfully recover to current, the recovery process must be able to reset the contents of the database to the way it looked just at (or right before) the point of failure. (This is for applications that won't work unless they are completely back to where they started) Point in time- Same as recover to current except it recovers to a certain time. You could restore the image copy and the roll forward or roll backward through the logs to error Transaction Recovery- transaction recovery is an application recovery whereby the effects of specific transactions during a specified time frame are removed from the database. UNDO recovery. Remove only the effects of the bad transactions. REDO recovery. Remove all the transactions after a given point in time, and then redo the good transactions only. Off-site disaster recovery -Happens when there is a natural disaster. Ch17 goes into detail about it

Describe the difference between database structural integrity and semantic data integrity.

Semantic data - keeping the actual data/relationships pure and maintained structure - keeping the tables, indexes, tablespaces maintained.

It is a good idea to perform a regularly planned disaster recovery test. However, what other events might cause you to test your disaster recovery plan more frequently?

Should be tested regularly to assure the readiness of your personnel. Should also test your plan upon these events. Significant change in daily operations Change in system hardware configuration Upgrade of the DBMS Loss of personnel responsible for the recovery Move of the primary data center to a new location Change in daily backup procedures Addition of major new applications or significant upgrades of existing critical applications. Major increase in the amount of data or the number of daily transactions.

Is it necessary to develop a backup and recovery plan for the data warehouse, since it is a read-only environment?

So, for all of the reasons you need to provide a backup and recovery plan for traditional databases, you need to do the same for the data warehouse databases. Of course, the backups need be taken only whenever data is added to the data warehouse.

What are transition variables and tables, and what benefit do they provide?

They hold data while it is being checked. For example, if a trigger fires because a row was deleted, there isn't anything for the trigger to assess since the data was already deleted. These transition tables hold the values if they are needing to be checked.

What is the difference between SAN and NAS?

Storage Area Network, or SAN generally refers to an interconnected network of storage devices. Network Attached Storage, or NAS refers to storage that can be accessed directly from the network.

Name the three categories of performance tools and briefly summarize their typical capabilities.

System Performance Tools System performance tools examine the database server, its configuration, and usage. The most commonly used system performance tool is the performance monitor. Database performance monitoring and analysis tools support many types of performance-oriented requests in many ways. Database Performance Tools database analysis tools are available that can proactively and automatically monitor your database environment. Application Performance Tools Writing SQL statements to access database tables is the responsibility of an application development team

Describe the purpose of the start and end date/time columns within the context of a temporal database.

Temporal databases are meant to keep track of time. They need a column for the start and end time in order to do this.

What is the purpose of a catalog visibility tool?

The basic feature common to all catalog tools is the capability to request catalog information using a GUI (or panel-driven) interface without using SQL statements.

Your organization deploys PCs., midrange systems, and mainframe computers. Discuss the three layers of a client/server application and which computing platform would work best for each layer.

The three layers are presentation logic, business logic, and database management systems. Presentation logic consists of the tasks required to display information on a computer screen. Business logic consists of the core elements of an application needed by the end user to manipulate information for managing business. DBMS is the sofware that manages the data. PCs generally work best for doing presentation logic tasks. Midrange or mainframe servers do well with handling business logic. DBMS run best on dedicated mainframes.

What is meant by the term log archival, and why is it important to assure recoverability?

This logs the changes made to the database. It is important because a backup does not always have the most up to date version of a database.

Describe the two-phase COMMIT process.

The two-phase COMMIT provides a consistent outcome, guaranteeing the integrity of the data across platforms, regardless of communication or system failures. One of the database servers acts as the coordinator of the two-phase COMMIT, while the other database servers are participants.

What is the preferred method of implementing referential integrity in a relational database?

The use of triggers

If you are asked by management to develop a cost justification for DBA tools, how can you minimize the amount of work it takes to produce such a justification?

The way you can minimize the cost is you can create you own tools for your database, but no all the time is successful

Name the four types of database drivers, and discuss the pros and cons of each type.

There are actually two main types of drivers, with one of those types having four different architecture types. The first type of driver is ODBC (Open Database Connectivity) and the second is JDBC (Java Database Connectivity.) ODBC drivers are coded in C and can theoretically work with any DBMS or operating system. JDBC drivers are based coded in Java and were initially created to work with Java-based databases. Now to answer the question, the four types of JDBC architectures are bridged, client based, network protocol, and database wire architectures. The bridged architecture driver can accept JDBC calls and relay them into ODBC calls. However, these drivers are platform dependant, not secure, and perform poorly. The client based JDBC drivers are proprietary drivers installed on client machines that contain the DBMS software front-end. These drivers tend do what is required, but all have to be managed and upgraded independently, which causes a lot of work. Network protocol drivers slide in-between the calling program and the database itself. These drivers are extremely quick and have little to no latency. However, coding them can be quite difficult as they are coded for specific versions of specific databases. Database wire drivers are some of the best drivers for JDBC implementations. They install directly on the JVM of the client and interact directly with the database itself. These drivers perform well and do not have to be maintained like the client based drivers.

When calculating record size for a table, how should you treat the size calculation for variable-length columns?

To calculate the length of the entire record you will need to find the amount of overhead required for the type of rows (variable or fixed length) and add that to either calculation.The length of a column will depend on the data type of that column.

Why is it a good idea to rank your applications and database objects for disaster recovery planning purposes?

To determine which applications have the biggest impact if they are not available.

Name four factors that impact the duration of a recovery.

Transaction identification, data integrity, speed, availability, invasiveness

Prosecution for compliance failure is not just an idle threat: true or false?

True, It's very real and harsh.

Some organizations have identified the need to retain data for more than 100 years: true or false?

True.

What factors determine whether a full or incremental image copy backup should be taken for a database object?

Which one will take more time. If a full backup does not need to be done then it can sometimes be more effective to do an incremental backup. But multiple incremental backups later have to be applied to a full backup when recovering

Describe a technique for unloading data from multiple tables into a single UNLOAD file.

Unloading from Views: Being able to unload data from a view comes in handy when it is necessary to access data from multiple tables and place it into a single file. By creating the view as a join of those tables and unloading from the view, you can unload data from multiple tables to a single file.

Name the seven questions that a database auditing tool should be able to answer.

Who accessed the data? 2. At what date and time was the access? 3. What program or client software was used to access the data? 4. From what location was the request issued? 5. What SQL was issued to access the data? 6. Was the request successful, and if so, how many rows of data were retrieved? 7. If the request was a modification, what data was changed? (A "before" and "after" image of the change should be accessible.)

Name and describe four types of database structural integrity problems that may be encountered by a DBA.

a. Index Corruption i. An index provides alternate path to data in the database by using an ordered b-tree structure. The leaf pages are pointers to physical data locations in the base table. If the pointers don't point to the correct data, the index is useless and will access incorrect data, corrupting the database. b. Page Header Corruption i. If this becomes corrupted the DBMS can no longer interpret the data stored on the page. These situations require the database file to be recovered from backups. c. Backup Files i. Files formatted incorrectly, or backed up in the wrong location. Also caused by media failures.

Describe the three RI DELETE rules and the impact of each on foreign key rows in the child table when a DELETE is issued for the primary key row in the parent table.

a. Restricted DELETE - Deletion of primary key row is not allowed if a foreign key value exists b. Neutralizing DELETE - Foreign key values equal to primary key value of the row being deleted are set to null c. Cascading DELETE - All foreign key rows with a value equal to the primary key of the row being deleted are deleted as well.

At a high level, what are the two types of metadata?

a. Technology metadata: describes the technical aspects of the data as it relates to storing and managing the data in computerized systems. b. Business metadata: describes aspects of how the data is used by the business and is needed for the data to have value to the organization.

Why is the DBMS system catalog useful for metadata analysis?

a. The DBMS system catalog is a particularly effective source of metadata because it is active, integrated, and nonsubvertible. The system catalog is active because the metadata is automatically built and maintained as database objects are created and modified. b. The integration of the system catalog and the DBMS, coupled with the active nature of the system catalog, keeps the technology metadata in the system catalog accurate and up-to-date. Additionally, the DBMS system catalog is nonsubvertible, meaning that normal DBMS operations are the only mechanism for populating the system catalog.

Name two traditional DBA duties that may require additional attention in the light of a regulatory compliance project.

change management & backup and recovery

Why is a quiesce point important for a point-in-time recovery?

ensures backup integrity by checking relationships in data during backup

What does the acronym ETL mean?

extract, transform, and load.

Describe a few data cleansing techniques that may need to be applied to data before it can be loaded into the data warehouse.

identification of invalid and coded data, and transforming that data into useful, query-able information.

Discuss the various options that must be considered specifically when creating the physical design of the data warehouse

instead of just mimicking the design of similar production OLTP databases. Common physical design techniques used for OLTP databases, such as a database in third normal form, can in fact inhibit an OLAP database from performing well. This is so because the data access requirements for file and table structures, indexes, and clustering sequence in a data warehouse differ from those of OLTP databases, as do the access requirements and patterns.

When designing a data warehouse, how should problems that are identified in the operational OLTP systems be handled?

you will need to examine the budget and deadlines associated with the data warehouse project to determine whether production problems can be remedied during the project. If not, you can fix the data problems as the data flows into the data warehouse. You should also plan on tracking the source of the errors in the production systems, if possible, so that a problem once identified is at least documented.

Name and define the three capabilities used to describe distributed technology and databases.

· Autonomy · Isolation · Transparency

What issues and constructs can cause data integrity problems after a LOAD?

· If the database and application rely on the trigger to calculate data or enforce constraints · If data is being loaded that does not conform to a unique constraint

What advantages does a SAN have over SCSI devices?

• Shared storage among multiple hosts • High I/O performance • Server and storage consolidation


Ensembles d'études connexes

Código internacional de ética periodística - UNESCO (COLUMNA 1)

View Set

Section 8.4: Spread of Communism after 1900 CE

View Set

NCLEX-style Review Questions for Exam 3 Pharm

View Set

Soft-Tissue Injuries & Burns (Multiple Choice)

View Set

BCH 369 Grabner: Exam 2 Conflict

View Set