Exam 3 ALL

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

What are common functions/tasks of a DBA?

- Auditing - Application Integration - Backup and recovery - Business Intelligence/Data warehousing - Capacity planning - Change management - Database application development

How does authentication differ from access control?

- Authentication is the process of verifying who you are. When you log on to a PC with a username and password you are authenticating. - Authorization is the process of verifying that you have access to something. Gaining access to a resource (e.g. directory on a hard disk) because the permissions configured on it allow you access is authorization.

When should NoSQL be used?

- Client wants 99.999% availability on a high traffic site. - Your data makes no sense in SQL, you find yourself doing multiple JOIN queries for accessing some piece of information. - You are breaking the relational model, you have CLOBs that store denormalized data and you generate external indexes to search that data.

What are key success factors for BI implementation?

- Conduct research and don't go it alone - Accuracy of data - Self-service business intelligence: fast, trustworthy, accessible - Make your key customers part of your project team - Ensure that business and customer needs drive the technology

What is the process of building a data warehouse?

- Determine business objectives - Collect and analyze information - Identify core business processes - Construct a conceptual data model - Locate data sources and plan data transformations - Set tracking duration - Implement the plan

Why has NoSQL become a popular solution for some organizations?

- Improved ability to keep data consistent - Faster access to data than relational database management systems (RDBMS) - More easily allows for data to be held across multiple servers

What are characteristics of a Database Administrator's work?

- Installation of database software and servers - Communicating the hardware prerequisites and requirements to the systems administrator to ensure efficiency - Configuration of the database software program before it is deployed - Installation of updates and new patches within the program - Transferring old data to new systems when deployed - Development and implementation of backup and recovery plans - Testing of the recovery plan that is currently being implemented - Returning the system to operational status when there is a shutdown or failure - Communicating the risks and the trade offs of the backup methods that are available - Development of a security model to keep data secure from viruses and attacks - Monitoring storage space and capacity planning - Monitoring the performance of the server hardware and how it works with OS - Troubleshooting the issues on the server and finding a solution quickly

What are major responsibilities of a data warehouse administrator?

- Monitor the data and activities inside the data warehouse - Administer the data warehousing environment, including security, data growth, performance, platform upgrades, support agreements, and disaster recovery, - Administer the data warehousing services, including query services, and production services, - Support the data warehousing user.

What are 2 benefits of a data warehouse?

- Potential high returns on investment - Competitive advantage - Increased productivity of corporate decision-makers - More cost-effective decision-making - Better enterprise intelligence

How does business intelligence differ from data warehousing?

- The management of different aspects like development, implementation and operation of a data warehouse is dealt by data warehousing. It also manages the meta data, data cleansing, data transformation, data acquisition persistence management, archiving data. - In business intelligence the organization analyses the measurement of aspects of business such as sales, marketing, efficiency of operations, profitability, and market penetration within customer groups. The typical usage of business intelligence is to encompass OLAP, visualization of data, mining data and reporting tools.

What are the main disadvantages of the dimensional approach of Ralph Kimball?

1. In order to maintain the integrity of facts and dimensions, loading the data warehouse with data from different operational systems is complicated, and 2. It is difficult to modify the data warehouse structure if the organization changes the way in which it does business.

What is SQL?

A declarative or non procedural set-oriented query language.

What is a dimension table?

A dimension table is a table in a star schema of a data warehouse. A dimension table stores attributes, or dimensions, that describe the objects in a fact table.

What is Apache Hadoop?

A framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.

How does a procedure differ from a function?

A function returns a value and a procedure only executes commands.

What is Database performance tuning?

A group of activities used to optimize and homogenize the performance of a database

What is the Systems DBA managerial role?

A system DBA focuses on technical rather than business issues, primarily in the system administration area.

What is a data-driven decision support system?

A type of DSS that emphasizes access to and manipulation of a time-series of internal company data and sometimes external data.

What is an enterprise data warehouse?

A unified database that holds all the business information of an organization.

What are types of information security control?

Administrative Logical Physical

After you create a key for a record in a key value store, how do you store the fields that are available? Are all record the same length?

All fields in the value record separated by delimiter or XML

What is analytics?

Analytics - creating quantitative processes for a business to arrive at fact-based decisions and perform business knowledge discovery. Frequently involves: data mining, process mining, statistical analysis, predictive analytics, predictive modeling, business process modeling, complex event processing and prescriptive analytics.

What company originally developed the NoSQL database BigTable? A) Amazon. B) Facebook. C) Google. D) Twitter.

Ans: D

What is an example of a post-relational database that is NOT a NoSQL database? A) BigTable. B) CouchDB. C) Hive. D) SQLite.

Ans: D

What is the normal form of a Fact table in a Star schema? A) Completely denormalized B) Normalized to at least 3rd normal form C) Normalized to 2nd normal form D) Partially denormalized to speed query performance

Ans: D

What is the term used in data warehousing for a value or measurement? A) Attribute B) Dimension C) Domain D) Fact

Ans: D

What test is used to assess an online transaction processing environment? A. BASE. B. Benchmark. C. OLTP. D. ACID.

Ans: D

What is an XML database? A. A category of NoSQL databases. B. A database that allows data to be specified in eXtensible Markup Language format. C. A type of document-oriented database. D. A and B. E. All of the above.

Ans: E

The degree to which the administration of a database is _______________ dictates the skills and personnel required to manage databases.

Automated

________________ causes all data changes in a transaction to be made permanent.

COMMIT

What SQL statements are used in large, multiuser database systems to control transactions, i.e., sequences of changes to a database?

COMMIT and ROLLBACK

What is the problem of database security?

Concerns the use of a broad range of information security controls to protect databases (potentially including the data, the database applications or stored functions, the database systems, the database servers and the associated network links) against compromises of their confidentiality, integrity and availability.

A ____________ dimension has the same values for all areas of a business; it is a dimension that has the same meaning to every fact with which it relates.

Conformed

What does C A P stand for in CAP theorem?

Consistency, Availability, Partition Tolerance

Apache _________________________ is a database that uses JSON for documents and JavaScript for MapReduce queries.

CouchDB

What company originally developed the NoSQL database Apache Cassandra?

Facebook

A _________________ is a value or measurement about a specific event.

Fact

In data warehousing, a _____________ table consists of the measurements or metrics of a business process.

Fact Table

Why would anyone want to create a star schema data model?

Fast interactive analyzing of data

What is an example of network security controls?

Firewalls, Access control

What are the Set operators represented in the Venn diagrams?

INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN

The _______________ operator takes the results of two queries and returns only rows that appear in both result sets.

INTERSECT

The concept of ___________ was introduced into SQL to handle "missing data" in the relational model.

NULL

What relational algebra operator is called "bowtie" because of the symbol used to represent it in notation?

Natural Join

31. When are the 1) Header, 2) Declaration, 3) Executable, and 4) Exception sections used?

Only used for named procedures and functions Optional MANDATORY Optional

How does a data warehouse differ from an operational system?

Operational systems maintain records of daily business transactions whereas a Data Warehouse is a special database that serves as the integrated repository of company data, for reporting and decision support purpose. In other words operational systems are where the data is put in, and the data warehouse is where we get the data out.

What are five techniques that are used to enhance security?

Password Authorization Antivirus Firewall Data backup

What is the meaning of PL/SQL?

Procedural Language/Structured Query Language

What are distinct purposes of backups?

Recover data after it's lost Recover data from an earlier time

If you don't have enough information available when you design a query to determine which rows you want, what would you do?

SELECT * with no filters

In what clauses can you place a subquery?

SELECT, FROM, WHERE, HAVING, INSERT, UPDATE, DELETE

Why did Google develop its own database?

Scalability and better control of performance

What are major problems/limitations with NoSQL databases?

Security Data consistency Lack of standardization Scalability Caveat

What are the primary purposes of a database?

Store and Record Data, Provide Data to Applications, Provide a Record of Transactions (Hopefully Permanent and Unalterable)

Queries can be nested so that the results of one query can be used in another query. A nested query is also known as a _____________.

Subquery

What is the purpose of the MAP function in the MapReduce Framework?

Takes a series of key/value pairs, processes each, and generates zero or more output key/value pairs. The input and output types of the map can be (and often are) different from each other.

In 1983, what company introduced a database management system specifically designed for decision support?

Teradata

The UNION compatible rule means the tables have same number of columns and _____________.

The columns must have the same data types in the same order as the first table

What is data granularity or data grain?

The grain of the dimensional model is the finest level of detail that is implied when the fact and dimension tables are joined.

What is a slowly changing dimension?

They are dimensions that change slowly over time, rather than changing on regular schedule, time-base.

What is the three dimensional mapping in Google BigTable? What are the row key, column key and timestamp?

Timestamp Arbitrary string values

What is the purpose of the HAVING clause?

To filter the data to be selected Because WHERE could not be used with aggregate functions. Specifies that the SQL statement should only return rows where aggregate values meet the specified conditions

What is a discrete unit of work called that must be processed completely or not at all in an operational system?

Transaction

What is it called when changes to the database are recorded in a journal file to facilitate automatic recovery of an in-memory database?

Transaction logging

In what type of join are all columns from each table that is joined returned, and an instance for each row of each table?

UNION

The _______________ operator takes the results of two queries and returns only rows that appear in both result sets.

UNION

How many queries can be nested in a Where clause of an Outer query?

Unlimited number only constrained by computer memory, DBMS and processor speed.

What is it called when a SELECT query has been given a name and saved in the database?

User View/Stored Query

When are main memory databases often used?

When time response is critical

What is ETL? A) Enter, Transact and Load actions to populate a data warehouse B) Extract, Transform and Load actions to populate a data warehouse C) Extract, Transact and Load actions to populate a database

Ans: B

What is JSON? A) Java Simple Object Notation B) JavaScript Object Notation C) JavaScript Object Numbering

Ans: B

How does authentication differ from access control?

Authentication verifies your identity and authentication enables authorization. Authorization policies define what an individual identity or group/role may access. Access controls - also called permissions or privileges - are the methods we use to enforce such policies. See Three Dimensions of Data Security. The user, ProjectManager, has the ModifyProject privilege on a data realm comprised of Team A's projects.

What are controls called that are designed to restrict access and activities?

Authorization

What is changing the database design to improve the performance called? A. refining the design B. optimizing the design C. tuning the design D. modifying the design

Ans: C

What is the relationship between operational data and a data warehouse? A) The data warehouse consists of many data marts and operational data B) The data warehouse is used as a source for the operational data C) The operational data are used as a source for the data warehouse

Ans: C

What term is used for descriptions of the data contained in the data warehouse? A. Relational data. B. Operational data. C. Metadata. D. Informational data.

Ans: C

What type of data in a data warehouse is never found in the operational database environment? A. normalized. B. informational. C. summary. D. denormalized.

Ans: C

Why is database administration such a challenging job?

- The position requires a broad spectrum of knowledge - The consequences of failure are usually greater for a DBA - The better a DBA does their job the less visibility they have

What are key security techniques?

Access controls. Authentication. Authorization. Internal controls.

As database administration automation increases, what is the impact on personnel needs?

Fewer needed

In what type of join are the rows that do not have matching values in common columns nonetheless included in the result table?

OUTER JOIN

The _________________________ is the simplest style of data mart schema. It consists of one or more fact tables referencing any number of dimension tables.

Star Schema

What does the CAP theorem assert about a distributed, networked system?

States that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees

Queries can be nested so that the results of one query can be used in another query via a relational operator or aggregation function. A nested query is also known as a _____________.

Subquery

What type of DBA focuses on physical aspects of database administration like installation and configuration?

Systems DBAs

What are the disadvantages of using a Star schema?

The main disadvantage of any denormalized schema is that data integrity is not enforced as it would be in a normalized database. Extensive data redundancy is common in a Star schema. Inserts and updates can result in data anomalies which normalized schemas are designed to avoid.

What type of subquery is executed before the outer query and is executed only once?

Type I subquery or Non-Correlated

A _____________________ is a subquery that uses values from the outer query. The subquery is evaluated once for each row processed by the outer query.

Type II Subquery

What type of subquery is nested inside another outer query from which it uses values?

Type II subquery or Correlated

What is the Relational Algebra notation for UNION, INTERSECTION, and MINUS Set Operators?

UNION:⋃ INTERSECTION:⋂ MINUS:⎯

Which of the following is the preferred way to recover a database after a transaction in progress terminates abnormally? A. Rollback B. Rollforward C. Switch to a duplicate database D. Reprocess transactions

Ans: A

How does the Kimball approach to design a data warehouse differ from the Inmon approach?

Kimball - bottom-up In the bottom-up approach, data marts are first created to provide reporting and analytical capabilities for specific business processes. Inmon - top-down In the top-down approach to data warehouse design the data warehouse is designed using a normalized enterprise data model. "Atomic" data at the lowest level of detail are stored in the data warehouse. Dimensional data marts containing data needed for specific business processes or specific departments are created from the data warehouse.

How does a Left Outer Join differ from a Right Outer Join?

Left Outer Join includes unmatched rows from the table written on the left Right Outer Join includes unmatched rows from the table written on the right.

Which of the following is the preferred way to recover a database after a transaction in progress terminates abnormally? A. Rollback B. Rollforward C. Switch to a duplicate database D. Reprocess transactions

Ans: A

Which of the following is the simplest type of NoSQL database? a) Key-value b) Wide-column c) Document

Ans: A

Which of the following statements is incorrect? a) Non or Post Relational databases require that schemas be defined before you can add data b) NoSQL databases are built to allow the insertion of data without a predefined schema c) NewSQL databases are built to allow the insertion of data without a predefined schema

Ans: A

Why is concurrency control important? A. To ensure data integrity when updates occur to the database in a multiuser environment B. To ensure data integrity when updates occur to the database in a single-user environment C. To ensure data integrity while reading data occurs to the database in a multiuser environment D. To ensure data integrity while reading data occurs to the database in a single-user environment

Ans: A

"Sharding" a database across many server instances can be achieved with: a) LAN b) SAN c) MAN

Ans: B

In general, who determines the access privileges for a user and enters the appropriate authorization rules in the DBMS catalog to ensure users only access a database in appropriate ways? A. Security consultant B. Data Administrator C. DBA D. IT manager

Ans: C

__________ is a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management decisions. A. Data Mining. B. Data Warehouse. C. Web Mining. D. Text Mining.

Ans: B

A traditional data administrator performs which of the following roles? A. Tune database performance B. Establish backup and recovery procedures C. Resolve data ownership issues D. Protect the security of the database.

Ans: C

A traditional data administrator performs which of the following roles? A. Tune database performance B. Establish backup and recovery procedures C. Resolve data ownership issues Protect the security of the database.

Ans: C

How is a data warehouse organized? A. by end users. B. by naming conventions and formats C. around important subject areas. D. around departments and functions

Ans: C

In what environment can data be updated? A. data warehouse. B. data mining. C. operational. D. informational.

Ans: C

What does the acronym EIS commonly mean? A. Extensible interface system. B. Executive interface system. C. Executive information system. D. Enterprise information system.

Ans: C

What type of join is needed when you wish to include rows that do not have matching values? a. Equi-join b. Natural join c. Outer join d) Any of the above.

Ans: C

What is SQL? A) A declarative or nonprocedural query language. B) An imperative language that specifies an explicit sequences of steps to follow. C) A set-oriented language. D) A and C. E) All of the above.

Ans: D

If we want to select output rows based on the results of the group function, what clause would you use?

HAVING CLAUSE

What is an ad hoc query tool?

A query that cannot be determined prior to the moment the query is issued. It is created in order to get information when need arises and it consists of dynamically constructed SQL which is usually constructed by desktop-resident query tools.

What records/logs of facts should be kept by a DBA?

All structural changes to a database schema and a DBMS must be carefully documented and logged. Data extract, transform, and load actions should be documented. Logs and user actions should be regularly audited. Auditing involves monitoring and recording selected user database actions. Auditing can be based on individual actions, such as the type of SQL statement executed, or on combinations of factors that can include user name, application, time, and so on. Security policies can trigger auditing when specified elements in a database are accessed or altered, including the contents within a specified object.

After the database designers complete the logical design, what Database design does a DBA typically create? A. physical B. machine C. logical D. normalized

Ans: A

The data Warehouse is__________. A. read only. B. write only. C. read and write.

Ans: A

Which of the following is a NoSQL Database Type ? a) SQL b) Document databases c) JSON d) All of the mentioned

Ans: B

Most NoSQL databases support automatic __________, meaning that you get high availability and disaster recovery a) processing b) scalability c) replication

Ans: C

What are the four ACID guarantees for transactions in a database? A) Atomic, Consistent, Identified, Durable. B) Atomicity, Concurrency, Integrity, Durability. C) Atomicity, Consistency, Isolation, Durability.

Ans: C

What are the four steps in the proactive cycle to improve security? A. 1) analysis, 2) control, 3) detection, and 4) prevention B. 1) detection, 2) analysis, 3) correction, and 4) control C. 1) prevention, 2) detection, 3) analysis, and 4) control

Ans: C

What are the two major approaches for storing data in a data warehouse? A) Data mart and operational data store. B) Denormalized and flat file. C) Dimensional and normalized. D) Star schema and snowflake schema.

Ans: C

What are two approaches to scaling up (out) a NoSQL database? (very challenging question) A) Clusters and In-Memory. B) Distributed and Parallel processing. C) Master-slave and Sharding. D) Vertical and Horizontal partitioning.

Ans: C

What does poor data administration often lead to or cause? A. A unified definition of the same data entity B. Over familiarity with existing data C. Performance problems D. Security breach

Ans: C

What is detailed data in a fact table called? A. homogeneous data. B. attribute data. C. atomic data. D. reporting data.

Ans: C

What type of relationship exists between a dimension and fact table in a Star schema? A) Many-to-many, from dimension tables to a central fact table. B) One-to-one C) One-to-many, from dimension table to the fact table.

Ans: C

What type of relationships exist in a star schema between dimension and the fact table? A. many-to-many. B. one-to-one. C. one-to-many. D. many-to-one.

Ans: C

Which of the following statements is true of a data warehouse? A) Can be updated by end users. B) Contains numerous naming conventions and formats. C) Organized around important subject areas. D) Contains only current data.

Ans: C

How is isolation of transactions achieved? A. avoiding simultaneous transactions. B. storing updates permanently. C. preventing system failure. D. concurrency control.

Ans: D

How many queries can be nested in a Where clause of an Outer query? a. A maximum of two b. A maximum of 16 levels of nesting is possible, but depends upon available memory and the length of the query. c. Generally a maximum of 32 levels of nesting, but the limit varies based on available memory and the complexity of other expressions in the query. d. Unlimited number only constrained by computer memory, DBMS and processor speed.

Ans: D

In what type of join are the rows that do not have matching values in common columns nonetheless included in the result table? a. Cross Join b. Inner Join c. Nested Join d. Outer Join

Ans: D

SELECT E.EmployeeID, E.EmployeeName, M.EmployeeName AS Manager FROM Employee_T E, Employee_T M WHERE E.EmployeeSupervisor = M.EmployeeID; What type of join is used in the above query? A) CROSS JOIN B) INNER JOIN C) OUTER JOIN D) SELF JOIN

Ans: D

What are characteristics of an active data warehouse architecture? A. at least one data mart. B. data that has extracted from multiple internal and external sources. C. near real-time data updates. D. all of the above.

Ans: D

What is a common time horizon for data stored in a Data warehouse? A. Current, most recent year. B. 2-4 years. C. 5-6 years. D. 6 or more years.

Ans: D

What are the two main ways of storing PL/SQL code in the Oracle database?

CREATE PROCEDURE and CREATE TRIGGER

What is {FirstName:"Bob", Address:"5 Oak St.", Hobby:"sailing"}?

Document

What database offers an API or query language that allows a user to retrieve based on content?

Document Store

What is the intention of DBA automation?

Enable DBAs to focus on more proactive activities around database architecture, deployment, performance and service level management

What is the central table in a star schema?

Fact table

What databases store data as a collection of nodes, connected by edges?

Graph

What are the 3 fundamental Relational Algebra operators?

Projection (π), Selection (σ), Natural join (⋈)

What is the purpose of backups?

Refers to the copying and archiving of computer data so it may be used to restore the original after a data loss event.

What are the responsibilities of a Database Administrator?

Responsible for the performance, integrity and security of a database.

What is the role of a database administrator?

Responsible for the performance, integrity and security of a database. They will also be involved in the planning and development of the database, as well as troubleshooting any issues on behalf of the users.

Why are SQL databases not natively suited to a cloud environment?

SQL databases are difficult to scale, meaning they are not natively suited to a cloud environment

What are major problems with NoSQL databases?

Security Data consistency Lack of standardization Scalability Caveat

What are reasons for data backup? Why is it important?

So you don't lose anything

What does soft state mean in a NoSQL database?

Soft state is information (state) the user put into the system that will go away if the user doesn't maintain it. Stated another way, the information will expire unless it is refreshed.

What are the 3 types of DBAs?

System DBA Application DBA Task-oriented DBA

What are the major tasks in the maintenance phase?

They include: database backup and recovery, performance tuning, design modifications, access management and audits, usage monitoring, hardware maintenance, upgrades.

What is the UNION compatible rule?

Two or more tables that have the same number of columns and the corresponding columns have compatible domains\ the two relations must have the same set of attributes

What is a typical hierarchy for database administration?

VP Senior Junior

Authentication means ______________ of someone (a user, device, or other entity) who wants to use data, resources, or applications.

Verifying

What are two primary methods to run a database on the cloud?

Virtual machine image Database as a service

One technique for evaluating database security involves performing _________________________________ or penetration tests against the database.

Vulnerability Assessments

What are phases in the database life cycle?

1. Requirements analysis, 2. Logical and Physical Design, 3. Operation, 4. Maintenance. Operation stage includes adding of new data, modifying existing data and deletion of obsolete data.

What are the four attributes of a data warehouse specified by Inmon?

1. Subject-oriented. The data in the data warehouse is organized so that all the data elements relating to the same real-world event or object are linked together, e.g., customers or employees. 2. Non-volatile. Data in the data warehouse are never overwritten or deleted — once committed, the data are static, readonly, and retained for future reporting. 3. Integrated. An enterprise data warehouse contains data from most or all of an organization's operational systems and these data are made consistent. 4. Time-variant. A data warehouse contains the history of data values.

Briefly, what is the history of data warehousing?

1960s —Dimensional approach developed to model data into fact and dimension tables (Star schema) in a joint research project by General Mills and Dartmouth College. • 1970s — A. C. Nielsen commercialized dimensional data marts for retail sales. • Mid-1970s —Bill Inmon defines and promotes the term data warehouse. • 1983 — Teradata introduces a database management system specifically designed for decision support with parallel processing. The beta system was shipped to Wells Fargo Bank just in time for Christmas. • 1984 — Metaphor Computer Systems released Data Interpretation System (DIS), a hardware/software package and GUI for business users to create a database management and analytic system. • 1988 — Barry Devlin and Paul Murphy publish the article "An architecture for a business and information system" in the IBM Systems Journal where they introduced the term "business data warehouse". • 1990 — Red Brick Systems, founded by Ralph Kimball, introduced the Red Brick Warehouse, a database management system specifically for data warehousing. • 1991 — Prism Solutions, founded by Bill Inmon, introduced Prism Warehouse Manager, a software tool for developing a data warehouse. • 1992 — Bill Inmon published the book Building the Data Warehouse. • 1995 — The Data Warehousing Institute was founded. It is a for-profit organization that promotes data warehousing. • 1996 — Ralph Kimball published the book The Data Warehouse Toolkit.

What is a computer virus?

A computer virus is a computer program that can damage a computer's software, hardware or data. It is referred to as a 'virus' because it has the capability to replicate itself and hide inside other computer files. There are many types of viruses. New virus are regularly created and released through security gaps, PHP injection, and malicious web links.

What is eventual consistency? What databases provide this feature?

A consistency model, which is used in many large distributed databases. Such databases require that all changes to a replicated piece of data eventually reach all affected. MongoDB, CouchDB, Amazon SimpleDB, Amazon DynamoDB, Riak, DeeDS, ZATARA

What is an in-memory database?

A database management system that primarily relies on main memory for computer data storage.

What is a Data Warehouse?

A database used for reporting and data analysis. It is a central repository of data which is created by integrating data from one or more disparate data sources.

What is a data warehouse?

A database used for reporting and data analysis. It is a central repository of data which is created by integrating data from one or more disparate data sources.

What is an enterprise data warehouse?

A database used for reporting and data analysis. It is a central repository of data which is created by integrating data from one or more disparate data sources.

What is hacking?

A major security threat is unauthorized access. Many databases contain sensitive information, and it could be very harmful if this information were to fall in the wrong hands. Imagine someone stealing your social security number, date of birth, address and bank information. With this data someone could obtain a credit card using your identity and incur charges without your knowledge. Getting unauthorized access to computer systems is known as hacking. Computer hackers have developed sophisticated methods to obtain data from databases, which they may use for personal gain or to harm others. Have you ever received an e-mail with a notification that you need to log in to your credit card account with a link for you to follow? This is called phishing. Most likely, a hacker is trying to obtain your login details. Be careful!

What is a decision support system?

A specific class of computerized information system that supports business and organizational decision-making activities.

What is a star schema?

A star schema is a data mart design that consists of one or more fact tables that reference multiple dimension tables.

What is an advantage of using a subquery rather than a query with DISTINCT when both can yield the correct answer? a. A query will run more efficiently if you use the subquery to eliminate duplicates b. Use of subqueries reduces the hierarchy found in execution which can be useful c. Using subqueries makes it easier to read and understand the processing d. a and c e. All of the above.

Ans: D

What are three benefits of a data warehouse?

Benefits: • Consolidate data from multiple sources into a single database for querying. • Maintain data history, even if source transaction systems do not. • Improve data quality, by "scrubbing" the data to provide consistent, uniform codes and field descriptions, flagging and in some cases fixing bad data. • Provide a single common data model for all data of interest regardless of the data's source. • Restructure data so it is understandable for business users. • Restructure data so that it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems. • Add value to operational business applications, notably customer relationship management (CRM) systems. • Improve turnaround time for data access and reporting; • Standardize data across the organization so there will be one view of the "truth;" • Lower costs to create and distribute information and reports; • Share data and allow others to access and analyze the data; • Encourage and improve fact-based decision making.

What is business intelligence?

Business intelligence (BI) is a set of theories, methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information for business purposes.

What tools are included in the expanded definition of data warehousing?

Business intelligence tools Tools to extract, transform and load data into the repository Tools to manage and retrieve metadata

In the bottom-up approach to data warehouse design, ___________________ are first created to provide reporting and analytical capabilities for specific business processes.

Data Marts

What is an aggregate?

Data combined from several measurements

What is an Integrated data warehouse?

Data integration involves combining data residing in different sources and providing users with a unified view of these data.

What is a data mart?

Data marts are generally designed for a single subject area. An organization may have data pertaining to different departments like Finance, HR, Marketing, etc. stored in data warehouse and each department may have separate data marts. These data marts can be built on top of the data warehouse.

What do we call the function of managing and maintaining database management systems (DBMS) software?

Database administration

What do we call the technical function responsible for database design, security, and disaster recovery?

Database administration

What are DBMS policies, procedures, and standards?

Database managers don't simply oversee the storage of information in a system. As technology changes and new advances are made, database managers must keep up with the current developments in database design and application.

What type of DBA is responsible for data model design and maintenance and DDL generation?

Development DBAs

What is parallel computing?

Involves carrying out many operations simultaneously, dividing large problems and queries into smaller ones solved concurrently

What relational operation causes two or more tables with a common domain to be combined into a single table?

JOIN

What is JSON?

JSON is short for JavaScript Object Notation, and is a way to store information in an organized, easy-to-access manner. In a nutshell, it gives us a human-readable collection of data that we can access in a really logical manner.

What are the four types of NoSQL data models? See Q. 38)

Key-Value Store Document-Based Store Column-Based Store Graph-Based

What are the four major post-relational data models?

Key-Value Store (like Hadoop or Bigtable) Document Store Column Data Store Graph Model

In a distributed data store, what has similar importance like a schema has in a relational database?

Keyspace

What are three three limitations of a data warehosue?

Limitations: • User expectations • Lack of data • Poor data

Designers of distributed data stores have increased what feature at the expense of consistency?

The ability of arbitrary querying

What occurs during database implementation?

The physical realization of the database and application designs are to be done. This is the programming phase of the systems development.

What are three features of key value stores?

Consistency - all replications of a database have the same data Availability - a request sent receives always a response Partition Tolerance - the system resists when some replications encounter problems

Why is denormalized data acceptable when it's used for Decision Support?

Database designers have argued non-normalized or de-normalized data is bad. Database Administrators (DBAs) sometimes denormalize to improve query performance. Normalization of a relational database for transaction processing avoids processing anomalies and results in the most efficient use of database storage. A data warehouse for Decision Support is not intended to achieve these same goals. For Data-driven Decision Support, the main concern is to provide information to the user as fast as possible. Because of this, storing data in a de-normalized fashion, including storing redundant data and pre-summarizing data, provides the best retrieval results. Also, data warehouse data is usually static so anomalies will not occur from operations like add, delete and update of a record or field.

What are a DBA's managerial responsibilities?

Database managers don't simply oversee the storage of information in a system. As technology changes and new advances are made, database managers must keep up with the current developments in database design and application.

What are the two leading approaches to storing data in a data warehouse? Who championed each approach?

Dimensional - Ralph Kimball Normalization - Bill Inmon

What are the two major approaches for storing data in a data warehouse?

Dimensional and normalized

What is the evolution of the DBA function?

From Slide: In the beginning, Data Processing departments • Information Systems Department • Provide end users active data management support • Provide solutions to information needs (application development) • Starting 1980s, Database Administrators • DBA function created to handle increasingly complex data management tasks • Number of DBAs and Role varies from company to company • DBA's function is very dynamic • Mid-90s, added Data Warehouse administrators

What can a database administrator (DBA) do to control access to data for different types of users?

GRANT & REVOKE

What is the name of a proprietary (closed-source) database created by Google that uses columns?

Google Apps

What kind of database is designed for data whose elements are interconnected with an undetermined number of relations between them?

Graph database

The SQL ______________ clause includes a predicate used to filter rows resulting from the GROUP BY clause.

HAVING

What is data administration?

The process by which data is monitored, maintained and managed by a data administrator and/or an organization.

Why do we need database administrators?

The purpose of the DBA is to do all of this on a daily basis by performing a specific set of general tasks.

What does it mean to say "the star model is about slicing and dicing the fact table"?

The star schema supports rapid aggregations (such as count, sum, and average) of many fact records, and these aggregations can be easily filtered and grouped ("sliced & diced") by the dimensions.

What are common security risks to database systems?

- Deployment failures - Broken databases - Data leaks - Stolen database backups - The abuse of database features - A lack of segregation - Hopscotch - SQL injections - Sub-standard key management - Database inconsistencies

What are 2 motivations for using the NoSQL approach to storing data?

- Simplicity of design - Simpler "horizontal" scaling to clusters of machines (which is a problem for relational databases) - Finer control over availability.

With a _________________ database, application owners do not have to install and maintain the database on their own. Instead, a provider takes responsibility for installing and maintaining the database, and application owners pay according to their usage.

Cloud (Database-As-A-Service)

What are different ways of organizing and/or grouping documents?

Collections Tags Non-visible metadata Directory hierarchies

What is Online Analytical Processing (OLAP)?

Computer processing that enables a user to easily and selectively extract and view data from different points of view.

What is done to protect against loss of data in the case of a complete system crash of a main memory database system (MMDB)?

High availability implementations that rely on database replication, with automatic failover to an identical standby database in the event of primary database failure. To protect against loss of data in the case of a complete system crash, replication of an IMDB is normally used in addition to one or more of the mechanisms listed above

What data is stored in a data warehouse?

Historical data Derived data Metadata

What are common database administration tools?

Native tools (pg 300) Microsoft SQL Server has SQL Server Enterprises Manager Oracle SQL*Plus and Oracle Enterprise Manager/Grid Control GUI tools

What type of database provides a mechanism for storage and retrieval of data that use less structured consistency models than traditional relational database?

NoSQL Database

What are the sub-parts of PL/SQL programs?

The sub-parts are: 1) Header - This is the optional first section of the code block. It is used to identify the type of code block and its name. The code block types are: anonymous procedure, named procedure, and function. A header is only used for the latter two types; 2) Declaration - This is an optional section of the code block. It contains the name of the local objects that will be used in the code block. These include variables, cursor definitions, and exceptions. This section begins with the keyword Declare; 3) Executable - This is the only mandatory section. It contains the statements that will be executed. These consist of SQL statements, DML statements, procedures (PL/SQL code blocks), functions (PL/SQL code blocks that return a value), and built-in subprograms. This section starts with the keyword Begin; 4) Exception - This is an optional section. It is used to "handle" any errors that occur during the execution of the statements and commands in the executable section. This section begins with the keyword Exception. Every PL/SQL statement ends with a semicolon (;). PL/SQL blocks can be nested within other PL/SQL blocks using BEGIN and END.

What technology created by Google is the foundation for Hadoop? A) Hive. B) MapReduce. C) Oozie.

Ans: B

Which of the following is not a NoSQL database ? a) SQL Server b) MongoDB c) Cassandra d) None of the mentioned

Ans: A

How does rollback differ from roll forward in database recovery?

- Roll forward: The Rollforward redoes the changes made by a transaction, after the committed transaction and over-writes the changed value once again to ensure consistency. - Roll back: The Rollback transaction is a transaction which rolls back the transaction to the beginning of the transaction. It is possible to use before Commit transaction.

What type of join query will return all of the records in the left table (table A) that have a matching record in the right table (table B)? A) INNER JOIN B) LEFT JOIN C) OUTER JOIN D) RIGHT JOIN

Ans: A

What are major functions of data administration?

- Data policies, procedures, standards - Planning- development of organization's IT strategy, enterprise model, cost/benefit model, design of database environment, and administration plan. - Data conflict (ownership) resolution - Data analysis- Define and model data requirements, business rules, operational requirements, and maintain corporate data dictionary - Internal marketing of DA concepts - Managing the data repository and of database administration? - Selection of hardware and software -- Keep up with current technological trends -- Predict future changes -- Emphasis on established off the shelf products - Managing data security and privacy -- Protection of data against accidental or intentional loss, destruction, or misuse -- Firewalls -- Establishment of user privileges -- Complicated by use of distributed systems such as internet access and client/ server technology.

What are five characteristics or features of a data warehouse?

- Some data is denormalized for simplification and to improve performance. - Large amounts of historical data are used. - Queries often retrieve large amounts of data. -Both planned and ad hoc queries are common. - The data load is controlled.

What type of subquery is nested inside another outer query from which it uses values? A) Correlated Type II subquery. B) Dependent subquery. C) Simple Type I subquery.

Ans: A

Approximately how many steps are involved in the process of creating a new database?

5

What is data warehouse administration? How does it differ from database administration?

A DW admin provides the overall management of a data warehouse. From slide: Data Warehouse administration involves the overall management of a data warehouse. Administration tasks include archiving, consistency checks, developing/maintaining indexing and retrieval functionality, tracking data changes, migration, monitoring, performance issues, replication issues, data quality, and sizing/space management.

What are(is) the main responsibilities of a Data Warehouse Administrator? A. monitor the data and activities inside the data warehouse B. administer the data warehousing environment, including security, data growth, performance, platform upgrades, support agreements, and disaster recovery, C. administer the data warehousing services, including query services, and production services. D. support the data warehousing user and insure timely decision making and decision support. E. All of the above

Ans. E

Generally, a star schema is composed of __________ fact table(s)? A. one. B. two. C. three. D. four.

Ans: A

NoSQL databases are used mainly for handling large volumes of ______________ data. a) unstructured b) structured c) semi-structured

Ans: A

What SQL statement returns varying results based upon the evaluation of expressions? a. CASE b. DIFFERENCE c. DISTINCT d. UNION

Ans: A

What SQL statements are used in large, multiuser database systems to control transactions, i.e., sequences of changes to a database? a. COMMIT and ROLLBACK b. DCL and DML c. ROLLBACK and ROLLFORWARD d. UPDATE and COMMIT

Ans: A

What are major characteristics of a Database Administrator's work? A. Complex, repetitive, time-consuming and requires specialized training B. Easy, rewarding, time-consuming and requires specialized training C. High stress, demanding, tedious and requires an advanced degree D. Low stress, repetitive, and intellectually challenging

Ans: A

What are the four types of NoSQL data models? A) Column, Document, Graph, Key Value B) Column, Document, In-memory, Key Value C) Column, Document, Key Value, Network

Ans: A

What data from the operational environment is most commonly extracted and loaded into a data warehouse? A. Current detail data. B. Older detail data. C. Lightly summarized data. D. Highly summarized data.

Ans: A

What does DSS mean in a data warehouse context? A. Decision Support System. B. Decision Security System. C. Data Storage System. D. Data Support Service.

Ans: A

What does data transformation include? A. a process to change data from a detailed level to a summary level. B. a process to change data from a summary level to a detailed level. C. joining data from one source into various sources of data. D. separating data from one source into various sources of data.

Ans: A

What does database transaction durability ensure in the event of a system failure? A. a transaction is not lost once it has been committed. B. a transaction is completed uninterrupted. C. a transaction is reviewed before its is executed. D. a transaction is saved before the failure occurs.

Ans: A

What functions can return the results as a document, or may write the results to collections. a) MapReduce b) Mapper c) ReduceMap d) none of the above

Ans: A

What is a Full Outer Join? a. A join that combines the effect of applying both Left and Right Outer Joins. b. A join Where a result set will have no NULL values c. A join where only a single row will be produced in the result set containing fields populated from joined tables d. a and c e. b and c

Ans: A

What is an open-source DBMS? A. Free or nearly free database software whose source code is publicly available B. Free or nearly free database software, but source code is not publicly available C. Competitive compared to PC-oriented packages, but are not fully SQL compliant D. Clones of proprietary DBMS that have limited features

Ans: A

What is backward recovery? A. Where the before-images are applied to the database B. Where the after-images are applied to the database C. Where the after-images and before-images are applied to the database D. Switching to an alternative, existing copy of the database

Ans: A

What is backward recovery? A. Where the before-images are applied to the database B. Where the after-images are applied to the database C. Where the after-images and before-images are applied to the database D. Switching to an alternative, existing copy of the database

Ans: A

What is the UNION operator? a. An SQL operator that allows you to stack datasets b. An SQL operator that joins two datasets side-by-side c. UNION displays multiple SELECT statements

Ans: A

What is the input to each of the two phases of the MapReduce algorithm? A) Key-value pairs. B) Relations. C) XML and JSON.

Ans: A

What is the intention of DBA automation? A. Enable DBAs to focus on proactive activities like performance and service level management B. Increase the number of databases and reduce standardization of database schema C. Reduce the number of DBAs needed to operate a database.

Ans: A

What is the purpose of a query tool? A. data retrieval. B. information delivery. C. information exchange. D. run routine reports.

Ans: A

Which of the following is a wide-column store ? a) Cassandra b) Riak c) MongoDB d) Redis

Ans: A

A data warehouse administrator is concerned with which of the following? A. The amount of time needed to make a decision but not the typical roles of a database administrator B. The time to make a decision and the typical roles of a database administrator C. The typical roles of a data administrator and redesigning existing applications D. The typical roles of a database administrator and redesigning existing applications

Ans: B

After a DBMS is purchased, who has primary responsibility for installation and maintenance? A. IT manager B. DBA C. Vendor D. Systems Administrator

Ans: B

An operational system is _____________. A. used to run the business in real time and is based on historical data. B. used to run the business in real time and is based on current data. C. used to support decision making and is based on current data. D. used to support decision making and is based on historical data.

Ans: B

Data is stored, retrieved and updated in a ____________ environment. A. OLAP. B. OLTP. C. Business Intelligence (bi).

Ans: B

If both data and database administration exist in an organization, the database administrator is responsible for which of the following? A. Data modeling B. Database design C. Metadata creation D. None of the above.

Ans: B

If both data and database administration exist in an organization, typically the database administrator is responsible for which one of the following? A. Data modeling B. Database design C. Metadata D. Stewardship

Ans: B

In what clause(s) can you place a subquery? a. A subquery can be nested inside the IN or WHERE clauses of an outer SELECT, INSERT, UPDATE, or DELETE statement, or inside another subquery. b. A subquery can be nested inside the WHERE or HAVING clause of an outer SELECT, INSERT, UPDATE, or DELETE statement, or inside another subquery. c. A subquery can be nested only inside the WHERE or HAVING clause of an outer or inner SELECT subquery.

Ans: B

The UNION compatible rule means the tables have same number of columns and ____________________? a. corresponding attributes have compatible data types b. corresponding columns have identical data types and lengths c. corresponding rows have the same data types d. rows are ordered and have corresponding attributes

Ans: B

The _______ operator takes the results of two queries and returns only rows that appear in both result sets. a. Union b. Intersect c. Difference d. Projection

Ans: B

The _________ operation allows the combining of two relations by merging pairs of tuples, one from each relation, into a single tuple. a. Select b. Join c. Union d. Intersection

Ans: B

The concept of ___________ was introduced into SQL to handle "missing data" in the relational model. It indicates that a data value does not exist in the database. A) EXISTS. B) NULL. C) UNKNOWN. D) VIEW.

Ans: B

What are conditional CASE expressions? a. PL/SQL expression introduced in SQL-92. b. SQL statement that handles If/Then logic. c. Works like Else If in other programming languages.

Ans: B

What are the four attributes of a data warehouse specified by Inmon? A) Facts, Dimensions, Subjects, Time Variant. B) Integrated, Nonvolatile, Subject-Oriented, Time Variant. C) Integrated, Subject-Oriented, Tested, Time Variant. D) Nonvolatile, Subject-Oriented, Tested, Time Variant

Ans: B

What are the four steps in the proactive cycle to improve security? A. Steps are: 1) analysis, 2) control, 3) detection, and 4) prevention B. Steps are: 1) prevention, 2) detection, 3) analysis, and 4) control C. Steps are: 1) detection, 2) analysis, 3) correction, and 4) control

Ans: B

What are the two basic functions of a query optimizer? a. Calculate cost and execution time for a query. b. Determine Join order and Join method. c. Minimize cost and execution time for a query. d. Maximizes the execution plan that provides an optimum method of execution.

Ans: B

What are the two processes ensure that distributed databases remain up-to-date and current? A) Backups and Sharding. B) Duplication and Replication. C) Parity and RAID.

Ans: B

What do dimension tables commonly describe in a star schema? A. descriptive entities. B. relevant facts. C. units of measures. D. data domains.

Ans: B

What is Apache Hadoop? A) A post-relational database management system. B) An open-source software framework that supports data-intensive distributed applications. C) A proprietary software framework for large scale processing of data. D) Software for MapReduce in-memory processing.

Ans: B

What is a vulnerability assessment? A. process of identifying and correcting the vulnerabilities in a system. B. process of identifying, quantifying, and prioritizing the vulnerabilities in a system. C. process of prioritizing the vulnerabilities in a database system or data warehouse.

Ans: B

What is the extract process? A. capturing all of the data contained in various operational systems. B. capturing a subset of the data contained in various operational systems. C. capturing all of the data contained in various decision support systems. D. capturing a subset of the data contained in various decision support systems.

Ans: B

What is the main organizational justification for implementing a data warehouse? A. cheaper ways of handling data movement. B. decision support. C. storing large volumes of data. D. providing access to data.

Ans: B

What join returns records from the right table that have no matching key in the left table in the result set? a. Left outer join b. Right outer join c. Full outer join d. Cross join

Ans: B

What programming language is used to write custom functions to perform the map and reduce operations? a) Java b) Javascript c) JSON

Ans: B

When are two relations union-compatible? a. If they have the same attributes and the same columns b. If they have the same number of attributes and each attribute is from the same domain c. If they have the same number of rows and columns d. If they have the same number of rows and columns and each attribute is from the same domain

Ans: B

What is business intelligence? A) A set of theories, methodologies, processes, architectures, and technologies that transform information into meaningful and useful data for business purposes. B) A type of data warehouse used for competitive intelligence activities. C) An umbrella term to describe "concepts and methods to improve business decision making by using fact-based support systems." D) A and C

Ans: D

What is data scrubbing? A) A process to reject data from the data warehouse and to create the necessary indexes B) A process to load the data in the data warehouse and to create the necessary indexes C) A process to upgrade the quality of data after it is moved into a data warehouse D) A process to upgrade the quality of data before it is moved into a data warehouse

Ans: D

What is data scrubbing? A. a process to reject data from the data warehouse and to create the necessary indexes. B. a process to load the data in the data warehouse and to create the necessary indexes. C. a process to upgrade the quality of data after it is moved into a data warehouse. D. a process to improve the quality of data before it is moved into a data warehouse

Ans: D

What is the purpose of backups? A. Provide a structure for the storage and recovery of data. B. Recover data after it is lost by data deletion or corruption. C. Recover data from an earlier time, according to a user-defined data retention policy. D. B and C. E. All of the above.

Ans: D

What is the purpose of backups? A. Provide a structure for the storage and recovery of data. B. Recover data after it is lost by data deletion or corruption. C. Recover data from an earlier time, according to a user-defined data retention policy. D. B and C. E. All of the above.

Ans: D

What tasks are Business Intelligence and data warehousing systems used for in an organization? A. Forecasting. B. Reporting. C. Analysis of large volumes of product sales data. D. All of the above.

Ans: D

What types of data stores are used to store information about networks, such as social connections. a) Key-value b) Wide-column c) Document d) Graph

Ans: D

Which of the following is NOT a property of database transactions? A. Atomicity B. Consistency C. Durability D. Idenfiable

Ans: D

Which of the following statements is true? a) Documents can contain many different key-value pairs, or key-array pairs, or even nested documents b) MongoDB can link to only proprietary programming languages and development environments c) When compared to relational databases, NoSQL databases are more scalable and provide superior performance d) a and c e) All of the above.

Ans: D

Which of the following topics are part of an administrative policy to secure a database? A. Authentication policies B. Limiting particular areas within a building to only authorized people C. Backup procedures D. A and C. E. All of the above.

Ans: E

Which of the following topics are part of an administrative policy to secure a database? A. Authentication policies B. Limiting particular areas within a building to only authorized people C. Backup procedures D. A and C. E. All of the above.

Ans: E

Who is responsible for running queries and reports against data warehouse tables? A. DBA. B. Software applications. C. End users. D. Database analysts. E. C and D

Ans: E

What is an informational system?

Any organized system for the collection, organization, storage and communication of information. ... The term is also sometimes used in more restricted senses to refer to only the software used to run a computerized database or to refer to only a computer system.

What is an example of an access control?

Any selective restriction on access like user permissions. An access control system determines who can access what. Access is gained using identification like a username and password, biometrics like facial or fingerprint recognition, or check keycards. The identification must be authenticated and then an access control system determines what the authenticated user is authorized to do and restricts or limits access to only authorized uses.

What property requires that each transaction is "all or nothing"?

Atomicity

What are the four ACID guarantees for transactions in a database?

Atomicity Consistency Isolation Durability

Which ACID property ensures that any transaction will bring the database from one valid state to another? What does the consistency property insure?

Consistency Ensures that only valid data following all rules and constraints is written in the database

What is dimensional modeling?

Dimensional modeling is a data modeling technique in data warehouse design. Dimensional models use facts and dimensions to describe data for the business. Facts are typically numeric values that are additive (can be aggregated). Dimensions are descriptive elements used for grouping, labeling, and filtering facts.

What is the difference between a dimension and a fact in a star schema? Can facts be aggregated?

Dimension - A category of data. Qualifying characteristics that provide additional perspectives to a given fact Fact - numeric measurements that represent a specific business aspect or activity Each dimension record is related to thousands of fact records Yes - facts can be aggregated

How does a materialized view differ from a named query or dynamic view?

Dynamic view -- A virtual table that is created dynamically upon request by user. A dynamic view is not a temporary table. Rather, its definition is stored in the system catalog and the contents of the views are materialized as a result of an SQL query that uses the view. Materialized View -- Copies or replicas of data based on SQL queries created in the same manner as dynamic views. However, a materialized view exists as a table and thus care must be taken to keep it synchronized with its associated base tables.

An ___________________ is a unified database that holds all the business information of an organization.

Enterprise Data Warehouse

Apache _____________________ is an open-source software framework that supports data intensive distributed applications. It supports parallel running of applications on large clusters of commodity hardware. It derives from Google's MapReduce and Google File System (GFS) papers.

Hadoop

What is HDFS?

Hadoop Distributed File System is a distributed file system that provides high-performance access to data across Hadoop clusters. Like other Hadoop-related technologies, HDFS has become a key tool for managing pools of big data and supporting big data analytics applications.

What makes databases especially challenging to implement and manage?

Having a database does not mean the data will be used properly, efficiently, or correctly DBA work is complex, often repetitive, time-consuming and requires significant, ongoing training DBMS is just a tool for managing data - it must be used correctly so we must have effective management and use 3 main tasks/processes for implementing a DBMS Technological > DBMS software and hardware Managerial > administrative functions Cultural > corporate resistance to change and innovation DBA is key to implementation of DBMS

What types of data are stored in a data warehouse?

Historical enterprise data, metadata, structured or semi-structured, may have derived, primarily atomic

What are the BASE guarantees?

If no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.

What is a Full Outer Join?

In SQL the FULL OUTER JOIN combines the results of both left and right outer joins and returns all (matched or unmatched) rows from the tables on both sides of the join clause.

What is the CAP theorem?

In a database we can achieve 2 of the CAP characteristics, but not all three.

What are conformed dimensions? Why are they important?

In data warehousing, a conformed dimension is a dimension that has the same meaning to every fact with which it relates. Conformed dimensions allow facts and measures to be categorized and described in the same way across multiple facts and/or data marts, ensuring consistent reporting across the enterprise.

An _______________________ database is a database management system that primarily relies on main memory for computer data storage.

In-memory

What are major advantages/benefits of creating a star schema for a data warehouse?

Simpler queries - star schema join logic is generally simpler than the join logic required to retrieve data from a highly normalized transactional schemas. A Snowflake schema does have normalized tables. • Simplified business reporting logic - when compared to highly normalized schemas, a dimension schema simplifies common business reporting logic, such as period-over period and as-of reporting. • Query performance gains -star schema can provide performance enhancements for read only reporting applications when compared to highly normalized schemas. • Fast aggregations - the simpler queries against a Star schema can result in improved performance for aggregation operations. Snowflake schema queries are more complex and require longer execution time. • Feeding cubes - star and snowflake schemas are used by OLAP systems to build proprietary OLAP cubes efficiently; in fact, most major OLAP systems provide a ROLAP mode of operation which can use a denormalized schema directly as a source without building a proprietary cube structure.


संबंधित स्टडी सेट्स

North Africa and Southwest Asia, 6.1

View Set

Consumer Behavior Exam 1 (Chapter 2)

View Set

Environmental Science 1401 - Test 1

View Set

Final Exam- Biology 121: Anatomy & Physiology I

View Set

Energy change in chemical reactions

View Set

[COM 102] Week 1: What Is Interpersonal Communication? (Reflect & Relate 4 ed. McCormack)

View Set