Module 4 Databases, Data Warehousing, and Data Mining Did I get this, Learn by Doing

¡Supera tus tareas y exámenes ahora con Quizwiz!

True or False. Predictive analytics is used by businesses to predict trends or behaviors from customers.

True-It helps to determine patterns and predict future outcomes and trends.Predictive analytics is a process in data mining that attempts to deduce what might happen in the future.

There are five rules that pertain to the relational database model. Which one of the following examples breaks one of the rules? The video table is sorted by production date. In the orders table, the video title field is before the video production date. The primary key of the orders table is the video title.

Video title is not a good primary key because it is possible to have two videos with the same name. A better choice would be a video ID number.Hint: The order of records and fields is not important.

You are helping to design a spreadsheet table for accounting. You explain to the team that a spreadsheet table is similar to a database; the difference is that database tables have

relationship- Relational databases have strong relationships between tables, allowing for queries that lead to information.

Big data can be structured, semistructured, or

unstructured- Big data algorithms can handle any type of data set.

Which of the following is not a database model? Object defined Hierarchical Flat file Relational

"Object defined" is not the correct name of any database model. Simply named, an object database is a database that is used to store objects and is the basis for an object-oriented DBMS. Hierarchical is a type of database that connects tables in a master/slave format. A flat file is a type of database model and resembles a file cabinet's functionality. Relational is a type of database that connects tables of records together with complex relationships.

Which term describes the procedure in ETL (extract, transform, and load) that correlates the data with the data warehouse format? Consolidation algorithm Transfer procedure application Scrub algorithm

A consolidation algorithm correlates the data with the data warehouse format.

A data warehouse can be described or defined as a platform or capability for businesses to tap or develop business intelligence used for decision making. Describe what a data warehouse is and how it can benefit business.

A data warehouse is a collection of historical business transaction data, coupled with other, external market data or information. It serves to integrate data from multiple sources within the enterprise. The key benefit to business is that this data set can be queried and analyzed to support business decision makin

Consumers use data warehouses everyday, many times without even knowing it. Facebook is an example of a data warehouse that most consumers use on a daily basis. Which of the following best defines a data warehouse? A data warehouse is used to analyze sales data. A data warehouse is a data set of competitive pricing. A data warehouse is where big data is stored. A data warehouse is a collection of business data from multiple systems used for data analysis and mining.

A data warehouse is comprehensive business data set from multiple systems and is analyzed and mined for business intelligence. Facebook gathers personal data about your friends, your hobbies, and what you like, and stores that data in one central repository.Hint: Large amounts of historical data are stored.

Companies rely on data to drive information and decision making. Why would a business or enterprise implement a data warehouse? A data warehouse stores legacy paper files. A data warehouse improves decision making. A data warehouse stores employee files

A data warehouse is needed for successful business intelligence A key reason for implementing a data warehouse is to store and access data from multiple sources to be queried and analyzed to help manage decision making.

Select the best definition to describe a database. A database is a file containing business intelligence. A database is a set of data that are organized and easily searchable through queries. A database is a computer application that has business information

A database is a set of data, in electronic format, organized in tables and easily searchable.This is a collection of data.

Assume you are explaining the types of servers involved in databases to a group of coworkers. You note that a -------- is a dedicated computer that stores database files and database management systems.

A database server is a powerful computer that stores the databases and DBMS to access and administer the database. note from the choices: A SQL server is a database server, but not a generic database server. It is a Microsoft product. A web server processes requests to distribute information to the Internet.

Which of the following terms represents a set of data, typically aligned as a row in a table? Record Table Field

A record is a set of related data typically aligned in a row.Data are organized in tables. A table is a set of related records. A record is not a field, but a set of attributes related to one item. A record contains fields.

What is a required process in implementing and maintaining a data warehouse? Partitioning Formatting Defragging

A typical relational data warehouse is made up of indexed tables that can be implemented using a partitioned approach. Relational data warehouses benefit from partitioning the data as part of implementation and maintenance.Hint: Data warehouses should be scaled down by dividing database objects into smaller pieces, so the individual objects are more manageable.

Which of the following attributes or fields would be a good choice for a primary key? Customer Name Address Student ID

A unique identifier can be made for each new student. While multiple students may have the same name, no two students will have the same student ID.A primary key must be a unique identifier.

What is a key advantage in business in having a web-based database rather than a client-server traditional database? Security of data Data integrity Access from anywhere

Access from anywhere This is a cloud-based application. A web-based database allows users to access data from anywhere; it is useful for remote or mobile staff and to maintain database consistency.

Frank works as a database administrator and focuses a lot of time and attention on data integrity and the quality of data in the database. What are some of the desirable characteristics of data integrity that he should look for in his work? (Select all that apply.) Accuracy Duplication Consistency

Accuracy and consistency define data integrity in database design

Which of the following terms defines a database entity? A collection of related records Something for which a business collects data A collection of related fields

An entity is an event, person, or thing for which a business collects data.Hint: Organizations collect data about their customers, products, and employees. A record is a set of related fields. An entity is an event, person, or thing for which a business collects data. A table is a set of related records

What is one tool a business can use to map process requirements and data to develop a supporting relational database? Decision tree Visio Entity relationship diagram

An entity relationship diagram (ERD) outlines the business processes and subsequent relationships and possible data requirements.Used for organizing data in databases. Visio is a vector tool for outlining processes. It could be used to develop the entity relationship model A decision tree is a tool used in decision analysis.

A relational database which stores data in tables can be designed using a(n)

An entity-relationship model is the basic tool when designing a relational database.

Chris is explaining to management some ways in which an operational database can be used to improve performance and business decision making. What is a common business use of an operational database? Operational databases store statistical data for analysis. Operational databases can be used to track real time inventory. Operational database is a type of OLAP.

An operational database is an online transaction processing database (OLTP) and can be used to gather real-time transactional or operational data.

Big data has many characteristics, each with their own challenges. What is meant by "velocity" when discussing big data characteristics? Velocity is the high speed of the CPU bus. Velocity is the change measurement of big data. Velocity is the speed required to process or analyze big data.

Big data analytics must be performed at high speeds with powerful computers. This is referred to as high velocity requirement.

Which of the following is not a characteristic of big data? Velocity of processing Large volume Object data

Big data has three characteristics: large volume, variety, and velocity in which it can be processed through analytics. Object data is not one of them.Hint: Big data is large amounts of data.

Sarah is doing a presentation on big data at a company lunch and learning to help employees understand the ways IT can help the business. Which of the following is the best definition of big data? Big data is all the data a business enterprise collects and stores. Big data is another term for Big Blue, or IBM. Big data is used to describe large, complex data sets. Big data is data in the form of media objects like video.

Big data is a very large data set (structured or unstructured) that can be mined or analyzed to find new trends or relationships for business intelligence.A collection of data sets that can be mined.

Data mining, predictive analytics, and online analytic processing are all part of a category called -Select- . Business intelligence is a strategic element and helps companies gain a -Select-

Business intelligence is a broad subject that encompasses many areas of data analysis Large businesses that gather, store, and analyze their data using analytics will gain business intelligence, which will provide a competitive advantage.

Databases are used in business to improve performance and decision making. When using entity-relationship modeling to map business processes, relationships, and data requirements, what is the main purpose of the conceptual data model, and how does this help businesses?

Conceptual data models are high-level models that map the scope of the enterprise data architecture model and help support system and documentation requirements. Besides modeling, businesses use their databases for many things, including something as simple as keeping track of basic transactions. Business intelligence tools can help with analyzing vast amounts of data from data warehouses and data marts to help run the business more efficiently. Data analysis provides managers and employees with information from multiple systems that will help them make better decisions

What is the name of the notation used to diagram the relationship between entities? Cardinality Crow's foot Schema

Crow's foot notation is used to indicate one, one and only one, none, or many. For example, one and only one customer can buy none to many products. One sales rep can help none or many customers.Hint: Before building a database, the relationships between entities should be understood. Cardinality describes the numeric relationship between two entities, but the notation used to indicate cardinality is called crow's foot notation. A schema is the overall design of the database, tables, and relationships.

T or F-Data mining is used by businesses to analyze internal factors in order to improve business performance.

Data mining is primarily an analysis of internal business data.Think about a technique that looks at data from different perspectives.

Data warehouses can be described as repositories of data from several sources and are mostly used for -Select- and for -Select- to help management decision making.

Data warehouses are key for management reporting. Data warehouses are key for management analysis.

Which database model connects or relates different data tables using common fields called primary keys? Hierarchical Relational

Databases are tables of information, and relational database model link these tables using common, primary keys.Hint: This type of database stores related information. Hierarchical is a type of database that connects tables in a master/slave format.

In database modeling and design, what is meant by database normalization? It restores lost relationships. It means adjusting data values. It means removing redundancy in conceptual model.

Databases must go through normalization in the design process to simplify complexity where possible and remove redundancy between elements.Hint: Grouping something by similar qualities or features.

There are several levels of analysis and methods. One data mining methodology is -Select- . Another more complex data mining methodology is -Select- . A third type of data mining is -Select-

Decision trees are used in data mining to generate rules and to classify data sets. Artificial neural networks are nonlinear algorithm models used in data mining. This can be described as machine learning or advanced algorithms. Rule induction is another type of data mining technique based on statistical importance.

What is the process to update or backup a data warehouse? Identify, restore, and backup Extract, transform, and load Identify, transfer, and load

ETL refers to the process of data identification, consolidation and scrubbing, formatting, and transferring to the target data warehouse.

A database administrator will encounter problems feeding the data warehouse database during the -Select- scheduled activities. Therefore, it is important to apply effective -Select- strategies.

ETL-A key challenge in maintaining a data warehouse database is loading homogeneous data from various business systems during the ETL process. Partition-To improve database manageability and performance, a database partition strategy must be considered.

are used to describe relationships for a database. In this example there is one customer who orders none to many video

Entity-relationship diagrams, or ERDs,

True or False. For referential integrity to be enforced in a relational database, a primary key does not need a corresponding entry of equal value in the referenced table.

False-A row value or primary key in a table only exists if an item with an equal value (foreign key) exists in a referenced table

Data warehouses are implemented and widely used in large businesses, usually with more than 1,000 employees. Their purpose is to store all transactions and analytical operations data for data mining later. Describe the process and key tasks required to operate and maintain a data warehouse.

First, one must identify the data source that has the critical data to be warehoused. Second, one must create a consolidation algorithm that also correlates the data with the data warehouse format. Third, one has to create a scrub algorithm to ensure data quality and integrity. Finally a transfer procedure application is run to update the data warehouse database

Horizontal and vertical partitioning are the two most common processes in partitioning databases. What is horizontal partitioning? Horizontal partitioning restricts the columns to replicate in a back up process. Horizontal partitioning transforms the table from two dimensions to three dimensions. Horizontal partitioning restricts the rows to replicate in a back up process.

Horizontal partitioning in relational database deals with rows, not columns.

In contrast with a -Select- , a -Select- uses a flexible model in which data are distributed among several machines, often in a cloud-computing format.

In a machine-based relational database, the data are stored on one central machine and not multiple machines. A web- or cloud-based database uses a flexible model in which data are distributed among several machines, often in a cloud-computing format

When a user visits a web page, web server log files can determine the page a user visits, specific search requests, and possibly, the intention of the user. Assume you work in marketing and want to describe how this information could be collected and how e-commerce companies can leverage this information. What would you say?

In web terms, when a user visits a website, he is following a track or path via clicks. These clicks are recorded and saved in web server log files. A clickstream is an application that records the location on the screen in which a user clicks while browsing the Internet. These data, if dumped into a database, can be "mined" for information, which could lead to the discovery of user intent. This user intent could then be used for targeted marketing. Identifying which pages are visited, how long a user stays on a page, and which pages are subsequently visited is also helpful, as is abandoned cart information.

Which of the following is an effective use of predictive analysis? An insurance company is analyzing medical records to determine malpractice rates. A local restaurant has decided to open another location across town. A fitness center had a decrease in profits last year, so they are increasing customer membership fees

Insurance companies rely on predictive analysis to determine how to charge customers.

Data mining is sometimes referred to as ___________. deep dive knowledge discovery big data

Knowledge discovery is data mining or extracting useful information from data.

Predictive analytics of multidimensional data can lead to better decisions and lower costs. What is a multidimensional database? An online transaction processing database A relational database that is structured as a cube A flat file database

Multidimensional databases are typically relational databases that store data in a cube, which contains aggregated data related to all dimensions and allow businesses to visualize data assets in multidimensions to see what is happening in real time.

is a process or practice to eliminate data redundancy

Normalization eliminates data redundancy.

Which of the following is considered an efficient database backup strategy? Reducing volume of data for regular backups Distributing database Backing up entire database

Not all data in a database changes all the time, such as, historical data. Therefore it is a good practice to reduce the data that is backed up on a regular basis.

When designing a database, you must consider its use or function. One type of database is OLAP. What does it stand for and what is it used for? OLAP stands for online analytical processing. The records are transactional and manipulated by users. OLAP stands for online analytical processing and are used for decision making and can be modified or deleted by users. OLAP stands for online analytical processing and are databases used to assist decision making.

OLAP stands for online analytical processing and are databases used to assist decision making. OLAP are analytical, fixed databases containing data that are used for making decisions.

What kind of database management system enables businesses to create new records and update and delete records, providing real time information for decision making? Online transaction processing Object oriented Hierarchical Flat file

OLTP stands for "online transaction processing" and is a database design used for browsing and manipulating business transaction activity to enable real time business analysis of records to help in business decision making.An information system used for data entry

True or False. Relational database systems give organizations the ability to analyze data to make better and faster managerial decisions

One key benefit to databases is that they provide businesses an ability to perform data analytics.Hint: In a database, data are stored once.

Which of the following examples represents a one-to-one type of relationship between entities? Each person has a social security number. A customer orders three videos. College classes have students.

One social security number is assigned to only one person; each person only has one social security number. Hint: One-to-many means one instance of the first entity can relate to several instances of the second entity.

Relational database systems are valuable to organizations because ________. they use SQL, which is easy to master, making it a very productive tool they are inexpensive systems, which make it more affordable for businesses their design and implementation requirements are minimal

Organizations are easily trained in SQL, and therefore, users become very productive in a short period of time.Hint: Relational databases allow for complex queries to be carried out easily.

When considering data mining and its benefits to business, what is required for an enterprise today to be competitive and gain business intelligence from day-to-day operations? Perform analytics on enterprise data stored in data warehouses. Hire consultants to provide business intelligence. Implement the office of chief business intelligence officer.

Perform analytics on enterprise data stored in data warehouses. A large business or enterprise must implement a data warehouse application and perform data mining or analytics to gain business intelligence.

is the process of establishing relationships between tables using keys.

Referential integrity establishes relationships between objects in different tables.

As a database administrator, you must avoid inconsistencies, which cause errors and integrity issues. Database inconsistencies are avoided by implementing what concept? Referential integrity Flowcharting Planning

Referential integrity stops database inconsistency by creating relationships between tables using primary and foreign keys.

If you were setting up a new database, which of the following items would represent "objects" in your database? Schema Reports Mirror

Reports are objects in database. A schema is the definition of the database. A mirror is a backup or distributed database snapshot, not an object.

Which item below is NOT a database model? Object Structured Relational

Structured is not a database model. Relational and object databases are two types of database models.

What term is used to describe the properties of data components? Relationships Entities Attributes

The attributes are the properties of the data components. The relationship describes how the entities are related. The attributes are the properties of the data components. Entities are the actual data objects. The attributes are the properties of the data components

We hear about "Business Intelligence," "Big Data," "Analytics," and "Data Mining" often. The business community has been analyzing data—operational data, analytical data, financial data, market data—for many years. Describe the process of data mining.

The business world has been analyzing operations and markets to increase profitability, that is, to increase revenues or reduce cost. The foundation of any analysis is the collection of large amounts of data or facts. When that data are analyzed or mined, patterns and relationships evolve, the analysis of which leads to information. This information can lead to the discovery of trends or insights, which can be leveraged by business.

One goal of a database administrator is to have a database that has the same entities and values in one or more tables. The database should go through what process to optimize? Normalization Equalization Duplication

The process of normalization eliminates first-pass design redundancies that could create query and reporting problems in a database.Creating objects and identifying relationships in a database

There are many types of data warehouses with many different applications. Which of the following item is not a data warehouse application? Transaction processing Information processing Data mining Analytical processing

The data in a data warehouse system comes from business transactions and other sources. Transaction processing involves running fundamental tasks used to run the business in an operational database. Data warehouses allow data to be stored and processed using queries and statistical analysis. Data mining is a data warehouse application that finds hidden patterns and associations to help with future predictions. Data warehouses support analytical processing. The data can be analyzed using OLAP.

Why does the ETL process run in parallel and not in sequential order? The ETL process to update a data warehouse runs in parallel because the extraction step is lengthy. The ETL process to update a data warehouse runs in parallel due to multicore CPU. The ETL process to update a data warehouse runs in parallel to minimize disruption.

The extraction process in ETL can be lengthy, so data already extracted can be processed and loaded.

Big data is in the news a lot, specifically when it involves issues of personal privacy. Big data is a collection of an enormous amount of unrelated, raw data that is complex and difficult to work with using traditional database management tools. As a result, big data is usually associated with predictive analytics. Describe the objective of predictive analytics.

The main objective of predictive analytics is to offer statistical or probability trends that can influence business practices or business strategic decision making. Large businesses must create predictive models to ensure a competitive advantage. This requires the collection of large amounts of data along with data mining techniques, such as predictive analytics

Relational database models were first developed in the 1970s. It has become the most widely used database model in business with leading products from Oracle, Sybase, and others, including Microsoft. Can you explain why this type of database model has become so widely used and how you might use it in your business?

The reason why relational database models are widely used in business applications is because of their ability to support transactions. They store data in tabular form; they use SQL, which offers ease of use; they have data relationships between tables; and there is no data redundancy. They dominate the market because of data re-use, normalization, separation of concerns and maturity, and momentum.

Relational database models were first developed in the 1970s. It has become the most widely used database model in business. Which one of the following traits is NOT one of the benefits leading to widespread use of relational database? Tabular format Use SQL language Tree-like structure

Tree-like structure is a hierarchical database characteristic, not a relational database characteristic. SQL language is one reason why relational databases are common because of its easy to use. Tabular format is one reason why relational databases are so common.

"Big data" has become a term that is all around us, and so is the term "the cloud." When we hear the term "big data,''' we also hear business intelligence, analytics, trend analysis, discoveries, etc. Please describe what is meant by big data and what big data offers to the business community.

big data can be thought of as a very large depository or distributed data set that is too complex or too big for standard computers and applications to handle for analysis or processing. The business community can gain insights from analyzing big data. The gained insight that can lead to exploitation of new trends, new innovations or discoveries, or new predictions that can be leverage in business strategies.

There are many data management challenges created by big data. Which of the following items is NOT one of those challenges? Cost Speed Variety Size

cost-Big data technologies and cloud-based analytics can provide substantial cost advantages.

The decision to implement a relational database system depends mostly on two factors. The first factor to consider is the -Select- . The second factor to consider when implementing a relational database is -Select-

data-The amount of information in the form of data is one key determinant is choosing to implement a relational database. users-The second determinant is the number of users of the information in an organization.

It is critical to build a data integrity check algorithm to ensure the quality of data in a database before the Hint, displayed below -Select- phase in the ETL process.

load-Data from the various sources that feed the data warehouse database will cause data integrity and quality issues if not checked before the load phase in the ETL process.

Because data warehouses consist of data from -Select- systems with different formats, the ETL process must be carefully designed to ensure data -Select- during the load process

several-Data warehouses contain data sets from several databases or application systems. intergrity-Data integrity means data is accurate and consistent.


Conjuntos de estudio relacionados

Nutrition Ch 6: Protein Study Guide, NHCC

View Set

Fresenius ~ Water Treatment Study Guide

View Set

Lección 7: Estructura: 7.3 y Estructura: 7.4

View Set

ServiceNow Cloud Management Certification-Orlando

View Set

PHYSICS 1320 Palomino Exams for Final

View Set