D075 - Information Technology - Unit 3 Modules 4 - 6

Ace your homework & exams now with Quizwiz!

A data analyst is using the ETL process to enter data into the company's relational database. The data contain many redundancies. Which process transforms the data into an accurate, clean, and error-free form?

Normalization

The data are full of missing, misplaced, or duplicate data, which the data analyst needs to remove. Which process can this data analyst use to remove such data?

Normalization

Freedom Rock Bicycles found that it has multiple records due to multiple sales to the same individual. This has caused access problems when looking for specific customers. What can Freedom Rock Bicycles do to alleviate this problem?

Normalize the database

Load

Once data are transformed, they are ready to finally be transferred into the data warehouse and data mart. The more often this is done, the more up-to-date analytic reports can be. The timing and frequency of the load must be known by the managers so that they know how up to date their data are.

Extract

Once you have determined where your data is coming from and where you want it to reside, you can start extracting. The data are often extracted from customer relationship management (CRM or enterprise resource planning (ERP) systems.

Transform

Once you have extracted data, they need to be transformed to fit into the database table. This may involve removing decimals and dollar signs from financial transactions so it will fit into the structured data table.

Freedom Rock Bicycles has installed a new database to keep track of sales by region, salesperson, product, and sales prices, among other items, to keep information on business processes. After installation, Freedom Rock Bicycles discovered additional benefits of the information accumulated in the database and how the information can improve the business profitability and future plans. Freedom Rock Bicycles found that daily reports could be run using SQL to indicate product and parts inventory and sales for the day. Which approach can be used to retrieve this information efficiently?

Online analytical processing (OLAP)

What is "C2:F2" called in the formula =SUM(C2:F2)?

Range

Database Management Systems (DBMS)

Software systems where data are stored in computer files called tables, and tables are connected to other tables with related information—hence, a relational database. Within a business, databases are created and made available to multiple users on the network and are secured to ensure the accuracy of the data and to prevent access by unauthorized users.

Normalization

Sorting through structured, unstructured, and semi-structured data, organizing the data into the fields and records of a relational database, and removing duplicate data completes the normalization process. Normalization increases the reliability and consistency of the data. Once all the data are collected, scrubbed, and normalized, the data can then be analyzed. These are all important functions of master data management and data governance, which help ensure that an organization's data retains its value to an organization.

What level of security protects the hardware that the database resides on?

System level security

Which level of security is required to protect the hardware and communications equipment that support a database?

System-level security

A large clothing retailer has an extremely active social media account and hundreds of thousands of followers who create millions of comments per year. Once a quarter, the marketing department offers a major discount to its followers on social media. The company wants to know the return on investment (ROI) of the discounts and store the information in its data warehouse. A data analyst in the IT department is given the task of extracting, processing, and analyzing the data. The data are currently unstructured. The data will be used to create information that will be given to the CIO to inform future decisions. A business analyst wants to use the social media data to create and present business intelligence. She will create visualizations that will be used by the executive team. Which tool is appropriate for creating and presenting this business intelligence?

Tableau

A data analyst wants to search through unstructured data from social media posts to look for useful customer behavior patterns and sentiments. Which type of analytics is appropriate for this task?

Text analytics

$C3

The column reference ($C) is absolute and will remain constant when copied and pasted to other cells.

C$3

The column reference (C) is relative and will change when copied and pasted to cells in other columns of the worksheet.

Veracity

The data must also be of high quality and trustworthy. Do the data represent what you believe they should? Are there discrepancies within the data that must be scrubbed to make the data worthwhile and valuable?

Volume

The main characteristic of big data is that it is big, and the remarkable volume it takes just to hold and manage digital data requires significant resources. The total volume of big data is growing exponentially because the sources that are producing big data are ever increasing.

Schema

The organization or layout of a database that defines the tables, fields and constraints, keys, and integrity of the database. It is the reference or blueprint of the database.

Velocity

This is the accelerating speed of the data being produced over a given time period. Streaming applications such as Amazon Web Services and Netflix are examples of a good velocity of data, as is the data your cell phone generates each minute it is turned on.

Record

Row in a table

Big Data

vast amounts of data that organizations want are not in readily available, neat, and tidy database tables. Data come from everywhere—including smartphone metadata, internet usage records, social media activity, computer usage records, and countless other data sources—to be sifted for patterns and trends. These large and expansive collected data sets

Field

A column in a table.

Relational Database

A group of database tables that is connected or linked by a defined relationship that ties the information together.

logical_test

A test or logical comparison of value that is either TRUE or FALSE.

In the following "IF" statement, what is "C2>B2" called? =IF(C2>B2,"Over Budget","Under Budget")

Argument

Why is it important to conduct data hygiene practices?

Because data become decayed and outdated

C3

Both the column and row references are relative and will change when the reference is copied and pasted to other cells.

$C$3

Both the column and the row references are absolute and will remain constant when the reference is copied and pasted to other cells

An analyst uses software to analyze data in the company's data warehouse and produce information presented in understandable charts and graphs on a dashboard. This information is used to inform decisions in the organization. Which software is used to conduct this data mining and discovery for presentation of the information?

Business intelligence software

A large clothing retailer has an extremely active social media account and hundreds of thousands of followers who create millions of comments per year. Once a quarter, the marketing department offers a major discount to its followers on social media. The company wants to know the return on investment (ROI) of the discounts and store the information in its data warehouse. A data analyst in the IT department is given the task of extracting, processing, and analyzing the data. The data are currently unstructured. The data will be used to create information that will be given to the CIO to inform future decisions. A company's data warehouse is time consuming to manage. During a discovery, the data analyst determines specific departments only used 20 percent of the data warehouse capacity. How can this company reduce the time associated with data management and better support the needs of individual departments?

By switching to a data mart

A large clothing retailer has an extremely active social media account and hundreds of thousands of followers who create millions of comments per year. Once a quarter, the marketing department offers a major discount to its followers on social media. The company wants to know the return on investment (ROI) of the discounts and store the information in its data warehouse. A data analyst in the IT department is given the task of extracting, processing, and analyzing the data. The data are currently unstructured. The data will be used to create information that will be given to the CIO to inform future decisions. A data analyst wants to use software to look for useful patterns and hidden relationships in this large set of social media data. Which process can be used to look for these patterns and relationships?

Data Mining

Variety

Data come both from structured and unstructured areas and in various forms. Structured data are data that you can easily recognize and this information easily fits into a relational database. But there are also the unstructured data that come from more fragmented sources; it contains valuable information but does not fit a form.

Which term refers to managing the availability, integrity, and security of an organization's data to ensure that the data remain high quality and valid?

Data governance

data hygiene process

Data governance, maintaining clean and updated data.Having clean data starts when the database is created by including database field (column) controls called validity checks.

Which term refers to looking for patterns, trends, and relationships between data so organizations can try to determine connections and outliers?

Data mining

A large clothing retailer has an extremely active social media account and hundreds of thousands of followers who create millions of comments per year. Once a quarter, the marketing department offers a major discount to its followers on social media. The company wants to know the return on investment (ROI) of the discounts and store the information in its data warehouse. A data analyst in the IT department is given the task of extracting, processing, and analyzing the data. The data are currently unstructured. The data will be used to create information that will be given to the CIO to inform future decisions. What can the data analyst use to retrieve and process the data to put into the data warehouse?

ETL

Which restriction applies to the data in the primary field of a database?

Each key must be unique

1. Vast amount of data that are collected 2. Determining what is meaningful and what is not 3. Data from multiple sources 4. Accessibility 5. Budgets 6. Lack of Skills 7. Presentation of information 8. Security and Privacy

Eight challenges of data analytics and process management

ETL

Extract, Transform, Load

True or False: Written posts extracted from a social media feed are generally classified as structured data. True False

False

A database manager is setting up a relational database. The manager creates a field in a database table that provides a link between two tables in a relational database. What is this field called?

Foreign key

The data analyst at Freedom Rock Bicycles wants to link two tables in the database. What is the name of the field in a database table that is used to connect or link two tables together?

Foreign key

Excel Arguments

Inside the parentheses of an function

What are examples of return on investment (ROI) based on qualitative investments?

Investing in network security to protect the network; Purchasing business intelligence (BI) software

Freedom Rock Bicycles is considering adopting a cloud-database system. What is the primary advantage of a cloud database?

It can be accessed from anywhere there is an internet connection

Which valuable business intelligence (BI) can Freedom Rock Bicycles acquire from the database that directly relates to overall revenue and profitability based on this scenario?

Knowing which products sell the best in each region

Data level security

This is where businesses implement processes to protect the actual data from getting stolen or tampered with in the database computers. One method of securing the information is database encryption (encrypting the data so that only those with authorized access can know how to unencrypt).

System level security

This means protecting the hardware that the database resides on and other communications equipment from malicious software that tries to enter the system. Firewalls and other network security systems prevent unauthorized intrusion into the system. Corporations fear a ransomware attack where their systems are invaded and ransom software is installed on their database systems so that it cannot be accessed unless a ransom is paid and a key is provided by the malicious person.

[value_if_true]

Used to specify the result of the IF function if the Boolean expression result is TRUE.

User level security

User level security starts with log-on IDs and passwords but can go much further in verification to restrict the user from visiting unauthorized websites or downloading from untrusted sources. Network administrators, along with corporate policies, define who has access to what, and what type of access that can be (read, write, etc.).

Which attribute of big data relates to whether the data are structured or unstructured?

Variety

data governance

concerned with data policies, data procedures, and standards to govern business-critical data. Basically, that means securing data so that they remain confidential and safe

Conditional Functions

allow you to perform calculations where the cell references used to complete the calculations depend on the values of other cells in a worksheet. The following table shows a summary of useful conditional functions.

Hadoop

an infrastructure for storing and processing large sets of data across multiple servers. Instead of centralized files in one place like a data warehouse or data mart, Hadoop uses a distributed file system that allows files to be stored on multiple servers. Unlike storing centralized data, Hadoop attempts to identify data files on other servers. Hadoop is flexible enough that it allows for one query to be issued that searches through multiple servers. Typically, Hadoop needs a highly qualified data scientist to run it. Right now, Hadoop is best for large companies such as Uber, Airbnb, and Spotify that create terabytes and petabytes of data every day.

SQL (Structured Query Language)

an international standard language for processing a database

Data Analysis

applying statistics and logic techniques to define, illustrate, and evaluate data. Simply stated, data analysis attempts to make sense of an organization's collected data, turn those data into useful information, and validate the organization's future decisions (like what product to sell or whom the organization should hire).

Business analytics

attempts to make connections between data so organizations can try to predict future trends that may give them a competitive advantage. Business analytics can also uncover computer system inadequacies within an organization. The following are forms of business analytics

Predictive analytics

attempts to reveal future patterns in a marketplace, essentially trying to predict the future by looking for data correlations between one thing and any other things that pertain to it.

Decision analytics

builds on predictive analysis to make decisions about future industries and marketplaces. Decision analytics looks at an organization's internal data, analyzes external conditions like supply abundance, and then endorses the best course of action.

data management (DM)

consists of the practices, architectural techniques, and tools for achieving consistent access to and delivery of data across the spectrum of data subject areas and data structure types in the enterprise.

The Data Management Process

defined as acquiring data, making sure the data are valid, and then storing and processing the data into usable information for a business.

Topic analytics

enables you to sift through large sets of data and identify the most common and most important topics in an easy, fast, and scalable way. For example, if a customer said, "the barista was friendly," that would be categorized under the topic "Employee Friendliness."

Online Analytical Processing (OLAP)

end users can submit queries against the database to gain insight into data relations such as trend analysis and also to create data models that guide future decisions. A common use of OLAP is the creation of what-if scenarios for budgeting and forecasting.

Summary Statistics

give a quick overall picture of the data.

Master Data Management (MDM)

is a methodology or process used to define, organize, and manage all the data of an organization that provides a reference for decision-making. Master data management tools can be used to support master data management by removing duplicates, standardizing data, and incorporating rules to prevent incorrect data from entering the system, thus creating an accurate source of master data.

Business intelligence

is a set of software and services that turn data into information that helps leaders in an organization make wise decisions. Very often the data in a data warehouse or data mart contains insights that are not easily discernible without advanced computing power and tools. Refers to an assortment of software applications used to analyze an organization's raw data. Can be described as computer applications that change data into significant, meaningful information that helps organizations make better decisions. Keep in mind that data are raw, unorganized facts, and information is essentially processed data that have meaning.

Descriptive analytics

is the baseline that other types of analytics are built on. Descriptive analytics defines past data you already have that can be grouped into significant pieces, like a department's sales results, and starts to reveal trends. This is categorizing the information.

Data mining

is the examination of huge sets of data to find patterns and connections and identify outliers. Data mining also provides insight into relationships that the user may not recognize but that are useful as information

Data Mining (Data Discovery)

is the examination of huge sets of data to find patterns, connections, outliers, and hidden relationships. Data mining is a business intelligence tool used for decision making.

Data Governance

managing the availability, integrity, and security of the data to ensure that the data remain high quality and valid for data analytics. Policies and procedures are established that define the data governance program, such as who has access, who has update capabilities, when and how backups are made and stored, and who administers the policies to ensure that they are followed.

Text analytics

the process of extracting information from written sources such as websites, e-books, and emails and inserting the data into a database to evaluate and interpret relevance or to understand customers' feedback on products and services.

Predictive Analytics

uses both new and historical data to forecast activity, behavior, and trends. This type of business intelligence involves applying statistical analysis techniques along with business knowledge to data to create predictive models that place a numerical value (or score) on the likelihood of a particular event happening.


Related study sets

CSC440 Chapter 5: System Modeling (Software Engineering, Sommerville, 10th Edition)

View Set

functional foods for health exam 2

View Set

Health Promotion of Newborn to One year chapter 21

View Set

Unit 1: Chapter 2 Mini-Lesson Questions

View Set

Chapter 49: Assessment and Management of Patients With Hepatic Disorders 3

View Set

Chapter 4: What have you learned?

View Set