QMB 3206 Ch 2 SB

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Data '_______' is a process that an organization uses to acquire, organize, store, manipulate, and distribute data

Management

T/F: A relational database consists of one or more logically related data files, where each data file is a two-dimensional grid that consists of rows and columns.

True

T/F: Data in a data mart are organized using a multidimensional data model called a star schema, which includes dimension and fact tables.

True

T/F: In a business setting, we might use a 1:1 relationship to describe a situation where each department can have only one manager and each manager can only manage one department. True

True

Oftentimes, a categorical variable is defined by more than two categories. For example, the mode of transportation used to commute may be described by three categories: Public Transportation, Driving Alone, and Car Pooling. Given k categories of a variable, the general rule is to create how many dummy variables?

k-1

List the following steps in order they are performed to 'bin' customers into equal groups using R. Import the customer data into a data frame and label it myData With the Customers worksheet active, choose Data Mining > Transform > Transform Continuous Dada > Bin We use the cut function to bin the data. The breaks argument of the cut function specifies the ranges of the bins We now create 5 equal-sized bins for DaysSinceLastReverse (recency), NumOfOrders (frequency), and Spending2018 (monetary)

1. Import the customer data into a data frame and label it myData 2. We now create 5 equal-sized bins for DaysSinceLastReverse (recency), NumOfOrders (frequency), and Spending2018 (monetary) 3. With the Customers worksheet active, choose Data Mining > Transform > Transform Continuous Dada > Bin 4. We use the cut function to bin the data. The breaks argument of the cut function specifies the ranges of the bins

List the following steps in order they are performed to 'bin' customers into equal groups using Analytic Solver.

1. Open the Customer data file 2. Choose Data Mining > Transform > Transform Continuous Data > Bin Select data range $A$1:$O$201. Check the box Variable names in the first row 3. Change #bins for variable to 5. Choose Equal count for the Bins to be made with option

According to interviews and expert estimates, analytics professionals spend from __________ of their time in the mundane task of collecting and preparing unruly data, before analytics can be applied (The New York Times ,August 17, 2014).

50-80%

Examples of common necessary mathematical data transformations include: A company might convert sales into happy customers or sad customers A retail company might convert customers' birth dates into ages Transformation of date values is often performed to help bring useful information out of the data In order to analyze trend, we often transform raw data values into Percentages

A retail company might convert customers' birth dates into ages Transformation of date values is often performed to help bring useful information out of the data In order to analyze trend, we often transform raw data values into Percentages

Which of the following are reasons for data professionals to learn data wrangling skills? Analytics professionals need broader skill sets than data mining techniques Analytics professionals are superior to all other IT professionals Organizations will be able to make decisions more rapidly Analytics professionals can no longer rely on the IT department to provide data

Analytics professionals need broader skill sets than data mining techniques Organizations will be able to make decisions more rapidly Analytics professionals can no longer rely on the IT department to provide data

In addition to __________, another common approach is to create new variables through mathematical transformations of existing variables.

Binning

_______________ is the process of transforming numerical variables into categorical variables by grouping the numerical values into a small number of groups

Binning

Which of the following statements about 'binning' is accurate? Bins must be consecutive Binning reduces the noise in the data Bins must be overlapping Bins must have equal intervals

Bins must be consecutive Binning reduces the noise in the data

Examples of transforming numerical data include transforming: Combining height and weight to create body mass index Calculating Percentages There is no need to transform data Individual's date of birth to age

Combining height and weight to create body mass index Calculating Percentages Individual's date of birth to age

A __________ variable, also referred to as an indicator or a binary variable, is commonly used to describe two categories of a variable.

Dummy

Recall Organic Food Superstore from the introductory case; In that case, an Entity Relationship Diagram (ERD) for the store illustrates three entities:CUSTOMER, ORDER, and PRODUCT. The relationship between CUSTOMER and ORDER entities is 1:M because:

Each order can only belong to one customer

An entity-relationship diagram (ERD) is a graphical representation used to illustrate the structure of the data. An '______' is a generalized category to represent persons, places, things, or events about which we want to store data in a database table. A single occurrence of an entity is called an '______'

Entity Instance

Recall that we use nominal and ordinal measurement scales to represent categorical variables. Which of the examples below represent a nominal scale representation of a categorical variable?

Marital status (single, married, widowed, divorced, separated)

There are two common strategies for dealing with missing values: __________ and ___________

Omission and Imputation

Which of the following is true of a data warehouse? One of its primary purposes is to support decision making Data in a data warehouse are usually organized around subjects such as sales, customers, or products that are relevant to business decision making It can be designed to support the marketing department for analyzing customer behaviors , and it contains only the data relevant to such analyses It is a small-scale data warehouse or a subset of the enterprise data ware - house that focuses on one particular subject or decision area.

One of its primary purposes is to support decision making Data in a data warehouse are usually organized around subjects such as sales, customers, or products that are relevant to business decision making

An effective strategy for dealing with these issues is category reduction, where we collapse some of the categories to create fewer nonoverlapping categories. The first guideline states that categories with very few observations may be combined to create the ___________ category

Other

Finally, another common transformation of categorical variables is to create category __________

scores

Another common transformation for numerical data is ____________ which is performed when the variables in a data set are measured using different scales.

Rescaling

Which of the following are reasons for missing values in data? Respondents decline to provide the information due to its sensitive nature Some of the questions do not apply to every respondent There are never missing values in data Respondents always provide all the requested information

Respondents decline to provide the information due to its sensitive nature Some of the questions do not apply to every respondent

The most popular query language used today is _________. This popular query language is used for manipulating data in a relational database using relatively simple and intuitive commands.

SQL

________ data also allows us to review the range of values for each variable.

Sorting

Which of the following are the very first tasks most data analysts perform to gain a better understanding and insights into the data? Sorting the data Copying the data Counting the data Visually reviewing

Sorting the data Counting the data Visually reviewing

Which of the following is NOT a correct statement about entity-relationship diagram (ERD) attributes? An entity is a generalized category to represent persons, places, things, or events. A foreign key is the primary key of a related entity. A primary key is an attribute that uniquely identifies each instance of an entity The relationships between entities can only be one-to-many

The relationships between entities can only be one-to-many

Sometimes nominal or ordinal variables come with too many categories. This presents a number of potential problems. Which of the following are potential problems highlighted in the text? Since collecting the data is simpler, categorical data never creates difficulties in data analysis Variables with too many categories pull down model performance Categories should be overlapping to complicate the analysis If a variable has some categories that rarely occur, it is difficult to capture the impact of these categories accurately

Variables with too many categories pull down model performance If a variable has some categories that rarely occur, it is difficult to capture the impact of these categories accurately

The basic structure of a SQL statement is relatively simple and usually consists of three keywords: Which of the following is a SQL keyword? Select all that apply! Where From Select Choose

Where From Select

In customer satisfaction surveys, we often use ordinal scales, such as very dissatisfied, somewhat dissatisfied, neutral, somewhat satisfied, and very satisfied, to indicate the level of satisfaction. In such cases, we can recode the categories using numbers 1 through 5, with 1 being very dissatisfied and 5 being very satisfied. This transformation allows the categorical variable to be treated as a _________ variable in certain analytical models.

numerical

Data ________ is the data conversion process from one format or structure to another

transformation


Ensembles d'études connexes

Sociology Chapter 8, Sociology Chapter 9, Sociology Chapter 10, Sociology Chapter 11, Sociology Chapter 13, Sociology Chapter 14

View Set

LS 10: Some Lessons from Capital Market History

View Set

Musculoskeletal chapter 54 w/ ppt included

View Set

Psych 227 March 27th: The (Sometimes) Challenging/Negative Aspects of Human Sexuality (Part 2)

View Set

Physics final exam multiple choice

View Set