Test 1 Data Analytics

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

10. Data analytics professionals estimate that they spend between __________ of their time cleaning data so it can be analyzed. A. 50 percent and 90 percent B. 10 percent and 20 percent C. 20 percent and 50 percent D. 70 percent and 95 percent

A. 50 percent and 90 percent

19. Which of the following best describes the profiling approach to data analytics? A. An attempt to characterize the typical behavior of an individual, group or population by generating summary statistics about the data. B. An attempt to predict a relationship between two data items. C. An attempt to reduce the amount of information that needs to be considered to focus on the most critical items. D. An attempt to discover associations between individuals based on transactions involving them.

A. An attempt to characterize the typical behavior of an individual, group or population by generating summary statistics about the data.

16. Which of the following best describes the regression approach to data analytics? A. An attempt to estimate or predict, for each unit, the numerical value of some variable using some type of statistical model. B. An attempt to predict a relationship between two data items. C. An attempt to divide individuals into groups in a useful or meaningful way. D. An attempt to discover associations between individuals based on transactions involving them.

A. An attempt to estimate or predict, for each unit, the numerical value of some variable using some type of statistical model.

73. Which approach to data analytics attempts to assign each unit in a population into a small set of categories? A. Classification B. Regression C. Similarity matching D. Co-occurrence grouping

A. Classification

74. Which approach to data analytics attempts to divide individuals into groups in a useful or meaningful way? A. Clustering B. Data reduction C. Similarity matching D. Co-occurrence grouping

A. Clustering

80. Short surveys regarding dining preferences, sometimes requested at the bottom of a restaurant bill, are an attempt to collect data that will facilitate which data approach? A. Clustering B. Regression C. Similarity matching D. Link prediction

A. Clustering

84. Retail stores often request customers' zip codes at the end of a sales transaction. This is an example of which data approach? A. Clustering B. Regression C. Similarity matching D. Classification

A. Clustering

86. Data profiling is used to assess data quality and internal controls and typically involves the following steps except: A. Filter the results. B. Identify the objects or activity you want to profile. C. Determine the types of profiling you want to perform. D. Set boundaries or thresholds for the activity.

A. Filter the results.

61. Which of the following describes a means of maintaining all of your data in one place, instead of across different related tables? A. Flat File B. Microsoft Access C. SQL lite D. Microsoft SQL Server

A. Flat File

66. What is one of Excel's tools is roughly equivalent to the VLOOKUP function? A. INDEX/MATCH B. SELECT C. LINK D. LINKUP

A. INDEX/MATCH

85. Data reduction typically involves the following steps except: A. Identify the attribute you would like to reduce to focus on more relevant fields. B. Identify the parameters of the model. C. Filter the results. D. Interpret the results.

A. Identify the attribute you would like to reduce to focus on more relevant fields.

99. Understanding and predicting warranty expense is an important determination for manufacturing firms. When using historical claims data to estimate the current period's warranty expense, the historical claims data represents which of the following: A. Independent variable B. Dependent variable C. Function D. Statistical Model

A. Independent variable

78. Which approach to data analytics attempts to forecast a relationship between two data items? A. Link prediction B. Regression C. Similarity matching D. Co-occurrence grouping

A. Link prediction

39. What would be the best primary key for the LendingClub rejected loan dataset? A. Loan application number B. Zip code C. Customer number D. Loan number

A. Loan application number

97. Which of the following best describes a dependent variable? A. Output B. Input C. Application D. Operation

A. Output

94. In general, the more complex the model, the greater the chance of __________. A. Overfitting the data B. Underfitting the data C. Pruning the data D. The need to reduce the amount of data considered

A. Overfitting the data

81. What is the analytics type for procedures that explore the current data to determine why an outcome has happened? A. Prescriptive B. Predictive C. Diagnostic D. Descriptive

A. Prescriptive

83. The decision support systems approach is most associated with the__________ analytics type. A. Prescriptive B. Predictive C. Diagnostic D. Descriptive

A. Prescriptive

92. Regression might be used to predict an interest rate given to borrower by the lender based on certain risk characteristics. In this regression, potential independent variables would be which of the following? A. Risk characteristics. B. Interest rate. C. Demographic characteristics. D. Loan acceptance.

A. Risk characteristics.

22. Which approach to data analytics attempts to identify similar individuals based on data known about them? A. Similarity matching. B. Clustering. C. Co-occurrence grouping. D. Link prediction.

A. Similarity matching.

56. At which step of the ETL process should you try to answer the question "What business problem will the data address?" A. Step 1: Determine the purpose and scope of the data request. B. Step 2: Obtain the data. C. Step 3 or 4: Transformation. D. Step 5: Loading the data for data analysis.

A. Step 1: Determine the purpose and scope of the data request.

63. What is the purpose of the Audit Data Standards? A. To provide a guide to standardize audit data requests B. To increase the cost of audits C. To help auditors learn SQL code D. To create standardized data storage systems

A. To provide a guide to standardize audit data requests

50. Which of the following best describes the purpose of a primary key? A. To uniquely identify each row in a table. B. To create the relationship between two tables. C. To provide business information, but are not required to build a database. D. To support business processes across the organization.

A. To uniquely identify each row in a table.

41. The objective of data transformation is: A. To validate the data for completeness and integrity B. To load the data into the appropriate tool for analysis C. To identify and obtain the data from the appropriate source D. To identify which approach to data analytics should be used

A. To validate the data for completeness and integrity

60. Comparing the number of records that were extracted to the number of records in the source database is an example of which ETL step. A. Validating the Data B. Obtaining the Data C. Cleaning the Data D. Loading the Data for Analysis

A. Validating the Data

46. Removing leading zeroes and non-printable characters from the data is an example of which of the following? A. Validating the data for completeness B. Validating the data for integrity C. Cleaning the data D. Obtaining the data

A. Validating the data for completeness

47. Comparing the number of records within the data is an example of which of the following? A. Validating the data for completeness B. Validating the data for integrity C. Cleaning the data D. Obtaining the data

A. Validating the data for completeness

64. There are many times when using SQL is the best option for extracting data, but sometimes it is not preferred. Which of the following is an example of when SQL would NOT be a preferred method of data extraction? A. When the data is already stored in Excel B. When the data is stored across different tables in a relational database C. When the data in the table you wish to analyze is too large for Excel's resources D. When you wish to extract precise attributes and records that fit your criteria

A. When the data is already stored in Excel

69. The global standard for exchanging financial reporting information that uses XML is called__________. A. XBRL B. XFRL C. Yahoo! Finance D. Bloomberg

A. XBRL

34. According to the text, as the debt-to-income ratio increases, there is __________ chance of a loan getting rejected by the bank. A. a greater B. a lesser C. no effect on the

A. a greater

27. If we are predicting which companies go bankrupt, bankruptcy would be the A. dependent variable B. independent variable C. explanatory variable D. classification variable

A. dependent variable

29. If a bank uses credit risk score to determine who will receive a loan, the variable predicting who will receive a loan would be considered the: A. dependent variable B. independent variable C. determinant variable D. classification variable

A. dependent variable

70. The acronym XBRL stands for__________. A. eXtensible Business Reporting Language B. eXtensive Business Reporting Language C. eXtensive Business Reporting Lingo D. eXtensive Business Reporting Language

A. eXtensible Business Reporting Language

2. Patterns discovered from __________ enable businesses to identify opportunities and risks and better plan for __________. A. past archives; the future B. current data; the future C. current data; today D. past archives; today

A. past archives; the future

26. One of the most important aspects of data analytics that impacts tax is: A. predictive analytics. B. co-occurrence grouping. C. similarity matching. D. data quality.

A. predictive analytics.

5. Which of the following best describes the goal of data quality: A. recognize what is meant by data quality, be it completeness, reliability or validity B. perform basic analysis to understand the quality of the underlying data and its ability to address the business question C. demonstrate ability to sort, rearrange, merge, and reconfigure data in a manner that allows enhanced analysis D. comprehend the process needed to clean and prepare the data before analysis

A. recognize what is meant by data quality, be it completeness, reliability or validity

7. Which of the following best describes the goal of developing an analytics mindset: A. recognize when and how data analytics can address business questions B. perform basic analysis to understand the quality of the underlying data and its ability to address the business question C. recognize what is meant by data quality, be it completeness, reliability or validity D. comprehend the process needed to clean and prepare the data before analysis

A. recognize when and how data analytics can address business questions

30. The 4V's of Big Data include all but the following: A. volatility B. variety C. velocity D. veracity

A. volatility

75. The__________ statistical concept that assigns a value to a number based on how many standard deviations it stands from the mean? A. z-score B. F-score C. q-score D. median

A. z-score

33. According to PwC's 18th Annual Global CEO survey, __________ percent of chief executive officers put a high value on data analytics. A. 95 B. 86 C. 55 D. 35

B. 86

15. Which of the following best describes the similarity matching approach to data analytics? A. An attempt to assign each unit (or individual) in a population into a few categories. B. An attempt to identify similar individuals based on data known about them. C. An attempt to divide individuals into groups in a useful or meaningful way. D. An attempt to discover associations between individuals based on transactions involving them.

B. An attempt to identify similar individuals based on data known about them.

18. Which of the following best describes the link prediction approach to data analytics? A. An attempt to characterize the typical behavior of an individual, group or population by generating summary statistics about the data. B. An attempt to predict a relationship between two data items. C. An attempt to reduce the amount of information that needs to be considered to focus on the most critical items. D. An attempt to discover associations between individuals based on transactions involving them.

B. An attempt to predict a relationship between two data items.

24. Which approach to data analytics attempts to divide individuals into groups in a useful or meaningful way? A. Similarity matching. B. Clustering. C. Co-occurrence grouping. D. Link prediction.

B. Clustering.

68. All of the following are examples of a supervised approach to evaluation data except: A. Causal modeling B. Data reduction C. Link prediction D. Regression

B. Data reduction

98. Understanding and predicting inventory obsolescence is an important determination for retail companies. When using competitor selling prices to estimate the inventory obsolescence reserve, the inventory obsolescence reserve represents which of the following: A. Independent variable B. Dependent variable C. Function D. Statistical Model

B. Dependent variable

37. Mastering the data can also be described via the ETL process. The ETL process stands for: A. Extract, total, and load data. B. Extract, transform, and load data. C. Enter, transform, and load data. D. Enter, total, and load data.

B. Extract, transform, and load data.

54. A data dictionary is paramount in helping data analysts do which of the following? A. Maintain databases. B. Identify the data they need to use. C. Communicating insights. D. Track outcomes.

B. Identify the data they need to use.

96. Which of the following best describes an independent variable? A. Output B. Input C. Application D. Operation

B. Input

91. Regression might be used to predict an interest rate given to borrower by the lender based on certain risk characteristics. In this regression, the dependent variable would be which of the following? A. Risk characteristics. B. Interest rate. C. Demographic characteristics. D. Loan acceptance.

B. Interest rate.

12. Which approach to data analytics attempts to characterize the typical behavior of an individual, group or population by generating summary statistics about the data? A. Similarity matching. B. Profiling. C. Data reduction. D. Regression.

B. Profiling.

55. At which step of the ETL process should you try to answer the question "What tools will be used to perform data analytic tests or procedures and why?" A. Step 1: Determine the purpose and scope of the data request. B. Step 2: Obtain the data. C. Step 3 or 4: Transformation. D. Step 5: Loading the data for data analysis.

B. Step 2: Obtain the data.

57. At which step of the ETL process should you try to answer the question "Where are the data located in the financial or other related systems?" A. Step 1: Determine the purpose and scope of the data request. B. Step 2: Obtain the data. C. Step 3 or 4: Transformation. D. Step 5: Loading the data for data analysis.

B. Step 2: Obtain the data.

4. Which of the following Tableau software tools specializes in data transformation? A. Tableau Desktop B. Tableau Prep Builder C. Tableau Public D. Tableau Visualize

B. Tableau Prep Builder

100. Training data are existing data that have been manually evaluated and assigned a class and __________ are existing data used to evaluate the model. A. Control data B. Test data C. Unstructured data D. Structured data

B. Test data

53. Which of the following best describes the purpose of a foreign key? A. To ensure that each row in the table is unique B. To create the relationship between two tables C. To provide business information D. To support business processes across the organization

B. To create the relationship between two tables

42. The objective of loading data is: A. To validate the data for completeness and integrity B. To load the data into the appropriate tool for analysis C. To identify and obtain the data from the appropriate source D. To identify which approach to data analytics should be used

B. To load the data into the appropriate tool for analysis

65. What is one of Excel's tools for joining data from two separate spreadsheets? A. SUMIF B. VLOOKUP C. SQL D. DATAREQUEST

B. VLOOKUP

35. According to the text, as the length of employment increases, there is __________ chance of a loan getting rejected by the bank. A. a greater B. a lesser C. no effect on the

B. a lesser

36. In the LendingClub dataset, a credit score is synonymous with: A. a debt score. B. a risk score. C. a credit card score. D. a premium score.

B. a risk score.

93. When working with a predictive model, underfitting the data is most likely caused by __________. A. an overly complex model B. an overly simple model C. over pruning the data D. a lack of data reduction

B. an overly simple model

28. If a bank uses credit risk score to determine who will receive a loan, the credit risk score would be considered the: A. dependent variable B. independent variable C. response variable D. classification variable

B. independent variable

3. Which of the following best describes the goal of descriptive data analysis: A. recognize what is meant by data quality, be it completeness, reliability or validity B. perform basic analysis to understand the quality of the underlying data and its ability to address the business question C. demonstrate ability to sort, rearrange, merge, and reconfigure data in a manner that allows enhanced analysis D. comprehend the process needed to clean and prepare the data before analysis

B. perform basic analysis to understand the quality of the underlying data and its ability to address the business question

31. A recent study from McKinsey Global Institute estimates that Data Analytics could generate up to $2 __________ in value. A. billion B. trillion C. million D. thousand

B. trillion

14. Which of the following best describes the clustering approach to data analytics? A. An attempt to assign each unit (or individual) in a population into a few categories. B. An attempt to identify similar individuals based on data known about them. C. An attempt to divide individuals into groups in a useful or meaningful way. D. An attempt to discover associations between individuals based on transactions involving them.

C. An attempt to divide individuals into groups in a useful or meaningful way.

20. Which of the following best describes the data reduction approach to data analytics? A. An attempt to characterize the typical behavior of an individual, group or population by generating summary statistics about the data. B. An attempt to predict a relationship between two data items. C. An attempt to reduce the amount of information that needs to be considered to focus on the most critical items. D. An attempt to discover associations between individuals based on transactions involving them.

C. An attempt to reduce the amount of information that needs to be considered to focus on the most critical items.

88. In many naturally occurring collections of numbers, the leading significant digit is likely to be small is called __________. A. Leading digits hypothesis B. Moore's law C. Benford's law D. Classification

C. Benford's law

89. Unaware of data analysis tools available to the internal auditors, a store employee frequently processes cash returns without a receipt for $99, which is just below the amount requiring manager approval of $100. An analysis using which of the following would likely (and quickly) identify the employee's fraudulent behavior. A. Leading digits hypothesis B. Moore's law C. Benford's law D. Clustering

C. Benford's law

44. Removing headings or subtotals from data is an example of which of the following? A. Validating the data for completeness B. Validating the data for integrity C. Cleaning the data D. Obtaining the data

C. Cleaning the data

45. Correcting inconsistencies across data is an example of which of the following? A. Validating the data for completeness B. Validating the data for integrity C. Cleaning the data D. Obtaining the data

C. Cleaning the data

21. Which approach to data analytics attempts to discover associations between individuals based on transactions involving them? A. Similarity matching. B. Clustering. C. Co-occurrence grouping. D. Link prediction.

C. Co-occurrence grouping.

71. Which of the following best describes an unsupervised approach to the evaluation of data? A. Data exploration that is free from oversight by a superior B. Data exploration to examine the relationships between variables that are hypothesized to exist C. Data exploration looking for potential patterns of interest D. Data exploration that is conducted with direct oversight by a superior

C. Data exploration looking for potential patterns of interest

72. Which of the following best describes a supervised approach to the evaluation of data? A. Data exploration that is free from oversight by a superior B. Data exploration that is conducted with direct oversight by a superior C. Data exploration to examine the relationships between variables that are hypothesized to exist D. Data exploration looking for potential patterns of interest

C. Data exploration to examine the relationships between variables that are hypothesized to exist

13. Which approach to data analytics attempts to reduce the amount of information that needs to be considered to focus on the most critical items? A. Similarity matching. B. Profiling. C. Data reduction. D. Regression.

C. Data reduction.

48. Which of the following questions are NOT suggested by the Institute of Business Ethics to allow a business to create value from data use and analysis, and still protect the privacy of stakeholders? A. How does the company use data, and to what extent is it integrated into firm strategy? B. Does the company send a privacy notice to individuals when their personal data is collected? C. Is the data kept in a secure location preventing access from unauthorized users? D. Does the company have the appropriate tools to mitigate the risks of data misuse?

C. Is the data kept in a secure location preventing access from unauthorized users?

38. When using [EmployeeID] as the unique identifier of the Employee table, [EmployeeID] is an example of which of the following: A. Foreign key B. Composite key C. Primary key D. Key attribute

C. Primary key

79. Which approach to data analytics attempts to characterize the typical behavior of an individual, group or population by generating summary statistics about the data? A. Classification B. Regression C. Profiling D. Link prediction

C. Profiling

90. Regression analysis typically involves the following steps except: A. Identify the variables that might predict an outcome. B. Identify the parameters of the model. C. Set boundaries or thresholds. D. Determine the functional form of the relationship.

C. Set boundaries or thresholds.

77. Which approach to data analytics attempts to identify similar individuals based on data known about them? A. Classification B. Clustering C. Similarity matching D. Co-occurrence grouping

C. Similarity matching

87. Data that are stored in a database or spreadsheet that is readily searchable is called __________. A. Training data B. Unstructured data C. Structured data D. Test data

C. Structured data

40. The objective of data extraction is: A. To validate the data for completeness and integrity B. To load the data into the appropriate tool for analysis C. To identify and obtain the data from the appropriate source D. To identify which approach to data analytics should be used

C. To identify and obtain the data from the appropriate source

51. Which of the following best describes the purpose of a non-key attribute? A. To ensure that each row in the table is unique B. To create the relationship between two tables C. To provide business information D. To support business processes across the organization

C. To provide business information

1. With a goal to give organizations the information they need to make sound and timely business decisions, data analytics often involves all of the following except: A. technologies. B. statistics. C. strategies. D. databases.

C. strategies.

17. Which of the following best describes the co-occurrence grouping approach to data analytics? A. An attempt to characterize the typical behavior of an individual, group or population by generating summary statistics about the data. B. An attempt to predict a relationship between two data items. C. An attempt to reduce the amount of information that needs to be considered to focus on the most critical items. D. An attempt to discover associations between individuals based on transactions involving them.

D. An attempt to discover associations between individuals based on transactions involving them.

67. What follows the ETL process of "Validating the Data"? A. Loading the Data for Analysis B. Obtain the data C. Determining the scope of the data request D. Cleaning the data

D. Cleaning the data

82. What is the analytics type for procedures that summarize existing data to determine what has happened in the past? A. Prescriptive B. Predictive C. Diagnostic D. Descriptive

D. Descriptive

49. Which of the following questions are NOT suggested by the Institute of Business Ethics to allow a business to create value from data use and analysis, and still protect the privacy of stakeholders? A. Does our company conduct appropriate due diligence when sharing with or acquiring data from third parties? B. How does the company use data, and to what extent is it integrated into firm strategy? C. Does the company send a privacy notice to individuals when their personal data is collected? D. Does the company require analysts to sign a confidentiality agreement the information found in the data?

D. Does the company require analysts to sign a confidentiality agreement the information found in the data?

58. When obtaining the data yourself, you should do all of the following before you begin except: A. Identify the tables that contain the information you need. B. Identify which attributes specifically hold the information you need in each table. C. Identify how those tables are related to each other. D. Identify any errors or issues from the extraction.

D. Identify any errors or issues from the extraction.

62. Relational databases help to reduce redundant data. Which of the following is NOT a reason to reduce redundant data? A. It takes up unnecessary space B. It is expensive C. It increases the risk of data-entry errors D. It is easier to perform analysis in spreadsheets

D. It is easier to perform analysis in spreadsheets

23. Which approach to data analytics attempts to predict a relationship between two data items? A. Similarity matching. B. Clustering. C. Co-occurrence grouping. D. Link prediction.

D. Link prediction.

25. The IMPACT cycle includes all the following processes except: A. Identify the questions. B. Address and refine results. C. Track outcomes. D. Predict the results.

D. Predict the results

11. Which approach to data analytics attempts to estimate or predict, for each unit, the numerical value of some variable using some type of statistical model? A. Similarity matching. B. Classification. C. Data reduction. D. Regression.

D. Regression.

43. All of the following are included in the five steps of the ETL process except: A. Determine the purpose and scope of the data request B. Obtain the data C. Validate the data for completeness and integrity D. Scrub the data

D. Scrub the data

59. There are a variety of methods that you could take to retrieve the data, including SQL. What does SQL stand for? A. Systems Query Language. B. Systems Question Language. C. Structured Question Language. D. Structured Query Language.

D. Structured Query Language.

52. Which of the following best describes the purpose of relational databases? A. To ensure that business rules are enforced B. To increase information redundancy in the organization C. To provide business information to data analysts D. To support business processes across the organization

D. To support business processes across the organization

32. The PwC's 6th Annual Digital IQ survey of more than 1,400 leaders from digital business, the area of investment that tops CEOs' list of priorities is: A. information technology B. capital expenditures including hardware and software C. accounting data analytics D. business analytics

D. business analytics

6. Which of the following best describes the goal of data scrubbing and data preparation: A. recognize what is meant by data quality, be it completeness, reliability or validity B. perform basic analysis to understand the quality of the underlying data and its ability to address the business question C. demonstrate ability to sort, rearrange, merge and reconfigure data in a manner that allows enhanced analysis D. comprehend the process needed to clean and prepare the data before analysis

D. comprehend the process needed to clean and prepare the data before analysis

95. While overfitting data could lead to an error rate of 0 (zero), it is unlikely that you would be able to __________ your results. A. define B. specify C. articulate D. generalize

D. generalize

9. Which of the following best describes the goal of defining and addressing problems through statistical data analysis: A. recognize what is meant by data quality, be it completeness, reliability or validity B. perform basic analysis to understand the quality of the underlying data and its ability to address the business question C. demonstrate ability to sort, rearrange, merge and reconfigure data in a manner that allows enhanced analysis D. identify and implement an approach that will use statistical data analysis to draw conclusions and make recommendations on a timely basis

D. identify and implement an approach that will use statistical data analysis to draw conclusions and make recommendations on a timely basis

76. The cutoff for the second quartile of a distribution is its__________. A. mean B. 75th percentile C. 25th percentile D. median

D. median

8. Which of the following best describes the goal of data visualization and data reporting: A. recognize when and how data analytics can address business questions B. perform basic analysis to understand the quality of the underlying data and its ability to address the business question C. recognize what is meant by data quality, be it completeness, reliability or validity D. report results of analysis in an accessible way to each varied decision maker and their specific needs

D. report results of analysis in an accessible way to each varied decision maker and their specific needs


Kaugnay na mga set ng pag-aaral

Organizational Behavior Chapter 7**

View Set

Prelude 4: Music as Order and Logic

View Set

Communication Networks (Chapter 3)

View Set

Chapt. 3: Critical Thinking, Ethical Decision Making, & the nursing process

View Set