Test 1 (Data Analytics)

Ace your homework & exams now with Quizwiz!

Benford's Law is an absolute and all data must conform. True False

False

Which of the following best describes the purpose of a non-key attribute? a. To ensure that each row in the table is unique b. To provide business information c. To create the relationship between two tables d. To support business processes across the organization

b.

Which type of chart is preferred for when quartiles, median, and outliers are required for analysis and insights? a. Scatter plots b. Box and whisker plots c. Line chart d. Pie chart

b.

Which of the following best describes a dependent variable? a. Input b. Operation C. Application D. Output

D.

When considering a question such as "Do our customers form natural groups based on similar attributes?" you would use an unsupervised approach. True False

True

XBRL is a global standard for exchanging financial reporting information that uses XML. True False

True

All of the following are examples of a supervised approach to evaluation data except: a. Data reduction b. Regression c. Link prediction d. Causal modeling

a.

Comparing descriptive statistics for numeric fields within the data is an example of which of the following? a. Validating the data for completeness b. Validating the data for integrity c. Obtaining the data d. Cleaning the data

a.

Diagnostic analytics include all of the following except: a. All of the choices are correct b. Profiling c. Clustering d. Similarity matching

a.

In general, the more complex the model, the greater the chance of: a. overfitting the data. b. underfitting the data. c. pruning the data. d. the need to reduce the amount of data considered.

a.

In the late 1960s, Ed Altman developed a model to predict if a company was at severe risk of going bankrupt. He called his statistic Altman's Z-score, now a widely used score in finance. Based on the name of the statistic, which statistical distribution would you guess this came from? a. Standardized normal distribution b. Poisson distribution c. Uniform distribution d. Normal distribution

a.

Justin Zobel suggests that revising your writing requires you to "be egoless—ready to dislike anything you have previously written," suggesting that it is __________ you need to please. a. the reader b. the customer c. yourself d. your boss

a.

Letter grades of A, B, and C would be best described as an example of: a. Ordinal data b. Ratio data c. Nominal data d. Interval data

a.

Mastering the data can also be described via the ETL process. The ETL process stands for: a. Extract, transform, and load data. b. Enter, total, and load data. c. Enter, transform, and load data. d. Extract, total, and load data.

a.

Removing headings or subtotals from data is an example of which of the following? a. Cleaning the data b. Validating the data for integrity c. Obtaining the data d. Validating the data for completeness

a.

The IMPACT cycle includes all except the following process: a. data preparation. b. communicate insights. c. address and refine results. d. perform test plan.

a.

The following chart types are preferred for depicting quantitative data except: a. Pie chart b. Box plots c. Line chart d. Scatter plots

a.

The purpose of transforming data is: a. to validate the data for completeness and integrity. b. to load the data into the appropriate tool for analysis. c. to obtain the data from the appropriate source. d. to identify which data are necessary to complete the analysis.

a.

Which approach to data analytics attempts to assign each unit in a population into a small set of classes where the unit belongs? a. Classification b. Regression c. Similarity matching d. Co-occurrence grouping

a.

Which approach to data analytics attempts to characterize the typical behavior of an individual, group or population by generating summary statistics about the data? a. Profiling. b. Data reduction. c. Similarity matching. d. Regression.

a.

Which of the following best describes the goal of data quality: a. recognize what is meant by data quality, be it completeness, reliability or validity b. comprehend the process needed to clean and prepare the data before analysis c. perform basic analysis to understand the quality of the underlying data and its ability to address the business question d. demonstrate ability to sort, rearrange, merge, and reconfigure data in a manner that allows enhanced analysis

a.

Which of the following best describes the goal of data visualization and data reporting: a. report results of analysis in an accessible way to each varied decision maker and their specific needs b. perform basic analysis to understand the quality of the underlying data and its ability to address the business question c. recognize when and how data analytics can address business questions d. recognize what is meant by data quality, be it completeness, reliability or validity

a.

Which of the following best describes the goal of defining and addressing problems through statistical data analysis: a. identify and implement an approach that will use statistical data analysis to draw conclusions and make recommendations on a timely basis b. perform basic analysis to understand the quality of the underlying data and its ability to address the business question c. recognize what is meant by data quality, be it completeness, reliability or validity d. demonstrate ability to sort, rearrange, merge and reconfigure data in a manner that allows enhanced analysis

a.

Which of the following best describes the goal of descriptive data analysis: a. perform basic analysis to understand the quality of the underlying data and its ability to address the business question b. recognize what is meant by data quality, be it completeness, reliability or validity c. demonstrate ability to sort, rearrange, merge, and reconfigure data in a manner that allows enhanced analysis d. comprehend the process needed to clean and prepare the data before analysis

a.

Which of the following best describes the purpose of a foreign key? a. To create the relationship between two tables b. To support business processes across the organization c. To ensure that each row in the table is unique d. To provide business information

a.

Why is Supplier ID considered to be a primary key for a Supplier table? a. It contains a unique identifier for each supplier. b. It is a 10-digit number. c. It can either be for a vendor or miscellaneous provider. d. It is used to identify different supplier categories.

a.

________ are existing data that have been manually evaluated and assigned a class and ________ are existing data used to evaluate the model. a. Training data; Test data b. Unstructured data; Structured data c. Structured data; Unstructured data d. Test data; Training data

a.

________ states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small. a. Benford's law b. Moore's law c. Leading digits hypothesis d. Classification

a.

Big Data is often described by the three Vs, or a. volume, velocity, and variability. b. volume, velocity, and variety. c. volume, volatility, and variability. d. variability, velocity, and variety.

b.

By the year 2020, about 1.7 megabytes of new information will be created every: a. week. b. second. c. minute. d. day.

b.

In general, the simpler the model, the greater the chance of: a. overfitting the data. b. underfitting the data. c. pruning the data. d. the need to reduce the amount of data considered.

b.

Line charts are not recommended for what type of data? a. Normalized data b. Qualitative data c. Continuous data d. Trend lines

b.

Regression analysis typically involves the following steps except: a. Identify the parameters of the model. b. Set boundaries or thresholds. c. Determine the functional form of the relationship. d. Identify the variables that might predict an outcome.

b.

Retail stores often request customers' zip codes at the end of a sales transaction. This is an example of which data approach? a. Similarity matching b. Clustering c. Classification d. Regression

b.

The IMPACT cycle includes all except the following process a. perform test plan. b. visualize the data. c. master the data. d. track outcomes.

b.

The metadata that describes each attribute in a database is which of the following? a. Composite primary key b. Data dictionary c. Descriptive attributes d. Flat file

b.

What are attributes that exist in a relational database that are neither primary nor foreign keys? a. Nondescript attributes b. Descriptive attributes c. Composite key d. Relational table attributes

b.

When working with a predictive model, underfitting the data is most likely caused by ________. a. over pruning the data b. a lack of data reduction c. an overly simple model d. an overly complex model

b.

Which of the following best describes a supervised approach to the evaluation of data? a. Data exploration that is free from oversight by a superior b. Data exploration to examine the relationships between variables that are hypothesized to exist c. Data exploration that is conducted with direct oversight by a superior d. Data exploration looking for potential patterns of interest

b.

Which of the following best describes a supervised approach to the evaluation of data? a. Data exploration that is conducted with direct oversight by a superior b. Data exploration to examine the relationships between variables that are hypothesized to exist c. Data exploration looking for potential patterns of interest d. Data exploration that is free from oversight by a superior

b.

Which of the following best describes the goal of data manipulation: a. comprehend the process needed to clean and prepare the data before analysis b. demonstrate ability to sort, rearrange, merge, and reconfigure data in a manner that allows enhanced analysis c. perform basic analysis to understand the quality of the underlying data and its ability to address the business question d. recognize what is meant by data quality, be it completeness, reliability or validity

b.

__________ is a discriminating classifier that is defined by a separating hyperplane that works first to find the widest margin (or biggest pipe) and then works to find the middle line. a. Linear classifier b. Support vector machines c. Decision trees d. Multiple regression

b.

A data dictionary is paramount in helping database administrators do which of the following? a. Identify the data they need to use. b. Communicating insights. c. Maintain databases. d. Track outcomes.

c.

An observation about the frequency of leading digits in many real-life sets of numerical data is called: a. leading digits hypothesis. b. Moore's law. c. Benford's law. d. clustering.

c.

As mentioned in the chapter, which of the following is not a common way that data will need to be cleaned after extraction and validation? a. Remove headings and subtotals. b. Format negative numbers. c. Clean up trailing zeroes. d. Correct inconsistencies across data.

c.

Data that are organized and reside in a fixed field with a record or a file are generally contained in a relational database or spreadsheet and are readily searchable by search algorithms. The term matching this definition is: a. training data. b. unstructured data. c. structured data. d. test data.

c.

In general, the more complex the model, the greater the chance of ________. a. Underfitting the data b. Pruning the data c. Overfitting the data d. The need to reduce the amount of data considered

c.

Mastering the data can also be described via the ETL process. The ETL process stands for: a. extract, total, and load data. b. enter, transform, and load data. c. extract, transform, and load data. d. enter, total, and load data.

c.

Unaware of data analysis tools available to the internal auditors, a store employee frequently processes cash returns without a receipt for $99, which is just below the amount requiring manager approval of $100. An analysis using which of the following would likely (and quickly) identify the employee's fraudulent behavior. a. Leading digits hypothesis b. Clustering c. Benford's law d. Moore's law

c.

When considering the color theme for a visualization, which is the best color theme for a color blind audience? a. red/blue scale b. gray scale c. orange/blue scale d. red/green scale

c.

Which approach to Data Analytics attempts to identify similar individuals based on data known about them? a. Classification b. Regression c. Similarity matching d. Data reduction

c.

Which approach to Data Analytics attempts to predict relationship between two data items? a. Profiling b. Classification c. Link prediction d. Regression

c.

Which approach to data analytics attempts to estimate or predict, for each unit, the numerical value of some variable using some type of statistical model? a. Similarity matching. b. Classification. c. Regression. d. Data reduction.

c.

Which approach to data analytics attempts to predict a relationship between two data items? a. Similarity matching b. Classification c. Link prediction d. Co-occurrence grouping

c.

Which attribute is required to exist in each table of a relational database and serves as the "unique identifier" for each record in a table? a. Foreign key b. Unique identifier c. Primary key d. Key attribute

c.

Which of the following best describes an independent variable? a. Operation b. Output c. Input d. Application

c.

Which of the following best describes an unsupervised approach to the evaluation of data? a. Data exploration that is conducted with direct oversight by a superior b. Data exploration that is free from oversight by a superior c. Data exploration looking for potential patterns of interest d. Data exploration to examine the relationships between variables that are hypothesized to exist

c.

Which of the following is not a typical example of nominal data? a. Ethnic group b. Hair color c. SAT scores d. Gender

c.

Which of these terms is defined as being a central repository of descriptions for all of the data attributes of the dataset? a. Big Data b. Data warehouse c. Data dictionary d. Data Analytics

c.

Which skills were not emphasized that analytic-minded accountants should have? a. Develop an analytics mindset b. Data scrubbing and data preparation c. Classification of test approaches d. Define and address problems through statistical data analysis

c.

While accountants don't need to become data scientists, they must know how to do the following except: a. Communicate with the data scientists about specific data needs and understand the underlying quality of the data b. Clearly articulate the business problem the company is facing c. Build a data repository d. Comprehend the process needed to clean and prepare the data before analysis

c.

________ are the kind of visualizations that present findings to an audience. a. Static visualizations b. Exploratory visualizations c. Declarative visualizations d. Interactive visualizations

c.

________ is more complex than ________. a. Training data; test data b. Qualitative data; quantitative data c. Quantitative data; qualitative data d. Test data; training data

c.

________ refers to data that are stored in a database or spreadsheet that is readily searchable. a. Training data b. Unstructured data c. Structured data d. Test data

c.

__________ data would be considered the most sophisticated type of data. a. Ordinal b. Nominal c. Ratio d. Interval

c.

__________ mark the split between one class and another. a. Decision trees b. Identified questions c. Decision boundaries d. Linear classifiers

c.

All of the following are included in the five steps of the ETL process except: a. Determine the purpose and scope of the data request b. Validate the data for completeness and integrity c. Obtain the data d. Scrub the data

d.

Comparing the number of records within the data is an example of which of the following? a. Validating the data for integrity b. Cleaning the data c. Obtaining the data d. Validating the data for completeness

d.

Exhibit 4-8 gives chart suggestions for what data you'd like to portray. Those options include all of the following except: a. relationship. b. distribution. c. comparison. d. normalization.

d.

Gold, silver, and bronze medals would be examples of: a. nominal data. b. structured data. c. test data. d. ordinal data.

d.

Models associated with regression and classification data approaches have all except this important part: a. identifying which variables (we'll call these independent variables) might help predict an outcome (we'll call this the dependent variable). b. the functional form of the relationship (linear, nonlinear, etc.). c. the numeric parameters of the model (detailing the relative weights of each of the variables associated with the prediction). d. test data.

d.

Red, Yellow, and Blue would be best described as an example of: a. Continuous Data b. Ordinal data c. Structured data d. Nominal data

d.

Regression analysis typically involves the following steps except: a. Identify the parameters of the model. b. Identify the variables that might predict an outcome. c. Determine the functional form of the relationship. d. Set boundaries or thresholds.

d.

Results using the Fahrenheit scale would be best described as an example of: a. Ordinal data b. Ratio data c. Nominal data d. Interval data

d.

The Fahrenheit scale of temperature measurement would best be described as an example of: a. discrete data. b. nominal data. c. continuous data. d. interval data.

d.

What is the most appropriate chart when showing a relationship between two variables (according to Exhibit 4-8)? a. Bar chart b. Pie graph c. Histogram d. Scatter chart

d.

When using [EmployeeID] as the unique identifier of the Employee table, [EmployeeID] is an example of which of the following: a. Foreign key b. Key attribute c. Composite key d. Primary key

d.

Which approach to Data Analytics attempts to assign each unit in a population into a small set of classes (or groups) where the unit best fits? a. Regression b. Similarity matching c. Co-occurrence grouping d. Classification

d.

Which of the following best describes an independent variable? a. Operation b. Output c. Application d. Input

d.

Which of the following best describes an unsupervised approach to the evaluation of data? a. Data exploration to examine the relationships between variables that are hypothesized to exist b. Data exploration that is free from oversight by a superior c. Data exploration that is conducted with direct oversight by a superior d. Data exploration looking for potential patterns of interest

d.

Which of the following best describes the goal of data scrubbing and data preparation: a. recognize what is meant by data quality, be it completeness, reliability or validity b. perform basic analysis to understand the quality of the underlying data and its ability to address the business question c. demonstrate ability to sort, rearrange, merge and reconfigure data in a manner that allows enhanced analysis d. comprehend the process needed to clean and prepare the data before analysis

d.

Which of the following best describes the profiling approach to data analytics? a. An attempt to predict a relationship between two data items. b. An attempt to discover associations between individuals based on transactions involving them. c. An attempt to reduce the amount of information that needs to be considered to focus on the most critical items. d. An attempt to characterize the typical behavior of an individual, group or population by generating summary statistics about the data.

d.

Which of the following best describes the purpose of relational databases? a. To ensure that business rules are enforced b. To increase information redundancy in the organization c. To provide business information to data analysts d. To support business processes across the organization

d.

Which of the following describes part of the goal of the ETL process: a. identify which approach to data analytics should be used. b. load the data into a relational database for storage. c. communicate the results and insights found through the analysis. d. identify and obtain the data needed for solving the problem.

d.

Which of the following is not one of the four main questions to consider when creating your data scale and increments? a. What scale should be used to display the data? b. How much data should be displayed in the visualization? c. Should outliers be displayed? d. Which colors should be displayed?

d.

Which of these is not included in the five steps of the ETL process? a. Determine the purpose and scope of the data request. b. Obtain the data. c. Validate the data for completeness and integrity. d. Learn what data is available in the data warehouse.

d.

Which skills were not emphasized that analytic-minded accountants should have? a. Data quality b. Descriptive data analysis c. Data visualization d. Data and systems analysis and design

d.

Which type of chart is best described as useful for identifying the correlation between two variables, for identifying a trend line, or line of best fit? a. Line chart b. Box and whisker plots c. Pie chart d. Scatter plots

d.

________ mark the split between one class and another. a. Decision trees b. Linear classifiers c. Identifying questions d. Decision boundaries

d.

__________ data would be considered the least sophisticated type of data. a. Ordinal b. Interval c. Ratio d. Nominal

d.

__________ is a set of data used to assess the degree and strength of a predicted relationship. a. Training data b. Unstructured data c. Structured data d. Test data

d.

The advantages of storing data in a relational database include which of the following? Option A - Help in enforcing business rules Option B - Increased information redundancy Option C - Integrating business processes a. Option A. b. Option B. c. Option C. d. All of these options are advantages of a relational database. e. Only Option A and Option B. f. Only Option B and Option C. g. Only Option A and Option C.

g.


Related study sets

Olympics: Pioneers & Progress 1896-1936

View Set

Forensic Science- Chapter 4 Review,

View Set

Section 7: Instruments of Real Estate Finance

View Set