Data analytics exam 1

Ace your homework & exams now with Quizwiz!

Define Stratified sampling

When the population to be sampled is divided into subpopulations and proportional samples are taken from each su-population

Which of the following is NOT true of data preparation and integration? A. The entire process of data preparation and integration is most enjoyable and time-consuming part of data science. B. Relevant information may be spread over several tables and databases, so we need to integrate the data. C. Leaders must think multidimensionally and refine insight from all relevant perspectives about their customers, and therefore data from different sources must be integrated. D. Data, especially big data, are often disorganized and overwhelming like runoffs, so they must be integrated to achieve higher value. E. Because data silos exist in an organization, data must be integrated to reduce data inconsistency.

A

Which of the following is NOT one of the fundamental concepts/principles mentioned in Chapter 1 of our textbook? A. Data science involves the judicious integration of human knowledge and computerbased techniques to achieve what neither of them could achieve alone. B. From a large mass of data, information technology can be used to find informative descriptive attributes of entities of interest. C. If you look too hard at a set of data, you will find something—but it might not generalize beyond the data you're looking at. D. Extracting useful knowledge from data to solve business problems can be treated systematically by following a process with reasonably well-defined stages.

A

Which of the following is a continuous numerical variable? A. The internal data is the the data that is procured and consolidated from different branches within an organization, B. All the customer's profile data is stored in a company's database, so the company can use it to generate new value without any concern. C. Data privacy must be considered when an organization or individual plans to use, retain, and/or disclose any personal information, but it is not necessarily involved in the collection of personal information. D. When a researcher uses the data which is collected by others, s/he is using the primary source of data. E. A data breach is the intentional release of secure or private/confidential information to an untrusted environment, so an unintentional release cannot be considered as a data breach.

A

Difference between a continuous and discrete variable

A discrete variable is a reult of counting with a continuous is measured. (Discrete - # of children, Continuous - Weight)

What is ERP (Enterprise Resource Planning)?

A suite of applications, a centralized database, and a set of inherent processes to consolidate business operations into a single computing platform, requires a disruptive conversion process.

What is CRM (Customer Relationship Management)?

A suite of applications, a centralized database, and a set of inherent processes to mange all interactions with customers

What is EAI (Enterprise Application Integration)?

A suite of software applications that integrates existing systems with layers of software that connects current applications.

According to GDPR, which of the following information is considered as personal data? Choose all that apply. A. A person's email address B. A person's government-issued ID number C. A person's phone number D. The organization that a person works E. A person's name F. The name of the university a person attends

A, B, C, D, E

The assigned article summarizes multiple perspectives of analytics, indicating that Analytics is not simply a buzzword, but is _______. Choose all that apply. A. A Collection of Specific Activities B. A Collection of Practices & Technologies C. A Transformation Process D. A Capability Set E. A Decision Paradigm F. A Movement G. A Fashion H. A Hype

A, B, C, D, E, F

Which of the following statements is true of analytics? Choose all that apply. A. Analytics problems must be defined in a clear and precise manner. B. We need to check whether a method's fundamental assumptions are satisfied before we use it in analytics. C. Quantitative data is more easily to be analyzed than qualitative data, so we should avoid using any qualitative data in analytics. D. Data Analytics can be used for solving many business problems, but it cannot help us recognize problems. E. Analytics deals with a lot of numbers, so we do not need to consider ethical issues in analytics. F. It is very important to check whether a analytics model is internally consistent.

A, B, F

Which of the following is true of database? Please choose all that apply. A. In a relational database, data is organized in a tabular format. B. Normalization combines tables in a database into a single table while denormalization breaks a single table apart into many tables. C. A row in a table is also called tuple, record, observation, example, or cases. D. A primary key is a column or group of columns in a table that provides a unique link between data in two tables. E. Compared with normalization, denormalization make it easier to query your data. F. Normalization reduces data redundancy, while denormalization may introduce redundancy.

A, C, E, F

Which of the following is NOT integrated into the episode of evidence-based problem findings & solving? A. Processors B.Mission C. Evidence D. Processes

B

Which of the following is NOT true based on Chapter 13 in Data Science for Business? A. This chapter mentions that good data science relies on heuristic ("art") as well as algorithmic thinking ("science"). B. This chapter mentions two important factors in ensuring successful data analytics: management must think data-analytically and create a data science thriving culture. C. This chapter mentions that culture that supports data science must be created in a bottom-up manner. D. This chapter is relevant to the Business Analytics Framework developed by Dr. Holsapple and his colleagues.

C

Which of the following is NOT true of analytics episodes? A. They could be categorized based on the intent, such as prediction episodes. B. They are subject to analytics managerial influences C. They are triggered only by the intent to solve a problem. D. They must operate on the available evidence only.

C

Which of the following is true of Data warehousing and Hadoop? A. Both of them can be used to deal with big data, but one will replace the other. B. Both of them can be used to deal with big data, but they exclude each other. C. Both of them can be used to deal with big data, but one complements the other. D. Only one of them can be used to deal with big data.

C

Which of the following is true of transactional data and incidental data? A. Transactional data initially has the value of the transaction itself and that value does not change over time. B. Incidental data is often generated around the event of a transaction, and thus incidental data must be required to complete a transaction. C. Incidental data may be stored for varying period of time but is rarely curated and analyzed the way transactional data is. D. Incidental data have immediate value upon creation, while transactional data has little value when it is created.

C

ABC company found that its profit decreased by 30% in 2019. As a starting point, its CEO assigned an analytics team to investigate why this situation occurred and then report the results to the next executive meeting. What type of analytics is the team going to conduct in this case? A. Prescriptive analytics B. Predictive analytics C. Diagnostic analytics D. Descriptive analytics

Diagnostic Analysis

1st order activities

Directly manipulating the data

In the Business Analytics Framework (BAF), which of the following perspectives may be considered as an umbrella concept which covers a combination of definitions in other five perspectives? A. A collection of practices and technologies B. A movement C. A capability set D. A Transformation process E. A decisional paradigm

E

Based on Class 02, which of the following is a feature or attribute of evidence? (Choose all that apply) A. Granularity B. Activity C. Perspective D. Variety E. Complexity F. Measurability G. Volume

Granularity, Variety, Complexity, Measurability, Volume

Systematical Sampling

Sampling of a population where the selected records are taken at a chosen interval which is chosen by the total population divided by the desired sample size.

Chapter 13 of our textbook mentions that the firm's management must think data-analytically. Based on this criterion, which of the following statements is NOT true? A. Managers have to understand the fundamental principles well enough to envision and/or appreciate data science opportunities. B. Managers have to understand the fundamental principles well enough to be willing to invest in data and experimentation. C. Managers have to understand the fundamental principles well enough to steer the data science team carefully to make sure that the team stays on track toward an eventually useful business solution. D. Managers have to understand the fundamental principles well enough to to be data scientists. E. Managers have to understand the fundamental principles well enough to supply the appropriate resources to the data science teams.

D

Features of evidence

Volume, Variety, Velocity, Volatility, Complexity, Granularity, Scope (Others on Class 2 slide 5)

3 Key Challenges Analytics

What data do they want to use, getting the right skills on your team, Taking the insights and transsofrming how the business operates

Which of the following is true of data understanding and data preparation? A. Data cleansing, selection, and transformation are included in data preparation, while data integration is included in data understanding. B. Data preparation provides information about the data, which will be used in data understanding. C. Characteristics of attributes and the dependencies between attributes are recognized in data preparation. D. Generally, data preparation takes much more time than data understanding. E. Data preparation occurs before data understanding.

D

Which of the following is true of eliminating information/data silos? A. ERP and EAI have a centralized database, but CRM does not. B. ERP is included in most CRM systems. C. Among ERP, EAI, and CRM, EAI is the most disruptive for an organization to convert to. D. ERP, EAI, and CRM can be used for eliminating problems of information silos. E. CRM integrates existing systems by providing layers of software that connect applications together.

D

2nd order activities

influencing HOW the data is manipulated

Which of the following data can Hadoop deal with? A. Structured data B. Unstructured data C. Heterogeneous Data D. All of the above E. Two but not all of A, B, and C

All o he Above

Which of the following firms would be most likely to find it advantageous to use a NoSQL database management system? A. A medium-sized manufacturing firm that serves one U.S. region B. A family-owned pizza restaurant with one location C. A very large social media provider D. A sole proprietor who provides consulting services

C

Database Application

•A collection of forms, reports, queries, and application programs that serves as an intermediary between users and database data.

Normalization

•Breaking tables apart and reduce data redundancy

Denormalization

•Combining tables that have been normalized into a single table

DBMS (Database Management System)

•Program used to create, process, and administer a database

3 most common Dimensions of big data

Volume, Variety, Velocity

Difference between Data understanding and data preparation

understanding provides general information about the data while preparation uses that information to clense, reduce, integrate, and transform.

Which of the following modules in the Hadoop framework harnesses the power of thousands of computers working in parallel and therefore is called the heart of Hadoop?

MapReduce

Cluster sampling

Population is divided into groups and then a random selection groups is sampled.

Generally, which of the following types of analytics is the most difficult to conduct? A. Diagnostic analytics B. Predictive analytics C. Descriptive analytics D. Prescriptive analytics

Predictive Analysis

What is the general rule of databases vs spreadsheets

1 theme, use a spreadsheet. Multiple themes, use a database.

Note the the challenges of analytics

Start with the hypothesis and design how your organization will change

Focal of Analytics is

The Problem (What are the Key resources and Key activities?)

Suppose that​ 10,000 customers in a​ retailer's customer database are categorized by three customer​ types: 3,500 prospective​ buyers, 4,500 first time​ buyers, and​ 2,000 repeat​ (loyal) buyers. A sample of​ 1,000 customers is needed. What type of sampling is best for this case?

Stratified Sampling

Other compositions of Big data

Base 3 and Veracity (IBM), Validity and Value

Which type of data account for the majority of all data in organizations? A. Structured data B. Unstructured data C. Semi-structured data D. None of the above

B

According to Chapter 14 and our lecture, which of the following statements is true? Choose all that apply. A. Human creativity, knowledge, and common sense adds value is in selecting the right data for data analytics, but after that, computers can do the remaining tasks alone. B. Ethical issues should not be a problem for data analytics because it extracts useful knowledge from a large amount of data without leaking any privacy information. C. Before solving a problem, we need to define the problem in a clear and precise manner. D. Data science involves the judicious integration of human knowledge and computer-based techniques to achieve what neither could achieve alone. E. There is a intense tension between privacy and improving business decisions because of the direct relationship: the more fine-grained data you collect on individuals, the better you can predict things about them that are important for business decision-making.

C, D, E

Which of the following is an example of procedural knowledge? Choose all that apply. A. A company's code of conduct B. A decision tree rule generated by data mining C. A definition of data analytics D. A manual to install a bicycle E. A recipe for pepperoni bread

D. A manual to install a bicycle E. A recipe for pepperoni bread

Which of the following is NOT true based on p.30 in the book Data Science for Business? A. Numerical data is often normalized or scaled so that they can be comparative. B. Data scientists may spend considerable time early inthe process defining the variables used later in the process. C. Leakage must be considered carefully during data preparation; otherwise the prediction model would be meaningless because the information on the target variable is leaked by the historical data. D. Even though the analytic technologies are powerful nowadays, they still have certain requirements on the data they use. E. The data to be analyzed is often in the same form as that is provided naturally.

E

Which of the following is NOT an evidence manipulation activity? A. Evidence Selection B. Evidence Simulation C. Evidence Emission D. Evidence Acquisition E. Evidence Generation

Evidence Simulation

Which of the following is NOT a factor influencing the usability of a particular representation of evidence to a particular processor? A. The process knowledge used by the processor B. The fit between the evidence representation and its processor. C. The environment within which the action is to take place. D. The action/task being attempted by the processor.

A

According to Chapter 14 and our lecture, which of the following statements is true? Choose all that apply. A. Human creativity, knowledge, and common sense adds value is in selecting the right data for data analytics, but after that, computers can do the remaining tasks alone. B. Ethical issues should not be a problem for data analytics because it extracts useful knowledge from a large amount of data without leaking any privacy information. C. Before solving a problem, we need to define the problem in a clear and precise manner. D. Data science involves the judicious integration of human knowledge and computer-based techniques to achieve what neither could achieve alone. E. There is a intense tension between privacy and improving business decisions because of the direct relationship: the more fine-grained data you collect on individuals, the better you can predict things about them that are important for business decision-making.

C,D,E


Related study sets

Chapter 18 Sports Med Test Reveiw

View Set

Probability of Multiple Events U2 L3

View Set

History; Reagan, Bush, and Clinton

View Set

SPC1017 Midterm Review Module 1-3

View Set

Unit 5: Primitive Hunting Equipment and Techniques

View Set

Amoeba Sisters Video Recap: Photosynthesis and Cellular Respiration

View Set