Chapter Eight: Understanding Big Data and Its Impact on Business Review Questions

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

What are the six steps in the data-mining process and why is each important?

1. Business Understanding 2. Data Understanding 3. Data Preparation 4. Data Modeling 5. Evaluation 6. Deployment Data mining is a continuous process or cycle of activity where you continually revisit the problems with new projects. This allows past models to be effectively reused to look for new opportunities in the present and future. Data mining allows users to recycle their work to become more effective and efficient on solving future problems.

What is distributed computing and how has it helped drive the big data era?

Distributed computing processes and manages algorithms across many machines in a computing environment. A key component of big data is a distributed computing environment that shares resources ranging from memory to networks to storage. With distributed computing individual computers are networked together across geographical areas and work together to execute a workload or computing processes as if they were one single computing environment.

What are the four data-mining techniques for predictions and why are they important to a business?

1. Optimization Model: A statistical process that finds the way to make a design, system, or decision as effective as possible, for example, finding the values of controllable variables that determine maximal productivity or minimal waste. 2. Forecasting Model: Time-series information is time-stamped information collected at a particular frequency. Forecasts are predictions based on time-series information allowing users to manipulate the time series for forecasting activities. 3. Regression Model: A statistical process for estimating the relationships among variables. Regression models include many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent variable and one or more independent variables. 4. ?

What is virtualization and how has it helped drive the big data era?

Virtualization is the creation of a virtual (rather than actual) version of computing resources, such as an operating system, a server, a storage device, or network resources. With big data it is now possible to virtualize data so that it can be stored efficiently and cost-effectively. Improvements in network speed an network reliability have removed the physical limitations of being able to manage massive amounts of data at an acceptable pace.

What are the four data-mining techniques? Provide examples of how you would use each one in business

1. Estimation Analysis: determines values for an unknown continuous variable behavior or estimated future value. -example: the percentage of high school students that will graduate based on student-teacher ratio or income levels. 2. Affinity Grouping Analysis: reveals the relationship between variables along with the nature and frequency of the relationships. -example: 55% of the time, events A and B occur together 3. Cluster analysis: is a technique used to divide an information set into mutually exclusive groups such that the members of each group are a close together as possible to one another and the different groups are as far apart as possible. -example: used to create target-marketing strategies based on zip codes., this allows a business to assign a level of importance to each segment. 4. Classification analysis: the process of organizing data into categories or groups for its most effective and efficient use. -example: groups of political affiliation and charity donors.

What are the four common characteristics of big data?

1. VARIETY: different forms of structured and unstructured data -data from spreadsheets and databases as well as from email, videos, photos, and PDFs, all of which must be analyzed 2. VERACITY: the uncertainty of data, including biases, noise, and abnormalities -uncertainty or untrustworthiness of data -data must be meaningful to the problem being analyzed -must keep data clean and implement processes to keep dirty data from accumulating in systems 3. VOLUME: the scale of data -includes enormous volumes of data generated daily -massive volume created by machines and networks -big data tolls necessary to analyze zettabytes and brontobyetes 4. VELOCITY: the analysis of streaming data as it travels around the Internet -analysis necessary of social media messages spreading globally

What is big data?

Big data is a collection of large, complex data sets, including structured and unstructured data, which cannot be analyzed using traditional database methods and tools. Big data includes data sources that include extremely large volumes of data, with high velocity, wide variety, and an understanding of the data veracity

What is data-driven decision management?

Data-driven decision management is an approach to business governance that values decisions that can be backed up with verifiable data.


Ensembles d'études connexes

Net 484 Chapter 6: Firewalls and Intrusion Detection

View Set

AP Classroom Review Questions for Unit 2 APSC Principles

View Set

Intermediate Accounting - Chapter 20

View Set

WGU Organizational Behavior C 715 Quiz questions

View Set