ISDS- Chapter 8: Understanding Big Data and Its Impact on Business
variety
-different forms of structured and unstructured data -data from spreadsheets and databases as well as from email, videos, photos, and PDFs, all of which must be analyzed
velocity
-the analysis of streaming data as it travels around the internet -analysis necessary of social media messages spreading globally
volume
-the scale of data -includes enormous volumes of data generated daily -massive volume created by machines and networks -big data tools necessary to analyze zettabytes and brontobytes
veracity
-the uncertainty of data, including biases, noise, and abnormalities -uncertainty or untrustworthiness of data -data must be meaningful to the problem being analyzed -must keep data clean and implement processes to keep dirty data from accumulating in systems
big data includes the four common characteristics
-variety -veracity -volume -velocity
optimization model
A statistical process that finds the way to make a design, system, or decision as effective as possible, for example, finding the values of controllable variables that determine maximal productivity or minimal waste.
regression model
include many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent variable and one or more independent variables.
databases
___________ contain information in a series of two-dimensional tables
data artist
a business analytics specialist who uses visual tools to help people understand complex data
big data
a collection of large, complex data sets, including structured and unstructured data, which can not be analyzed using traditional database methods and tools
dimension
a particular attribute of information
cluster analysis
a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible
-optimization -forecast -regression
data mining prediction analysis methods
-classification -estimations -affinity grouping -clustering
data mining techniques
data visualization
describes technologies that allow users to see or visualize data to transform information into a business perspective
estimation analysis
determines values for an unknown continuous variable behavior or estimated future value.
analysis paralysis
occurs when the user goes into an emotional state of over-analysis (or overthinking) a situation so that a decision or action is never taken, in effect paralyzing the outcome
forecasting model
predictions based on time-series information allowing users to manipulate the time series for forecasting activities.
infographics
presents the results of data analysis, displaying the patterns, relationships, and trends in a graphical format.
distributed computing
processes and manages algorithms across many machines in a computing environment
affinity grouping analysis
reveals the relationship between variables along with the nature and frequency of the relationships
cube
the common term for the representation of multidimensional info
virtualization
the creation of a virtual version of computing resources such as an operating system, a server, storage device, or network resources
data mining
the process of analyzing data to extract info not offered by the raw data alone
classification analysis
the process of organizing data into categories or groups for its most effective and efficient use
business intelligence dashboards
track corporate metrics such as critical success factors and key performances indicators and include advanced capabilities such as interactive controls, allowing users to manipulate data for analysis.
data mining tools
use a variety of techniques to find patterns and relationships in large volumes of information