ISDS final
6 V's that define Big data
- volume - variety - velocity - veracity - value proposition
Variability
data flows can be highly inconsistent, with periodic peaks, making data loads hard to manage
Variety
data today comes in all types of formats-ranging from traditional databases to hierarchical data stores created by the end users and OLAP systems, to text documents, email, XML, meter collected, sensor captured data, to video, audio, and stock ticker data. by some essentials, 80 to 85% of all organizations' data is in some sort of instructed or semi strutted format
Critical success factors for big data analytics
- a clear business need (alignment with the vision and the strategy) - strong, committed sponsorship (executive champion) - a fact based decision making culture
Hadoop
- an open source framework for storing and analyzing massive amounts of distributed, unstructured data. -originally created by doug cutting at yahoo.
big data technologies
- mapreduce - hadoop - noSQL
NoSQL
- not only SQL - a new style of database - to store and process large volumes of unstructured, semi structured, and multi structured data - can handle big data better than traditional relational database technology
What should companies do to succeed with big data?
- simplify - coexist - visualize - empower - integrate - govern - evangelize
data scientists
- use a combo of their business, communication, and technical skills to investigate Big data looking for whats to improve current business analytics practices (from descriptive to predictive and prescriptive) and hence to improve decisions for new business opportunities - considered a big data guru - positions are in high demand and offered with very high salaries and very high experiences
simplify
it's hard to keep track of all of the new database vendors, open source projects, and big data service providers. it will be even more crowded and complicated in the years ahead
Volume
most common trait of big data. Many factors contributed to the exponential increase in data volume, such as transaction based stored through the years, text data constantly streaming in from social media, increasing amounts of sensor data being collected, automatically generated RFID and GPS data and so forth
Velocity
refers to both how fast data is being produced and how fast the data must be processed (i.e., captured, stored, and analyzed) to meet the need or demand. RFID tags, automated sensors, GPS devices, and smart meters are driving an increasing need to deal with torrents of data in near-real time
Veracity
refers to the conformity to facts; accuracy, quality, truthfulness, or trustworthiness go big data
challenges of big data analytics
skill availability (data scientists are in short supply)
MapReduce
technique popularized by google that distributes the processing of very large multiple structured data files across a large cluster of ordinary machines/computer processor
Value proposition
this characteristic of big data is its potential to contain more useful patterns and interesting anomalies than "small" data