Big Data
Describe at least three sources of Big Data.
Archives, Machine logs, Public Web, Sensor Data, Social Media
Handling Big Data can be problematic because certain attributes are private. Which of the following attributes are deemed 'private': A) Salary Records B) Work Experience C) Student Records D) Personal Contact Information
C) Student Records
When was MapReduce developed and what purpose did it serve?
Developed by Google in 2005, breaks up files into small chunks and stores them across a distributed network.
Why are E-R Models not scalable with Big Data?
E-R Tables in SQL talk longer to search for relations than clustering.
T/F: Direct-attached hyperscale computer environments include shared storage.
False
T/F: The big data itself can provide information on the domain it represents.
False
List a few major open source technologies used to manipulate big data.
HaDoop, MongoDB, CouchDB, Cassandra
Provide a real world example of Big Data.
Obama's 2012 reelection campain.
Describe some benefits of using a column-oriented database for storing big data.
Reduces the computation required for queries.
State the four main architectures of parallel databases.
Shared memory, shared disk, shared nothing, hierarchical (hybrid)
What are Enterprise Resource Planning Systems and when were they first developed?
Software solutions used by businesses to assist in the organizing of resources used by a firm. 1976.
State and explain the characteristics of Big Data: Complexity
The challenges of linking various sources of data to infer a trend.
State and explain the characteristics of Big Data: Veracity
The inaccuracies which are often found within big data.
Why are NoSQL databases good for implementing big data storage solutions?
They are designed to be scalable which helps facilitate big data storage.
T/F: Big Data is an objective term?
False.
Explain the difference between Shared Disk, Shared Memory, and Shared Nothing Architectures.
For shared disk only the disk is shared, shared memory shares everything, shared nothing only communicate with one another.
What advice would you give to someone about to venture into big data analytics?
Gather as much data relevant to the domain that is going to be analyzed, avoid queries that will not provide any value.
What is Hadoop and why is it used?
Hadoop is an open source software product for distributed storage and processing of Big Data. Develops a parallel database architecutre running arcoss many different nodes.
How is Big Data used?
It is used to drawn trends and patterns from large and varied data sets.
Today's data is mostly unstructured. What does that mean when merging Big Data sets? Provide an example of an unstructured data type.
Something like HaDoop is needed to cluster the sets. Audio
State and explain the characteristics of Big Data: Variability
The inconsistencies which are often found in Big Data sets.
State and explain the characteristics of Big Data: Variety
The many sources from which Big Data can be drawn.
State and explain the characteristics of Big Data: Velocity
The speed at which data is being received and processed.
State and explain the characteristics of Big Data: Volume
The vast amount of data that must be dealt with.
State a benefit and drawback to using direct-attached storage in a hyperscale computing environment.
They allow mirrors to support constant avaialability.
Why do traditional DMS's fail when big data is involved?
They aren't flexible enough to handle the variety and velocity of the data.
Briefly explain how big data analytics can be used in the financial industry.
They can be used to make strategic trading decisions.
Briefly explain how big data analytics can be used to benefit a business.
They can be used to predict customer behaviours and preferences.
Why is Big Data used?
Used to show relationships and dependencies between events.
What is Big Data?
Big data is a term which is used to describe any data set that is so large and complex that it is difficult to process using traditional applications.
Dirty data is defined as "unreadable data or attributes due to irrelevant data and becomes inconsistent with other data", what is one negative effect on that?
Can't merge data sets
Given lots of sharing of Big Data, what is it called when network speeds are at a loss? A) Big Data Research and Development Initiative B) Time-Seneitive Network Cleaning C) Distributed Systems D) Bottleneck Networking
D) Bottleneck Networking