Big Data

Ace your homework & exams now with Quizwiz!

Describe at least three sources of Big Data.

Archives, Machine logs, Public Web, Sensor Data, Social Media

Handling Big Data can be problematic because certain attributes are private. Which of the following attributes are deemed 'private': A) Salary Records B) Work Experience C) Student Records D) Personal Contact Information

C) Student Records

When was MapReduce developed and what purpose did it serve?

Developed by Google in 2005, breaks up files into small chunks and stores them across a distributed network.

Why are E-R Models not scalable with Big Data?

E-R Tables in SQL talk longer to search for relations than clustering.

T/F: Direct-attached hyperscale computer environments include shared storage.

False

T/F: The big data itself can provide information on the domain it represents.

False

List a few major open source technologies used to manipulate big data.

HaDoop, MongoDB, CouchDB, Cassandra

Provide a real world example of Big Data.

Obama's 2012 reelection campain.

Describe some benefits of using a column-oriented database for storing big data.

Reduces the computation required for queries.

State the four main architectures of parallel databases.

Shared memory, shared disk, shared nothing, hierarchical (hybrid)

What are Enterprise Resource Planning Systems and when were they first developed?

Software solutions used by businesses to assist in the organizing of resources used by a firm. 1976.

State and explain the characteristics of Big Data: Complexity

The challenges of linking various sources of data to infer a trend.

State and explain the characteristics of Big Data: Veracity

The inaccuracies which are often found within big data.

Why are NoSQL databases good for implementing big data storage solutions?

They are designed to be scalable which helps facilitate big data storage.

T/F: Big Data is an objective term?

False.

Explain the difference between Shared Disk, Shared Memory, and Shared Nothing Architectures.

For shared disk only the disk is shared, shared memory shares everything, shared nothing only communicate with one another.

What advice would you give to someone about to venture into big data analytics?

Gather as much data relevant to the domain that is going to be analyzed, avoid queries that will not provide any value.

What is Hadoop and why is it used?

Hadoop is an open source software product for distributed storage and processing of Big Data. Develops a parallel database architecutre running arcoss many different nodes.

How is Big Data used?

It is used to drawn trends and patterns from large and varied data sets.

Today's data is mostly unstructured. What does that mean when merging Big Data sets? Provide an example of an unstructured data type.

Something like HaDoop is needed to cluster the sets. Audio

State and explain the characteristics of Big Data: Variability

The inconsistencies which are often found in Big Data sets.

State and explain the characteristics of Big Data: Variety

The many sources from which Big Data can be drawn.

State and explain the characteristics of Big Data: Velocity

The speed at which data is being received and processed.

State and explain the characteristics of Big Data: Volume

The vast amount of data that must be dealt with.

State a benefit and drawback to using direct-attached storage in a hyperscale computing environment.

They allow mirrors to support constant avaialability.

Why do traditional DMS's fail when big data is involved?

They aren't flexible enough to handle the variety and velocity of the data.

Briefly explain how big data analytics can be used in the financial industry.

They can be used to make strategic trading decisions.

Briefly explain how big data analytics can be used to benefit a business.

They can be used to predict customer behaviours and preferences.

Why is Big Data used?

Used to show relationships and dependencies between events.

What is Big Data?

Big data is a term which is used to describe any data set that is so large and complex that it is difficult to process using traditional applications.

Dirty data is defined as "unreadable data or attributes due to irrelevant data and becomes inconsistent with other data", what is one negative effect on that?

Can't merge data sets

Given lots of sharing of Big Data, what is it called when network speeds are at a loss? A) Big Data Research and Development Initiative B) Time-Seneitive Network Cleaning C) Distributed Systems D) Bottleneck Networking

D) Bottleneck Networking


Related study sets

Cooper QuestionsAn example of unconditioned reinforcer(s) is: Open Hint for Question 3 in a new window. Food Water Oxygen Warmth Sexual stimulation All of these

View Set

UPREP CH. 20/23. VSIM VERNON RUSSELL

View Set

Money and Banking in the economy Exam 1

View Set

⑦ Multiple Choice ①~③ (60 points)

View Set

Regional terms and location practice

View Set

Properties of Concrete- Review Questions

View Set