ISDS final- chapter 6
T/F Hadoop and MapReduce require each other to work
False
what produces big data the fastest?
RFID
Using data to understand customers/clients and business operations to sustain and foster growth and profitability is:
a) an increasingly challenging task for today's enterprises. b) is not a new technological fad, rather, it's a business priority.
Variability
data flows can be highly inconsistent with periodic peaks, making data loads hard to manage
Where does big data come from?
everywhere, but most big data is generated by machines
is there a strong case for large cities to use big data and related information technologies?
for Dublin, big data was used to ease traffic problems and better understand the traffic network
What does big data mean to Luxottica?
for Luxoticca, big data includes everything they can find about their customer interactions in the from of transactions, click streams, product reviews and social media patterns
critical success factors for big data analytics
1. clear business need ) alignment with the vision and the strategy) 2. strong, committed sponsorship ( executive champion) 3. alignment between the business and IT strategy 4. a fact based decision making culture 5. a strong data infrastructure 6. the right analytics tools 7. right people with the right skills
Skills that define a data scientist
1. domain expertise, problem definition and decision modeling 2. data access and management (traditional and new data systems) 3. programming, scripting and hacking 4. internet and social media/social networking technologies 5. curiosity and creativity 6. communication and interpersonal
challenges of big data analytics
1. effectively and efficiently capturing, storing and analyzing big data 2. new breed of technologies needed (developed or purchased or hired or outsourced) 3. data volume 4. data integration 5. processing capabilities 6. data governance 7. skill availability (data scientist are in short supply) 8. solution cost (ROI)
know the inputs to the analytics system
1. market research 2. social media 3. census data 4. election databases
What should companies do to succeed with big data?
1. simply 2. coexist 3. visualize 4. empower 5. integrate 6. govern 7. evangelize
know the analytic system outputs or goals
1. voter mobilization 2. organize movements 3. increase # of volunteers 4. raise money contributions
Variety
80-85% of all organizations data is in some sort of unstructured or semi-structured format. ranging from traditional databases to hierarchical data stores.
What is CERN and why is it important to the world of science?
CERN is the European organization for nuclear research. plays a leading role in fundamental studies of physics. instrumental in many key global innovations and breakthrough studies in theoretical physics. operates the world largest particle physics laboratory located near Geneva, Switzerland
why did eBay need a big data solution?
Ebay is the worlds largest online marketplace and requires the ability to turn the enormous volumes of data it generates into useful insight for customers.
T/F Big data simplifies data governance issues, especially for global firms
False
What was the obtained results for Luxottica?
Luxoticca did not outsource their data strorage and promotional campaign development and management, nor did they merge with companies in Asia.
In the world of Big Data, ________ aids organizations in processing and analyzing large volumes of multi-structured data. Examples include indexing and search, graph analysis, etc.
MapReduce
What are the big data technologies?
MapReduce Hadoop NoSQL
HBase, Cassandra, MongoDB, and Accumulo are examples of ________ databases.
NoSQL
Turning Machine-Generated Streaming Data into Valuable Business Insights
The company uses stream analytics to boost customer satisfaction and competitive advantage. The company selected to work with Splunk, one of the leading analytics service providers in the area of turning machine-generated streaming data into valuable insights and provided beneficial results in the areas of application troubleshooting, operations, compliance, and security.
A case in the energy industry for stream analytics?
a classic smart grid application for the electric power supply chain
Business investments ought to be made for the good of the business, not for the sake of mere technology advancements. Therefore the main driver for Big Data analytics should be an alignment with the vision and the strategy and at any level-strategic, tactical, and operations. Which of the critical success factors for Big Data analytics is being described?
a clear business need
In ____?______, the numbers rather than intuition, gut feeling, or supposition drive decision making. There is also a culture of experimentation to see what works and doesn't. To create ____?____, senior management needs to do the following: recognize that some people can't or won't adjust; be a vocal supporter; stress that outdated methods must be discontinued; ask to see what analytics went into decisions; link incentives and compensation to desired behaviors
a fact based decision making culture
It is a well-known fact that if you don't have committed executive backing, it is difficult (if not impossible) to succeed. If the scope is a single or a few analytical applications, the support can be at the departmental level. However, if the target is enterprise-wide organizational transformation, which is often the case for Big Data initiatives, _____________________ needs to be at the highest levels and organization-wide. Which one best Critical Success Factor for Big Data Analytics best fills the blank in the previous sentence?
a strong committed sponsorship
data volume
ability to capture, store and process the huge volume of data in a timely manner
data integration
ability to combine data quickly and at a reasonable cost
what is the goal of MapReduce?
achieving high performance with "simple" computers
Stream analytics
also called data-in-motion analytics and real-time analytics. analytic process of extracting actionable information from continuously flowing/streaming data. - one of the V's is big data: velocity
In-motion ________ is often overlooked today in the world of BI and Big Data.
analytics
In the eBay use case study, load ________ helped the company meet its Big Data needs with the extremely fast data handling and application availability requirements.
balancing
How can big data benefit large-scale trading bank?
big data can handle the high volume, high variability and continuously streaming data that trading banks need to deal with
MapReduce + Hadoop=
big data core technology
Data scientist
big data guru, one with skills to investigate big data. very high salaries, very high expectations
How can big data help ease traffic in large cities?
by integrating geospatial data from buses into a central geographic information system you can create a digital map of the city. Then, using the dashboard screen operators can drill down to see if the number of buses that are on time or delayed. users can produce detailed reports on areas frequently delayed and take prompt action to ease congestion
What were the challenges, solutions and results for the investment bank?
challenge was the bank was not fast enough to respond to growing business needs and requirements. Big data offered the scalability to address the problem. the major benefit was providing real time access to trading data. achieved single version of the truth.
what were the challenges, solutions and results for eBay?
eBay needed a solution to perform rapid analysis on a broad assortment of structured and unstructured data. the solution did NOT integrate into a single big data center infrastructure. eBay can now more cost effectively process massive amounts of data at very high speeds.
stream analytics applications
ecommerce telecommunications law enforcement and cyber security power industry financial services health services- biggest potential source of big data comes from patient monitoring government
perpetual analytics
evaluated every incoming observation against all prior observations in the context of intelligent systems and recognizing how the new observation relates to all prior observations enables the discovery of real-time insights.
open source
hundred of contributors continuously improve the core technology
________ speeds time to insights and enables better data governance by performing data integration and analytic functions inside the database.
in-database analytics
Allowing big data to be processing in memory and distributed across a dedicated set of nodes can solve complex problems in near real time. this process is called
in-memory analytics
What are some example tasks of MapReduce?
indexing the web for search, graph analysis, text analysis, machine learning
Why stream analytics?
it may not be feasible to store the data or may lose its value
challenges for Dublin city council?
major problem was the difficulty in getting a good picture of traffic in the city from a high level perspective. this gave operators the ability to see the system as a whole instead of just individuals corridors.
What does big data mean traditionally?
massive amounts of data
Do you think big data analytics could change the outcome of an election?
may well have in 08 and 12. many agree democrats clearly had the advantage in utilizing big data
critical event processing
method od capturing, tracking and analyzing stream of data to detect events (out of normal happenings) of certain types that are worthy of the effort
volume
most common trait of big data. factors of the exponential increase in data volume are: transaction based data stored through the years, text data from social media and increasing amounts of sensor data being collected.
petabyte (PB)
newly popular unit of data in the big data era which is 10^15 bytes
NoSQL
not only SQL, a new style of database to store and process large volumes of unstructured, semi-structured and multi-structured data. can handle big data better than traditional relational database technology.
Hadoop
open source framework for storing and analyzing massive amounts of distributed, unstructured data. originally created by Doug Cutting at Yahoo. breaks up big data into multiple parts so each part can be processed and analyzed at the same time on multiple computers.
What is Big Data?
popular term for exponential growth, availability and use of information, both structured and nonstructured. - relative term, "big" depends of organization size. - big data by itself, regardless of its size, type, or speed is worthless
grid computing
promotes efficiency, lower cost and better performance by processing jobs in a shared, centrally managed pool of IT resources
Velocity
refers to both how fast data is being produced and how fast the data must be processed (captured, stored and analyzed) to meet need/demand.
Veracity
refers to the conformity to facts: accuracy, quality, truthfulness, or trustworthiness of big data
data governance
security, privacy, access
MapReduce
technique popularized by Google that distributes the processing of very large multi-structured data files across a large cluster of ordinary machines/computer processors - good at processing and analyzing large volumes of multi-structured data in a timely manner
processing capabilities
the ability to process the data quickly as it is captured (i.e. stream analytics)
What were the main challenges for Luxottica?
there was a disconnect between data analytics and marketing execution. the technique company uses to gain visibility into its customers is data integration.
value proposition
this characteristics of big data is its potential to contain more useful patterns and interesting anomalies than small data. - with the value proposition, big data also brought big challenges
Why is there a need for big data?
traditional data warehouses have not been able to keep up with the variety and complexity of data so a new breed of technologies are need to take on big data (developed or purchased or hired or outsourced)
T/F Hadoop was deigned to handle petabytes and exabytes of data distributed over ,multiple nodes in parallel
true
T/F: many analytics tools are too complex for the average user and this is one justification for big data
true
Big data + "big" analytics=
value
The ________ of Big Data is its potential to contain more useful patterns and interesting anomalies than "small" data.
value proposition
Data flows can be highly inconsistent with periodic peaks making loads hard to manage. which V is this?
variability
refers to the conformity to facts: accuracy, quality, truthfulness, or trustworthiness of the data.
veracity
What are the 3 main V's?
volume variety velocity
6 v's that characterize/define big data
volume variety velocity veracity variability value proposition
What is the role of analytics and big data in modern day politics?
volume, variety and velocity readily apply to the kind of data used for political campaigns. big data analytics can help predict election outcomes as well as targeting potential voters and donors and have become a critical part of political campaigns.