ISDS Test 5- Chapter 6

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

What is NoSQL? How does it fit into the Big Data analytics picture?

"Not Only SQL" is a new style of database for processing large volumes of multi-structured data. NoSQL databases are mostly aimed at serving up discrete data stored among large volumes of multi-structured data to end-user and automated Big Data application. NoSQL databases trade ACID compliance for performance and scalability.

What are the common business problems addressed by using Big Data analytics?

-Process efficiency and cost reduction -Brand management -Revenue maximization, cross-selling, and up-selling -Enhanced customer experience -Churn identification, customer recruiting -Improved customer service -Identifying new products and market opportunities -Risk management -Regulatory compliance -Enhanced security capabilities

What does ACID stand for?

Atomicity, consistency, isolation, durability

Function of Secondary Nodes

Backup name nodes

What were Luxottica's main challenges?

Because it outsourced both data storage and promotional campaign development and management, there was a disconnect between data analytics and marketing execution. They lacked an individualized view of their customers and could not act decisively and consistently. They used data integration to gain visibility into its customers.

What is Big Data Analytics and how does it differ from regular analytics?

Big Data analytics is analytics applied to Big Data architectures.In order to keep up with the needs of Big Data, high-performance computing has been developed. This includes in-memory analytics, in-database analytics, grid computing, and appliances. This differs from regular analytics which mainly focuses on relational database technologies.

How can Big Data benefit large-scale trading banks?

Big Data can potentially handle the high volume, high variability, continuously streaming data that trading banks need to deal with. Traditional relational databases are often unable to keep up the the data demand.

What is Big Data? Why is it difficult to define?

Big Data means different things to different people. Traditionally, it has been used to describe the massive volumes of data analyzed by large organizations (ie. Google and NASA). Big data includes both structured and unstructured data, but is not just about volume; also variety, velocity, veracity, and value proposition.

How can big data analytics help ease the traffic problem in large cities?

Big data helps identify traffic congestion in its early stages. By integrating geo-spacial data from buses and data on bus timetables into central geographic information systems, you can create a digital map of the city. Operators can drill down to see the number of buses that are on time or delayed on each route. Users can produce detailed reports on areas of the network where buses are frequently delayed and take prompt action to ease congestion. Data and analytics can also assist with future planning of roads, infrastructure, and public transportation.

How can stream analytics be used be used in e-commerce?

Companies such as Amazon and eBay use stream analytics to analyze customer behavior in real time (every page visit, product looked at, search conducted, and click).

What are the critical success factors for Big Data analytics?

Critical factors include a clear business need, strong and committed sponsorship, alignment between the business and IT strategies, a fact-based decision culture, a strong data infrastructure, the right analytics tools, and personnel with advanced analytic skills.

Why is Big Data important? What has changed to put it in the center of the analytic world?

Due to more and more data becoming available, timely processing of data with traditional means becomes impractical. Big Data brought exponential growth, availability, and use of structured and unstructured information to the analytic world.

What is the role of analytics and Big Data in modern day politics?

Elections are suitable arenas for Big Data because they are unpredictable and contain the three Vs, volume, variety, and velocity. Big Data Analytics can help predict election outcomes and target potential voters and donors.

What scenarios can Hadoop and RDBMS coexist?

Ex: You can use Hadoop for storing and archiving multi-structured data. This data is extracted from Hadoop and analyzed by the relational DBMS. Hadoop can be used to filter, transform, and analyze multi-structural data.

What other industries and application areas are stream analytics used?

Example: News industry: being able to rapidly sift through data to recognize "newsworthy" events. Weather: predict tornadoes and other natural disasters.

What are the most fruitful industries for stream analytics?

Examples: e-commerce, telecommunications, law enforcement, cyber security, the power industry, health sciences, and the government

What are the common characteristics of data scientists and which is most important?

Expertise in both technical and business application domains is the most sought-out. They need to have soft skills (creativity, communication skills, etc.) and sound skills (data manipulation, programming)

Should large cities use Big Data Analytics and related information technologies?

For the Dublin case, Big Data Analytics were used primarily to ease traffic problems and better understand the traffic network. Departments that could benefit from big data: social services, tax collection sanitation services, crime prevention, and police and fire departments.

What is Hadoop? How does it work?

Hadoop is an open source framework for processing, storing, and analyzing massive amounts of unstructured data. It handles petabytes and exabytes of data over multiple nodes parallel to commodity machines connected by the internet. It utilizes the MapReduce framework to implement distributed parallelism.

What are the use cases for Big Data and Hadoop?

Hadoop is differentiated two ways: first, as the repository and refinery of raw data, and second, as an active archive of historical data. Their distributed file system and flexibility of data formats is advantageous when working with info from the Wed, social media, multimedia, and text. It can handle such huge volumes of data at a low cost and historical data can be managed easily with this approach.

What is the role of visual analytics in the world of Big Data?

It helps organizations uncover trends, relationships, and anomalies by visually shifting through large quantities of data. To be successful, a visual analytics application must allow for the coexistence and integration of relational and multi-structured data.

What is critical event processing? How does it relate to stream analytics?

It is a method of capturing, tracking, and analyzing streams of data to detect special events. It involves combining data from multiple sources to infer events or patterns of interest. (aka, "change of state") May be detected as a measurement exceeding a predefined time, temperature, or value.

What is MapReduce? What does it do and how?

It is a programming model that allows the processing of large-scale data analysis problems to be distributed and parallelized. It distributes the processing of very large multi-structured data files across a large cluster of machines (Google). High performance is achieved by breaking the processing into small units that can be run parallel across the hundreds, potentially thousands, of nodes. The map function breaks a problem into sub-problems. The "reduce" function merges the results from each of these nodes into a final result.

What is a stream?

It is an unbounded flow or sequence of data elements arriving continuously at a high velocity. They often cannot be efficiently or effectively stored for subsequent processing.

What is CERN? Why is it important to science?

It is is the European Organization for Nuclear Research and plays a leading role in the study of physics. It operates the world's largest particle physics laboratory, home to the Large Hadron Collider (LHC). It lies near Geneva, Switzerland.

What is steam analytics? How does it differ from regular analytics?

It is the process of extracting actionable information from continuously flowing/streaming data. It differs because it deals with high velocity data streams instead of more permanent data stores like databases, files, or web pages.

What is Data Stream Mining? What are its challenges?

It is the process of extracting novel patterns and knowledge structures from continuous, rapid data records. Processing data streams is a challenge as opposed to more permanent data storage. It is a more continuous flow of ordered sequence of instances that can only be read once and must be processed immediately as they come in.

Could Big Data analytics change the outcome of an election?

It may have changed the outcome of the 2008 and 2012 elections. Democrats clearly had the competitive advantage in utilizing Big Data and Analytics over the Republicans. The usage and expertise gap between parties may disappear over time and even the playing field.

What were the proposed solution and the obtained results?

Luxottica deployed the Customer Intelligence Appliance (CIA) from IBM Business Partner Aginity LLC. They helped Luxottica segment customer behavior and provide a platform and smart database for marketing execution systems, such as campaign management, email services, and direct marketing. Luxottica did not outsource their data storage and promotional campaign development and management, nor did they merge with companies in Asia.

What are the main Hadoop components?

Major components are the HDFS, a Job Tracker operating on the master node, Name Nodes, Secondary Nodes, and Slave Nodes.

What are the inputs to the analytic system

Market research Social media Census data Election databases

Where do data scientists come from?

Master of Science (or Ph.D.) in Computer Science, MIS, Industrial Engineering are most common.

How do you think the Big Data vendor landscape will change in the near future?

More traditional data vendors will incorporate Big Data into their architectures.

Examples of data streams

Sensor data, computer network traffic, phone conversations, ATM transactions, web searches, and financial data

Function of Slave Nodes

Store data and take direction to process it from the job tracker.

What were the challenges, solution, and results of the Top 5 Investment Bank Achieves Single Source of the Truth

The Bank's legacy system was built on relational database technology. As volume and variability increased, the legacy system was not fast enough and was unable to deliver real-time alerts to manage market and couterparty credit positions. Big Data offered scalability and benefits including a new alert feature, less downtime for maintenance, a faster capacity to process complex changes, and reduced costs. The major benefit was providing real-time access to trading data. The system was also unified.

What will the future of Big Data be like? Will it lose its popularity to something else? If so, what?

The buzzword "Big Data" might change to something else, but the trend toward increased computing capabilities, analytics, methodologies, and data management of high volume information will continue.

Function of HDFS

The default storage layer in any given Hadoop cluster.

What is the Hadoop Distributed File System (HDFS)?

The default storage layer in any given Hadoop cluster. A file organization system adept at storing large volumes of unstructured and semistructured data. It is an alternative to the traditional tables/rows/columns structure of a relational database. Data is replicated across multiple nodes, allowing for fault tolerance in the system.

What were the challenges Dublin City was facing; proposed solution, initial results, and future plans?

The major problem was the difficulty in getting a good picture of traffic in the city from a high-level perspective. The proposed solution was to team up with IBM Research, and especially their Smarter Cities Technologies Centre. IBM researchers created a digital map of the city, with real-time positions of Dublin's 1,000 buses. This gave operators the ability to see the system as a whole instead of just individual corridors. The managers could now answer questions such as "Are the bus lane start times correct?" and "Where do we need to add additional bus lanes and bus-only traffic signals?" The Dublin City Council and IBM plan to enhance the system by incorporating meteorological data, under-road sensors, and bicycle-usage data into their predictive analytics.

Function of Name Nodes

The node in a Hadoop cluster that provides the client information on where in the cluster particular data is stored and if any nodes fail.

Function of Job Tracker

The node of a Hadoop cluster that initiates and coordinates MapReduce jobs or the processing of the data.

Case 6.7: Turning Machine-Generated Streaming Data into Valuable Business Insights

The use of stream analytics via dashboards has helped to improve the effectiveness of the company's threat assessments and security monitoring. The company uses stream analytics to boost customer satisfaction and competitive advantage. The company selected Splunk, one of the leading analytics service providers in the area of turning machine-generated streaming data into valuable insights and provided beneficial results in the areas of application troubleshooting, operations, compliance, and security.

What are the challenges facing data warehousing and Big Data?

The variety and complexity of data makes many data warehouses incapable of keeping up. The variety and velocity forced the IT world to develop "Big Data" but does not mean the end of data warehousing.

What is special about the Big Data vendor landscape? Who are the big players?

The vendor landscape is developing rapidity and entrepreneurial startup firms bring innovative solutions to the marketplace. Hadoop: Coudera (leader), MapR, Hortonworks. NoSQL: DataStax, Informatica, Pervasive Software, Syncsort, and MicroStrategy. Data Warehouse Leaders: Netezza, Greenplum, Vertica, and Aster Data. Mega Vendors: Oracle and IBM Other: EMC, HP, Teradata

What are the common characteristics of emerging Big Data technologies?

They take advantage of commodity to enable scale-out, parallel processing techniques; employ nonrelational data storage capabilities in order to process unstructured/semistructured data; and apply advanced analytics and data visualization technology to convey insights to end users.

What is a data scientist and why are they in demand?

They use a combination of their business and technical skills to investigate Big Data, looking to improve predictive and prescriptive business analytics practices. They investigate and look for new possibilities, while a BI user analyzes existing business situations and operations.

What are the use cases for data warehousing and RDBMS?

Three Use Cases: performance, integration, and availability of a variety of BI tools.

What does "big data" mean to Luxottica?

To them, Big Data includes everything they can find about their customer interactions. This is a source of BI for potential product, marketing, and sales opportunities.

What are the motivations for stream analytics?

Traditional analytics approaches often either arrive at the wrong decisions because of using too much out-of-context data, or they arrive at the correct decisions but too late to be of any use. It is critical to be about to quickly analyze data.

What are the big challenges when considering implementation of Big Data analytics?

Traditional ways are not sufficient for Big Data. Major challenges are the vast amount of data volume, the need for data integration to combine data of different structures in a cost-effective manner, the need to process data quickly, data governance issues, skill availability, and solution costs.

Which "V" is most important when defining Big Data?

Value Proposition is most important for decision makers because it has potential to contain more patterns and anomalies than small data. Analyzing large and feature rich data, organizations gain greater business value.

What are the system outputs or goals

Voter mobilization Organize movements Increase number of volunteers Raise money contributions

Why did eBay need a Big Data solution?

eBay is the world's largest online marketplace that requires the ability to turn the volumes of data it generates into useful insights for customers.

What were the challenges, the proposed solution, and obtained results for eBay's Big Data Solution?

eBay was experiencing explosive data growth and needed a solution that did not have scalability issues and transactional constraints associated with common relational database approaches. It needed a solution to perform rapid analysis on a broad assortment of the structured and unstructured data it captured. The solution did NOT integrate into a single Big Data Center infrastructure. Now, eBay can more cost effectively process massive amounts of data at very high speeds. The new architecture's reliability and fault tolerance has been greatly enhanced.


Kaugnay na mga set ng pag-aaral

Chapter 5 Design of Goods and Services

View Set

NCLEX RN Passpoint - Psychosocial Integrity, Safety and Infection Control, Management of Care, Health Promotion and Maintenance

View Set

BI-216 Chapter 4 Cells and Organelles

View Set

Subsection 5.1 Test your Knowledge

View Set

Personal Finance Yerkes Spring 2019 Unit 2

View Set

Medical Law and Ethics Chapter 10

View Set

AP EURO- The Industrial Revolution

View Set