Chapter 5 Study Guide
Big Data
A collection of data so large and complex that it is difficult to manage using traditional database management systems.
Data Cube:
A common representation for this multidimensional structure
ex of a foreign key
A foreign key is used to establish and enforce a link between two tables.
Managing data in organizations is difficult for many reasons. Reason 1
The amount of data is increasing exponentially with time. Much historical data must be kept for a long time, and new data are added rapidly. For example, to support millions of customers, large retailers such as Walmart must manage many petabytes of data. (A petabyte is approximately 1,000 terabytes, or trillions of bytes
Using databases eliminates many problems that arose from previous methods of storing and accessing data, such as file management systems. T/F
True
A(n) ____ generally describes a(n) ____
record; entity
Ex of an entity and instance:
your university's student database contains an entity called "student." An instance of the student entity would be a particular student. Thus, you are an instance of the student entity in your university's student database.
What makes a record and what's an example?
-A logical grouping of related fields, such as the student's name, the courses taken, the date, and the grade
What is data governance and what are some types?
-An approach to managing information across an entire organization. -Master Data -Master Data Management
Issues with big data untrusted data sources
-Big Data can come from numerous, widely varied sources. -These sources may be internal or external to the organization. -For example, a company might want to integrate data from unstructured sources such as e-mails, call center notes, and social media posts with structured data about its customers from its data warehouse.
Big Data Used in the Functional Areas of the Organization Product Development
-Big Data can help capture customer preferences and put that information to work in designing new products. -ex: ford conducting a test on whether or not they should implement a 3 blink turn indicator
Industry analysts estimate that
80 to 90 percent of the data in an organization are unstructured.
bit
A binary digit—that is, a 0 or a 1.
examples of Big Data:
-facebook -Google's YouTube -Twitter -autonomous cars
Big Data Used in the Functional Areas of the Organization Human Resources:
-Big Data is also having an impact on hiring. -An example is Catalyst IT Services, a technology outsourcing company that hires teams for programming jobs. -the company collects more data based on how candidates answer than on what they answer. -the assessment might give a problem requiring calculus to an applicant who is not expected to know the subject. How the candidate responds—laboring over an answer, answering quickly and then returning later, or skipping the problem entirely—provides insight into how that candidate might deal with challenges that he or she will encounter on the job. That is, someone who labors over a difficult question might be effective in an assignment that requires a methodical approach to problem solving, whereas an applicant who takes a more aggressive approach might perform better in a different job setting -benefit of this Big Data approach is that it recognizes that people bring different skills to the table and there is no one-size-fits-all person for any job
Database systems minimize the following problems:
-Data redundancy: The same data are stored in multiple locations. -Data isolation: Applications cannot access data associated with other applications. -Data inconsistency: Various copies of the data do not agree.
Characteristics of data warehouses and data marts -nonvolatile
-Data warehouses and data marts are nonvolatile—that is, users cannot change or update the data. -Therefore, the warehouse or mart reflects history, which, as we just saw, is critical for identifying and analyzing trends. -Warehouses and marts are updated, but through IT-controlled load processes rather than by users.
Issues with big data Big Data is dirty
-Dirty data refers to inaccurate, incomplete, incorrect, duplicate, or erroneous data. Examples of such problems are misspelling of words, and duplicate data such as retweets or company press releases that appear multiple times in social media. -Suppose a company is interested in performing a competitive analysis using social media data. The company wants to see how often a competitor's product appears in social media outlets as well as the sentiments associated with those posts. The company notices that the number of positive posts about the competitor is twice as great as the number of positive posts about itself. This finding could simply be a case of the competitor pushing out its press releases to multiple sources; in essence, blowing its own horn. Alternatively, the competitor could be getting many people to retweet an announcement.
Managing Big Data Database environment:
-For many organizations the first step toward managing data was to integrate information silos into a database environment and then to develop data warehouses for decision making. -An information silo is an information system that does not communicate with other related information systems in an organization
Big Data Used in the Functional Areas of the Organization Marketing:
-Marketing managers have long used data to better understand their customers and to target their marketing efforts more directly. -The United Kingdom's InterContinental Hotels Group marketing specific offers for each individual customer
Issues with big data Big Data changes, especially with data streams
-Organizations must be aware that data quality in an analysis can change, or the data themselves can change, because the conditions under which the data are captured can change. -For example, imagine a utility company that analyzes weather data and smart-meter data to predict customer power usage.
Organizations Can Analyze More Data:
-Random sampling works well, but it is not as effective as analyzing an entire dataset. -
A Generic Data Warehouse Environment
-Source systems that provide data to the warehouse or mart -Data-integration technology and processes that prepare the data for use -Different architectures for storing data in an organization's data warehouse or data marts -Different tools and applications for the variety of users. -Metadata (data about the data in a repository), data quality, and governance processes that ensure that the warehouse or mart meets its purposes
Big Data Used in the Functional Areas of the Organization Government Operations:
-The growing availability of Big Data sources within London—for example, traffic cameras and sensors on cars and roadways—can help to create a new era of smart transport. -Analyzing this Big Data offers new ways for traffic analysts in London to "sense the city" and enhance transport via real-time estimation of traffic patterns and rapid deployment of traffic management strategies.
What is the difference between master data and transactional data?
-Transactional data, which are generated and captured by operational systems, describe the business's activities, or transactions. - In contrast, master data are applied to multiple transactions, and they are used to categorize, aggregate, and evaluate the transactional data
Characteristics of data warehouses and data marts -multidimensional
-Typically, the data warehouse or mart uses a multidimensional data structure. Recall that relational databases store data in two-dimensional tables. -In contrast, data warehouses and marts store data in more than two dimensions. -For this reason, the data are said to be stored in a multidimensional structure. A common representation for this multidimensional structure is the data cube.
What are some issues with Big Data?
-Untrusted data sources -Big Data is dirty -big data changes, especially in data streams
What makes a data hierarchy?
-a bit (smallest unit of data) -a byte (a group of bytes) -a field (a logical grouping of characters) -a record (a logical grouping of related fields) -a data file/table (a logical grouping of related records) -a database (a logical grouping of related files
ex of primary key
-a student record in a U.S. university would use a unique student number as its primary key. -In the past, your Social Security number served as the primary key for your student record. However, for security reasons, this practice has been discontinued.
Example of a field
-a student's name in a university's computer files would appear in the "name" field, and her or his Social Security number would appear in the "Social Security number" field. -a motor vehicle department's licensing database that contains a driver's photograph, or a field that contains a voice sample to authorize access to a secure facility.
Big Data Used in the Functional Areas of the Organization Operations
-companies have been using information technology to make their operations more efficient -UPS combining GPS information and data from sensors installed on more than 46,000 vehicles, UPS reduced fuel consumption by 8.4 million gallons, and it cut 85 million miles off its routes.
Database systems also maximize the following:
-data security -data integrity -data independence
ex of unstructured data
-e-mail messages, -word processing documents, -videos, -images, -audio files, -PowerPoint presentations, -Facebook posts, -Tweets, -Snaps, -ratings and recommendations, -web pages.
Example of A/B experiment and what is it?
-each experiment has only two possible outcomes. -When Etsy analysts noticed that one of its web pages attracted customer attention but failed to maintain it, they looked more closely at the page and discovered that it had few "calls to action." (A call to action is an item, such as a button, on a web page that enables a customer to do something.) On this particular Etsy page, customers could leave, buy, search, or click on two additional product images. The analysts decided to show more product images on the page. Consequently, one group of visitors to the page saw a strip across the top of the page that displayed additional product images. Another group saw only the two original product images. On the page with additional images, customers viewed more products and, significantly, bought more products. The results of this experiment revealed valuable information to Etsy.
Managing Big Data: Big Data makes it possible to do many things that were previously much more difficult;
-for example, to spot business trends more rapidly and accurately, to prevent disease, to track crime, and so on. -When Big Data is properly analyzed, it can reveal valuable patterns and information that were previously hidden because of the amount of work required to discover them.
Characteristics of data warehouses and data marts:
-organized by business dimension or subject -use online analytical processing -integrated -time variant -nonvolatile -multidimensional
Big Data generally consists of the following:
-traditional enterprise data -machine generated data/sensor data -social data -Images captured by billions of devices located throughout the world, from digital cameras and camera phones to medical scanners and security cameras.
Managing Big Data Traditional relational databases versus NoSQL databases
-traditional relational databases such as Oracle and MySQL store data in tables organized into rows and columns -NoSQL databases can manipulate structured as well as unstructured data as well as inconsistent or missing data. For this reason, NoSQL databases are particularly useful when working with Big Data. Hadoop and MapReduce are particularly useful when analyzing massive databases.
Master data management
-type of data governance -is a process that spans all of an organization's business processes and applications. -It provides companies with the ability to store, maintain, exchange, and synchronize a consistent, accurate, and timely "single version of the truth" for the company's master data
Master data
-type of data governance are a set of core data, such as -customer, product, employee, vendor, geographic location, and so on, that span the enterprise's information systems
Two ways to define big data Big Data Institute
-vast datasets that: -Exhibit variety; -Include structured, unstructured, and semistructured data; -Are generated at high velocity with an uncertain pattern; -Do not fit neatly into traditional, structured, relational databases; and -Can be captured, processed, transformed, and analyzed in a reasonable amount of time only by sophisticated information systems.
Characteristics of Big Data
-volume -velocity -variety
Difficulties in Managing Data 6 reasons
1) Data increases exponentially over time 2)Multiple sources of data\ 3) New sources of data 4) Data rot, or data degradation 5)Data security, quality, and integrity 6) Government Regulation
The KMS Cycle (6 steps)
1)Create knowledge. Knowledge is created as people determine new ways of doing things or develop know-how. Sometimes external knowledge is brought in. 2) Capture knowledge. New knowledge must be identified as valuable and be presented in a reasonable way. 3) Refine knowledge. New knowledge must be placed in context so that it is actionable. This is where tacit qualities (human insights) must be captured along with explicit facts. 4) Store knowledge. Useful knowledge must then be stored in a reasonable format in a knowledge repository so that other people in the organization can access it. 5) Manage knowledge. Like a library, the knowledge must be kept current. Therefore, it must be reviewed regularly to verify that it is relevant and accurate. 6) Disseminate knowledge. Knowledge must be made available in a useful format to anyone in the organization who needs it, anywhere and any time.
Managing data in organizations is difficult for many reasons. Reason 4
Adding to these problems is the fact that new sources of data such as blogs, podcasts, tweets, Facebook posts, YouTube videos, texts, and RFID tags and other wireless sensors are constantly being developed, and the data these technologies generate must be managed. Also, the data become less current over time. For example, customers move to new addresses or they change their names, companies go out of business or are bought, new products are developed, employees are hired or fired, and companies expand into new countries.
Managing data in organizations is difficult for many reasons. Reason 3
Another problem is that data are generated from multiple sources: internal sources (for example, corporate databases and company documents); personal sources (for example, personal thoughts, opinions, and experiences); and external sources (for example, commercial databases, government reports, and corporate websites). Some of these data sources are in the form of data streams, which are data that are continuously generated by point-of-sale systems, clickstream data, social media, and sensors.
Database systems also maximize the following: data independence:
Applications and data are independent of one another; that is, applications and data are not linked to each other, so all applications are able to access the same data.
examples of Big Data: autonomous cars
Autonomous cars generate up to 20 terabytes of data per car per day.
Database systems also maximize the following: data security:
Because data are "put in one place" in databases, there is a risk of losing a lot of data at one time. Therefore, databases must have extremely high security measures in place to minimize mistakes and deter attacks.
data consist of structured and unstructured data and are called
Big Data
Companies are able to use:
Big Data to create new business models. -ex: The company recently placed sensors on all of its trucks. These sensors wirelessly communicate sizeable amounts of information to the company, a process called telematics. The sensors collect data on vehicle usage—including acceleration, braking, cornering, and so on—in addition to driver performance and vehicle maintenance. By analyzing this Big Data, the company was able to improve the condition of its trucks through near-real-time analysis that proactively suggested preventive maintenance
Managing data in organizations is difficult for many reasons. Data streams: Clickstream Data
Clickstream data are those data that visitors and customers produce when they visit a website and click on hyperlinks. Clickstream data include the terms that the visitor to the website entered into a search engine to reach that website, all links that users click, how long they spend on each page, if they click the "back" button, if they add or remove items from a shopping cart, and many other data points.
Managing data in organizations is difficult for many reasons. Reason 2
Data are also scattered throughout organizations, and they are collected by many individuals using various methods and devices. These data are frequently stored in numerous servers and locations and in different computing systems, databases, formats, and human and computer languages.
Managing data in organizations is difficult for many reasons. Reason 5
Data are also subject to data rot. Data rot refers primarily to problems with the media on which the data are stored. Over time, temperature, humidity, and exposure to light can cause physical problems with storage media and thus make it difficult to access data. The second aspect of data rot is that finding the machines needed to access the data can be difficult. For example, it is almost impossible today to find 8-track players to listen to music on. Consequently, a library of 8-track tapes has become relatively worthless, unless you have a functioning 8-track player or you convert the tapes to a more modern medium such as DVDs.
Characteristics of data warehouses and data marts integrated:
Data are collected from multiple systems and are then integrated around subjects. For example, customer data may be extracted from internal (and external) systems and then integrated around a customer identifier, thereby creating a comprehensive view of the customer.
Characteristics of data warehouses and data marts organized by business dimension or subject:
Data are organized by subject—for example, by customer, vendor, product, price level, and region. This arrangement differs from transactional systems, where data are organized by business process such as order entry, inventory control, and accounts receivable
Database systems also maximize the following: data integrity:
Data meet certain constraints; for example, there are no alphabetic characters in a Social Security number field.
Managing data in organizations is difficult for many reasons. Reason 6
Data security, quality, and integrity are critical, yet they are easily jeopardized. Legal requirements relating to data also differ among countries as well as among industries, and they change frequently.
Managing data in organizations is difficult for many reasons. ____ ____ hinder the process of gaining actionable insights from organizational data, create barriers to an overall view of the enterprise and its data, and delay digital transformation efforts
Data silos
Characteristics of data warehouses and data marts time variant:
Data warehouses and data marts maintain historical data; that is, data that include time as a variable. Unlike transactional systems, which maintain only recent data (such as for the last day, week, or month), a warehouse or mart may store years of data. Organizations use historical data to detect deviations, trends, and long-term relationships.
attribute
Each characteristic or quality of a particular entity.
examples of Big Data: -facebook
Facebook's 2.45 billion users upload more than 350 million new photos every day. They also click a "like" button or leave a comment more than 5 billion times every day. Facebook's data warehouse stores more than 300 petabytes of data, and the platform receives 600 terabytes of incoming data per day.
examples of Big Data: -Twitter
In July 2020 industry analysts estimated that Twitter users sent some 550 million tweets per day.
Managing data in organizations is difficult for many reasons. Data streams: Point-of-sale data
Organizations capture data from each customer purchase with their POS systems. Clerks (or customers themselves using self-checkout) use bar code scanners to scan each item purchased. POS systems collect data in real time, such as the name, product identification number, and unit price of each item; the total amount of all items purchased; the sales tax on that amount; the payment method used; a time stamp of the purchase; and many other data points.
ex of microsegmentation
Paytronix Systems provides loyalty and rewards program software for thousands of different restaurants. Paytronix gathers restaurant guest data from a variety of sources beyond loyalty and gift programs, including social media. Paytronix analyzes this Big Data to help its restaurant clients microsegment their guests. Restaurant managers are now able to more precisely customize their loyalty and gift programs.
business rules
Precise descriptions of policies, procedures, or principles in any organization that stores and uses data to generate information
Managing data in organizations is difficult for many reasons. Data streams: Social Media Data
Social media data (also called social data) are the data collected from individuals' activity on social media websites, including Facebook, YouTube, LinkedIn, Twitter, and many others. These data include shares, likes and dislikes, ratings, reviews, recommendations, comments, and many other examples.
examples of Big Data: Google's YouTube
The 2 billion users of Google's YouTube service upload more than 300 hours of video per minute. Google itself processes on average more than 70,000 search queries per second.
Managing data in organizations is difficult for many reasons. Data streams: Sensor Data
The Internet of Things (IoT; see Chapter 8) is a system in which any object, natural or manmade, contains internal or external wireless sensor(s) that communicate with each other without human interaction. Each sensor monitors and reports data on physical and environmental conditions around it, such as temperature, sound, pressure, vibration, and movement. Sensors can also control physical systems, such as opening and closing a valve and adjusting the fuel mixture in your car.
Characteristics of Big Data velocity:
The rate at which data flow into an organization is rapidly increasing. Velocity is critical because it increases the speed of the feedback loop between a company, its customers, its suppliers, and its business partners. For example, the Internet and mobile technology enable online retailers to compile histories not only on final sales, but on their customers' every click and interaction. Companies that can quickly use that information—for example, by recommending additional purchases—gain competitive advantage.
Characteristics of Big Data variety:
Traditional data formats tend to be structured and relatively well described, and they change slowly. Traditional data include financial market data, point-of-sale transactions, and much more. In contrast, Big Data formats change rapidly. They include satellite imagery, broadcast audio streams, digital music files, web page content, scans of government documents, and comments posted on social networks.
Characteristics of Big Data volume:
We have noted the huge volume of Big Data. Consider machine-generated data, which are generated in much larger quantities than nontraditional data. For example, sensors in a single jet engine can generate 10 terabytes of data in 30 minutes. With more than 25,000 airline flights per day, the daily volume of data from just this single source is incredible. Smart electrical meters, sensors in heavy industrial equipment, and telemetry from automobiles compound the volume problem.
Example of a data file/table?
a grouping of the records from a particular course, consisting of course number, professor, and students' grades, would constitute a data file for that course.
Managing a database refers to the processes of :
adding, deleting, accessing, modifying, and analyzing data that are stored in a database.
Databases are arranged so that one set of software programs—the database management system—provides ____
all users with access to all of the data.
Open data
are accessible public data that individuals and organizations can use to create new businesses and solve complex problems.
Data are organized in a hierarchy that begins with ____ and proceeds all the way to ____
bits; databases
Big Data generally consists of the following Social data:
customer feedback comments; microblogging sites such as Twitter; and social media sites such as Facebook, YouTube, and LinkedIn.
Big Data generally consists of the following traditional enterprise data:
customer information from customer relationship management systems, transactional enterprise resource planning data, Web store transactions, operations data, and general ledger data.
Two ways to define big data Gartner:
defines Big Data as diverse, high-volume, high-velocity information assets that require new forms of processing in order to enhance decision making, lead to insights, and optimize business processes.
Microsegmentation simply means
dividing customers up into very small groups, or even down to the individual customer.
Segmentation of a company's customers means
dividing them into groups that share one or more characteristics.
Organizations implement databases to
efficiently and effectively manage their data.
ex of attribute
if our entities were a customer, an employee, and a product, entity attributes would include customer name, employee number, and product color.
Big Data allows organizations to
improve performance by conducting controlled experiments. ex: Amazon (and many other companies such as Google and LinkedIn) constantly experiments by offering slightly different looks on its website
A byte can be a ____, ____, or a ____
letter, a number, or a symbol.
ex of structured data
must be defined in terms of field name and type (e.g., alphanumeric, numeric, and currency)
Hot Data
refers to data that must be accessed frequently and rapidly. -ex eHarmony
Cold Data
refers to the storage of relatively inactive data that does not have to be accessed frequently or rapidly
Tables enable people to compare information quickly by:
row or column
telematics
sensors wirelessly communicating sizeable amounts of information
Big Data generally consists of the following Machine generated data/sensor data:
smart meters; manufacturing sensors; sensors integrated into smartphones, automobiles, airplane engines, and industrial machines; equipment logs; and trading systems data.
Data is processed in several ____ and multiple ____
stages; locations
Making Big Data available for relevant
stakeholders can help organizations gain value
Fields can contain data other than _____
text and numbers, such as an image, or any other type of multimedia.
ex of secondary key
the student's major might be a secondary key if a user wanted to identify all of the students majoring in a particular field of study.
Managing data in organizations is difficult for many reasons. Organizations have developed information systems for specific business processes, such as
transaction processing, supply chain management, and customer relationship management.
Characteristics of data warehouses and data marts use online analytical processing:
where business transactions are processed online as soon as they occur. -The objectives are speed and efficiency, which are critical to a successful Internet-based business operation. -In contrast, data warehouses and data marts, which are designed to support decision makers but not OLTP, use online analytical processing (OLAP), which involves the analysis of accumulated data by end users.
Entities can typically be identified in the user's:
work environment.