Information Systems Chapter 15
What 2 critical conditions need to be present for data mining to work?
(1) the organization must have clean, consistent data, and (2) the events in that data should reflect current and future trends
Tools that put users in control so that they can create custom reports on an as-needed basis by selecting fields, ranges, summary conditions, and other parameters.
Ad hoc reporting tools
A term describing the extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions.
Analytics
Computer software that can mimic or improve upon functions that would otherwise require human intelligence.
Artificial Intelligence (AI)
computer software that seeks to reproduce or mimic (perhaps with improvements) human thought, decision-making, or brain functions.
Artificial intelligence
_____ is computer software that seeks to reproduce or mimic human thought, decision making, or brain functions.
Artificial intelligence software
A general term used to describe the massive amount of data available to today's managers. Big data are often unstructured and are too big and costly to easily work through use of conventional databases, but new tools are making these massive datasets available for analysis and insight.
Big Data
iBeacon technology works by using ___________ technology.
Bluetooth Low Energy
what does it mean to over-engineer a model?
Build a model with so many variables that the solution arrived at might only work on the subset of data you've used to create it.
A term combining aspects of reporting, data exploration and ad hoc queries, and sophisticated data modeling and analysis.
Business Intelligence (BI)
An acronym standing for completely automated public Turing test to tell computers and humans apart. The Turing Test is, rather redundantly, an idea (rather than an official test) that one can create a test to tell computers apart from humans.
CAPTCHAs
Data Quality
Can this data be trusted; is it accurate, clean, complete, and reasonably free of errors? How can our data be made more accurate and valuable for analysis? Will we need to "scrub," calculate, and consolidate data so that it can be used?
Data Sourcing
Can we obtain all the data we'll need? From where? Can we get it through our internal systems or from third-party data aggregators, suppliers, or sales partners? Do we need to set up new collection efforts, surveys, or systems to obtain the data we need?
Reports that provide regular summaries of information in a predetermined format. - often developed by information systems staff - formats can be difficult to alter
Canned reports
personalizing an individual customer's experience based on the trends and preferences identified across similar customers.
Collaborative filtering
determining which customers are likely to leave, and what tactics can help the firm avoid unwanted defections.
Customer churn
used to empower employees to track and record data at nearly every point of customer contact
Customer relationship Management systems (CRM)
figuring out which customers are likely to be the most valuable to a firm.
Customer segmentation
A heads-up display of critical indicators that allow managers to get a graphical glance at key performance metrics.
Dashboards
raw facts and figures
Data
A catch-all term for storage and access technologies used in Big Data. systems that allow for the storage of data in both structured as well as "raw," "unfiltered" formats. they also provide the tools to be able to "pipe out" data, filter it, and refine it so that it can be turned into information.
Data Lake
Setting up new collection efforts, surveys, or systems to obtain the data is called ________.
Data Sourcing
A database or databases focused on addressing the concerns of a specific problem (e.g., increasing customer retention, improving product quality) or business unit (e.g., marketing, engineering).
Data mart
A(n) _____ is a database or databases focused on addressing the concerns of a specific problem.
Data mart
The process of using computers to identify hidden patterns in, and to build models from, large datasets.
Data mining
A set of databases designed to support decision-making in an organization. - It is structured for fast online queries and exploration. - they may aggregate enormous amounts of data from many different operational systems
Data warehouse
A single table or a collection of related tables of data
Database
Job title focused on directing, performing, or overseeing activities associated with a database or set of databases. These may include (but not necessarily be limited to): database design, creation, implementation, maintenance, backup and recovery, policy setting and enforcement, and security.
Database administrator (DBA)
A type of machine learning that uses multiple layers of interconnections among data to identify patterns and improve predicted results. Deep learning most often uses a set of techniques known as neural networks and is popularly applied in tasks like speech recognition, image recognition, and computer vision.
Deep Learning
The process of identifying and retrieving relevant electronic information to support litigation efforts.
E-discovery
_____ refers to identifying and retrieving relevant electronic information to support litigation efforts.
E-discovery
AI systems that leverage rules or examples to perform a task in a way that mimics applied human expertise.
Expert Systems
Google is consistently seen as a better source of flu outbreaks than the U.S. Center for Disease Control. (T/F)
False
One career thought to be 'safe' from disruption by technology is journalist, since a computer could never write a newspaper article or a book. (T/F)
False
Collaborative filtering is determining which customers are likely to leave, and what tactics can help the firm avoid unwanted defections. (T/F)
False (Collaborative filtering is personalizing an individual customer's experience based on the trends and preferences identified across similar customers.)
Proven relational technology is considered especially effective for "Big Data" work. (T/F)
False (Conventional tools often choke when trying to sift through the massive amounts of data collected by many of today's firms.)
Dynamic pricing is considered especially appropriate for retailers, such as grocery stores or department stores. (T/F)
False (Dynamic pricing is especially tricky in situations where consumers make repeated purchases and are more likely to remember past prices, and when they have alternative choices, like grocery or department store shopping.)
The three "Vs" of Big Data refer to the names of the three leading commercial and open source technologies used in most of these efforts. (T/F)
False (The three Vs of "Big Data"-volume, velocity, and variety-distinguish it from conventional data analysis problems and require a new breed of technology.)
Server crash? No problem. Big data storage is designed in such a way so that there will be no single point of failure. The system will continue to work, relying on the remaining hardware.
Fault-tolerance
What are the 4 primary advantages cited when using big data technologies?
Fexibility, Scalability, Cost effectiveness, Fault-tolerance
building trading systems to capitalize on historical trends.
Financial modeling
Transaction processing systems (TPS)
For most organizations that sell directly to their customers, _________ represents a fountain of potentially insightful data
uncovering patterns consistent with criminal activity.
Fraud detection
model-building techniques where computers examine many potential solutions to a problem, iteratively modifying (mutating) various mathematical models, and comparing the mutated models to search for a best alternative function.
Genetic algorithms
A set of mostly open-source tools to manage massive amounts of unstructured data for storage, extraction, and computation.
Hadoop
identifying characteristics consistent with employee success in the firm's various roles.
Hiring and promotion
Data Quantity
How much data do we need?
How does a relational database system work?
In relational database systems, several data fields make up a data record, multiple data records make up a table or data file, and one or more tables or data files make up a database. Files that are related to one another are linked based on a common field (or fields) known as a key. If the value of a key is unique to a record in a table, and that value can never occur in that field while referring to another record in that table, then it is a primary key. If a key can occur many times over multiple records in a table but relates back to a primary key in another table, then it is a foreign key.
Data presented in a context so that it can answer a question or support decision making.
Information (the goal is to turn data into information)
Insight derived from experience and expertise.
Knowledge
Older/outdated information systems that are often incompatible with other systems, technologies, and ways of conducting business. Incompatible legacy systems can be a major roadblock to turning data into information, and they can inhibit firm agility, holding back operational and strategic initiatives.
Legacy systems
Systems that provide rewards and usage incentives, typically in exchange for a method that provides a more detailed tracking and recording of customer activity. In addition to enhancing data collection, loyalty cards can represent a significant switching cost.
Loyalty card
A type of artificial intelligence that leverages massive amounts of data so that computers can improve the accuracy of actions and predictions on their own without additional programming.
Machine Learning
determining which products customers buy together, and how an organization can use this information to cross-sell more products or services.
Market basket analysis
identifying which customers will respond to which offers at which price at what time.
Marketing and promotion targeting
A statistical techniques used in AI, and particularly in machine learning. They hunt down and expose patterns, building multilayered relationships that humans can't detect on their own.
Neural Networks
L.L. Bean's rollout of Big Data efforts for improved customer service involved all of the following except: - Reporting tools were made available to non-technical staff - Training technical staff - Training marketing staff - Creating NoSQL to adapt to a business with some 30 different customer engagement channels - None of the above is incorrect, the effort leveraged all of the above.
None of the above is incorrect, the effort leveraged all of the above.
Software that can scan images and identify text within them.
OCR (Optical Character Recognition.)
_____ refers to a method of querying and reporting that takes data from standard relational databases, calculates and summarizes the data, and then stores the data in a special database.
Online analytical processing
The most common standard for expressing databases, whereby tables (files) are related based on common keys.
Relational databases
Which of the following is by far the most common standard for expressing databases?
Relational databases
Which of the following is not considered an advantage of Hadoop? Cost effectiveness Fault tolerance Scalability Flexbility Relational structure
Relational structure
here data used to build models that determine an end result may contain data that has outputs explicitly labeled as well as unlabeled, e.g., "hey software, take a look at my categorizations and see if they are valid or you can come up with better or missing ones"
Semi-supervised learning
A type of cloud computing where a third-party vendor manages servers, replication, fault-tolerance, computing scalability, and certain aspects of security, freeing software developers to focus on building "Business Solutions" and eliminating the need to spend time and resources managing the technology complexity of much of the underlying "IT Solution."
Serverless computing efforts
(the most common) language used to create and manipulate databases
Structured query language (SQL)
where algorithms are trained by providing explicit examples of results sought, like defective vs. error-free, or stock price
Supervised learning
How can data aggregators be controversial?
The represent a big target for identity thieves, are a method for spreading potentially incorrect data, and raise privacy concerns
Some kind of business exchange.
Transaction
Systems that record a transaction (some form of business-related exchange), such as a cash register sale, ATM withdrawal, or product return.
Transaction processing systems (TPS)
A lack of diverse perspectives among software development and managerial teams can lead to systems that put an organization at risk. (T/F)
True
Data quantity answers the question: How much data do we need? (T/F)
True
Expert systems are used in tasks ranging from medical diagnoses to product configuration. (T/F)
True
Firms may turn to third parties and outside services to acquire data for predictive models. (T/F)
True
Modern datasets can be so large that it might be impossible for humans to spot underlying trends without the use of data mining tools. (T/F)
True
NoSQL technologies are often used with massive, disparately structured data. (T/F)
True
Spotify's EchoNest subsidiary analyzes both tracks themselves, as well as what others are saying about music in order to build automated music recommendations. (T/F)
True
Walmart uses Hadoop to sift through social media posts about the firm. (T/F)
True
where data are not explicitly labeled and don't have a predetermined result. Clustering customers into previously unknown groupings machine be one example
Unsupervised learning
What are the 3 V's of Big Data
Velocity, Volume, Variety
Data Relevance
What data do we need in order to compete on analytics? What data do we need to meet our current and future goals?
Data Governance
What rules and processes are needed to manage this data, from creation through retirement? Are there operational (backup, disaster recovery), legal or privacy concerns? How should the company handle access and security ?
Data hosting
Where will the data systems be housed? What are the hardware and networking requirements for that effort?
What is enterprise software?
a source for customer, supply chain, and enterprise data
What is used to transform data into information? - canned reports - ad hoc reports - digital dashboards - OLAP - all of the above
all of the above
Algorithms that learn based on data that includes bias may be ______.
biased
The ___________ nature of AI makes it especially difficult to transparently demonstrate how machine learning software has arrived at decisions.
black box
A column in a database table. Columns represent each category of data contained in a record (e.g., first name, last name, ID number, date of birth); it defines the data that a table can hold
column or field
Firms that collect and resell data.
data aggregators
A special database used to store data in OLAP reporting.
data cube
Data used in online analytical processing (OLAP) reporting are stored in _____.
data cubes
Flexibility
data lakes can absorb any type of data, structured or not, from any type of source (geeks would say such a system is schema-less). But this disparate data can still be aggregated and analyzed.
For ____________ to work, the organization must have clean, consistent data; and the events in that data should reflect current and future trends.
data mining
___________, when combined with a firm's internal data assets, can give the firm a competitive edge
data obtained from outside sources
An unusable pile of big data is sometimes referred to as a
data swamp
A set of databases designed to support decision making in an organization is known as a(n):
data warehouse
Sometimes referred to as database software; software for creating, maintaining, and manipulating data.
database management systems (DBMS)
Walmart stopped sharing its data assets with information brokers like ACNielsen and Information resources because: - Walmart realized that there was money to be made by selling its data, instead of sharing it with information brokers. - these agencies violated the terms of agreement under which data was originally shared. - Walmart found that it could access industry data in a much more efficient manner from other third-party sources. - Walmart decided to directly pass on the data to its competitors to leverage network effects. - due to Walmart's huge scale, the agencies offered no extra value with their additional data.
due to Walmart's huge scale, the agencies offered no extra value with their additional data.
Changing pricing based on demand conditions is known as:
dynamic pricing
A phrases table, column, and row are also referred to by the names ___________ respectively.
file, field, record
A(n) ______ is a model building technique in which computers examine many potential solutions to a problem, iteratively modifying various mathematical models, and comparing the modified models to search for a best alternative.
genetic algorithm
What is the big culprit limiting BI initiatives
getting data into a form where it can be used, analyzed, and turned into information
RFID technology is often used to.
identify products as they move through an organization's value chain (With radio frequency identification (RFID), inventory can literally announce its presence so that firms can precisely journal every hop their products make along the value chain: "I'm arriving in the warehouse," "I'm on the store shelf," "I'm leaving out the front door.")
What 3 skills should a competent business analytics team possess?
information technology, statistics, and business knowledge
The ratio of a company's annual sales to its stock is known as the _____ ratio.
inventory turnover
What are CAPTCHAs meant to do?
keep out automated software that may create accounts used in spamming or other nefarious activity.
The problem of incompatible legacy systems limiting firms' ability to turn data into information is compounded by:
mergers and acquisitions
An integrated shopping experience and unified customer view across channels is sometimes referred to as _______________
omnichannel
Providing customers with a unified experience across customer channels, which may include online, mobile, catalog, phone, and retail. Pricing, recommendations, and incentives should reflect a data-driven, accurate, single view of the customer.
omnichannel
A method of querying and reporting that takes data from standard relational databases, calculates and summarizes the data, and then stores the data in a special database called a data cube.
online analytical processing (OLAP)
A row in a database table. Records represent a single instance of whatever the table keeps track of (e.g., student, faculty, course title).
row or record
Systems that can absorb any type of data, structured or not, from any type of source are often referred to as:
schema-less
You will often hear technologists refer to the SQL standard by pronouncing it as
sequel
A problem limiting the turning of data into information is that most transactional databases are not set up to be: - functional in scalable, high volume environments. - used across different domains. - used for recording customer activity. - simultaneously accessed for reporting and analysis. - capable of storing large volumes of data.
simultaneously accessed for reporting and analysis.
What happens when information is combined with knowledge?
stronger decisions are made
list of data, arranged in columns (fields) and rows (records)
table or file
Cost effectiveness
the technology, much of it open source and that can be started on limited computing resources before scaling, is often considered cheap by data-warehousing standards. Using a cloud service data lake offering may further reduce hardware and management costs.
Scalability
these systems can start on a single PC, but thousands of machines can eventually be combined to work together for storage and analysis. Depending on the solution, a cloud provider may be able to mask the complexity required to scale systems and simply "spin up" resources whenever more capacity is needed.
How can OLAP tools present results?
through multidimensional graphs or vis spreadsheet-style cross-tab reports
What is the goal of AI?
to create computer programs that are able to mimic or improve upon functions that would otherwise require human intelligence.
What is the idea behind query and reporting tools?
to present users with a subset of requested data, selected, sorted, ordered, calculated, and compared, as needed
How can survey data be used?
to supplement a firm's operational data
The amount of data being created doubles every ____ years
two