MGMT 300 Chapter 15

¡Supera tus tareas y exámenes ahora con Quizwiz!

Two Conditions for Data Mining to work

(1) organization must have clean, consistent data (2) events in that data should reflect current and future trends

Tools Fueling the Current Spread of Artificial Intelligence

*1. new generation of hardware chips* - that fuel AI through designs tailored to find patterns faster *2. cloud resources* *3. open source algorithms* - that can be applied to creating custom insights *4. software development kits* - that create standards for building AI into apps and other products *5. data-capture tools* - that include sensors, cameras, and microphones.

Transactional Databases

- *are not* set up to be simultaneously accessed for reporting and analysis - *in order to run analytics* the data must be ported to a data warehouse or data mart.

Airbnb

- Airbnb 150-plus data scientists crunch 15 petabytes of data daily - over 3.5 million listings across nearly 200 countries generate upwards of 15 billion logged events that help maximize satisfaction from both property hosts and and their renting customers. - Airbnb's *smart pricing feature* uses machine learning to constantly tweak the accuracy of models that suggest the perfect rate (known as Aerosolve)

Walmart HR (Virtual Reality)

- With 1.5 million employees in the US and 2.1 million employees worldwide, Walmart is the world's largest private employer - Walmart now has virtual reality headsets in the back of its 4,600 US locations, and over a million Walmart workers have used the headsets in programs the firm has created to learn procedures. *Walmart executives also hope the this VR technology will help* - decrease employee bias - reduce bias in hiring - increase diversity - contribute to improved employee retention *VR is now proving to be a sort of "flight simulator"* for employees to train in various scenarios, and the system provided by the startup STRIVR is now in use at other firms that include Chipotle, JetBlue, and Fidelity Investments.

Data Mart

- a database(s) focused on addressing the concerns of a specific problem or business unit *such as* - increasing customer retention - improving product quality - marketing - engineering

Data Swamp

- an unusable pile of big data

Event-Driven Medicine System

- built by Dr. John Halamka and his team at Boston's Beth Israel Deaconess Medical Center (part of the Harvard Medical School network). *when using this system and encounter a patient with a chronic disease* - docs generate a decision support *"screening sheet."* - *each event in the system* an office visit, a lab results report updates the patient database - the medical equivalent of transactions and customer interactions - *combine that with Artificial Intelligence* and the system can offer recommendations for care - system helps to ensure that key issues are on a provider's radar.

Tableau (Business Intelligence Tool)

- can gather data from multiple resources, and allow users to explore relationships and create powerful, consolidated reports and charts.

Business Intelligence (BI)

- combines aspects of reporting, data exploration and ad hoc queries, and sophisticated data modeling and analysis - *often times* used interchangeable with *analytics*, meaning using data for better decision making

Dashboards

- heads-up display of critical indicators that allow managers to get a graphical glance at key performance metrics. - some tools allow data to be exported into spreadsheets

Walmart is a mature business that needs to find huge markets or find dramatic cost savings

- in order to boost profits and continue to move its stock price higher. - Walmart's success also makes it a high-impact target for criticism and activism. - Walmarts data assets could not predict impactful industry trends such as the rise of Target and other upscale discounters.

Knowledge

- insight derived from experience and expertise - a stronger decision can be made - based on data and information

Engineering Society IEEE

- is crafting ethical guidelines for machine learning in products such as the design of autonomous systems

Federal IT Dashboard

- offers federal agencies, and the general public, information about the government's IT investments.

____________ provides customers with a unified experience across customer channels, which may include online, mobile, catalog, phone, and retail.

Omnichannel

Why do firms need to create separate data repositories for their reporting and analytics work?

Running analytics against transactional data can bog down a TPS.

Canned and ad hoc reports, digital dashboards, and OLAP

are all used to transform data into information.

The __________ nature of machine learning is especially difficult to break apart in a clear demonstration of how systems make their decisions.

black box

Major Factor *Limiting Business Intelligence* initiatives

is getting data into a form where it can be used, analyzed, and turned into information

Roll-your-own OFfers using a basket of Hadoop

preferred by organizations when more control in managing all aspects of technology is needed and firms have the resources to provide this kind of support.

Supervised Learning

- where algorithms are trained by providing explicit examples of results sought, like defective vs. error-free, or stock price

Amount of Data created

- doubles every two years

Issues Managers (as well as concerned citizens) want to be aware of involving Artificial Intelligence

*1. *Data quality, inconsistent data, or the inability to integrate data sources *into a single dataset capable of input into machine learning systems can all stifle efforts. *2. Not enough data.* *3. *Technical staff may require training* in developing and maintaining such systems, and such skills are rare. *4. *AI systems also involve a discipline known as "change management"* that goes hand-in-hand with many IS projects. - *change management* seeks to identify how workflows and processes are to be altered, and how to manage the worker and organizational transition from one system to another *5. *Some types of machine learning may be legally prohibited* because of the data used or the inability to identify how a model works and whether or not it might be discriminatory *6. The negative unintended consequences of data misuse* might also lead to regulation that limits techniques currently used *7. Workers are startled to find that in the United States*, just about anything done on organizational networks or using a firm's computer hardware can be monitored *8. *Firms that gain an early lead and benefit from scale may be in a position to collect more data than competitors*, fueling a virtuous cycle where early winners generate more data, have stronger predictive capabilities, and can have an edge in entering new markets, offering new services, attracting customers, and cutting price

Key Areas Leveraging *Data Mining*

*1. Customer segmentation* - figuring out which customers are likely to be the most valuable to a firm. *2. Marketing and promotion targeting* - identifying which customers will respond to which offers at which price at what time. *3. Market basket analysis* - determining which products customers buy together, and how an organization can use this information to cross-sell more products or services. *4. Collaborative filtering* - personalizing an individual customer's experience based on the trends and preferences identified across similar customers. *5. Customer churn* - determining which customers are likely to leave, and what tactics can help the firm avoid unwanted defections. *6. Fraud detection* - uncovering patterns consistent with criminal activity. *7. Financial modeling* - building trading systems to capitalize on historical trends. *8. Hiring and promotion* - identifying characteristics consistent with employee success in the firm's various roles.

Broader Issues Needed to Design, Develop, Deploy, and Maintain Systems

*1. Data relevance* - What data do we need in order to compete on analytics? - What data do we need to meet our current and future goals? *2. Data sourcing* - Can we obtain all the data we'll need? - Can we get it through our internal systems or from third-party data aggregators, suppliers, or sales partners? - Do we need to set up new collection efforts, surveys, or systems to obtain the data we need? *3. Data quantity* - How much data do we need? *4. Data quality* - Can this data be trusted; is it accurate, clean, complete, and reasonably free of errors? - How can our data be made more accurate and valuable for analysis? - Will we need to "scrub," calculate, and consolidate data so that it can be used? *5. Data hosting* - Where will the data systems be housed? - What are the hardware and networking requirements for that effort? *6. Data governance* - What rules and processes are needed to manage this data, from creation through retirement? - Are there operational (backup, disaster recovery), legal or privacy concerns? - How should the company handle access and security ?

Four Primary Advantages of using Big Data Technologies (under the Hadoop Umbrella)

*Flexibility:* - data lakes can absorb any type of data, structured or not, from any type of source *Scalability:* - these systems can start on a single PC, but thousands of machines can eventually be combined to work together for storage and analysis. *Cost Effectiveness:* - the technology, much of it open source and that can be started on limited computing resources before scaling, is often considered cheap by data-warehousing standards. *Fault-tolerance:* - big data storage is designed in a way so that there will be no single point of failure - system will continue to work, relying on the remaining hardware.

AI systems Racist or Sexist

- *AI systems (or algorithms) that learn based on data*, make any biases in data a part of the model *Facial Recognition Systems* built using data with mostly Caucasian faces have been shown to be weaker in identifying people of color - misidentified darker-skinned women 35% and dark-skinned men 12% of the time *implications of such AI Systems* - range from poor customer service to harmful incarceration causing misclassification by criminal justice and security systems. - *Ex: Amazon.com* stopped use of a system for analyzing job applicant resumes when it turned out the system had learned to favor male applicants and downgrade women - diversity of systems development staff can influence model development - AI systems may reveal biases in processes and workforce that firms aren't even aware exist

Customer Relationship Management Systems (CRM)

- *allows firms* to set up systems to gather additional data beyond conventional purchase transactions or website monitoring. - *used to empower employees* to track and record data at nearly every point of customer contact. - *can capture* all these events for subsequent analysis or for triggering follow-up events. *enterprise software* includes: - CRM systems - Supply Chain Management (SCM) - Enterprise Resource Planning (ERP) systems *enterprise software* (CRM, SCM, and ERP) is a source for customer, supply chain, and enterprise data.

Walmart Shares Sales Data with only Relevant Suppliers

- *data can help firms become more efficient* so Walmart can keep dropping prices - *data can help firms uncover patterns* that help suppliers sell more. - *more than 17,000 suppliers are given access to their products' Walmart performance across metrics* including daily sales, shipments, returns, purchase orders, invoices, claims, and forecasts - Walmart stopped sharing data with information brokers - Walmart custom builds large portions of its information systems to keep competitors off its trail - *To help suppliers become more efficient, and as a result lower prices, Walmart shares data with them*

Benefits from Data Mastery

- *data leverage* is at the center of competitive advantage in firms such as Amazon, Netflix, and Zara. *Walmart:* helped them to the top of the Fortune 500 list. *Spotify*: helped them craft you a killer playlist for your run. *Google*: helped maked Google's voice recognition a better listener and Google Search a better detective - encourages firms to make their product the best it possibly can - data-driven insights are credited with helping politicians win elections.

Walmart (Big Data)

- *every hour*, Walmart gathers data on over a million transactions and crunches over 2.5 petabytes of data for continued insights. - *claim* they have built "the world's biggest private cloud," with over 40 petabytes of total data, and growing.

Partnership for Artificial Intelligence to Benefit People and Society

- *industry leaders including Amazon, Apple, Facebook, Google, IBM, and Microsoft*, working with additional participants, such as the human rights organization Amnesty International, have come together to co-create a set of best practices and guidelines - *goal* is to develop and share best practices, ensuring that technology is developed to be ethical, fair, inclusive, transparent, secure, reliable interoperable, and trustworthy

Data that can be purchased from aggregators

- *may not in and of itself yield sustainable competitive advantage* since others may have access to this data - *BUT* when combined with a firm's proprietary data or integrated with a firm's proprietary procedures or other assets, third-party data can be a key tool for *enhancing organizational performance*

Analytics may not always provide the total solution for a problem

- *sometimes a pattern is uncovered*, but determining the best choice for a response is less clear.

AI System Facts

- AI education is made up of 80% male professors and 75% male undergraduates. (meaning 20% female professors and 25% female undergraduates) - Facebook's AI team is only 15% female - Google's AI team is only 10 percent% and just 2.5% of their workforce is black - Facebook and Microsoft it's just 4% of their workforce is black vs. over 12% across the US pop.

Walmart Criticism

- Accusations of sub-par wages draw union activists. - Poor labor conditions at some of the firm's contract manufacturers. - Demand prices so aggressively low that suppliers end up cannibalizing their own sales at other retailers.

New Tools Supporting

- big data, business intelligence, analytics, and machine learning are helping managers make sense of this data torrent.

Data lake

- a catch-all term for storage and access technologies used in Big Data. - *systems that allow* for the storage of data in both structured as well as "raw," "unfiltered" formats. - *provide* the tools to be able to "pipe out" data, filter it, and refine it so that it can be turned into information. - not only hold all data inflows, they contain the tools allowing users to "dive in, or take samples - *used by* Amazon, Google, IBM, Microsoft, and SAS to refer to product and service offerings.

Data Warehouse

- a set of databases designed to support decision-making in an organization. - is structured for fast online queries and exploration - aggregate enormous amounts of data from many different operational systems. - large data warehouses are complex, can cost millions, and take years to build

Database

- a single table or a collection of related tables - a list (or more likely, several related lists) of data - most organizations have several databases or even 100s or 1000s - *various databases focus on any combination of functional areas* sales, product returns, inventory, payroll, geographical regions, or business units - *specialized databases* are created for recording transactions or aggregate data from multiple sources in order to support reporting and analysis

Neural Networks

- a statistical techniques used in AI (particularly in machine learning and are popular in data mining) - examine data and hunt down and expose patterns, in order to build models to exploit findings - *identify patterns* by testing multilayered relationships that humans can't detect on their own - *are often referred to as a "black box,"* meaning that the weights and relationships of data that identify patterns approximate a mathematical function, but are difficult to break out as you would in a traditional mathematical formula

Machine Learning

- a type of artificial intelligence that leverages massive amounts of data so that computers can learn and improve the accuracy of actions and predictions on their own without additional programming - software that contains the ability to learn or improve without being explicitly programmed.

Severless

- a type of cloud computing where a third-party vendor manages servers, replication, fault-tolerance, computing scalability, and certain aspects of security. - *computing efforts allow firms* to store large amounts of data, including unstructured data, along with tools to extract and use this data - *serverless computing is attractive* in projects where it's OK to have someone else manage the "IT solution" of server, storage, security - *instead focus* on the "business" solution of creating serverless business logic and piping data in and out of serverless storage.

Data Lakes provide

- access to a large pool of data that may be structured or in its "raw" form - *also include tools* for access and analysis, which sets them apart from inaccessible and unusable "data swamps."

Transaction

- any kind of business exchange - *occurs* every time a consumer uses a point-of-sale system, an ATM, or a service desk - representing an event that's likely worth tracking.

Expert Systems

- are AI systems that leverage rules or examples to perform a task in a way that mimics applied human expertise - are used in tasks ranging from medical diagnoses to product configuration. - may be programmed with explicit rules or rules may be automatically built by analyzing specific cases against outcomes

Machine Learning and AI Challenges

- are increasingly being mentioned in filings with the US Securities and Exchange Commission (SEC) - least 55 firms mentioned AI risks in 2018, more than double the number from a year earlier.

Genetic Algorithms

- are model-building techniques where computers examine many potential solutions to a problem - modifying (mutating) various mathematical models, and comparing the models to search for a best alternative function.

Which reasons indicate why legacy systems often limit data utilization?

- are often not aligned with a firm's current business needs - are often not designed to share data - are often not compatible with newer technologies

Legacy Systems

- are older information systems that are often incompatible with other systems, newer technologies, and ways of conducting business *incompatible legacy systems* can be a major roadblock to turning data into information *incompatible legacy systems* can inhibit firm agility, holding back operational and strategic initiatives. - *problem made worse* by mergers and acquisitions

Canned Reports

- are reports that provide regular summaries of information in a predetermined format. - developed by information systems staff - formats can be difficult to alter.

Data Warehouses and Data Marts

- are repositories for large amounts of transactional data awaiting analytics and reporting. - data stored in data warehouses and data marts are distilled, cleansed, and ready to consume - contains huge volumes of data

Relational Database Management Systems (RDBMS)

- are the most common database standard for creating and manipulating databases - *SQL (or structured query language)* is the most popular standard for relational database systems.

Over-Engineer

- build a model with so many variables that the solution arrived at might only work on the subset of data you've used to create it. - *test to see if you're looking at a random occurrence in the numbers by* dividing your data used to building your model with one portion of the data, and using another portion to verify your results

Artificial Intelligence (AI)

- computer software that can mimic or improve upon functions that would otherwise require human intelligence. - AI starts with what are sometimes referred to as "naked algorithms" - an explosion of tools is fueling the current spread of AI - data mining has its roots in AI - Google CEO Sundar Pichai says that AI will have a "more profound" impact than electricity or fire - Gartner estimates that AI delivered some $1.2 trillion in aggregate business value in 2018 - IDC predicts worldwide spending on cognitive and Artificial Intelligence systems will reach $77.6B in 2022.

General Data Protection Regulation

- created and drafted by european regulators to provide data protection for Internet users - GDPR rulings have made many accustomed to click-and-continue without paying attention to consent granted, creating increased security concerns

Information

- data presented in a context so that it can answer a question or support decision making. - is the *goal* of data - often combined with managers knowledge

Analytics

- describes the extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions.

Survey Data

- direct surveys can tell you what your cash register can't - can be used to supplement a firm's operational data. *example* - Zara store managers informally survey customers in order to help shape designs and product mix - * many CRM products* have survey capabilities that allow for additional data gathering at all points of customer contact.

US Health Care System

- estimates suggest that health care spending makes up 18% of US gross domestic product - *example*: US automakers spend more on health care than they do on steel *medical errors are the third leading cause of death in the United States* - responsible for as many as 250,000 unnecessary deaths in the United States each year - more than motor vehicle accidents, breast cancer, or AIDS - *technology* has the potential to reduce errors, improve healthcare quality, and save costs. - pioneering hospital networks and technology companies are partnering to help tackle cost and quality issues.

Data Aggregators

- firms that collect and resell data - are part of a multibillion-dollar industry that provides genuinely helpful data to a wide variety of organizations. - companies that you've likely never heard of but that are thought to have more data on you than Facebook or Google - *report by the US Federal Trade Commission* these include Acxiom, CoreLogic, Datalogix, TransUnion, ID Analytics, Intelius, PeekYou, TowerData, and Recorded Future *information includes* - name - address - Social Security number - category labeled "ability to afford products - granular categorization *common individuals targeted include* - thrifty elders - new age/organic lifestyle adherents - bikers/Hell's Angels - people who do a lot of medical Googling - *internet* allows for easy access to data that had been public but is otherwise difficult to access (ex: home sale prices and home value assessments)

Catching the Golden State Killer (AI Facial Recognition)

- fugitive known as the Golden State Killer is one of the most heinous criminals in US history. - Police linked him to at least 12 murders, 50 rapes, and over 100 burglaries from 1976 to 1986. *Using a well-preserved piece of biological evidence, officers sequenced the DNA of the Golden State Killer and uploaded it to the GEDMatch website*, an open-source clearinghouse of publicly-shared DNA. - this evidence was all available without requiring a warrant or a court order. - case brought to light additional issues of privacy and ethical data use. - big data and machine learning can keep us safe and deliver social good BUT can expose others to discrimination, extortion, or the public release of sensitive information

Database Administrator (DBA)

- is a job title focused on directing, performing, or overseeing activities associated with a database or set of databases. *include*: - database design - database creation - implementation - maintenance - backup and recovery - policy setting and enforcement - security.

Deep Learning

- is a subcategory of machine learning (and artificial intelligence) that uses multiple layers of interconnections among data to identify patterns and improve predicted results. - uses a set of techniques known as neural networks and is popularly applied in tasks like speech recognition, image recognition, and computer vision. - technologists refer to "deep" refers to the layers of interconnections and analysis that are examined to arrive at results.

Data

- is considered a defensible source of competitive advantage - *raw facts and figures* that must be turned into information in order to be useful and valuable *data a firm can leverage is a true strategic asset when* - rare, valuable, imperfectly imitable, and lacking in substitutes - *early to capture of data assets* can be the difference between a dominating firm and an also-ran. - *data is oftentimes considered a defensible source of competitive advantage,* advantages based on capabilities and data that others can acquire will be short-lived.

Walmart

- is the largest retailer in the world - over the past several years Walmart has popped in and out of the top spot on the Fortune 500 list - meaning that the firm has had revenues greater than any firm in the United States. - Walmart is so big that in 3 months it sells more than a whole year's worth of sales at Home Depot *Retail Link*: used by Walmart to record a sale and automatically triggers inventory reordering, scheduling, and delivery. - AMR report ranked Walmart as having the 7th best supply chain in the country - deliveries are choreographed to arrive at intervals less than ten minutes apart - replenish shelves every two days

Structured Query Language (SQL)

- language used to create and manipulate databases (most common is SQL) - *variants of SQL* inhabiting everything from lowly desktop software, to high-powered enterprise products. - *used by* linkedIn, Monster.com, or another job site

Firms that mismanage their customer data assets *risk*

- lawsuits - brand damage - lower sales and fleeing customers - prompt more restrictive legislation.

Types of Artificial Intelligence

- machine learning - deep learning - neural network - expert systems - genetic algorithms

Data Rich, Information Poor

- many organizations are data rich but information poor - survey by Accenture found 57% of companies reporting that they didn't have a beneficial, consistently updated, companywide analytical capability - only 60% were backed by analytics and 40% were made by intuition and gut instinct

Online Analytical Processing (OLAP)

- method of querying and reporting that takes data from standard relational databases, calculates and summarizes the data, and then stores the data in a special database called a *data cube* - *manager using an OLAP tool* can quickly explore and compare data across multiple factors such as time, geography, product lines, etc. - *OLAP users often talk about how they* can "slice and dice" their data, "drilling down" inside the data to uncover new insight - *OLAP tools* can present results through multidimensional graphs, or via spreadsheet-style cross-tab reports.

Operational Data Cannot always be Queried

- most transactional databases aren't set up to be simultaneously accessed for reporting and analysis, *as a consequence* data is not efficiently transformed into information - if a database is asked to analyze historical sales trends showing the most and least profitable products over time, this database analysis requires significant processing

External Sources (for data)

- organizations can have their products sold by partners and can rely heavily on data collected by others *data from external sources might not yield competitive advantage* on its own - BUT can provide key operational insight for increased efficiency and cost savings - when combined with a firm's unique data assets, it can give firms a high-impact edge (or competitive edge when combined with internal data assets) *examples* 1. *Brinker* restaurant chain that runs 1,700 eateries in 27 countries - they supplements their own data with external feeds on weather, employment statistics, gas prices, and other factors - using this in predictive models that help the firm in determining staffing levels to switching around menu items 2. *Carnival Cruise Lines* - combines its own customer data with third-party information tracking household income and other key measures - helps the firm target limited marketing dollars on those past customers that are able to afford to go on a cruise - during the first 3 years of this system, they have experienced double-digit increases in bookings by repeat customers

IMPORTANT STATISTIC

- over 91% of Fortune 1000 senior executives survey said big data initiatives were planned or underway, with half of these execs expecting efforts to cost $10 million or more - many organizations lack the skills required to exploit big data *McKinsey estimates* - a US talent shortfall of 140,000 to 190,000 data scientists - further need for 1.5 million more managers and analysts who will have to be savvy consumers of big data analytics

E-discovery

- process of identifying and retrieving relevant electronic information to support litigation efforts *includes* - e-mail, HR reports, memos, voice call logs, transaction data, sensor data, repair records, etc. - should be accounted for in a firm's archiving and data storage plans. - there's no profit in complying with a judge's order, just a sunk cost. *example* - Office of Federal Housing Enterprise Oversight (OFHEO) was subpoenaed for documents in litigation involving mortgage firms Fannie Mae and Freddie Mac - an effort that cost $6 million, a full 9 percent of its total yearly budget

Data Mining

- process of using computers to identify hidden patterns in, and to build models from, large datasets. - *used because* modern datasets can be so large that it might be impossible for humans to spot underlying trends

Inventory Turnover Ratio

- ratio of a company's annual sales to its inventory

Data Base Management System (DBMS)

- refers to software for creating, maintaining, and manipulating data (or databases) - referred to as *database software* *oracle is the world's largest database software vendor* - oracle co-founder and CEO Larry Ellison has made significant profits - Ellison ranks in the Top 10 of the Forbes 400 list of wealthiest Americans

Data Aggregators can be *controversial*

- represent a big target for identity thieves - are a method for spreading potentially incorrect data - raise privacy concerns.

IMPORTANT FACT

- research has found that companies ranked in the top third of their industry in the use of data-driven decision making were on average 5% more productive and 6% more profitable than competitors

Corporate Data

- roughly 80% of corporate data is messy and unstructured - *data is not stored* in conventional, relational formats - *data is rather stored* in office productivity documents, e-mail, call center conversations, social media, and data streaming in from disparate sensors in the Internet of Things

Serverless Computing

- separates many of the devops tasks associated with back-end database management, scaling, and security - allows firms to quickly develop and deploy solutions faster than Hadoop-style roll-your-own efforts. - *speed of serverless offerings* may come at the expense of vendor lock-in, as many of services provided by one firm are proprietary and cannot be easily migrated to another vendor

Hadoop

- set of mostly open-source tools to manage massive amounts of unstructured data for storage, extraction, and computation. *example* of a firm offering Hadoop-based solutions include Cloudera - provides a collection of technologies for manipulating massive amounts of unstructured data - system is flexible, scalable, cost-effective, and fault-tolerant

Relational Database System

- several data fields make up a data record - multiple data records make up a table or data file - one or more tables or data files make up a database - files that are related to one another are linked based on a common field (or fields) known as a key. - *If the value of a key is unique to a record in a table*, and that value can never occur in that field while referring to another record in that table, then it is a *primary key*. - *If a key can occur many times over multiple records in a table* but relates back to a primary key in another table, then it is a *foreign key*

Data Cube

- special database used to store data in OLAP reporting. - makes OLAP fast, sometimes 1,000 times faster than performing comparable queries against conventional relational databases - *data cubes for OLAP access* are often part of a firm's data mart and data warehouse efforts.

CAPTCHAs

- standing for completely automated public turing test to tell computers and humans apart. - *turing test* is an idea (rather than an official test) that one can create a test to tell computers apart from humans. "I am not a robot" checkmark online *GOOGLE reCAPTCHA* - asked users to prove their superior-to-robot chops by classifying images, such as hard-to-read letters, house numbers, and to identify squares containing stop lights and traffic signs *OCR* - Optical Character Recognition - software that can scan images and identify text within them.

Loyalty Card

- systems that provide rewards and usage incentives, typically in exchange for a method that provides a more detailed tracking and recording of customer activity - *represent* a significant switching cost - *used by* grocers and retailers can tie you to cash transactions and track customer activity - *by using one of these card* you are giving up information about yourself in exchange for some kind of financial incentive. - firms use retailer cards to make you a more loyal and satisfied customer. *examples* CVS Pharmacy ExtraCare card - card provides an instant discount Best Buy's Reward Zone - allow you to build up points over time

Transaction Processing System

- systems that record a transaction or some form of business-related exchange, such as a cash register sale, ATM withdrawal or product return. - *used by* most organizations that sell directly to their customers

DevOps (development operations)

- the costly care and feeding of a firm's technology infrastructure

Ad Hoc Reporting Tools

- tools that put users in control so that they can create custom reports on an as-needed basis by selecting fields, ranges, summary conditions, and other parameters. - *allow users to dive in and create their own reports*, selecting fields, ranges, and other parameters to build their own reports on the fly

Big Data

- used to describe the massive amount of data available to today's managers. - are *often unstructured* and are too big and costly to easily work through use of conventional databases - collections, storage, and analysis of extremely large, complex, and often unstructured data sets that can be used by organizations to generate insights that would otherwise be impossible to make. - 2 petabytes is a data amount greater than the combined contents of all of the academic research libraries in the United States

Three Vs of Big Data

- volume, velocity, and variety - these distinguish big data from conventional data analysis problems and require a new breed of technology - *big data* is forcing companies to rethink both the technology infrastructure and the necessary skills needed to successfully interpret and act on information

Walmarts Data Mining Prowess

- walmart mines its mother lode of data to get its product mix right under all sorts of *varying environmental conditions* - *data mining protects walmart* from "a retailer's twin nightmares: too much inventory or too little inventory - *data mining helps walmart* tighten operational forecasts, helping to predict things like how many cashiers are needed at a given store at various times of day throughout the year. - *data mining drives the organization*, with their data reports forming the basis of sales meetings and executive strategy sessions

Unsupervised Learning

- where data are not explicitly labeled and don't have a predetermined result. - *example* clustering customers into previously unknown groupings machine

Semi-Supervised Learning

- where data used to build models that determine an end result may contain data that has outputs explicitly labeled as well as unlabeled

Red Lining Laws

- where geographic exclusion is a proxy for race) mean that firms may need to prove that inferences buried deep in their system aren't simply proxies for illegal exclusionary criteria, such as race

Differentiation

- will be key in distinguishing operationally effective data use from those efforts that can yield true strategic positioning.

Problems with using Bad Data (when data mining)

- wrong estimates from bad data leaves firms grossly overexposed to risk (resulted in the 2008 financial crisis) *when the market does not behave as it has in the past, computer-driven investment models are not effective* - result from "hundred year flood" or black swans, where the events are so extreme/unusual that they never show up in the data used to build the model *models influenced by bad data, missing or incomplete historical data, and over-engineering are prone to yield bad results*

Key Concepts Associated with Database Systems

1 *Row or Record* - *row* is used in a database table such as "students" table represents a student - *record* represent a single instance of whatever the table keeps track of *2. Column or Field* - *columns* represent each category of data contained in a record (defines the data that a table can hold) - categories of data (first name, last name, ID number, date of birth) *3. Table (also known as a File)* - list of data, arranged in columns or fields and rows or records. *4. Relational Database* - most common standard for expressing databases, whereby tables (files) are related based on common keys. - all SQL databases are relational databases *database* is either a single table or a collection of related table

Steps in Developing and Deploying More Ethical, Les Risk-Prone Systems

1. *Hire ethicists* - few technologists (or managers in general) have had significant training in technology ethics. - Microsoft and Salesforce are among firms that have brought professional ethicists 2. *Develop a code of technology ethics* - many organizations have "core values" shared throughout the organization. - Firms should develop and continue to refine systems development ethics in an effort to keep risks and responsibility top-of-mind among technologists and other decision-makers that lay out how various issues will be handled 3. *Create a systems review board* - It's important that many voices are involved in identifying, heading off, or responding to issues that may arise. - The board should have the perspective of deeply-knowledgeable technologists, senior executives, legal, and public relations representatives. 4. *Create and enforce technology audit trails* - an audit trail exposes how and when information systems are used so that the way a firm arrived at a particular outcome can be identified. 5. *Implement strong tech and procedural training programs.* - a broad training program would raise legal, ethical, and technical issues so that everyone at all levels in the organization become partners in improvement. 6. *Provide a means for remediation.* A study led by technology and audit firms found that only 43% of organizations have clear procedures for overriding AI results that are suspicious or questionable.

Data mining and Business Analytics team needs *three skills*

1. *information technology* - used for understanding how to pull together data, and for selecting analysis tools 2. *statistics* - used for building models and interpreting the strength and validity of results 3. *business knowledge* - used for helping set system goals and requirements, and offering deeper insight into what the data really says about the firm's operating environment

Lack of Diverse Perspectives Among Software Development and Managerial Teams

can lead to systems that put an organization at risk


Conjuntos de estudio relacionados

Ch 1 Independent Contractor or Employee

View Set

Chapter 6: The Revolution Within

View Set

Diversity Strategy Terms - Pricing

View Set

properties of probability distributions

View Set

Psychology: Relationships - Factors affecting attraction

View Set

Chapter 17 Dosage Calculations and medication administration

View Set