MIS 180 - Ch. 6

Ace your homework & exams now with Quizwiz!

Techniquees a Data Scientist Will Use to Perform Big Data Advanced Analytics

Behavioral analysis: using data about people's behaviors to understand intent and predict future actions Correlation analysis: determines a statistical relationship between variables, often for the purpose of identifying predictive factors among the variables Exploratory data analysis: identifies patterns in data, including outliers, uncovering the underlying structure to understand relationships between the variables pattern recognition analysis: the classification or labeling of an identified pattern in the machine learning process Social media analysis: analyzes text flowing across the Internet, including unstructured text from blogs and messages Speech analysis: the process of analyzing recorded calls to gather information; brings structure to customer interactions and exposes information buried in customer contact center interactions with an enterprise. Speech analysis is heavily used in the customer service department to help improve processes y identifying angry customers and routing them to the appropriate customer service representative Text analysis: analyzes unstructured data to find trends and patterns in words and sentences. Text mining a firm's customer support email might identify which customer service representative is best able to handle the question, allowing the system to forward it to the right person Web analysis: analyzes unstructured data associated with websites to identify consumer behavior and website navigation

Big data is one of the most promising technology trends occurring today

Of course, notable companies such as Facebook Google and Netlfix are gaining he most business insights from big data currently, but many smaller markets are entering the scene, including retail, insurance, and health care Over the next decade, as big data starts to improve your everyday life by providing insights into your social relationships, habits, and careers, you can expect to see the need for data scientists and data artists dramatically increase

data artist

a business analytics specialist who uses visual tools to help people understand complex data Great data visualizations provide insights into something new about the underlying patterns and relationships Just think of the periodic table of elements and imagine if you had to look at an Excel spreadsheet showing each element and the associated attributed in a table format - this would not only be difficult to understand but easy to misinterpret By placing the elements in the visual periodic table, you quickly grasp how the elements relate and the associated hierarchy and data artists are experts at creating a story from the information Infographic perform the same function for business data as the periodic table does for chemical elements

repository

a central location in which data is stored and managed

record

a collection of related data elements Each record in an entity occupies one row in its respective table

data warehouse

a logical collection of information, gathered from many different operational databases, that supports business analysis activities and decision-making tasks The primary purpose of a data warehouse is to combine information, more specifically, strategic information, throughout an organization into a single repository in such a way that the people who need that information can make decisions and undertake business analysis A key idea within data warehousing is to collect information from multiple systems in a common location that uses a universal querying tool This allows operational databases to run where they are most efficient for businesses, while providing a common location using a familiar format for the strategic or enterprise wide reporting information Datawarehouses go even a step further by standardizing information Gender, for instance can be referred to in many ways (male, female, M/F, 1/0) but it should be standardized on a data warehouse with one common way of referring to each data element that stores gender (M/F) Standardization of data elements allows for greater accuracy, completeness, and consistency, and increases the quality of the information in making strategic business decisions The data warehouse the is simply a tool that enables business users, typically managers, to be more effective in many ways including: Developing customer profiles Identifying new-product opportunities Improving business operations Identifying financial issues Analyzing trends Understanding competitors Understanding product performance Applebees has found tremendous value in its data warehouse by being able to make business decisions about customers' regional needs The company also uses data warehouse information to perform the following Base labor budgets on actual number of guests served per hour Develop promotional sale item analysis to help avoid losses from overstocking or understocking inventory Determine theoretical and actual costs of food and the use of ingredients

information integrity

a measure of the quality of information

foreign key

a primary key of one table that appears as an attribute in another table and acts to provide a local relationship between the two tables Creating the logical relationship between the tables allows managers to search the data and turn it into useful information To manage and organize various entities within the relational database model, you use primary keys and foreign keys to create logical relationships

infographic (information graphic)

a representation of information in a graphic format designed to make the data easily understandable at a glance; present the results of data analysis displaying the patterns, relationships, and trends, in a graphical format Traditional bar graphs and pie charts are boring and at best confusing and at worst misleading As databases and graphic collide more and more, people are creating infographics, which display information graphically so it can be easily understood Infographics are exciting and quickly convey a story users can understand without having to analyze numbers, tables, and boring charts

relational database management system

allows users to create, read, update, and delete data in a relational database

dynamic catalog

an area of a website that stores information about products in a database Dynamic website information is stored in a dynamic catalog

outlier

data value that is numerically distant from most of the other data points in a data

data visualization

describes technologies that allow users to "see" or visualize data to transform information into a business perspective Data visualization is a powerful way to simplify complex data sets by placing data in a format that is easily grasped and understood dar quicker than the raw data alone

transaction information

encompasses all of the information contained within a single business process or unit of work, and its primary purpose is to support daily operational tasks Organizations need to capture and store transactional information to perform operational tasks and repetitive decisions such as analyzing daily sales reports and production schedules to determine how much inventory to carry (Walmart which handles more than 1 million customer transactions every hour and Facebook which keeps track of 400 million active users along with their photos and web links) In addition, every time a cash register rings up a sale, a deposit or withdrawal is made from an ATM, or a receipt is given at the gas pump, capturing and storing transactional information are required EXAMPLES: airline ticket, packing slip, sales receipt

data visualization tools

moves beyond Excel graphs and charts into sophisticated analysis techniques such as pie charts, controls, instruments, maps, time-series graphs, etc Data visualization tools can help uncover correlations and trends in data that would otherwise go unrecognized

analysis paralysis

occurs when a user goes into an emotional state of over-analysis (or over-thinking) a situation so that a decision or action is never taken in effect paralyzing the outcome In the time of big data, analysis paralysis is a growing problem One solution is to use data visualizations to help people make decisions faster

business intelligence dashboard

tracks corporate metrics such as critical success factors and key performance indicators and include advanced capabilities such as interactive controls, allowing users to manipulate data for analysis The majority of business intelligence software vendors offer a number of data visualization tools and business intelligence dashboards

information types

-transactional information -analytical information

data mining process model overview

1. Business Understanding: gain a clear understanding of the business problem that must be solved and how it impacts the company Activities: Identify business goals Situation assessment Define data-mining goals Create project plan 2. Data Understanding: analysis of all current data along with identifying any data quality issues Activities: Gather data Describe data Explore data Verify data quality 3. Data Preparation: gather and organize the data in the correct formats and structures for analysis Activities: Select data Cleanse data Integrate data Format data 4. Data Modeling: apply mathematical techniques to identify trends and patterns in the data Activities: Select modeling technique Design tests Build models 5. Evaluation: analyze the trends and patterns to assess the potential for solving the business problem Activities: Evaluate results Review process Determine next steps 6. Deployment: deploy the discovereies to the organization for work in everyday business Activities: Plan deployment Monitor deployment Analyze results Review final reports

The Four Primary Reasons for Low Quality Information

1. Online customers intentionally enter inaccurate information to protect their privacy 2. Different systems have different information entry standards and formats 3. Data-entry personnel enter abbreviated information to save time or erroneous information by accident 4. Third party and external information contains inconsistencies, inaccuracies, and errors

data latency

: the time it takes for data to be stored or retrieveds of users and programs to perform information-processing and information-searching tasks Some organizations must be able to support hundreds or thousands of users including employees, partners, customer, and suppliers, who all want to access and share the same information with minimal data latency Databases today scale to exceptional levels, allowing all type

data warehousing components

Data mart Information cleansing Business intelligence

data mining techniques

Estimation analysis: Affinity grouping analysis: Cluster analysis: Classification analysis:

There are a number of advantages to using the web to access company databases

First, web browsers are much easier to use than directly accessing the database using a custom-query tool Second, the web interface requires few or no changes to the database model Finally, it costs less to add a web interface in front of a DBMS than to redesign and rebuild the system to support changes Additional data driven website advantages include: Easy to manage content: website owners can make changes without relying on MIS professionals; users can update a data-driven website with little or no training Easy to store large amounts of data: data driven websites can keep large volumes of information organized. Website owners can use templates to implement changes for layouts, navigation, or website structure This improves website reliability, scalability and performance Easy to eliminate human errors: data driven websites trap data-entry errors, eliminating inconsistencies while ensuring all information is entered correctly ZAPPOS credits its success as an online shoe retailer to its vast inventory of nearly 3 million products available through its dynamic data driven website - the company built its data-driven website catering to a specific niche market: consumers who were tired of finding out their most desired items were always out of stock at traditional retailers Zpappos' highly flexible, scalable, and secure database helped it rank as one of the most-available Internet retailer Companies can gain valuable business knowledge by viewing the data accessed and analyzed from their website Running queries or using analytical tools, such as PivotTable on the database that is attached to the website can offer insight into the business, such as items browsed, frequent requests, items bought together, and so on

forecasting model

Forecasts are predictions based on time-series information allowing users to manipulate the time series for forecasting activities

history of the data warehouse

In the 1990s as organizations began to need more timely information about their business, they found that traditional management information systems were too cumbersome to provide relevant information efficiently and effectively Most of the systems were in the form of operational databases that were designed for specific business functions, such as accounting, order entry, customer service, and sales, and were not appropriate for business analysis for the reasons shown: During the latter half of the 20th century, the numbers and types of operational databases increased Many large businesses found themselves with information scattered across multiple systems with different file types (such as spreadsheets, databases, and even word processing files) making it almost impossible for anyone to use the information from multiple sources Completing reporting requests across operational system could take days or weeks using antiquated reporting tools that were ineffective for running a business - from this idea, the data warehouse was born as a place where relevant information could be stored and accesses for making strategic queries and reports Lands' End created an organization wide data warehouse so all its employees could access organizational information Lands' end soon found out that there could be too much of a good thing: many of its employees would not use the data warehouse because it was simply too big, was too complicated, and had too much irrelevant information Lands' End knew there was valuable information in its data warehouse, and it had to find a way for its employees to easily access the information - data marts were the perfect solution to the company's information overload problem Once the employees began using the data marts, they were ecstatic at the wealth of information

reasons business analysis is difficult from operational databases

Inconsistent Data Definitions: every department had its own method for recording data so when trying to share information, data did not match and users did not get the data they really needed Lack of Data Standards: managers need to perform cross-functional analysis using data from all departments, which differed in granularities, formats, and levels Poor Data Quality: the data, if available, were often incorrect or incomplete. Therefore, users could not rely on the data to make decisions Inadequate Data Usefulness: users could not get the data they needed; what was collected was not always useful for intended purposes Ineffective Direct Data Access: most data stored in operational databases did not allow users direct access; users had to wait to have their queries or questions answered by MIS professionals who could code SQL

Business Advantages of a Relational Database

Increased flexibility: Databases tend to mirror business structures, and a database needs to handle changes quickly and easily, just as any business needs to be able to do so Equally important, databases need to provide flexibility in allowing each user to access the information in whatever way best suits his or her needs The distinction between logical and physical views is important in understanding flexible database user views While a database only has one physical view, it can easily support multiple logical views that provides for flexibility One user could perform a query to determine which recordings had a track length of four min or more and at the same time another user could perform an analysis to determine the distribution of recordings as they relate to the different categories: for example, are there more R and B recordings than rock or are they evenly distributed Another example: a mail order business: one user might want a preort presented in alphabetical format, in which case last name should appar before first name. Anotheruser working with a catalog mailing system, would want customer names appearing as first name and then last name. Both are easily achievable, but different logical views of the same physical information Increased scalability and performance: database has to be scalable to handle the massive volumes of information and the large numbers of users expected for the launch of the website - in addition, the database needed to perform quickly under heavy use Reduced information redundance increased information integrity increased information security: Managers must protect information, like any asset, from unauthorized users or misuse As systems become increasingly complex and highly available over the Internet on many different devices, security becomes an even bigger issue Databases offer many security features including passwords to provide authentication, access levels to determine who can access the data, and access controls to determine what type of access they have to the information For example, customer service representatives might need read-only access to customer order information so they can answer customer order inquiries- they might not have or need the authority to change or delete or order information Various security features of databases can ensure that individuals have only certain types of access to certain types of information

information cube

Information cube: the common term for representation of multidimensional information A relational database contains information in a series of two dimensional tables With big data information in multidimensional, meaning it contains layers of columns and rows A dimension is a particular attribute of information Each layer in big data represents information according to an additional dimension Cube a represents store information (the layers), product information (the rows), and promotion information (the columns) Once a cube of information is created, users can being to slice and dice the cube to drill down into the information The second cube (cube b) displays a slic representing promotion II information for all products, at all stores The third cube (cube c) displays only information for promotion III, product B, at store 2 By using multidimensional analysis, users can analyze information in a number of different ways and with any number of different dimensions For example, users might want to add dimensions of information to a current analysis including product category, region, and even forecasts for actual weather The true value of big data is its ability to provide multidimensional analysis that allows users to gain insights into their information Big data is ideal for off-loading some of the querying against a database For example, querying a database to obtain an average of sales for product B at store 2 while promotion III is under way might create a considerable processing burden for a database, essentially slowing down the time it takes another person to enter a new sale into the same database If an organization performs numerous queries against a database (or multiple databases), aggregating that information into big data databases could be beneficial

model of a typical data warehouse

Internal databases: marketing, sales, inventory, billing External databases: competitor info, industry info, mailing lists, stock market analysis Both are compiled into a data warehouse through ETL Data warehouse: marketing info, inventory info, sales info, billing info, competitor info, industry info, mailing list info, and stock market analysis which then through ETL goes to the marketing data mart, inventory data mart, or sales data mart Ask a simple question such as who is my best customer or what is my worst selling product and you might get as many answers as you have employees Databases, data warehouses, and data marts can provide a single source of "trusted" data that can answer questions about customers, products, suppliers, production, finances, fraud, an even employees

two primary tools available for retrieving information from a DBMS

Query-by-example (QBE) tool Structured language (SQL) Uses three primary data models for organization information: hierarchical, network, and the relational database, the most prevalent: although the hierarchical and network models are important, this text focuses only on the relational database model

two types of integrity constraints

Relational integrity constraint Business-critical integrity constraint

Understanding the costs of using low-quality information

Some of the serious business consequences that occur due to using low-quality information to make decisions are Inability to accurately track customers Difficulty identifying the organization's most valuable customers Inability to identify selling opportunities Lost revenue opportunities from marketing to nonexistent customers The cost of sending undeliverable mail Difficulty tracking revenue because of inaccurate invoices Inability to build strong relationships with customers

structured and unstructured data examples

Structured Data: Sensor data: Weblog data Financial data Click-stream data Point of sale data Accounting data Unstructured Data: Satellite images Photographic data Video data Social media data Text message Voice mail data

data mart

The data warehouse then sends subsets of the information to data marts Data mart: contains a subset of data warehouse information To distinguish between data warehouses and data marts, think of data warehouses as having a more organizational focus and data marts as having focused information subsets particular to the needs of a given business unit such as finance or production and operations

identity management

a broad administrative area that deals with identifying individuals in a system (such as a country, a network, or an enterprise) and controlling their access to resources within that system by associating user rights and restrictions with the established identity Security risks are increasing as more and more databases and DBMS systems are moving to data centers run in the cloud The biggest risks when using cloud computing are ensuring the security and privacy of the information in the database Implementing data governance policies and procedures that online the data management requirements can ensure safe and secure cloud computing

data broker

a business that collects personal information about consumers and sells that information to other organizations Before the start of the information age in the late 20th century, businesses sometimes collected information from non-automated sources Businesses then lacked the computing resources to properly analyze the information and often made commercial decisions based primarily on intuition As businesses started automating more and more systems, more and more information became available However, collection remained a challenge due to the lack of infrastructure for information exchange or to incompatibilities between systems - reports sometimes took months to generate Such reports allows informed long-term strategic decision making However, short term tactical decision making continued to rely on intuition In modern businesses, increasing standards, automation, and technologies have led to vast amounts of available information Data warehouse technologies have set up repositories to store this information Improved ETL has increased the speedy collecting of information Business intelligence has not become the art of sifting through large amounts of data, extracting information and turning that information into actionable knowledge

recommendation engine

a data-mining algorithm that analyzes a customer's purchases and actions on a website and then uses the data to recommend complementary products Netflix uses a recommendation engine to analyze each customer's film viewing habits to provide recommendations for other customers with Cinematch, its movie recommendation system Using Cinematch,Netflix can present customers with a number of additional movies they might want to watch based on the customer's current preferences Netflix's innovative use of data mining provides its competitive advantage in the movie rental industry

primary key

a field, or group of fields, that uniquely identifies a given record in a table It uniquely identifies each record in the table Primary keys are a critical piece of a relational database because they provide a way of distinguishing each record in a table; for instance, imagine you need to find information on a customer named Steve Smith Simply searching the customer name would not be an ideal way to find information because there might be 20 customers with the name Stevem Smith This is the reason the relational database model uses primary keys to uniquely identify each record - using Sam Smith's unique ID allows a manager to search the database to identify all information associated with this customer

extraction, transformation, and loading (ETL)

a process that extracts information from internal (or transactional/operational databases) and external databases, transforms it using a common set of enterprise definitions, and loads it into a data warehouse

information cleansing or scrubbing

a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information Specialized software tools exist that use sophisticated procedures to analyze, standardize, correct, match, and consolidate data warehouse information This step is vitally important because data warehouses often contain information from several databases, some of which can be external to the organization In a data warehouse, information cleansing occurs first during the ETL process and again once the information is in the data warehouse Companies can choose information cleansing software from several vendors including Oracle, SAS, IBM, and Tableau Ideally, scrubbed information is accurate and consistent Looking at customer information highlights why information cleansing is necessary Customer information exists in several operational systems In each system, all the details could change, from the customer ID to contact information, depending on the business process the user is performing Contact information in operational systems: Billing: Contact: Hans Hultgren 555-1211 Customer service: contact: Anne Logan and Deborah Wallbridge (phone numbers provided) Marketing: contact Paul Bauer and Don McCubbrey Sales: contact Paul Bauer and Don McCubbrey If a customer name is entered differently in multiple operational systems then information cleansing allows an organization to fix these types of inconsistencies and cleans the information in the data warehouse Achieving perfect info is almost impossible The more complete and accurate an organization wants its information to be, the more it costs The trade off for perfect information lies in accuracy versus completeness Accurate info means its correct while complete information means there are no blanks Data that is accurate and complete: perfect info (pricey) Not very useful data and may be a prototype only costs the least There is also: very incomplete but accurate and complete but with known errors For their information, most organizations determine a percentage high enough to make good decisions at a reasonable cost, such as 85% accurate and 65% complete Maintaining quality information in a data warehouse or data mart is extremely important The Data Warehousing Institute estimates that low-quality information costs US businesses $600 billion annually - that number may seem high but it is not If an organization is using a data warehouse or data mart to allocate dollars across advertising strategies, low-quality information will definitely have a negative impact on its ability to make the right decision

prediction

a statement about what will happen or might happen in the future, for example, predicting future sales or employee turnover Please not the primary difference between forecasts and predictions All forecasts are predictions but not all predictions are forecasts For example, when you would use regression to explain the relationship between two variables this is a prediction but not a forecast

regression model

a statistical process for estimating the relationships among variables. Regression models include many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent variable and one or more independent variables Examples: Predict the winners of a marathon based on gender, height, weight, hours of training Explain how the quantity of weekly sales of a popular brand of beer depends on its price at a small chain of supermarkets

optimization model

a statistical process that finds the way to make a design, system, or decision as effective as possible, for example, finding the values of controllable variables that determine maximal productivity or minimal waste Determine which products to produce given a limited amount of ingredients Choose a combination of projects to maximize overall earnings

data lake

a storage repository that holds a vast amount of raw data in its original format until the business needs it While a traditional data warehouse stores data in files or folders, a data lake uses a flat architecture to store data Each data element in a data lake is assigned a unique identifier and tagged with a set of extended metadata tags When a business question arises, the data lake can be queried for all of the relevant data providing a smaller data set that can then be analyzed to help answer the question

data map

a technique for establishing a match, or balance, between the source data and the target data warehouse This technique identifies data shortfalls and recognizes data issues They can also alert managers to inconsistencies or help determine the cause and effects of enterprise wide business decisions

cluster analysis

a technique used to divide information sets into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different group are as far apart as possible Cluster analysis identifies similarities and differences among data sets allowing similar data sets to be clustered together A customer database includes attributes such as name and address, demographic information such as gender and age, and financial attributes such as income and revenue spent A cluster analsisgroups similar attributes together to discover segments or clusters and then examine the attributes and values that define the clusters or segments Marketing managers can drive promotion strategies that target the specific group identified by the cluster analysis A great example of using cluster analysis in business is to create target-marketing strategies based on zip codes Evaluating customer segments by zip code allows a business to assign a level of importance to each segment Zip codes offer valuable insight into such things as income levels, demographics, lifestyles, and spending habits With target marketing, a business can decrease its costs while increasing the success rate of the marketing campaign

data-driven decision management

an approach to business governance that values decisions that can be backed up with verifiable data The success of the data driven approach is reliant upon the quality of the data gathered and the effectiveness of its analysis and interpretation In the early days of computing, it usually took a specialist with a strong background in technology to mine data for information because it was necessary for that person to understand how databases and data warehouses worked Today, business intelligence tools often require very little, if any, support from the MIS department Business managers can customize dashboards to display the data they want to see and run custom reports on the fly The changes in how data can be mined and visualized allows busines executives who have no technology backgrounds to be able to work with analytics tools and make data-drive decisions Data-driven decision management is usually undertaken as a way to gain a competitive advantage A study from the MIT Center for Digital Business found that organizations driven most by data-based decision making had 4 percent higher productivity rates and 6 percent higher profits However, integrating massive amounts of information from different areas of the business and combining it to derive actionable data in real time can be easier said than done Errors can creep into data analytics processes at any stage of the endeavor, and serious issues can result when they do

data point

an individual item on a graph or chart Organizational data includes far more than simple structured data elements in a database; the set of data also includes unstructured data such as voicemail, customer phone calls, text messages, video clips, along with numerous new forms of data, such as Tweets from Twitter An early reference to business intelligence occurs in Sun Tzu's book titled The Art of War Sun Tzu claims that to succeed in war, one should have full knowledge of one's own strengths and weaknesses and full knowledge of the enemy's strengths and weaknesses Lack of either one might result in defeat A certain school of though draws parallels between the challenges in business and those of war, specifically Collecting information Discerning patterns and meaning in the information Responding to the resultant information Many organizations today find it next to impossible to understand their own strengths and weaknesses, let alone their biggest competitors' because the enormous volume of organization data is inaccessible to all but the MIS department

data-driven website

an interactive website kept constantly updated and relevant to the needs of its customers using a database Data-driven capabilities are especially useful when a firm needs to offer large amounts of information, products, or services Visitors can become quickly annoyed if they find themselves buried under an avalanche of information when searching a website A data driven website can help limit the amount of information displayed to customers based on unique search requirements Companies even use data driven websites to make information in their internal databases available to customers and business partners

data set

an organized collection of data Employee decisions are numerous and they include providing service information, offering new products, and supporting frustrated customers Employees can base their decisions on data sets, experience, or knowledge and preferably a combination of all three

comparative analysis

compares two or more data sets to identify patterns and trends

data dictionary

compiles all of the metadata about the data elements in the data model Looking at a data model along with reviewing the data dictionary provides tremendous insight into the database's functions, purpose, and business rules

virtualization

creates multiple "virtual" machines on a single computing device, it is the creation of a virtual (rather than actual) version of computing resources, such as an operating system, a server, a storage device, or network resources With big data it is now possible to virtualize data so that it can be stoed efficiecntly and cost-effectively Improvements in network speed and reliability have removed the physical limitations of being able to manage massive amounts of data at an acceptable pace The decrease in prie of storage and computer memory allows companies to leverage data that would have been inconceivable to collect only 10 years ago

database management system (DBMS)

creates, reads, updates, and deletes data in a database while controlling access and security Managers send requests to the DBMS and the DBMS performs the actual manipulation of the data in the database Most popular examples of DBMS include MySQL, Microsoft Access, SQL Server, FileMaker, Oracle, and FoxPro Relationship of Database, DBMS, and User goes back and forth between the thing with DBMS in middle Database: customers, orders, products, distributors DBMS: 1. Enter new customer 2. Find customer order 3. Enter new products

business rule

defines how a company performs a certain aspect of its business and typically results in either a yes/no or true/false answer Example: stating that merchandise returns are allowed within 10 days of purchase

estimation analysis

determines values for an unknown continuous variable behavior or estimated future value Estimation models predict numeric outcomes based on historical data For example, the percentage of high school students that will graduate based on student-teacher ratio or income levels An estimate is similar to a guess and is one of the least expensive modeling techniques Many organizations use estimation analysis to determine the overall costs of a project from start to completion or estimates on the profits from introducing a new product line

analytical information

encompasses all organizational information and its primary purpose is to support the performing of managerial analysis tasks Analytical information is useful when making important decisions such as whether the organization should build a new manufacturing plant or hire additional sales personnel Analytical information makes it possible to do many things that previously were difficult to accomplish, such as spot business trends, prevent diseases, and fight crime For example: credit card companies crunch through billions of transactional purchase records to identify fraudulent activity - indicators such as charges in a foreign country or consecutive purchases of gasoline send a red flag highlighting potential fraudulent activity Walmart was able to use its massive amount of analytical information to identify many unusual trends, such as correlation between storms and Pop-tarts Armed with the valuable information the retail chain was able to stock up on pop tarts that were ready for purchase when customers arrived EXAMPLES: product statistics, sales projections, future growth, trends

business-critical integrity constraint

enforces business rules vital to an organization's success and often requires more insight and knowledge than relational integrity constraints Consider a supplier of fresh produce to large grocery chains such as Kroger The supplier might implement a business critical integrity constraint stating that no product returns are accepted after 15 days past delivery - that would make sense because of the change of spoilage of the produce Business critical integrity constraints tend to mirror the very rules by which an organization achieves success

dirty data

erroneous or flawed data Duplicate data Misleading data Incorrect data Non-formatted data Violates business rules data Non-integrated data Inaccurate data The complete removal of dirty data from a source is impractical or virtually impossible According to Gartner Inc., dirty data is a business problem, not an MIS problem Over the next two years more than 25 percent of critical data in Fortune 1000 companies will continue to be flawed; that is, the information will be inaccurate, incomplete, or duplicated To increase the quality of organizational information in a data warehouse or data mart and thus the effectiveness of decision making, businesses must formulate a strategy to keep information clean

market basket analysis

evaluates such items as websites and checkout scanner information to detect customers' buying behavior and predict future behavior by identifying affinities among customers' choices of products and services One of the most common forms of association detection analysis is market basket analysis Market basket analysis is frequently used to develop marketing campaigns for cross-selling products and services (especially in banking, insurance, and finance) and for inventory control, shelf-product placement, and other retail and marketing applications

data scientist

extracts knowledge from data by performing statistical analysis, data mining, and advanced analytics on big data to identify trends, market changes, and other relevant information

Information Granularity

he extent of detail within the information (fine and detailed or coarse and abstract) Employees must be able to correlate the different levels, formats, and granularities of information when making decisions For example, a company might be collecting information from various suppliers to make needed decisions, only to find that the information is in different levels, formats, and granularities One supplier might send detailed information in a spreadsheet, while another supplier might send summary information in a Word document, and still another might send a collection of information from emails Employees will need to compare these different types of information for what they commonly reveal to make strategic decisions Levels, Formats, and Granularities of Organizational Information: Information Levels: individual, department, enterprise Individual knowledge, goals and strategies Departmental goals, revenues, expenses, processes, and strategies Enterprise revenues, expenses, processes, and strategies Information Formats:Document, presentation, spreadsheet, database Letters, memos, faxes, emails, reports, marketing materials, and training materials Product, strategy, process, financial, customer, and competitor Sales, marketing, industry, financial, competitor, customer, and order spreadsheets Customer, employee, sales, order, supplier, and manufacturer databases Information Granularities: detail (fine), summary, aggregate (coarse) Reports for each salesperson, product, and part Reports for all sales personnel, all products, and all parts Reports across departments, organizations, and companies Exciting and unexpected results from successfully collecting, compiling, sorting, and financially analyzing information from multiple levels, in varied formats, and exhibiting different granularities can include potential new markets, new ways of reaching customers, and even new methods of doing business

query-by-example (QBE) tool

helps users graphically design the answer to a question against a database Managers typically interact with QBE tools

logical view of information

hows how individual users logically access information to meet their own particular business needs

source data

identifies the primary location where data is collected Source data can include invoices, spreadsheets, time-sheets, transactions, and electronic sources such as other databases Managers send their information requests to the MIS department where a dedicated person compiles the various reports In some situations, responses can take days, by which time the information may be outdated and opportunities lost Many organizations find themselves in the position of being data rich and information poor - even in today's electronic world, managers struggle with the challenge of turning their business data into business intelligence For many companies, the scenario of being able to find all the data necessary to help a client is a pipe dream Attempting to father all of the client info would actually take hours or even days to compile With so much data available, it is surprisingly hard for managers to get information, such as inventory levels, past order history, or shipping data

real-time information

immediate, up to date information The growing demand for real time information stems from organizations' needs to make faster and more effective decisions, keep smaller inventories, operate more efficiently, and track performance more carefully Information also needs to be timely in the sense that it meets employees' needs, but no more If employees can absorb information only on an hourly or daily basis, there is no need to gather real-time information in smaller increments

the two primary computing models that have shaped the collection of big data

include distributed computing and virtualization

dynamic information

includes data that change based on user actions Dynamic information changes when a user requests information A dynamic website changes information bsed on user requests such as movie ticket availability, airline prices, or restaurant reservations Websites change for site visitors depending on the type of information they request An automobile dealer would create a database containing data elements for each car it has available for sale including make, model, color, year, mpg, a photograph, and so on Website visitor might click on Porsche and then enter their specific requests such as price range or year made and once the user hits go the website automatically provides a custom view of the requested information The dealer must create, update, and delete automobile information as the inventory changes

static information

includes fixed data that are not capable of change in the event of a user action For example, static websites supply only information that will not change until the content editor changes the information

data validation

includes the tests and evaluations used to determine compliance with data governance policies to ensure correctness of data Data validation helps to ensure that every data value is correct and accurate In Excel you can use data validation to control the type of data or the values that users enter into a cell For example, you may want to restrict data entry to a certain range of data, limit choices by using a list, or make sure that only positive whole numbers are entered

the 4 primary traits of the value of information

information type: The two primary types of information are transactional and analytical information timeliness: Timeliness is an aspect of information that depends on the situation In some firms or industries, information that is a few days or weeks old can be relevant while in others, information that is a few minutes old can be almost worthless Some organizations, such as 911 response centers, stock traders, and banks require up-to-the-second information Other organizations such as insurance or construction companies, require only daily or even weekly information information quality: business decisions are only as good at the quality of the information used to make them information governance: Information is a vital resource and users need to be educated on what they can and cannot do with it To ensure a firm manages its information correctly, it will need special policies and procedures establishing rules on how the information is organized, updated, maintained, and access

big data

is a collection of large, complex data sets, including structured and unstructured data, which cannot be analyzed using traditional database methods and tools Big data came into fruition primary due to the last 50 years of technology evolution Revolutionary technological advances in software, hardware, storage, networking, and computing models have transformed the data landscape, making new opportunities for data collection possible Big data is one of the latest trends emerging from the convergence of technological factors For example, cell phones generate tremendous amounts of data and much of it is available for use with analytical applications Big data includes data sources that include extremely large volumes of data, with high velocity, wide variety, and an understanding of the data veracity Four Common Characteristics of Big Data: Variety: Different forms of structured and unstructured data Data from spreadsheets and databases as well as from email, videos, photos, and PDF's, all of which must be analyzed Veracity The uncertainty of data, including biases, noise, and abnormalities Uncertainty or untrustworthiness of data Data must be meaningful to the problem being analyzed Must keep data clean and implement processes to keep dirty data from accumulating in systems Volume: The scale of data Includes enormous volumes of data generated daily Massive volume created by machines and networks Big data tools necessary to analyze zettabytes and brontobytes Velocity: The analysis of streaming data as it travels around the Internet Analysis necessary of social media messages spreading globally Four V's of Big Data: Big Data will Create 4.4 Million Global MIS Jobs Volume: scale of data 40 zettabytes of data created by 2020 2.5 Quintillion Bytes of Data Created Daily (10 million blue-rays) 100 terabytes of data per company 6 billion cell phones creating data 90 percent of data has been Created Daily (10 million blue-rays) Variety: different forms of data 90 percent of Data Created is Unstructured 400 million wireless monitors 4 billion hours of video created 400 million tweets 30 billion pieces of content shared on facebook monthly Velocity: analysis of streaming data Every minute we create 72 hours of youtube video, 200,000 instagram posts, 205 million emails 100 sensors in every connected cars 19 billion network connections Veracity: uncertainty of data 1 in 3 business leaders do not trust data to make decisions $3.1 trillion in poor data costs per year The move to big data combines business with science, research, an government activities A company can now analyze petabytes of data for patterns, trends, and anomalies gaining insights into data in new and exciting ways A petabyte of data is equivalent to 20 million four drawer file cabinets filled with text files or 13 years of HDTV content Big data requires sophisticated tools to analyze all of the structured and unstructured data from millions of customers, devices, and machine interactions With the onset of big data, organizations are collecting more data than ever Historically, data were housed in functional systems that were not integrated, such as customer service, finance, and human resources Today, companies can gather all of the functional data together by the petabyte, but finding a way to analyze the data is incredibly challenging Business Focus Areas of Big Data: Data mining Data analysis Data visualization

time-series information

is time stamped information collected at a particular frequency Examples: web visits per hour Sales per month Customer service calls per day

data model

logical data structures that detail the relationships among data elements using graphics and pictures

database

maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses). The core component of any system, regardless of size, is ia database and a database management system Companies store their information in databases, and managers access these systems to answer operational questions such as how many customers purchased product A in December or what were the average sales by region

algorithm

mathematical formulas places in software that performs an analysis on a data set Analytics: the science of fact-based decision making Analytics uses software based algorithms and statistics to derive meaning from data Advanced analytics uses data patterns to make forward-looking predictions to explain to the organization where it is headed

data gap analysis

occurs when a company examines its data to determine if it can meet business expectations, while identifying possible data gaps or where missing data might exist

information integrity issue

occurs when a system produces incorrect, inconsistent, or duplicate data Data integrity issues can cause managers to consider the system reports invalid and will make decisions based on other sources To ensure your systems do not suffer from data integrity issues, review the Five Common Characteristics of High-Quality Information: Accurate: is there an incorrect value in the information? Example: is the name spelled correctly? Is the dollar amount recorded properly? Complete: is a value missing from the information? Example: is the address complete including street, city, state, and zip code Consistent: is aggregate or summary information in agreement with detailed information? Example: do all total columns equal the true total of the individual item? Timely: is the information current with respect to business needs? Example: is information updated weekly, daily, or hourly? Unique: is each transaction and event represented only once in the information? Example: are there any duplicate customers?

information inconsistency

occurs when the same data element has different values Take for example the amount of work that needs to occur to update a customer who had changed her last name due to marriage - changing this information in only a few organizational systems will lead to data inconsistencies causing customer 123456 to be associated with two last names

distributed computing

processes and manages algorithms across many machines in a computing environment A key component of big data is a distributed computing environment that shares resources ranging from memory to networks to storage With distributed computing individual computers are networked together across geographical areas and work together to execute a workload of computing processes as if they were one single computing environment For example, you can distribute a set of programs on the same physical server and use a message service to allow them to communicate and pass information You can also have a distributed computing environment where many different systems or servers, each with its own computing memory, work together to solve a common problem

metadata

provides details about data For example, metadata for an image could include its size, resolution, and data created Metadata about a text document could contain document length, data created, author's name, and summary Each data element is given a description, such as Customer Name; metadata is provided for the type of data (text, numeric, alphanumeric, date, image, binary value) and descriptions of potential predefined values such as a certain area code; and finally the relationship is defined

real-time systems

provides real-time information in response to requests Many organizations use real-time systems to uncover key corporate transactional information Most people request real-time information without understanding one of the biggest pitfalls associated with real-time information- continual change Imagine the following scenario: three managers meet at the end of the day to discuss a business problem. Each manager has gathered information at different times during the day to create a picture of the situation. Each manager's picture may be different because of the time differences Their views on the business problem may not match because the information they are basing their analysis on is continually changing - this approach may not speed up decision making, and it may actually slow it down Business decision makers must evaluate the timeliness for the information for every decision Organizations do not want to find themselves using real-time information to make a bad decision faster

data governance

refers to the overall management of the availability, usability, integrity, and security of company data Every firm, large and small, should create an information policy concerning data governance It is important to note the difference between data governance and data stewardship Data governance focuses on enterprise-wide policies and procedures, while data stewardship focuses on the strategic implementation of the policies and procedures

data steward

responsible for ensuring the policies and procedures are implemented across the organization and acts as a liaison between the MIS department and the business Phoenix Arizona is not a good place to sell golf clubs because typical golfers in Phoenix are tourists and conventioneers who usually bring their clubs with them; the analysis further revealed that two of he best places to sell golf clubs in the US are Rochester, NY and Detroit, Michigan

affinity grouping analysis

reveals the relationship between variables along with the nature and frequency of the relationships Many people refer to affinity grouping algorithms as association rule generators because they create rules to determine the likelihood of events occurring together at a particular time or following each other in a logical progression Percentages usually reflect the patterns of these events, for example, "55% of the time events A and B occurred together" or "80% of the time that items A and B occurred together, they were followed by item C within three days

relational integrity constraint

rules that enforce basic and fundamental information-based constraints For example, a relational integrity constraint would not allow someone to create an order for a nonexistent customer, provide a markup percentage that was negative, or order zero pounds of raw materials from a supplier

integrity constraint

rules that help ensure the quality of information The database design needs to consider integrity constraints The database and DBMS ensures that users can never violate these constraints The specification and enforcement of integrity constraints produce higher-quality information that will provide better support for business decisions Organizations that establish specific procedures for developing integrity constraints typically see an increase in accuracy that then increases the use of organizational information by business professionals

entity (also referred to as a table)

stores information about a person, place, thing, transaction, or event Each entity is stored in a different two-dimensional table with rows and columns

relational database model

stores information in the form of logically related two-dimensional tables For flexibility in supporting business operations, managers need to query or search for answers to business questions such as which artist sold the most albums during a certain month The relationships in the relational database model help managers extract this information Many business managers are familiar with excel and other spreadsheet programs that they can use to store business data Although spreadsheets are excellent for supporting some data analysis, they offer limited functionality in terms of security, accessibility, and flexibility, and rarely scale to support business growth From a business perspective, relational databases offer many advantages over using a text document or a spreadsheet

fast data

the application of big data analytics to smaller data sets in near-real or real-time in order to solve a problem or create business value The term fast data is often associated with business intelligence and the goal is to quickly gather and mine structured and unstructured data so that action can be taken As the flood of data from sensors, actuators, and machine-to-machine (M2M) communication in the Internet of Things (IoT) continues to grow, it has become more important than ever for organizations to identify what data is time sensitive and should be acted upon right away and what data can sit in a data warehouse or data lake until there is reason to mine it

data aggregation

the collection of data from various sources for the purpose of data processing One example of a data aggregation is to gather information about particular groups based on specific variables such as age, profession, or income Businesses collect a tremendous amount of transactional information as part of their routine operations Marketing, sales, and other departments would like to analyze these data to understand their operations better Although databases store the details of all transactions (for instance, the sale of a product) and events (hiring a new employee), data warehouses store the same information but in an aggregated form more suited to supporting decision-making tasks Aggregation in this instance can include totals, counts, averages, and the like

attribute (also called columns or fields)

the data elements associated with an entity - at the top of the table the different categories

information redundancy

the duplication of data, or the storage of the same data in multiple places Redundant data can cause storage issues along with data integrity issue, making it difficult to determine which values are the most current or most accurate Employees become confused and frustrated when faced with incorrect information causing disruptions to business processes and procedures One primary goal of a database is to eliminate information redundancy by recording each piece of information in only one place in the database - this saves disk space, makes performing information updates easier, and improves information quality

data stewardship

the management and oversight of an organization's data assets to help provide business users with high-quality data that is easily accessible in a consistent manner

content creator

the person responsible for creating the original website content

content editor

the person responsible for updating and maintaining website content

physical view of information

the physical storage of information on a storage device

master data management (MDM)

the practice of gathering data and ensuring that is is uniform, accurate, consistent, and complete, including such entities as customers, suppliers, products, sales, employes, and other critical entities that are commonly integrated across organizational systems MDM is commonly included in data governance A company that supports a data governance program has a defined policy that specifies who is accountable for various portions or aspects of the data, including its accuracy, accessibility, consistency, timeliness, and completeness The policy should clearly define the processes concerning how to store, archive, back up, and secure the data In addition, the company should create a det of procedures identifying accessibility levels for employees Then, the firm should deploy controls and procedures that enforce government regulations and compliance with mandates such as Sarbanes-Oxley

d

the process of analyzing data to extract information not offered by the raw data alone Reports piled on a manager's desk provide summaries of past business activities and stock market data - unfortunately, these reports dont offer much insight into why these things are happening or what might happen over the next few months Data mining to the rescue Data mining can also begin at a summary information level (coarse granularity) and progress through increasing levels of detail (drilling down) or the reverse (drilling up) Companies use data mining techniques to compile a complete picture of their operations, all within a single view, allowing them to identify trends and improve forecasts The three elements of data mining include Data: foundation for data-directed decision making Discovery: process of identifying new patterns, trends, and insights Deployment: process of implementing discoveries to drive success One retailers discovered that loyalty program customers spent more over time and it strategically invested in specific marketing campaigns focused on these high spenders thereby maximizing revenue and reducing marketing costs One manufacturer discovered a sequence of events that preceded accidental release of toxic chemicals, allowing the factory to remain operational while it prevented dangerous accidents One insurance company discovered that one of its offices was able to process certain common claim types more quickly than others of comparable size Armed with this valuable information the company mimicked this office's best practices across its entire organization, improvising customer service DATA MINING PROCESS MODEL: data mining is a continuous process or cycle of activity where you continually revisi the problems with new projects This allows past models to be effectively reused to look for new opportunities in the present and future Data mining allows users to recycle their work to become more efficient and effective on solving future problems It is similar to creating a household budget and resuing the same basic budget year after year even though expenses and income change

data profiling

the process of collecting statistics and information about data in an existing source Insights extracted from data profiling can determine how easy or difficult it will be to use existing data for other purposes along with providing metrics on data quality, data mining analysis techniques

anomaly detection

the process of identifying rare or unexpected items or events in a data set that do not conform to other items in the data set One of the keys advantages of performing advanced analytics is to detect anomalies in the data to ensure they are not used in models creating false results Anomaly detection helps to identify outliers in the data that can cause problems with mathematical modeling

classification analysis

the process of organizing data into categories or groups for its most effective and efficient use For example, groups of political affiliation and charity donors The primary goal of a classification analysis is not to explore data to find interesting segments, but to decide the best way to classify records It is important to note that classification analysis is similar to cluster analysis because it segments data into distinct segments called classes; however, unlike cluster analysis, a classification analysis requires that all classes are defined before the analysis begins For example, in a classification analysis, the analyst defines two classes: 1) a class for customers who default on a loan; 2) a class for customers who did not default on a loan Cluster analysis is exploratory analysis and classification analysis is much less explanatory and more grouping Example: Age: if young, it asks student (yes or no) if old it asks credit score (yes or no)

data replication

the process of sharing information to ensure consistency between multiple data sources Data mining can determine relationships among such internal factors as price, product positioning, or staff skills, and external factors such as economic indicators, competition, and customer demographics In addition it can determining the impact on sales, customer satisfaction, and corporate profits and drill down into summary information to view detailed transactional data With data mining, a retailer could use point-of-sale records of customer purchases to send target promotions based on an individual's purchase history By mining demographic data from comment or warranty cards, the retailer could develop products and promotions to appeal to specific customer segments, data mining analysis techniques

data element (or data field)

the smallest or basic unit of information Data elements can include a customer's name, address, email, discount rate, preferred shipping method, product name, quantity ordered, and so on

structured query language (SQL)

users write lines of code to answer questions against a database MIS professionals typically have the skills required to code SQL

data mining tool

uses a variety of techniques to find patterns and relationships in large volumes of information that predict future behavior and guide decision making, to perform data mining, users need data mining tools Data mining uncovers trends and patterns, which analysts use to build models that, when exposed to new information sets, perform a variety of information analysis functions Data mining tools for data warehouses help users uncover business intelligence in their data Data mining uncovers patterns and trends for business analysis such as: Analyzing customer buying patterns to predict future marketing and promotion campaigns Building budgets and other financial information Detecting fraud by identifying deceptive spending patterns Finding the best customers who spend the most money Keeping customers from leaving or migrating to competitors Promoting and hiring employees to ensure success for both the company and the individual

competitive monitoring

when a company keeps tabs of its competitor's activities on the web using software that automatically tracks all competitor website activities such as discounts and new products BI can help managers with competitive monitoring Few examples of how managers can use BI to answer tough business questions Where has the business been? Historical perspective offers important variables for determining trends and patterns Where is the business now? Looking at the current business situation allows managers to take effective action to solve issues before they grow out of control Where is the business going? Setting strategic direction is critical for planning and creating solid business strategies


Related study sets

ATI Mental Health Theories & Therapies Assessment

View Set

Chapter 11: Real Estate Calculations

View Set

Pediatrics TEST BANK: The Child with Respiratory Dysfunction

View Set

General Insurance Chapter Test - Life and Health Insurance; Oregon; ExamFX.

View Set

ATI - DCSMA - Critical Care Medication

View Set

Grammar and Composition II Test 11

View Set

Managing People and Organizations Exam #2

View Set