Information Systems Final
Unstructured data in NoSQL
- Applications now create massive volumes of new, rapidly changing data types: structured, semi-structured, and unstructured data. -Relational databases were not designed to cope with the scale and agility challenges that face modern applications.
Evaluating information quality: Purpose
--Knowing the motive behind the page's creation can help you judge its content. --Who is the intended audience? Scholarly audience or experts? General public or novices? --If not stated, what do you think is the purpose of the site? Is the purpose to: Inform or Teach? Explain or Enlighten? Persuade? Sell a Product?
Mobile Site vs. Responsive Site
-Should you create a dedicated mobile site or use a responsive design? It's a question experts are divided on. -For many sites, content and feature parity is the right answer -For others a good mobile experience means trimming down on content and features --A deep information architecture (often supported by cascading menus on the desktop) doesn't translate well on mobile; it often forces the user to take too many steps to reach the content.
T & E Frauds - occurrence and losses
-T&E frauds alone account for about 16% of all frauds uncovered. -T & E fraud total losses can represent a significant sum for a company.
Scraping with the Google Chrome Extension:
-Web Scraper is an extension for the Chrome browser made exclusively for web data scraping. -You can setup a plan (sitemap) on how to navigate a website and specify the data to be extracted. -The scraper will traverse the website according to the setup and extract the relevant data. -It lets you export the extracted data to CSV.
Evaluating information quality
-You can find information related to companies, products, people, governments, research, etc. that is readily available through the Internet -Accessibility and volume of information do not equal Quality
Web query in Excel
-You can use Excel to import data appearing on a web page -Use "Data" ribbon in Excel and then choose "From Web" Option -Next you provide a URL in a dialog box that looks like a web browser -A box with an arrow appears next to html tables on the page -You can select a table on the page by clicking on the arrow next to it -You can now import data from the selected table into Excel by clicking on the "Import" button -Next, provide where you would like your imported data to appear -You can later click Properties to modify Refresh frequency
A company buys and receives products from suppliers that they will sell on their e-commerce site. Once they receive the products from the suppliers and record it in their Purchasing System; they create a spreadsheet of products and quantities received; and then manually update the quantity available for sale in their ecommerce system. Options for real-time system integration software are: a. Design and build integration software that automates ETL for weekly data transfers b. Avoid using a database and instead, rely on integration software for better control c. Buy real-time system integration software from a software provider and implement it d. Design and build a new Purchasing System that uses larger relational database software e. 2 of the above are correct
c. Buy real-time system integration software from a software provider and implement it
Big and unstructured data is often stored in a NoSQL database for which of these reasons: a. It needs a tabular row and column structure b. It needs a relationship structure with key fields and foreign keys c. Data is spread over multiple servers leading to better response times for queries d. Data is concentrated in one server leading to a better reporting response time e. 2 of the above are correct
c. Data is spread over multiple servers leading to better response times for queries
Which of the following applies to a company that is OK with losing 12 hours worth of newly entered data: a. RTO = 12 b. RTO = .5 c. RPO = 12 d. Data backup is done on a tape (a storage medium) at your place of business each evening e. None of the above are correct
c. RPO = 12
A(n) _____ can replace many applications with one unified set of programs, this system can then be used to manage all its vital business operations. a. materials resource planning system b. management information system c. enterprise resource planning system d. decision support system e. None of them
c. enterprise resource planning system
Transform
clean and manipulate data
Security
consists of the protections or safeguards put in place to secure protected information. It requires that administrative, technical, and physical safeguards are developed and used
Which of the following is an example that would need to use an ETL process: f. Moving data from the old system into a newly developed and deployed system a. Entering a Sale Order in the transaction processing system b. Entering a Car Rental in an ERP system c. Merging sales data with advertising data and deploying the merged data into a business intelligence tool d. 2 of the above are correct.
d. 2 of the above are correct.
Transaction processing system output might be used for: f. Developing a predictive model for future student enrollment a. Reporting for evaluating revenue in an e-commerce sales system b. Input into a separate system c. 2 of the above are correct d. All of the above are correct.
d. All of the above are correct.
Multi-valued fields in relational databases are: a. Allowed in line item tables b. Allowed because they make update queries faster c. Allowed because they make the retrieving of data in a query faster d. Not allowed e. 2 of the above are correct
d. Not allowed
Which of the following is true about structured data from an Online Transaction Processing System? a. Data is batched, and transactions are created once a day b. Transactions are typically stored in a NoSQL database c.Involves unstructured data about 80% of the time d. Point of Sale system that reads a UPC code is an example e. 2 of the above are correct
d. Point of Sale system that reads a UPC code is an example
After assessing security threats and procedures for IT Systems: a. Applications and endpoints will always share the same risk level b. Applications and servers will always share the same risk level c. Servers and endpoints will always share the same risk level d. Servers and endpoints might have different security risk levels e. All parts of an IT system share the same risk level if they are networked
d. Servers and endpoints might have different security risk levels
A business intelligence (BI) predictive model that predicts whether a student accepted to the Tippie COB will actually enroll in the Tippie COB; might use: a. Structured Data only b. Unstructured Data only c. Internal Unstructured Data only d. Structured and Unstructured Data e. None of the above are used in a BI model
d. Structured and Unstructured Data
Which of the following is NOT considered a business intelligence (BI) practice or tool? a. Data extraction b. Visualization c. Predictive Modeling d. Transaction processing controls e. All of the above are BI related
d. Transaction processing controls
Transaction processing systems that support a large enterprise typically store data in a: a. NoSQL database b. Excel sheet c. Access database d. Relational database e. None of the above
d.Relational database
A TPS provides valuable input to
decision support systems, knowledge management systems
Which of the following is an example of unstructured data: a University of Iowa course registrations b. Social media posts c. University of Iowa payment amounts received for tuition d. Clickstream data e. 2 of the above are examples
e. 2 of the above are examples
Types of data that can be moved from the source to Access include: a. Transaction data b. Unstructured data like web ad data c. Data stored in a different relational database d. Sentiment Analysis customer scores stored in your ERP system e. All of the above
e. All of the above
Data used in business intelligence (BI) analysis could come from which of the following sources: a. An OLTP internal system b. Unstructured data from sources outside the company c. Unstructured data from sources inside the company d. 2 of the above are correct e. All of the above are correct
e. All of the above are correct
NoSQL databases require a company to: a. Model a business process prior to building the database and then not allow changes to the database b. Specifically define the data type for each attribute in the NoSQL table c. Store data in rows and columns d. All of the above are correct e. None of the above are correct
e. None of the above are correct
Before importing data into Access from another source; like Excel; the data should be cleaned. Converting multi valued columns in Excel into multiple columns in Excel prior to moving to Access: a. Is not necessary as Access uses multi-valued fields b. Is not necessary as Access converts multi-valued Excel columns into multiple columns when you run a Make Table query c. Is not necessary as Access converts multi-valued Excel columns into multiple columns when you run an Append Table query d. Is necessary only if you have more than 1 table in Access e. None of the above are correct.
e. None of the above are correct.
RPO refers to
generally refers to data backups and data availability - for example, does a company require data backups that are no more than four hours old? A day? A week? Depending on the nature of the business, the RPO could differ. -RPO is all about data and how much data a client can afford to lose in a disaster scenario. -The RPO defines the extent of acceptable data loss for a client. -Critical business decisions need to be made regarding backups (data replication, offsite backup data storage)
Structured data
information easily displayed in titled columns and rows which can easily be ordered and processed. -could be visualized as a perfectly organized filing cabinet where everything is identified, labeled, and easy to access -uploads neatly into a relational database (traditional row database structure) -easily detectable via queries, search operations, or simple algorithms -leaves out immense amounts of material that does not fit simply into a firm's organization of information
Load
into new system, divide into tables
Transaction Processing System (TPS)
monitors, collects, and stores transaction data in real time -Capture and process detailed data necessary to update the organization's records about fundamental business operations
How to deal with unstructured data
need to extract elements that matter and figure out how they are related (analyze)
Extract
need to know where to extract data from
What is included in TPS
order entry, inventory control, payroll, accounts payable, accounts receivable, general ledger, etc.
Transaction Processing System (book definition)
organized collection of people, procedures, software, databases, and devices used to process and record business transactions
Unstructured data
raw and unorganized; no pre-defined data model (text and relationships are ambiguous) -sometimes there is some structure but it is not neatly spelled out and traditional software cannot easily digest it
Privacy
refers to an individual's right to control both access to and use of his or her information
Confidentiality
relates to the right of an individual to the protection of their information during storage, transfer, and use, in order to prevent unauthorized disclosure of that information to third parties
relational database model
simple but highly useful way to organize data into collections of two-dimensional tables called relations - each row in the table represents an entity and each column represents an attribute of that entity
Privacy policy
statement that describes what the organization's practices are
What should be done to make the best business decisions
structured and unstructured data must both be consulted, queried, and leveraged to make the best business decisions
Information security
tasks and policies designed to guard digital information, which may be... -Processed by a computer (e.g. personal computer or hand-held device) -Stored on a magnetic or optical storage device (e.g. hard drive, DVD, or USB drive) -Transmitted over a network space -Layered Approach
Types of unstructured data
text files, email, social media, websites, mobile data (text messages, locations), communications, media, business applications, customer service interactions
RPO (Recovery Point Objective)
the maximum possible time period in which data can be lost after a system failure. -If your RPO is ten days then that means that any database outage should compromise a maximum of ten days' worth of data. -This parameter should consider how much data can be lost without hampering normal business operations, and it helps to determine which files should be backed up and how often.
Recovery Time Objective (RTO)
the maximum tolerable time allowed to recover systems after a disaster scenario. -This could be failure of servers, network, storage or even a complete outage of the entire infrastructure. -RTO is the time a business can afford to be without critical services before incurring significant losses
What type of data is big data
unstructured data because it is collecting data on virtually every human activity
Unstructured Data Types
Often, unstructured data is ultimately related back to the company's structured data records. --Blend this data with systems of record transactional data so that employees have more complete information at their fingertips.
Businesses rely on enterprise systems to perform daily activities in areas such as:
Product supply and distribution Sales and marketing Human resources Manufacturing Accounting and taxes
TPS -> ?
Relational Database
Web Scraping Steps
Review powerpoint
What might you web scrape
--Product prices for your competitors. https://www.binnys.com/beer?binnysStore=35&cat=375&product_list_limit=72 --A listing of real estate agents with phone number and email for marketing your Home Selling Software product. https://homes.lepickroeger.com/idx/roster --A list of all retail stores in a city for a location analysis project; where might I open my new business? Iowa City Downtown District https://downtowniowacity.com/listing-category/shops/ --A list of last months restaurant inspections for a research project. http://apps.macombgov.org/inspectionresult/FacilitySelection.aspx?ZipCode=48026 --A list of all apartment buildings in a city. --A list of products that large e-commerce sites sell that are Made-in-USA; where might I sell my Made-in-USA product? Does it fit within their product lines? Is my pricing reasonable? https://huckberry.com/store/t/category/clothing/outerwear
Analyze transactions for indicators of known fraud risks example
-An employee may be authorized, for example, to use a P-Card for purchases of specific business items. --If an analysis of P-Card data shows that a purchase was made from a consumer products store, this could be a strong indication of an actual fraud. -Identify expenses relating to airfares and hotels in non-standard locations (e.g., exotic resorts)
Card and Card-like Layout design patterns
-Cards are a way of organizing different topics. -Card layouts also respond well to responsive designs. -Smaller screens have actually helped the web to become more modular -Pages can be broken up into their constituent parts, and reordered on the fly, depending on browser or screen sizes. -Cards work for displaying information in bite-sized chunks. -Cards will often include information such as a title, a user name, a picture, and various icons, a brief amount of text (like a product description)
Dashboards
-Create a spreadsheet and map out the data behind your most important KPIs. --Each KPI you track will have at least one data point originating from one system or another. -Gathering your data to feed your dashboard likely requires going to multiple services. --SQL queries for your structured data --Maybe using APIs to automate data retrieval from other sources like social media --Tableau can connect to your TPS database and get real-time data, and it can blend data from sources such as Excel, Google Analytics and APIs. -You can build an effective KPI dashboard in Excel. --How do you update the data on that dashboard? Many use software like Tableau
Mobile E-Commerce: Cyber Monday
-Cyber Monday 2017, mobile saw its first $2 billion day, -Claiming just under a third of the money spent online. -According to Business Insider, smartphones accounted for 37.6 percent of retail visits on Cyber Monday, and 21 percent of the revenue. -Smartphone conversion rates were up by roughly 10 percent.
What situation is data analysis particularly effective related to employee fraud
-Effective controls in both areas usually depend upon regular approvals by appropriately authorized individuals. -Over time, review and approval processes become less stringent and effective. -Fortunately, this situation is one in which data analysis can be particularly effective. By analyzing millions of transactions and looking for a variety of indicators of fraud, data analysis can make up for control weaknesses and rapidly identify where fraud has occurred.
Examples of well-defined KPIs:
-For a university. Increase (direction) the five-year graduation rate for incoming freshman (measure) to at least 80 percent (target) starting with the graduating class of 2022 (time frame) -For a customer service department. Increase (direction) the number of customer phone calls answered within the first four rings (measure) to at least 90 percent (target) within the next three months (time frame) -For an HR organization. Reduce (direction) the number of voluntary resignations and terminations for performance (measure) to 6 percent or less (target) for the 2018 fiscal year and subsequent years (time frame)
Scraping with the Google Chrome Extension: Installation and setup
-Go to https://chrome.google.com/webstore/detail/web-scraper/jnhgnonknehpejjnehehllkliplmbmhn?hl=en -Add the web scraper extension to Chrome. -Once this is done, you are ready to start scraping any website using your chrome browser.
RPO Example
-If you're forced to restore your database from a backup, you want to lose as few transactions as possible. -e-commerce website that receives an average of one transaction every three hours -Dealing with one lost transaction might be manageable, but dealing with eight would cause more of a problem. -In this instance, you'd want to set your RPO at three hours (or shorter) to minimize damage. -A daily backup wouldn't be good enough.
NoSQL databases address several issues that the relational model is not designed to address:
-Large volumes of rapidly changing structured, semi-structured, and unstructured data -Relational databases require that schemas be defined before you can add data. For example, you might want to store data about your customers such as phone numbers, first and last name, address, city and state - a SQL database needs to know what you are storing in advance. -NoSQL databases are built to allow the insertion of data without a predefined schema. -Document databases do away with the table-and-row model altogether, storing all relevant data together in single 'document' in JSON or XML
Analyze entire populations of transactional data
-Look for various forms of anomalies. -Transactional data analysis does not necessarily prove that fraud has occurred, but it can be a very effective way of highlighting a situation that just does not seem to make sense and warrants further investigation. --Why, for example, would one employee with the same job responsibilities as a hundred others claim 50% more travel expenses?
Measures
-Metrics that track progress in executing chosen strategies to attain organizational objectives and goals --These metrics are also called key performance indicators (KPIs) and consist of a direction, measure, target, and time frame --KPIs have specific targets that directly impact business outcomes. --Revenue is a solid KPI for every business, but how about social media followers?
Web Scraping
-Process of extracting data from websites -You could go website to website and copy and paste data into Excel or word; clean it; then use it. -You could automate the extraction with software; clean it; then use it.
Employee Fraud
Fraudulent use of purchasing or procurement cards (P-Cards) and fraudulent claims for travel and entertainment expenses (T&E) rank among the most commonly occurring types of employee fraud.
How much of business-relevant information originates in unstructured form?
80%
Why process and system integration is important
After an order is made, inventory needs to be updated and after an order ships, sales channel needs to be updated with delivery information to avoid errors
Example of Integration Software: SelluSeller: Integrate with your ERP Order Management Module (system of record)
Allows you to Integrate with your existing ERP Order Management Module (like SAP) to update Orders in real-time and update warehouse Inventory in real-time and directly sync your warehouse inventory with you marketplaces/channels.
Evaluating information quality: Accuracy
Are the sources for factual information clearly listed so that the information can be verified? Is it clear who has the ultimate responsibility for the accuracy of the content of the material? Can you verify any of the information in independent sources or from your own knowledge? Has the information been reviewed or refereed? Is the information free of grammatical, spelling, or typographical errors?
Integration Software - how to do it
Build it or buy it
What principles do companies usually follow
Companies usually follow the Fair Information Practices Principles (FIPP) set forth by the Federal Trade Commission (FTC) --FIPP are guidelines that represent widely accepted concepts concerning fair information practice in an electronic marketplace --Provide guidance for how to deal specifically with personal information
Scraping with the Google Chrome Extension: What you need
Google Chrome browser Login to Google and Chrome - sync if you want it to follow you**** A working internet connection
Acceptable downtime and data loss
How long they can afford to be offline during a disaster (the recovery time objective or RTO) and how much data they can afford to lose (the recovery point objective or RPO). -These factors influence the way that data is copied and made available.
Evaluating information quality: Currency
If timeliness of the information is important, is it kept up-to-date? Is there an indication of when the site was last updated?
Dashboards should be...
Informative and geared towards monitoring and tracking KPIs.
Structured Data
Structured data is information easily displayed in titled columns and rows which can easily be ordered and processed. --TPS -> Relational DB
Evaluating information quality: Objectivity
Is the information covered fact, opinion, or propaganda? Is the author's point-of-view objective and impartial? Is the language free of emotion-rousing words and bias? Is the author affiliated with an organization? Does the author's affiliation with an institution or organization appear to bias the information? Does the content of the page have the official approval of the institution, organization, or company?
Compare data across different databases and systems example
T&E payment data compared to HR records to see if there are instances in which an employee has been using a P-Card or claiming expenses while on vacation
Evaluating information quality: author
Is the name of the author/creator on the page? Are his/her credentials listed (occupation, years of experience, position or education)? Is the author qualified to write on the given topic? Why? Is there contact information, such as an email address, somewhere on the page? Is there a link to a homepage? If there is a link to a homepage, is it for an individual or for an organization? If the author is with an organization, does it appear to support or sponsor the page? What does the domain name/URL reveal about the source of the information, if anything? If the owner is not identified, what can you tell about the origin of the site from the address?
Why is the relational database model widely used
It is easier to control, more flexible, and more intuitive than other approaches because it organizes data in tables
Example of NoSQL
MongoDB
Types of structured data
OLTP, Input/processing/output, transaction data, master data, relational DB, Application controls, Prevent controls
Employee Fraud and Data Analysis
Data analysis can be particularly effective. --By analyzing millions of transactions and looking for a variety of indicators of fraud, data analysis can make up for control weaknesses and rapidly identify where fraud has occurred
Online Transaction Processing (OLTP)
Data processing in which each transaction is processed immediately -at any time, the data in an online system reflects the current status -many organizations find that OLTP enables them to provide faster, more efficient service -put data into databases quickly, reliably and efficiently but do not support big data analysis that today's businesses and organizations require
Security (midterm definition)
Determine the overall risk level by reviewing the data, server, and application risk classification examples and selecting the highest applicable risk designation across all -Implement the security standards for the level of risk based on loss of confidentiality, loss of integrity, and loss of availability --amount of security is directly related to risk
Example of unstructured data
Email holds information such as the time sent, subject, and sender (all uniform fields), but the content of the message is not so easily broken down and categorized.
Enterprise Resource Planning System
Ensures that information can be shared across all business functions and all levels of management to support the running and managing of a business
http://www.hangoutmusicfest.com/
Ex of cards
True or False: Responsive design helps to avoid an inconsistent experience for users on different devices.
True
Evaluating information quality: Reliability and Credibility
Why should anyone believe information from this site? Does the information appear to be valid and well-researched, or is it unsupported by evidence? Are quotes and other strong assertions backed by sources that you could check through other means? What institution (company, government, university, etc.) supports this information? If it is an institution, have you heard of it before? Can you find more information about it? Is there a non-Web equivalent of this material that would provide a way of verifying its legitimacy?
Example of evaluating information quality
You can find many sites that rank high schools: What data do they use? Where is the data from? Does the information truly represent the overall population? Is there a potential bias?
NoSQL database
a way to store and retrieve data that is modeled using some means other than the simple two-dimensional tabular relations used in relational databases
Responsive design helps to furnish: a, A full experience at the mobile level b. A full experience at the desktop level c. A full experience at the mobile level only d. Site design changes as the display used by the user increases in screen size
a. A full experience at the mobile level b. A full experience at the desktop level c. Site design changes as the display used by the user increases in screen size
Which of the following, (1) money deposited in a bank account, (2) student recording her answer to a question in an online test, (3) customer adding an item to the online shopping cart, are considered transactions in an information system? a. All of them b. 1 and 3 only c. 1 only d. None of them
a. All of them
Which of the following is true about structured data from an Online Transaction Processing System? a. Each transaction is processed at the time of entry b. Is based upon Business Intelligence analysis c. Data is always captured with a paper document d. Involves unstructured data about 80% of the time e. 2 of the above are correct
a. Each transaction is processed at the time of entry
designing a site with a mobile-first approach requires: a. Prioritizing what content goes on the site. b. Taking your existing website, and using software; converting it to a mobile site while keeping the existing content. c. A stripped down site with little functionality provided. d. A careful analysis of the functionality needed by users.
a. Prioritizing what content goes on the site. d. A careful analysis of the functionality needed by users.
A company buys and receives products from suppliers that they will sell on their e-commerce site. Once they receive the products from the suppliers and record it in their Purchasing System; they create a spreadsheet of products and quantities received; and then manually update the quantity available for sale in their ecommerce system. Real-time system integration might enable them to: a. Sell more products by having more accurate and real-time data b. Avoid using a database and instead, rely on integration software for better control c. Avoid using a spreadsheet and instead, rely on an Access database for data entry and control d. Avoid using a spreadsheet and instead, rely on ETL for weekly data transfers e. 2 of the above are correct
a. Sell more products by having more accurate and real-time data
Advantages of a relational database
allows tables to be linked which reduces data redundancy and allows data to be organized more logically
An RPO is: a. The same for all systems at a business as all data is backed-up b. Different depending on each system at a business c. Equal to the RTO for a business d. Always less than the RTO for a business e. None of the above are correct
b. Different depending on each system at a business
When assessing security threats and procedures for endpoints; which of the following is True: a. All endpoints have the same risk level b. All endpoints implement the same standard security procedures c. All endpoints should have an up-to-date Operating System installed d. All endpoints should have data encrypted e. 2 of the above are True
c. All endpoints should have an up-to-date Operating System installed
When assessing security threats and procedures for servers; which of the following is True: a. All servers have the same risk level if they are networked b. All servers implement the same standard security procedures c. All servers should have an up-to-date Operating System installed d. All servers should have data encrypted e. 2 of the above are True
c. All servers should have an up-to-date Operating System installed