ISA 235 Exam 2 2
Four common characteristics of Big Data:
-Variety: different forms of structured and unstructured data -Veracity: uncertainty of data, including biases, noise, and abnormalities -Volume: the scale of the data -Velocity: analysis of streaming data as it travels around the internet
Explain 4 data mining techniques
1) Estimation Analysis: determines values for an unknown continuous variable behavior or estimated future value 2) Affinity Grouping Analysis: reveals the relationship between variables along with the nature and frequency of the relationships 3) Cluster Analysis: technique used to divide information sets into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible 4) Classification Analysis: process of organizing data into categories or groups for its most effective and efficient use
Variety -Different forms of structured and unstructured data -Data from spreadsheets and databases as well as from email, videos, photos, and PDF's, all of which must be analyzed Veracity -The uncertainty of data, including biases, noise, and abnormalities -Uncertainty or untrustworthiness of data -Data must be meaningful to the problem being analyzed -Must keep data clean and implement processes to keep dirty data from accumulating in systems Volume -The scale of data -Includes enormous volumes of data generated daily -Massive volume created by machines and networks -Big data tools necessary to analyze zettabytes and brontobytes Velocity -The analysis of streaming data as it travels around the internet -Analysis necessary of social media messages spreading globally
4-V's of BigData - four common characteristics of big data:
·7 abilities of Agile MIS infrastructure
Accessibility, Availability, Maintainability, Portability (system is available to operate on different devices or software platforms), Reliability, Scalability (system can "scale up" or adapt to the increased demands in growth), Usability
have their own transmitter and a power source (typically a battery)
Active RFID Tags
Two major advantages over dial-up Can transmit and receive data much faster They have an always on connection to their ISP, users can simultaneously talk on the phone and access the internet Disadvantages Works over a limited physical distance and remains unavailable in many areas where the local telephone infrastructure does not support DSL technology
Advantage and Disadvantages of Digital subscriber line (DSL)
the maximum amount of data that can pass from one point to another in a unit of time Similar to a hose if the hose is large water can flow through it quickly, therefore, the speed of transmission of a network is determined by the speed of its smallest bandwidth measured in terms of bit rate (or data rate), the number of bits transferred or received per unit of time
Bandwidth
Easy to access large volumes of information Increased communications New marketplace
Benefits of Connected World
quickly replacing dial-up, a high-speed internet connection that is always connected High-speed in this case refers to any bandwidth greater than 2 Mbps
Broadband
details how a company recovers and restores critical business operations and systems after a disaster or extended disruption
Business continuity plan
1) Accurate 2) Timely 3) Consistent 4) Complete 5) Unique
Categorize the five common characteristics of high-quality information and rank them in order of importance for Hotels.com.
Ethics Privacy Legal
Challenges of Connected World
- Infrastructure as a Service (IaaS): delivery of computer hardware capability, including the use of servers, networking, and storage, as a service, EX. Amazon's Elastic Compute Cloud. - Data as a Service (DaaS): Facilitates the accessibility of business-critical data in a timely, secure, and affordable manner - Platform as a Service (PaaS): supports the deployment of entire systems including hardware, networking, and applications using a pay-per-use revenue model - Big Data as a Service (BDaaS): offers a cloud-based Big Data service to help organizations analyze massive amounts of data to solve business dilemmas
Cloud hosting services
technique used to divide information sets into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible
Clustering
Hot site - location that is ready for employees to immediately use in the event of a disaster includes the facility and fully ready to use equipment Warm site - location semi ready for employees to immediately use after a disaster includes the facility and computer equipment that still needs set up Cold site - a location that employees can move to in the event of an emergency only consists of the location no computer equipment
Compare the differences among a hot, cold, and warm site.
track corporate metrics such as critical success factors and key performance indicators and include advanced capabilities such as interactive controls, allowing users to manipulate data for analysis
Dashboard
contains a subset of data warehouse information. To distinguish between data warehouses and data marts, think of data warehouses as having a more organizational focus and data marts as having focused information subsets particular to the needs of a given business unit such as finance or production and operations
Data Mart
is a facility used to house management information systems and associated components, such as telecommunications and storage systems
Data centers
Three elements of data mining include:
Data: foundation for data-directed decision making -Discovery: Process of identifying new patterns, trends, and insights -Deployment: Process of implementing discoveries to drive success
creates, reads, updates, and deletes data in a database while controlling access and security There are two primary tools available for retrieving information from DBMS: - Query-by-example (QBE) tool: helps users graphically design the answer to a question against a database Managers typically interact with QBE tools - Structured Query Language (SQL): users write lines of code to answer questions against the database
Database Management System (DBMS):
-Creates, reads, updates, and deletes data in a database while controlling access and security There are two primary tools available for retrieving information from a DBMS: - Query-by-example (QBE) tool - that helps users graphically design the answer to a question against a database - Structured query language (SQL) - that asks users to write lines of code to answer questions against a database
Database management system (DBMS)
includes hardware, software, and telecommunications equipment that, when combined, provides the underlying foundation to support the organization's goals. Future of company depends of its ability to meet its partners, suppliers, and customers any time of the day in any geographic location
Describe the characteristics of an agile MIS infrastructure. What are the seven abilities?
Local area network (LAN) - connects a group of computers in close proximity, such as in an office building, school, or home. Wide area network (WAN) - spans a large geographic area such as a state, province, or country best example is the Internet Metropolitan area network (MAN) - is a large computer network usually spanning a city.
Describe the different wireless network categories.
NEEDS DONE
Develop a list of some possible entities and attributes located in Hotels.com database.
provides high-speed digital data transmission over standard telephone lines using broadband modem technology, allowing both Internet and telephone services to work over the same phone lines
Digital subscriber line (DSL)
Detailed process for recovering information or a system in the event of a catastrophic disaster includes such factors as which files and systems need to have backups and their corresponding frequency and methods along with the strategic location of the storage in a separate physical site that is geographically dispersed
Disaster recovery plan
to convert IP addresses into domains, or identifying labels that use a variety of recognizable naming conventions. Users don't have to remember 97.17.237.15 they can just remember www.apple.com
Domain Name System (DNS)
A marketing department could use a data visualization tool, like a time-series graph which is good at identifying trends in counts or numerical values over time, to see how people of different demographics they might be targeting are showing interest in a particular ad and which direction their interest is trending.
Explain how a marketing department could use a data visualization tool to help with the release of a new product.
is a collection of computers, often geographically dispersed, that are coordinated to solve a common problem. breaks down a problem into pieces and distributes it to many machines, allowing faster processing than could occur with a single system. Takes advantage of unused computer processing power to create a "virtual supercomputer" advantages it brings: makes better use of MIS resources, allows greater scalability as systems can easily grow
Explain the advantages of grid computing
an interactive website kept constantly updated and relevant to the needs of its customers using a database Advantages include: -Easy to manage content -Easy to store large amounts of data -Easy to eliminate human error
Explain the business benefits of a data-driven website.
Fault tolerance is a general term meaning that a system as a whole can handle problems even if parts of the system fail. Failover is a part of it usually; it offers an exact replica of a real-time data, and if the primary server crashes, the users are automatically directed to the secondary server or backup server
Explain the difference between fault tolerance and failover.
NEEDS DONE
Explain why a business today would want to follow sustainable MIS practices.
Protocols are essential for telecommunications to happen because they allow for sophisticated data encryption and user authentication to prevent people who weren't intended to receive a message from receiving it. An example of one of these protocols is wired equivalent privacy (WEP) which makes it so messages require a key inorder to be decrypted. An even more secure protocol is wi-fi protected access (WPA) which is a newer and more secure form of WEP.
Explain why protocols are essential for telecommunications to happen. Correctly use one or more of the protocols that were discussed in class to illustrate your answer.
which is a process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse. The data warehouse then sends subsets of the information to data marts
Extraction, transformation, and loading (ETL)
stores, views, and analyzes geographic data creating, multidimensional charts or maps
Geographic information systems (GIS)
is a satellite-based navigation system providing extremely accurate position, time, and speed information
Global positioning system (GPS)
creative environment of animated movies With increased grid computing power, the DreamWork's animators were able to add more realistic movement to water, fire, and magic scenes Can work faster and more efficiently, providing potential competitive advantage and additional cost savings
Grid Computing Example
are designation locations where Wi-Fi access points are publicly available
Hotspot
describes technologies that allow users to see or visualize data to transform information into a business perspective Using these tools beyond Excel graphs and charts to turn data into sophisticated analysis techniques such as controls, instruments, maps, time-series graphs, and more we can help uncover correlations and trends in data that would otherwise go unnoticed
How could a business use data visualization to identify new trends?
provides the means for portable devices to connect wirelessly to a local area network and works by using access points that send and receive data via radio waves. Its max range is about 1000 feet in open areas or parks and around 400 to 250 feet in closed areas such as an office building
How does Wi-Fi work? What is its range?
Domain name systems (DNS) - to convert IP addresses into domains, or identifying labels that use a variety of recognizable naming conventions. Users don't have to remember 97.17.237.15 they can just remember www.apple.com A DNS is like a phone book for the internet when you plug in a website domain that you can easily remember the DNS converts it into the IP address associated with that domain
How does a domain name system work?
works by using 24 global satellites that orbit Earth, sending signals to a receiver that can communicate with three or four satellites at a time. The satellites broadcast signals constantly while the receiver measures the time it takes for the signals to reach it. This measurement, which uses the speed of the signal to determine its location is taken from three distinct satellites to provide precise location information.
How does the GPS work?
is a unique number that identifies where computers are located on the network
IP Address
-Increased Flexibility -Increased Scalability and Performance -Reduced Information Redundancy -Increased Information Integrity (quality) -Increased Information Security
Identify the business advantages of a relational database.
Supporting operations: Information MIS infrastructure Identifies where and how important information, such as customer records, is maintained and secured Supporting change: Agile MIS infrastructure Includes the hardware, software, and telecommunications equipment that, when combined, provides the underlying foundation to support the organization's goals Supporting the environment: Sustainable MIS infrastructure Identifies ways that a company can grow in terms of computing resources while simultaneously becoming less dependent on hardware and energy consumption
Identify the three primary areas associated with an information MIS infrastructure.
refers to the extent of detail within the information (fine and detailed or coarse and abstract)
Information granularity
delivers hardware networking capabilities, including the use of servers, networking, and storage, over the cloud using a pay-per-use revenue model EX. Amazon's Elastic Compute Cloud Offers cost-effective solution for companies that need their computing resources to grow and shrink as business demands change Perfect for companies with research-intensive projects that need to process large amounts of information at irregular intervals
Infrastructure as a service (IaaS)
specializes in providing management, support, and maintenance to a network A level down from regional services providers
Internet service provider (ISP)
Things like data visualization allow users to visualize data to transform information into business perspective and to identify trends that would not be visible from the raw data
List the reasons a business would want to display information in a graphic or visual format.
connects a group of computers in proximity, such as in an office building, school, or home are often connected to others, and to wide area networks Allow sharing of files, printers, games, and other resources
Local Area Networks (LAN)
evaluates such items as websites and checkout scanner information to detect customer's buying behavior and predict future behavior by identifying affinities among customers' choices of products and services Inventory control, shelf-product placement, and other retail and marketing applications
Market-basket analysis
a large computer network usually spanning a city Most colleges and large companies that span a campus use this infrastructure supported
Metropolitan Area Network (MAN)
refers to the computer chip performance per dollar doubles every 18 months
Moore's law
Local Area Networks (LAN), Wide Area Networks (WAN), Metropolitan Area Network(MAN)
Network Categories
are traffic exchange points in the routing hierarchy of the Internet that connects NSPs Typically have regional or national coverage and connect to only a few NSPs
Network access points (NAP)
A new generation of database management systems that is not based on the traditional relational database model. Pros compared SQL: -Good for non-relational data -Schema-less architecture allows for frequent changes to the database and easy addition of varied data to the system -Easily scaleable, runs well on the cloud Cons compared to SQL: -Installation -Management -Can have slower response time Examples: Amazon DynamoDB, MongoDB, Couchbase, Riak
NoSQL
Cube - A multidimensional structure consisting of "Data Cubes" Dimension - Sides of a cube Measure - Facts in a fact table Aggregation - Projection of the cube 4 types of aggregation: Sum, Count, MIN, MAX
OLAP Buzz Words
provides advanced query capabilities to the warehouse that standard SQL cannot Complex queries that need to aggregate data can take hours to run End users cannot be expected to issue SQL statements
OLAP cube
do not have a power source Draw power from the RFID reader, which sends out electromagnetic waves that induce a current in the tag's antenna
Passive RFID Tags
supports the deployment of entire systems, including hardware, networking, and applications using a pay-per-use revenue model Helps companies minimize operational costs and increased productivity by providing all the following without up-front investment: Increased security, access to information anywhere and anytime, Centralized information management, easy collaboration with partners, suppliers, and customers, and increased speed to market with significantly less cost
Platform as a service (PaaS)
is an electronic identification device that is made up of a chip and antenna
RFID Tag
stores information in the form of logically related two-dimensional tables
Relational Database Model
To create an SSL connection, a web server requires a SSL Certificate, an electronic document that confirms the identity of a website or server and verifies that a public key belongs to a trustworthy individual or company
SSL Certificate
combination of HTTP and SSL to provide encryption and secure identification of an internet server. HTTPS protects against interception of communications, transferring credit card information safely and securely. When a user enters a web address using http:// the browser will encrypt the message
Secure hypertext transfer protocol (SHTTP or HTTPS)
is a standard security technology for establishing an encrypted link between a web server and a browser, ensuring that all data passed between them remain private
Secure sockets layer (SSL)
use a battery to run the microchip's circuitry, but communicate by drawing power from the RFID reader
Semi-Passive RFID Tags
delivers applications over the cloud using a pay-per-use revenue model
Software as a service (SaaS)
explain how Data Warehousing and Data marts support business decisions.
Support business decisions because if you have a large amount of data and no data marts employees of all departments will be receiving all of the data which can be overwhelming, complex, and too much information
provides the technical foundation for the public internet as well as for large numbers of private networks Another way to understand TCP/IP Consider a letter that needs to go from Denver to California. TCP makes sure the envelope is delivered and does not get lost along the way. IP acts as the sending and receiving labels, telling the letter carrier where to deliver the envelope and whom it was from
Transmission control protocol/internet protocol (TCP/IP)
private secure internet access in effect a "private tunnel"
Virtual private network (VPN)
creates multiple "virtual" machines on a single computing device - 3 basic categories - Storage virtualization; combines multiple network storage devices so they appear to be a single storage device - Network virtualization; combines networks by splitting the available bandwidth into independent channels that can be assigned in real time to a specific device - Server virtualization; combines the physical resources, such as servers, processors, and operating systems, from the applications (this is the most common form and typically when you hear the term virtualization, you can assume server virtualization
Virtualization
applications are in every kind of company vehicle these days from police cars to bulldozers, from dump trucks to mayoral limousines. Emergency response systems use this to track each of their vehicles and so dispatch those closest to the scene of an accident.
What are Advantages of GPS?
1) Information type 2) Information timeliness 3) information quality 4) information governance
What are the four primary traits that help determine the value of information?
NEEDS DONE
What are the steps taken by big technology companies such as Google, Microsoft, and Amazon to build a sustainable MIS infrastructure?
collection of large, complex data sets, including structured and unstructured data, which cannot be analyzed using traditional database methods and tools
What is BigData
table that provides the data for the elements of the Cube. There can be only one per Cube!
What is Fact Table?
refers to the computer chip performance per dollar doubling every 18 months. Great for supporting a connected corporation, but significant unintended side effects include our dependence on fossil fuels and increased need for safe disposal of outdated computing equipment
What is Moore's Law and how does it affect companies?
uses electronic tags and labels to identify objects wirelessly over short distances. It holds the promise of replacing existing identification technologies such as the bar code
What is RFID
A hot site is a separate and fully equipped facility where the company can move immediately after a disaster and resume business. A warm site is one step down from this, it is a facility with computer equipment that still requires installation and configuration before use. Finally a cold site is another step down it is only the facility with no computer equipment where employees can move after a disaster
What is a hot site? How is it different from cold and warm sites?
is a service that uses GPS technology to track a person's location and to provide targeted information accordingly. Examples include google maps, yelp, and facebook places
What is a location based service? Give an example.
is a unique number that identifies where computers are located on the network to allow data to reach its intended destination Changes with Wi-Fi
What is an IP address? Does it change with location or Wi-Fi?
refers to the overall management of the availability, usability, integrity, and security of company data
What is data governance and its importance to a company?
continuous process or cycle of activity where you continually revisit problems with new projects Goal: not to explore the data to find interesting segments, but to decide the best way to classify records
What is data mining
is the efficient coexistence of telephone, video, and data communication within a single network, offering convenience and flexibility not possible with separate infrastructures Important to business: Allow for multiple services, multiple devices, but one network, one vendor, and one bill Offering convenience and flexibility not possible with separate infrastructures
What is network convergence and why is it important to a business?
the Dimensional Data Warehouse is a Summary of Tables That Exist in the Relational Database
What is the Cube?
VoIP uses IP technology to transmit telephone calls, example skype, while IPTV distributes video content using IP actress the internet and private IP networks, comcast for example.
What is the difference between VoIP and IPTV?
The key difference is when the plan takes effect. For example, business continuity requires you to keep operations functional during the event and immediately after. Disaster recovery focuses on how you respond after the event has completed and how you return to normal.
What is the difference between a disaster recovery plan and a business continuity plan?
An intranet is a private network, operated by a large company or other organisation, which uses internet technologies, but is insulated from the global internet. An extranet is an intranet that is accessible to some people from outside the company, or possibly shared by more than one organisation
What is the difference between an intranet and extranet?
Business decisions are only as good as the quality of the information used to make them. Bad data can lead to information inconsistency which occurs when the same data element has different values. This then leads to information integrity issues which occur when a system produces incorrect, inconsistent, or duplicate data. Using the wrong information can lead managers to make erroneous decisions. These in turn can cost time, money, reputations, and even jobs Business managers can ensure they do not suffer from data integrity issues by going over the five characteristics common to high quality information which are: accuracy, completeness, consistency, timeliness, and uniqueness
Why does a business need to be concerned with the quality of its data?
A company would want to use virtualization because it can increase availability of applications that can give a higher level of performance depending on the hardware used, increase energy efficiency by requiring less hardware to run multiple systems or applications, and it can increase hardware usability by running multiple operating systems on a single computer
Why would a company want to use virtualization?
a wireless security protocol to protect Wi-Fi networks it is an improvement on the original Wi-Fi security standard, WEP, and provides more sophisticated data encryption and user authentication
Wi-Fi protected access (WPA)
Spans a large geographic area such as a state, province, or country Essential for carrying out day-to-day activities of many companies and government organizations, allowing them to transmit and receive information among their employees, customers, suppliers, business partners, and other organizations across cities, regions, and countries around the world
Wide Area Network (WAN)
is an encryption algorithm designed to protect wireless transmission data. If you are using a Wi-Fi connection, encrypts the data by using a key that converts the data to a nonhuman readable form. The purpose was to provide wireless networks with the equivalent level of security as wired networks. Unfortunately the technology behind it has been demonstrated to be relatively insecure compared to newer protocols such as WPA.
Wired equivalent privacy (WEP)
is a means by which portable devices can connect wirelessly to a local area network, using access points that send and receive data via radio waves operates at considerably higher frequencies than cell phone use, which allows greater bandwidth
Wireless fidelity (Wi-Fi)
information collected from multiple sources such as suppliers, customers, competitors, partners, and industries that analyze patterns, trends, and relationships for a strategic decision making
business intelligence
including the carbon dioxide and carbon monoxide produced by business processes and systems
carbon emission
Public Cloud: promotes massive, global, industrywide applications offered to the general public EX. Amazon Web Services, Google Cloud Connect Private Cloud: serves only one customer or organization and can be located on the customer's premises or off the customer's premises Downside: require significant investment of time and money to set them up Community Cloud: serves a specific community with common business models, security requirements, and compliance considerations Hybrid Cloud: includes two or more private, public, or community clouds, but each cloud remains separate and is only linked by technology that enables data and application portability NEEDS EXAMPLES FOR REST
cloud computing environments
process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information Occurs first during the ETL process of data warehousing and occurs again one the information is in the data warehouse
data cleansing
contains a subset of data warehouse information. To distinguish between data warehouses and data marts, think of data warehouses as having a more organizational focus and data marts as having focused information subsets particular to the needs of a given business unit such as finance or production and operations
data mart
is the process of analyzing data to extract information not offered by the raw data alone. Can also begin at a summary information level (coarse granularity) and progress through increasing levels of detail (drilling down) or the reverse (drilling up)
data mining
describes technologies that allow users to see or visualize data to transform information into a business perspective
data visualization
is a logical collection of information - gathered from many different operational databases - that supports business analysis activities and decision-making tasks Primary purpose is to combine information, more specifically, strategic information, throughout an organization into a single repository in such a way that the people who need that information can make decisions and undertake business analysis go even a step further by standardizing information. Gender, for instance can be referred to in many ways (Male, Female, M/F, 1/0) should be standardized on a data warehouse with one common way of referring to each data element
data warehouse
Data warehouse - a logical collection of information - gathered from many different operational databases - that supports business analysis activities and decision-making tasks Primary purpose is to combine information, more specifically, strategic information, throughout an organization into a single repository in such a way that the people who need that information can make decisions and undertake business analysis The data warehouse then is simply a tool that enables business users, typically managers, to be more effective in many ways, including: -Developing customer profiles -Identifying new-product opportunities -Improving business operations -Identifying financial issues -Analyzing trends -Understanding competitors -Understanding product performance
data warehousing
maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses)
database
refers to discarded, obsolete, or broken electronic devices
e-waste
a primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables
foreign key
a collection of computers, often geographically dispersed, that are coordinated to solve a common problem. With this a problem is broken into pieces and distributed to many machines, allowing faster processing than could occur with a single system
grid computing
A large retailer could track inventory with RFID technology if they had RFID tags, which are electronic identification devices made up of a chip and antenna, on their products. This would allow them to use a RFID reader which is a transmitter/receiver that reads the contents of RFID tags in an area to easily scan through a warehouse and count all of the inventory.
how could RFID help a large retailer track inventory?
is the practice of gathering data and ensuring that it is uniform, accurate, consistent, and complete, including such entities as customers, suppliers, products, sales, employees, and other critical entities that are commonly integrated across organizational systems
master data management (MDM)
is a field (or group of fields) that uniquely identifies a given record in a table
primary key
Stores information in the form of logically related two-dimensional tables Business advantages: -Increased flexibility -Increased scalability and performance -Reduced information redundancy -Increased information integrity -Increased information security allows users to create, read, update, and delete data.
relational database