Exam 3 ISM3004
Protocol
"rules of the road"; Requirements that make sure that system functions reliably
What is a Data Warehouse? What are its characteristics?
Collection of databases that supports decision making -Many sources -Operational systems-periodic transfer -Historical data -Fast Queries -Exploration
Network
Collection of devices connected together via communications devices and transmission media
Business Intelligence
Combining aspects of reporting, data exploration & ad hoc queries, & sophisticated data modeling & analysis
Views
Display relationships by combining them for reporting and display
What is the problem with using the public Internet to connect remote offices and mobile users to an organization's private resources?
Easy to use when WAN does not connect, but not safe. Too risky. Lost functionality.
What are "Rogue APs" and why are they a problem?
Enterprise IT runs the network for your business, but an employee doesn't like the wireless there so they go to another store and get their own wireless router. The instant this is done all security and reliability that has been configured for the network can be hacked.
ERP
Enterprise Resource Planning System-paychecks, invoices, payments=business transactions/data to seek insights
What is operational data?
Operational Data is exactly what it sounds like - data that is produced by your organization's day to day operations. Things like customer, inventory, and purchase data fall into this category.
What customers are most likely to benefit from a service like Starlink?
Serving people in areas w/ no high speed internet service.
Ad-Hoc Reporting Tools
Tools that put users in control so that they can create custom reports on an as-needed basis by selecting fields, ranges, summary conditions, and other parameters.
What is TPS?
Transaction Processing Systems Examples: ATM, retail sales transactions, websites, searches
What is "information abundance"? What are its implications for knowledge workers?
Treasure side-The world has changed, jobs have changed. Some people not caught up to change. Geek up! Take advantage of opportunities.
What does an Ethernet Switch do?
What you use to connect a group of nodes together. Also has a speed capacity and management capabilities Port count - can buy at variety of port counts based on needs Limit of devices you can hook up
How do inconsistent data formats impact a business?
When bringing data together from different repositories, data formats can be inconsistent
PoE
PoE-Power Over Ethernet - Using unused ethernet wires for powering other devices
Benefits of using relational databases
-Combines and simplifies data
Cons of ad hoc reporting tools
-Demanding of user -Potentially steep learning curve -Business knowledge -Understand data schema
What are the 2 valid relationship types?
-Field must be found in 2 separate tables -Must be a key field in 1/2 of the tables
Dashboards
-Graphic view of what is happening inside the software system -Some customization -A picture is worth a thousand words
Business operations examples
-Health care patient data-Michigan tags cows at birth-Transportation: Plane engine produces 10tb of data every 30 min-Swiss Rails: 100 data items a second
Pros of OLAP
-Huge data -Pre-processed + Summarized -User reports fast
Cons of OLAP
-No access to details; user only sees summary
Characteristics of unstructured data
-Not organized-no schema -Ex: Text (email, facebook pages, news stories, etc) -Binary-images, audio, video
OLAP
-Online analytical processing the manipulation of information to create business intelligence in support of strategic decision making. -used for enormous amounts of data
Characteristics of structured data
-Organized: Rows and columns, structure -Predefined characteristics known as schemas: rules for organizing data (data type, data ranges)
MapReduce
-Programming model in Hadoop -Map: process input data in parallel -Reduce: combine data from Map to create final results
What technology was described that can protect against total server failure?
Clustering-When multiple servers share the same file storage
For legacy WiFi Lans (802.11b and 802.11g) What is the max bandwidth? What radio frequency spectra are used?
802.11b -Max bandwidth-11 mbps -Radio frequency-2.4 GHz 802.11g -Max bandwidth-54 mbps -Radio frequency-2.4 GHz
How well does WAN address remote office needs? What about mobile users?
A company has WAN for three offices- Florida, Texas, and Wisconsin. If CEO goes to Colorado for vacation, he cannot connect to WAN. Therefore, WAN works for remote office needs, but not mobile users.
Data Aggregator
A company whose sole job is to collect data from a wide variety of sources and organize it, clean it, and connect it to each other and then sell access to it to others. Example: Acxiom
What is a server? (i.e. what does it do?)
A computer whose job is to provide services, and share resources to other nodes on the network
Domain Registrar - what is it? why would you use one?
A person or entity who helps you to buy and register a domain. Helpfully, most registrars offer intuitive tools that help you search available names. On your part, you simply fill in the name of your choice and make payment. Important for a business to register a domain name to protect copyrights and trademarks, build creditability, increase brand awareness, and search engine positioning.
What is a Router? What does it do?
A router helps you connect multiple devices to the Internet, and connect the devices to each other. Also, you can use routers to create local networks of devices. These local networks are useful if you want to share files among devices or allow employees to share software tools. Also has security features like a firewall, VPN, access control
RAID 5
A technique that stripes data across three or more drives and uses parity checking, so that if one drive fails, the other drives can re-create the data stored on the failed drive. RAID 5 drives increase performance and provide fault tolerance. Windows calls these drives RAID-5 volumes.
What is Automated Data Tiering? How does it address these problems of data growth?
ADT-Match storage performance to access frequency-most accessible data is most relevant Current Working Data - Top Tier Storage (fastest, most high quality) - SSDs Recently Used - Mid Tier Storage - Hard Drives Historical - Bottom Tier Storage (cheapest/slowest) - Tape
What is AP?
AP-access point-Provides coverage to multiple users. Scatter them throughout the building and connect each access point unobtrusive, good coverage, many access points.
Be able to recognize each of the last mile technologies and describe it in very general terms (go back to lecture for this maybe-- this makes no sense)
Analog Modems- POTS (standard telephone lines) Cable Broadband-digital connection for Cable Tv; shared w/neighbors DSL-Digital Subscriber Line-telephone companies solution using existing telephone wires; works on limited distances, not for rural customers FTTH-Fiber To The Home Cellular Wireless Satellite Wireless
What is a "transaction"? What are its two key characteristics?
Any business exchange 1. Standardized-schema 2. Occurs repeatedly
Enterprise Software (3 different kinds)
Applications that address the needs of multiple users throughout an organization or work group.
How does VPN work?
Buy hardware with VPN capability Connect VPN to internet service providers VPN software installed on remote nodes Connects remote and mobile users
How do workstation NICs, Ethernet switches, and cables connect together to build a typical Ethernet LAN?
By buying another switch and connecting the two switches together with cable
Cash anonymous
Cash is always anonymous
Workstation
Client PCs that a human being uses to interact with the network and its resources
Client-Server service model
Client requests services, server provides Clear division of labor. Clients consume services and resources. Servers provide services, share resources. Servers are controlled and managed by IT to keep them secured, patched and efficient. Clients are user devices that could be BYOD or corporate supplied etc.
CRM
Customer Relation Management System Every sales call, every customer inquiry, every follow up call=data
Information
Data that has been presented in such a way that it answers questions or supports decision making
What is DeDupe? How does it address these problems?
DeDupe: Oftentimes, we have the same data repeatedly stored in our systems. -Match duplication in unstructured data Single storage for any data DeDuplication System-goes through the storage system and looks for unique images and keeps them. When discovers duplicates, it eliminates the data storage and replaces it with a pointer to the single copy
What is meant by the term "last mile"? Why should an organization care about the last mile?
Describes the portion of the telecommunications network chain that physically reaches the end-user's premises. The core of the internet is fast enough for todays needs, but when you get to the last mile where it enters a home or business, the speed plummets. o Why care? Bandwidth- how much you can fit in a certain period of time; decent amount is in megabits/second
Server
Device attached to the network whose primary purpose is to provide a service to users/workstations.
What is DAS? What problem does it solve?
Distributed Antenna Service used to improve wireless signals in an indoor or outdoor space, essentially anywhere with an obstructed signal. Install multiple small antenna inside of a building to boost the cellular signal inside of a building
DNS - What does the acronym mean? What does it do for us?
Domain Name System-DNS translates domain names to IP addresses so browsers can load Internet resources. A global distributed system of servers, software, and protocols that enable us to convert the billions of different host names into the appropriate IP numbers so that the computers and routers can get the work done. All we need to know is the name
Dark Data
Dormant data that is spread across servers on incompatible systems where it can not be turned into anything of value
What is the rate of data growth?
Doubling every 6 months...unprecedented and will continue regardless of budget restraints.
How do remote offices connect (VPN)?
Each office has a corporate router attached to service provider; router has VPN software built into it. All routers at different offices configured to be single VPN: seems as if they are on the same LAN. Encrypts data before sending it to other offices. Router can also send unencrypted information when going across internet.
Fiber Optic cable is non-conducting - why is that good?
Good for connecting buildings together during lightning strike - voltage will not ruin system
Top CIOs say that data growth is the #1 challenge today. What two problems arise from that challenge?
How are we going to handle that explosive growth with constrained budgets How are we going to exploit that data?-Process and present it to managers so they can make decisions
What is the "backhoe problem"? How do you protect your network against this problem?
If every network is connected by a single cable problems can arise where you lose connectivity. You want redundancy in lines, meaning you want multiple paths of connection out of the building so that if something happens to one cable it does not lose connectivity.
Knowledge
Insight derived from experience and enterprise-savvy information
IP
Internet Protocol. The main delivery system for information over the Internet; enables billions of devices on this planet to communicate at high speed around the world.
Satellite wireless - what is "latency"?
It is the amount of delay, measured in milliseconds (ms), that occurs in a round-trip data transmission to a satellite 23,000 miles away in space ½ second round trip, even at the speed of light because satellite is over 22,000 miles away.
Briefly explain the differences between Low-Earth Orbit (LEO) and Geosynchronous orbit
LEO-LEO satellites are much smaller and their orbits are much closer to earth, so the rockets needed to launch them are also smaller and cheaper. The downside with LEO satellites is that many are needed to cover any specific geographical area. LEO satellites orbit the Earth many times per day. Altitude of 160 to 2000 km Geosynchronous Orbit (GSO)-Stays in same place above Earth as Earth turns meaning that it takes one day to complete one orbit. Not many parking spots in orbit. GEO satellites are bigger and more expensive to deploy, the network operator can gradually add to their coverage as their business grows. Takes a long time to get data out there and back. 65x farther away than starlink.
LAN
Local Area Network; a geographic network that covers a relatively small geographic area such as a building or a small campus - no more than a mile distance between computers Ethernet (protocol) Physical - wires, radio waves MAC address Packet structure Rules for "speaking" and "listening" 802.11
MAC vs. IP addresses -- what do they do? how are they different?
MAC address-Media Access Control-unique number that identifies the network device IP address-address assigned uniquely to every device on the internet - LAN side (local network) MAC address-totally unique, tells nothing about where computer is located-does not change (example: SSN) IP-tells uniquely who is it and where are they-changes as you move from place to place (Mailing address)
For current 802.11ac, what is the max bandwidth and radio frequency spectra?
Max bandwidth-433-mbps-6.77gbps -Radio frequency-2.4 and 5 GHz
How do loyalty cards generate valuable data?
Membership program in which company is paying you through bonuses for data about you that you otherwise would not give them Company wants to know WHAT was sold to WHOM
Net Neutrality - what's the basic issue? Who is on each side of the issue and why?
NET NEUTRALITY IS the idea that internet service providers like Comcast and Verizon should treat all content flowing through their cables and cell towers equally. That means they shouldn't be able to slide some data into "fast lanes" while blocking or otherwise discriminating against other material. In other words, these companies shouldn't be able to block you from accessing a service like Skype, or slow down Netflix or Hulu, in order to encourage you to keep your cable package or buy a different video-streaming service.
What is a table?
Organized collection of data made up of records and fields
Key field
Part of relational database One of the fields in a table, data items are unique
Field
Part of table Column Attribute for data (fixed schema-textual data) Example: address
Record
Part of table Row of data individual observation
What common devices can interfere with WiFi networks? Which radio spectrum is affected?
Phones, Microwaves, etc can interfere because they use 2.4 GHz
Be familiar with the three examples of big data provided in the lecture. How do you see the Three V's in each?
Predictive Policing-Los Angeles Big Data is Cool! Tesco grocery chain Actions speak louder than words
PIG
Programming language of Hadoop
Two basic versions of RAID technology
RAID 1 RAID 5
Velocity
Rapid arrival=too fast. Cannot react fast enough. Feedback Loop-data comes in and we need to get it into a system and process it
Relational Database
Real power when we correlate data from multiple tables and link them together -multiple tables that are related
RAID - what risk does this protect against?
Redundant array of inexpensive disks Protects against hard drive failure
What is Cybersquatting?
Registering a domain [URL] that you have no rights to [ex: auburntigers.com]
Canned Reports
Reports that provide regular summaries of information in a predetermined format. -answer specific questions -easy for users -IT overhead
Point of Scale Systems
Retail computer systems that collect sales data and are hooked directly into the store's inventory-control system Scan barcode, transaction happens. Data.
How is a Data Mart different from a Data Warehouse?
Same thing, different scale- Looks at specific problem/unit rather than the enterprise
What can a company do about this problem?
Separate data repository -One for operational data -One for reporting and analytics Combine data from many sources-cleaning it Historical data-builds as months and days go by; used as a resource to see trends Periodic import from operational systems-allows analytical system to be up to date enough to come up with inferences
What are "data silos"? How do they come into being? Why is this a problem?
Silo-implying that data collections are completely separated with no possibility of communication or sharing -company may have some data that is trapped inside of obsolete legacy systems -incompatible systems Problem-Missed opportunities to see patterns, trends, correlations ,develop new insights to answer questions and make decisions.
What is an SSD? How does it address these problems of data growth?
Solid State Drives Storage - uses flash memory Faster than magnetic hard drives Latency-amount of time you have to wait for the data to spin to a place you can read it Greater throughput Lower power consumption-less electricity, cheaper, generate less heat=less AC RAID-use to link multiple together. Allows them to share the workload by spreading data among SSDs. Improved performance and capacity Prices dropping - viable alternative for many forms of corporate data
VPN (Virtual Private Network)
Solution to when WAN and internet cannot be used for business. All data is encrypted. Network that uses a public telecommunication infrastructure, such as the Internet, to provide remote offices or individual users with secure access to their organization's network. Affordable!
HDFS
Stands for Hadoop Distributed File System and is the way that Hadoop structures its files.
Briefly describe the Starlink service and the "constellation" they are building
Starlink is a plan by SpaceX to put 12,000 satellites (a constellation of satellites) into low Earth orbit (LEO) that offer high-speed, low-latency, cheap internet access to anyone anywhere on the planet. Unlimited data for cheap price-beta out right now known as "Better Than Nothing." Placed 340 miles up.
What is SQL?
Structured Query Language Most common language for creating & manipulating databases Ruling champion database in business world
SCM
Supply Chain Management System- Each order for finished goods/raw materials=transactions
Sources of customer-provided data
Surveys -customer survers -product registration cards -contests External sources -General info (weather, news) -Public Records
What is a "site survey"? Why is it important?
Take wireless devices and set them up in temp locations then take a sensor and walk around the site to see where signal levels are weak and strong and then tweak the locations till there is good coverage everywhere. Important for business instillations, large sites
Hadoop
Technically...Is an open source system designed to consume any data you want (unstructured, structured, etc). Distributing computing platform Practically...Highly scalable, open source, cost-effective, flexible, fault-tolerant
Bandwidth
The amount of data that can be transmitted over a network in a given amount of time. (bits per second)
For remote office and mobile user VPNs, identify where on the network diagram the data is encrypted.
The data is encrypted immediately after being sent from remote office/mobile user VPN.
"CAT" ratings for Ethernet cables
The number tells you specifically how the cable has been engineered and how safely it can transmit data. CAT 5 is the minimum remotely accessible quality
Packet
The small unit into which information is broken down before being sent across a network.
What guidance did Mr. Olson offer about WiFi range?
They are radio waves they don't like things that get in their way like walls. Maybe 100 feet indoors. More outdoors
What is "information overload"? What is its alleged impact?
Tumult side 900 billion cost to economy "Drinking from a fire hydrant"
RAID 1 (mirroring)
Two drives are used in unison, and all data is written to both drives, giving you a mirror or extra copy of the data, in the case that one drive fails
URL - what are the component parts and what does each do for you?
Uniform resource locator -tells our web browser to tell our software what it is we are looking for -Way that we can tell where any resource is on the internet Components http://www.nytimes.com/tech/index.html Application transfer protocal- tells software what data is (postcast, webpage, video, etc) {http} Hostname-{www.} Name of server that has data Domain name-{nytimes} Tells what organization on the network owns that host; consists of two or three partts Top level domain- {.com} fixed Path- Tells where within server does piece of data live-folder in file system {tech} File- Specific piece of content {index.html} (case sensitive)
UPS - what risk does this protect against? how does it work?
Uninterrupted Power Supply -Protects against Power Outages -Battery that has electronics in it being constantly charged from the electricity in the wall. Clean filtered power is then provided to the servers. If power is gone all together, the batteries provide the power needed to keep the servers running.
Copper UTP
Unshielded Twisted Pair-Cable contains 4 sets of 2 wires each-each pair is twisted around each other in different frequency so that the signals going down one pair do not mess with the signals going down the other pair -Startolobolgy - each cable goes to one node -Distance- cables can be up to 90m in wall, 10m to equipment (short runs) -Quality- "cat rating" - tells us how cable was engineered and how fast it can safely transmit data -Installation-not diy. Professional installation w/ test results for every single cable
What is the single biggest cause of data loss? How do you protect against that risk?
User error. Back everything up. Data and image backup. Image is a snapshot of the entire contents of the hard drive so that if the server crashes you can restore the server image to a previous image in time. Back up systems- tape, disk, software
Pros of ad hoc reporting tools
Users define their own resorts Powerful/flexible
Fiber Optic Cabling
Uses glass or plastic fiber to carry information as light pulses Long Runs - connecting buildings together, etc Multimode fiber: up to 550 meters Single mode fiber 5km to 40km Made out of glass- does not conduct electricity -Network Infrastructure - between buildings, closets
Variety
Variety too great; too little consistency: Text...Images...Sound...Video...Human input...sensors...servers...
What three characteristics are necessary for something to be "Big Data"? (three V's)
Volume, Velocity, Variety
How do mobile users connect to a VPN?
We do not trust alternate routers/internet. Install VPN software on mobile devices (CEO's laptop). VPN then encrypts data and then sends to hotel internet and then to corporate router in another state and then decrypts info.
WAN-What is it and how is it constructed?
Wide Area Network- dedicated private data circuits that stretch from state to state to different offices. Redundant, high speed, expensive. Does not work for mobile users, only remote offices.
WAN
Wide Area Network; largest type of network in terms of geographic area; largest WAN is the Internet
Ethernet
a system for connecting a number of computer systems to form a local area network, with protocols to control the passing of information and to avoid simultaneous transmission by two or more systems.
Analytics
a term describing the extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions -Base decisions on data and analysis
Peer-to-Peer service model
all machines are equal (considered servers and clients at the same time) Everyone is a client and a server. Every computer could share its resources and provide services with the others. Hard to manage especially as you scale up. Reduced security and reliability.
Node
any device connected to a network
How does the analysis of operational data compete with customers?
delays and lost sales due to significant amount of additional load to the system during business hours (best if we do not query operational data)
Pros of Canned Reports
easy + useful
Cons of Canned Reports
inflexible + IT overhead
Volume
notion data is "too big" to be analyzed with traditional methods (hundreds and millions of data items)
Satellite wireless
radio transmission systems in space
Data
raw facts and figures-tells you nothing alone-very valuable, but needs to be turned into information. Data integrity is key and you must understand your data schema
Data Mining
the process of analyzing data to extract information not offered by the raw data alone -Enormous historical datasets -Identify patterns -Build Models -Predict Future