BCIS unit 3 & 4
Primary Key
a field or set of fields that uniquely identifies the record
Hadoop two primary components
- A data processing component (MapReduce) - A distributed file system (Hadoop Distributed File System, HDFS)
In-memory database (IMDB)
- A database management system that stores the entire database in random access memory (RAM) - Provides access to data at rates much faster than storing data on some form of secondary storage - Enables the analysis of big data and other challenging data-processing applications - Performs best on multiple multicore CPUs
Private Cloud Computing
- A single tenant cloud - Organization often implements due to concerns that their data will not be secure in a public cloud - Can be divided into two types: On-Premise and Service provider managed
Wi-Fi
- A wireless telecommunications technology brand owned by the Wi-Fi Alliance - Employs a wireless access point (a transmitter with an antenna) that receives the signal and decodes it; translates signals into a radio signal and sends it to devices wireless adapter - Users device has a wireless adapter that translates data into a radio signal and transmits it using an antenna
Three most common network topologies
- star network - bus network - mesh network
Software Defined Networking (SDN)
- An emerging approach to networking - Allows network administrators to have programmable central control of the network via a controller without requiring physical access to all the network devices. - Google is implementing Andromeda- the underlying SDN architecture that will enable Google's cloud computing services to scale better, more cheaply, and more quickly
ACID properties
- Atomicity: all changes to data are performed as if they are a single operation. - Consistency: data is in a consistent state when a transaction starts and when it ends (everything adds up) - Isolation: the intermediate state of a transaction is invisible to other transactions, all transactions are separate. - Durability: After successful transaction, no changes will be undone.
Affiliation IDs and their affiliations
- Biz: business sites - Com: All types of entities including nonprofits, schools, and private individuals (ex: ".com" googles domain name is google.com meaning their affiliation ID is Com) - Edu: Post-secondary educational sites - Gov: Government sites - Net: Networking sites - Org: Nonprofit organization sites
considerations when building a database
- Content: what data should be collected? cost? - Access: what data should be provided to which users and when? - Logical structure: how should data be arranged so that it makes sense? - Physical organization: where should data be physically located? - Archiving: how long to store? - Security: how can data be protected?
Two broad categories of communications media
- Guided (Wired) transmission media - Wireless
Challenges of Big Data
- How to choose what subset of the data to store - Where and how to store the data - How to find the nuggets of data that are relevant to the decision making at hand - How to derive value from the relevant data - How to identify which data needs to be protected from unauthorized access
Examples of big data
- Retail organizations monitor social networks to engage brand advocates, identify brand adversaries - Advertising and marketing agencies track comments on social media - Hospitals analyze medical data and patient records - Consumer product companies monitor social networks to gain insight into consumer behavior - Financial service organizations use data to identify customers who are likely to be attracted to increasingly targeted and sophisticated offers
Accessing the Internet
- There are several ways, including using a LAN Server, telephone lines, a high-speed service, or a wireless network - Dial-up internet connection uses modem and standard phone line - Other options include cable modem connections, DSL connections, and satellite connections
Guided transmission media types
- Twisted-pair wire - Coaxial cable - Fiber-optic cable
NoSQL database advantages
- ability to spread data over multiple servers so that each server contains only a subset of the total data - do not require a predefined schema - data structures are more flexible and can provide improved access speed and redundancy
Types of IoT applications
- connect and monitor - control and react - predict and adapt - transform and explore
Data management factors
- the need to meet external regulations designed to manage risk associated with financial misstatement - the need to avoid the inadvertent release of sensitive data - the need to ensure that high data quality is available for key decisions
data governance requires business leadership and active participation
- use of a cross-functional tea is recommended - team should consist of executives, project managers, line-of-business managers, and data stewards - a data steward is an individual responsible for management of critical data elements
ELT process
-Extract -Transform -Load
How the Internet of Things (IoT) works
1. Sensors gather data 2. Data passes over network 3. Data from across the IoT is gathered and stored- often in the cloud 4. Data is combined with other data from other systems 5. Data is analyzed to gain insights into operation of devices on IoT 6. Alerts sent to people, Enterprise systems, or IoT devices based on these insights
IP address
A 64-bit number that identifies a computer on the Internet
Uniform Resource Locator (URL)
A Web address that specifies the exact location of a Web page using letters and words that map to an IP address and a host location
HTML Tag
A code that tells the Web browser how to format text—as a heading, as a list, or as body text—and whether images, sound, and other elements should be inserted ex- <p style= "text-align-center"> gives paragraph format
Data Definition Language (DDL)
A collection of instructions and commands used to define and describe data and relationships in a specific database. allows the databases creator to describe data and relationships that are to be contained in the schema.
Cloud Computing
A computing environment where software and storage are provided as an Internet service and are accessed with a Web browser. - advantages to businesses: they can save on system design, installation, and maintenance. Increased efficiency and reduce the costs of new product and service launches. Employees can access corporate systems from any internet-connected computer.
Client
any computer that sends messages requesting services from the servers on the network
data warehouse
A large database that collects business information from many sources in the enterprise, covering all aspects of the company's processes, products, and customers, in support of management decision making. helps relate information in innovative ways.
Bus Network
A network in which all network devices are connected to a common backbone that serves as a shared communications medium. Long backbone with network nodes branching off of it.
Star Network
A network in which all network devices connect to one another through a single central device called the hub node. hub node in the central with network nodes branching out from it.
Internet of Things (IoT)
A network of physical objects or "things" embedded with sensors, processors, software, and network connectivity capability to enable them to exchange data with the manufacturer of the device, device operators, and other connected devices.
Mesh Network
A network that uses multiple access points to link a series of devices that speak to each other to form a network connection across a large area. Lots of communicating network nodes.
Data Lifecycle Management (DLM)
A policy-based approach to managing the flow of an enterprise's data, from its initial acquisition or creation and storage to the time when it becomes outdated and is deleted.
JavaScript
A popular programming language for client-side applications Use to create Web pages that respond to user actions
Search Engine Optimization (SEO)
A process for driving traffic to a Web site by using techniques that improve the site's ranking in search results.
Public Cloud Computing
A service provider owns and manages the infrastructure with cloud user organizations (tenants) accessing slices of shared hardware resource via the Internet - Can be a faster, cheaper, and more agile approach to building and managing your own IT infrastructure - data security is a key concern because when using a public cloud computing service, you are relying on someone else to safeguard your data
Hadoop
An open-source software framework that includes several software modules that provide a means for storing and processing extremely large data sets. Can be used as a staging area for data to be loaded into a data warehouse or data mart.
Data Manipulation Language (DML)
A specific language, provided with a DBMS, which allows users to access and modify the data, to make queries, and to generate reports.
Bluetooth
A wireless communications specification that describes how cell phones, computers, personal digital assistants, etc., can be interconnected
Other client-side programming languages include:
ASP.NET, C, C++, Perl, PHP, and Python
Popular tools for creating Web pages and managing Web sites
Adobe Dreamweaver, RapidWeaver (for Mac developers), and Nvu
data management
An integrated set of functions that defines the processes by which data is obtained, certified fit for use, stored, secured, and processed in such a way as to ensure that the accessibility, reliability, and timeliness of the data meet the needs of the data users within an organization.
Java
An object-oriented programming language from Sun Microsystems based on C++ Allows small programs (applets) to be embedded within an HTML document *can be used on any computer*
ARPANET (Advanced Research Projects Agency Network)
Ancestor of the internet Project started by the U.S. Department of Defense (DoD) in 1969
Internet Service Provider (ISP)
Any organization that provides Internet access to people.
Examples of using sensors and the IoT to monitor and control key operational activities
Asset monitoring Construction Agriculture Manufacturing Monitoring parking spaces Predictive Maintenance Retailing Traffic monitoring
Amazon Web Services (AWS)
Basic infrastructure that Amazon employs to make the contents of its online catalog available to other Web sites or software applications
HTML file
CSS file (fonts, colors, layout) + XML file (content) that makeup the look of a web page
On-premise private cloud
Cloud infrastructure is deployed by an organization on its data centers within its premise Provides complete control over the infrastructure and data. Enables standardization of IT resources, processes, and services
Connecting via LAN server
Connection method of businesses and organizations that manage a local area network (LAN)
The World Wide Web (Web)
Consists of server and client software, the hypertext transfer protocol (http), standards, and markup languages that combine to deliver information and services over the Internet
Sources of an organizations useful data
Documents, Data from business apps, Social media, Sensor data, Media, Machine log data, Public data, and Archives that make up the organizations Big Data.
traditional approach to data management
Each distinct operational system used data files dedicated to that system
Hyperlink
Highlighted text or graphics in a Web document that, when clicked, opens a new Web page containing related content. - Using these, Web users can jump between Web Pages stored on various Web servers, creating the illusion of interacting with one big computer
database approach to data management
Information systems share a pool of related data Offers the ability to share data and information resources A database management system (DBMS) is required
instant messaging
The online, real-time communication between two or more people who are connected via the Internet.
wireless connection
Internet service over cellular and Wi-Fi networks has become common
predict and adapt
IoT application - Degree of sensing: External data is used to augment sensor data - Degree of action: Data used to perform predictive analysis and initiate preemptive action
Control and react
IoT application - Degree of sensing: Individual devices each gathering a small amount of data - Degree of action: Automatic monitoring combined with remote control with trend analysis and reporting
Transform and explore
IoT application - Degree of sensing: Sensor and external data used to provide new insights - Degree of action: New business models, products, and services are created
connect and monitor
IoT application - Degree of sensing: individual devices each gathering a small amount of data - Degree of Action: Enables manual monitoring using simple threshold-based exception alerting
mobile device management (MDM) software
Manages and troubleshoots mobile devices remotely, pushing out applications, data, patches and settings A central control group can maintain group policies for security, control system settings, ensure malware protection is in place for mobile devices used across the network, and make it mandatory to use passwords to access the network
relational DBMS for individuals and workgroups
Microsoft Access, IBM Lotus Approach, Google Base, OpenOffice Base
Open source relational DBMS
MySQL, PostgreSQL, MariaDB, SQL Lite, CouchDB
The internet and the web have provided an online access to:
News, Education and training, Job information, messaging, Conferencing, blogging, podcasting, vlogging, media and entertainment, music, TV, Games, shopping, and Maps
Internet backbone
One of the Internet's high-speed, long-distance communications links.
relational DBMS for workgroups and enterprise
Oracle, IBM DB2, Sybase Adaptive Server, Teradata, Microsoft SQL Server, Progress OpenEdge
Network Management Software
Protects software from being copied, modified, or downloaded illegally Locates telecommunications errors and potential network problems
NoSQL database
Provides a means to store and retrieve data that is modeled using some means other than the simple two-dimensional tabular relations used in relational databases
Database Activities
Providing a user view of the database Adding and modifying data Storing and retrieving data Manipulating the data and generating reports
Database Administrator (DBA)
Skilled and trained IS professionals - work with users to define their data needs - apply database programming languages to craft a set of databases to meet those needs - test and evaluate databases - implement changes to improve their databases performance - assure that data is secure from unauthorized access
the Social Web
Social networking Web sites enable users to share information about themselves and to find, meet, and converse with others
Internet Censorship
Some countries try to control Internet content and services
Web 2.0
The Web as a computing platform that supports software applications and the sharing of information among users.
Autonomic computing
The ability of IT systems to manage themselves and adapt to changes in the computing environment, business policies, and operating objectives. - Goal: To create complex systems that run themselves, while keeping the systems complexity invisible to the end user - Addresses four key functions: Self-Configuring, Self-healing, Self-Optimizing, and Self-Protecting
Hotspot
The area covered by one or more interconnected wireless access points
Database as a Service (DaaS)
The database is stored on a service provider's servers The database is accessed by the client over a network, typically the Internet Database administration is handled by the service provider ex- Amazon Relational Database Service (Amazon RDS)
The internet size and impact
The internet is international in scope with users on every continent. Internet sites have a profound impact on world politics The number of worldwide internet users is expected to continue growing
Extensible Markup Language (XML)
The markup language designed to transport and store data on the Web. - The key to Web services - Used within a Web page to describe and transfer data between Web service applications
channel bandwidth
The rate at which data is exchanged, usually measured in bits per second (bps)
network topology
The shape or structure of a network, including the arrangement of the communication links and hardware devices on the network.
Hypertext Markup Language (HTML)
The standard page description language for Web pages. - tells the browser how to display font characteristics, paragraph formatting, page layout, image placement, hyperlinks, and the content of a web page
Storing and retrieving data
When an application program needs data it requests the data through the DBMS Requesting the data through the DBMS is a process called querying Concurrency controls deals with the situation in which two or more users or applications need to access the same record at the same time.
Wireless Technologies
Wireless transmission involves the broadcast of communications in one of three frequency ranges Radio, microwave, or infrared frequencies. In some cases, use of wireless communications is regulated meaning the signal must be broadcast within a specific frequency range to avoid interference with other wireless transmissions.
Radio frequency
Wireless transmission operating in the 3KHz-300MHz range - Advantages: Supports mobile users; costs are dropping - Disadvantages: Signal is highly susceptible to interception
microwave (terrestrial and satellite) frequency range
Wireless transmission operating with a high frequency radio signal (300MHz-300GHz) sent through the atmosphere and space (often involves communications satellites) - Advantages: Avoids cost and effort to lay cable or wires; capable of high-speed transmission - Disadvantages: Must have unobstructed line of sight between sender and receiver; signal is highly susceptible to interception Common forms of satellite communications- - Geostationary satellite - Low earth orbit (LEO) satellite
Infrared frequency range
Wireless transmission that signals in the 300GHz-400THz frequency range - Advantages: lets you move, remove, and install devices without expensive wiring - Disadvantages: Must have unobstructed line of sight between sender and receiver; transmission is effective only for short distances
Connection via internet service providers
You must have an account with the service provider along with software and devices that support a connection via TCP/IP
a bit
a binary digit that represents a circuit that is either on or off
attribute
a characteristic of an entity
Web site
a collection of pages on one particular topic, accessed under one Web domain
record
a collection of related data fields
file
a collection of related records
Schema
a description of the entire database. can be part of the database or a separate schema file. DBMS can reference a schema to find where to access the requested data in relation to another piece of data.
data dictionary
a detailed description of all the data used in the database - Can also include a description of data flows, information about the way records are organized, and the data-processing requirements
Sensor
a device that is capable of sensing something about its surroundings such as pressure, temperature, humidity, pH level, motion, vibration, or level of light
data model
a diagram of data entities and their relationships
Cascading Style Sheets (CSS)
a file or portion of an HTML file that defines the visual appearance of content in a Web page - uses special HTML tags to globally define characteristics for a variety of page elements as well as how those elements are laid out on the Web page
Database Management System (DBMS)
a group of programs that manipulate the database and provide an interface between the database and its users and other application programs. can produce a wide variety of documents, reports, and other output that can help organizations achieve their goals.
field
a name, number, or combination of characters that describes an aspect of a business object or activity
Extranet
a network based on Web technologies that links resources of a company's intranet with its customers, suppliers, or other business partners
Data Administrator (DA)
a nontechnical position responsible for defining and implementing consistent principles for a variety of data issues including setting data standards and data definitions that apply across all the databases in an organization. can be a high-level position reporting to top-level managers
entity
a person, place, or thing for which data is collected, stored, and maintained
broadband communications
a relative term; a telecommunications system that can transmit data very quickly
Virtual Private Network (VPN)
a secure connection between two points across the Internet
relational model
a simple but highly useful way to organize data into collections of two-dimensional tables called relations. each row in the table represents an entity and each column represents an attribute of that entity. if relations share at least one common attribute they can be linked to provide useful information.
SQL (Structured Query Language)
a special-purpose programming language for accessing and manipulating data stored in a relational database
data mart
a subset of a data warehouse that is used by small- and medium-sized businesses and departments within large companies to support decision making - a specific area in the data mart might contain greater detailed data than the data warehouse
Near Field Communication (NFC)
a very short-range wireless connectivity technology designed for consumer electronics, cell phones, and credit cards
query by example (QBE)
a visual approach to developing database queries or requests
in 1986 SQL was
adopted by ANSI as the standard query language for relational databases.
Intranet
an internal corporate network built using internet and world wide web standards and technologies
Database
an organized collection of data. a collection of integrated and related files.
hierarchy of data
bits, characters, fields, records, files, and databases
Service provider managed private cloud
built and managed by a service provider, which provides guaranteed security of cloud information
manipulating data by joining
combining two or more tables
manipulating data by linking
combining two or more tables through common data attributes to form a new table with only the unique data attributes
satellite transmission
communications satellites are relay stations that receive signals from one earth station and rebroadcast them to another
Hybrid Cloud Computing
composed of both private & public clouds integrated through networking - Organizations typically use the public cloud to run applications with less sensitive security requirements, and run more critical applications on the private portion of the cloud
SQL databases
conform to ACID properties
Local Area Network (LAN)
connects computer systems and devices within a small area such as an office or a home
Wide Area Network (WAN)
connects large geographic regions Consists of- - computer equipment owned by the user - data communications equipment and telecommunications links provided by various carriers and service providers Communications may involve transborder data flow
Metropolitan Area Network (MAN)
connects users and their devices in an area that spans a campus or city
Routing messages over the internet
data is transmitted from one host computer to another on the internet
Enterprise data modeling
data modeling done at the level of the entire enterprise. provides a roadmap for building databases and information systems.
entity-relationship (ER) diagrams
data models that use basic graphical symbols to show the organization of and relationships between data. help ensure that the logical structure of application programs is consistent with the data relationships in the database.
Data governance
defines the roles, responsibilities, and processes for ensuring that data can be trusted and used by the entire organization
manipulating data by projecting
eliminating columns in a table
manipulating data by selecting
eliminating rows according to certain criteria
Internet Protocol (IP)
enables computers to route communications traffic from one network to another
Big Data
extremely large and complex data collections - traditional data management software, hardware, and analysis processes are incapable of dealing with them. Three characteristics: Volume, Velocity, and Variety
client/server architecture
features multiple computer platforms dedicated to special functions, e.g., database management, printing, or communications
DBMS Front-end applications
interact directly with people
coaxial cable
guided transmission media type with inner conductor wire surrounded by insulation. advantages: cleaner and faster data transmission than twisted-pair wire disadvantages: more expensive than twisted-pair
fiber-optic cable
guided transmission media type with many extremely thin strands of glass bound together in a sheathing; uses light beams to transmit signals. advantages: diameter of cable is much smaller than coaxial, less distortion of signal, capable of high transmission rates disadvantages: expensive to purchase and install.
Twisted-Pair wire
guided transmission media type with twisted pairs of copper wire, shielded or unshielded, used for telephone service. advantages: widely available disadvantages: limitations on transmission speed and distance
Data Validation
identifying bad data and rejecting it at the time of data entry.
search engine
information on the web is found by specifying keywords - the market is dominated by Google
DBMS Back-end applications
interact with other programs or applications ex- The Library of Congress (LOC) provides a back-end application that allows Web access to its databases, which include references to books and digital media in the LOC collection.
a byte
made up of 8 bits. each one represents a character.
Microsoft's .NET platform
product that allows developers to use various programming languages to create and run programs - many other products make it easy to develop Web content and interconnect Web services as well
domain
range of allowable values for a data attribute
Internet Corporation for Assigned Names and Numbers (ICANN)
responsible for managing IP addresses and Internet domain names - domain names must adhere to strict rules
Database Server
sends only the data that meets a specific query—not the entire file
IP Protocol
set of rules used to pass packets from one host to another
Guided (wired) transmission media
signals are guided along a solid medium
Personal Area Network (PAN)
supports the interconnection of information technology close to one person. Personal and private accounts.
network operating system (NOS)
systems software that controls the computer systems and devices on a network ex- Linux, UNIX, Windows Server, and Mac OS X
Data lake
takes a "store everything" approach to big data, saving all the data in its raw and unaltered form - also called an enterprise data hub - raw data is available when users decide just how they want to use the data - only when the data is accessed for a specific analysis is it extracted from the data lake
Computer Network
the communications media, devices, and software needed to connect two or more computer systems or devices. Organizations can use networks to share hardware, programs, and databases.
Network nodes
the computers and devices on the networks
The internet
the infrastructure on which the Web exists - Made up of computers, network hardware such as routers and fiber-optic cables, software, and the TCP/IP protocols
Tunneling
the process by which VPNs transfer information by encapsulating traffic in IP packets over the Internet
Data Cleansing
the process of detecting and then correcting or deleting incomplete, incorrect, inaccurate, irrelevant records that reside in a database. also called data cleaning or scrubbing. the cost can be quite high.
wireless
the signal is broadcast over airwaves as a form of electromagnetic radiation
data item
the specific value of an attribute
Transmission Control Protocol (TCP)
the widely used transport layer protocol that most internet applications use with IP
Web Browser
web client software used to view web pages ex: internet explorer, firefox, chrome, and safari