DSS Chapter 6
database server
A computer in a client/server environment that is responsible for running a DBMS to process SQL statements and perform database management tasks
database administration
Database design and management group responsible for defining and organizing the structure and content of the database, and maintaining the database. large companies also use this
Data Mining
Finds hidden patterns and relationships in large databases and infers rules from them to predict future behavior discovery driven
Databases and the Web 1. web server 2. Application servers of CGI 3. Database server
Firms use the web to make information from their internal databases available to customers and partners. What makes it possible Web interfaces provide familiarity to users and savings over redesigning legacy systems.
-redundant and inconsistent data produced by multiple systems -data input errors
How are Data quality problems caused
Amazon Relational Database Service
Offers My S Q L, Microsoft S Q L Server, Oracle Database engines
Software for database querying and reporting Multidimensional data analysis (O L A P) Data mining
Once data is gathered, tools are required for consolidating, analyzing, to use insights to improve decision making
byte
a group of bits, represents a single character, which can be a letter, a number, or another symbol
Data quality audit
a survey and/or sample of files to determine accuracy and completeness of data in an information system can survey entire data files, samples from data files, end users for their perceptions of data quality
data dictionary
automated or manual file storing definitions of data elements and their characteristics
project
creates a subset consisting of columns in a table permits user to create new tables containing only desired information
select
creates a subset of all records meeting stated criteria
clustering
discovering as yet unclassified groupings
querying and reporting
1. Data Manipulation a.Structured query language (S Q L) b. Microsoft Access query-building tools 2.Report generation, e.g., Crystal Reports
relational database
1. minimizing the number of times a piece of information appears in our database does these things a. reduces the possibility of error b. simplifies the process of updating the database
key field
A field in a record that uniquely identifies instances of that record so that it can be retrieved, updated, or sorted
field
A grouping of characters into a word, a group of words, or a complete number, such as a person's name or age.
Common Gateway Interface (CGI)
A specification for processing data on a web server, application server
1. customer behavior 2. weather patterns
Big datasets offer more patterns and insights than smaller datasets....bc (2 examples)
1. Data Warehouse 2. Data marts 3. Hadoop 4. In-memory computing 5. Analytical platforms
Business Intelligence Infrastructure -Array of tools for obtaining useful information from internal and external systems and big data
join
Combines relational tables to present the server with more information than is available from individual tables
big data
Data sets with volumes so huge that they are beyond the ability of typical relational DBMS to capture, store, and analyze. The data are often unstructured or semi-structured. Massive quantities of unstructured and semi-structured data from Internet and more
Non-relational database management systems
Database management system for working with large quantities of structured and unstructured data that would be difficult to analyze with a relational model. "No S Q L" Handle large data sets of data that are not easily organized into tables, columns, and rows Use more flexible data model Don't require extensive structuring Can manage unstructured data, such as social media and graphics E.g. Amazon's Simple D B, MetLife's Mongo D B
Data Warehourse
Database that stores current and historical data that may be of interest to decision makers Consolidates and standardizes data from many systems, operational and transactional databases Data can be accessed but not altered
web mining
Discovery and analysis of useful patterns and information from the web-E.g. to understand customer behavior, evaluate website, quantify success of marketing Content mining - mines content of websites Structure mining - mines website structural elements, such as links Usage mining - mines user interaction data gathered by web servers
Sentiment analysis
Mines online text comments online or in email to measure customer sentiment
Hadoop
Open-source software framework for big data Breaks data task into sub-problems and distributes the processing to many inexpensive computer processing nodes then combines the result into a smaller data set that is easier to analyze. Key services Hadoop Distributed File System (H D F S) MapReduce You probably have used this to find the best airfare on the internet, get directions, do a search on google, or connect of FB
Analytic Platforms
Preconfigured hardware-software systems Designed for query processing and analytics Use both relational and non-relational technology to analyze large data sets Include in-memory systems, No SQL DBMS E.g. I B M Pure Data System for Analytics-query for processing and analytics Integrated database, server, storage components Data lakes
Pricing based on usage Appeal to small or medium-sized businesses
Relational database engines provided by cloud computing services
in-memory computing
Relies on computer's main memory (RAM) for data storage users access data stored in system's primary memory by Eliminates bottlenecks in retrieving and reading data Dramatically shortens query response times Enabled by high-speed processors, multicore processing Lowers processing costs
Data administration
Responsible for specific policies and procedures through which data can be managed as a resource in large organization you need his these responsibilities include developing information policy, planning for data, overseeing logical database design, and data dictionary development, and monitoring how information system's specialists and end-user groups use data
database management systems (DBMS)
Software for creating, storing, organizing, and accessing data from a database Separates the logical and physical views of the data Logical view: how end users view data Physical view: how data are actually structured and organized Examples: Microsoft Access, D B 2, Oracle Database, Microsoft S Q L Server, My S Q L
attributes
Specific characteristics of each entity: SUPPLIER(entity) name, address (attributes) PART description, unit price, supplier
normalization
Streamlining complex groupings of data to minimize redundant data elements and awkward many-to-many relationships the process of creating, small, stable data structures from complex groups of data when designing a relational database
text mining
Unstructured data (mostly text files) accounts for 80 percent of an organization's useful information. Text mining allows businesses to extract key elements from, discover patterns in, and summarize large unstructured data sets. when businesses want to turn to this for analyzing calls to customer.
1. Associations 2. Sequences 3. Classifications 4. Clustering 5. Forecasting
What are the types of information obtainable from data mining
database
a collection of related files containing records on people, places, or things are at the heart of information systems because they keep track of the people, places, and things that a business must deal with on a continuing, often instant basis
record
a group of related fields, such as a student's identification number (ID), the course taken, the date, and the grade
file
a group records of the same type
Data Manipulation Language (DML)
a language associated with a database management system that end users and programmers use to manipulate data in the database used to add, change, delete, and retrieve the data in the database contains commands that permit end users and programming specialists to extract data from the database to satisfy information requests and develop applications
data lake
a repository(a place where things are stored) for raw unstructured data or structured data that for the most part have not yet been analyzed and the data can be accessed in many ways form from large types of analytic platform
query
a request for information from a database
data cleansing
also known as data scrubbing. consists of activities for detecting and correcting data in a database that are incorrect, incomplete, improperly formatted, or redundant enforces consistency
3Vs 1. extreme volume 2. wide variety 3. velocity
big Data are characterized by the
distributed databases
databases spread stored in multiple physical locations Stored in multiple physical locations Google's Spanner cloud service
referential integrity rules
ensure the relationships between coupled tables remain consistent
sequences
events linked over time
columns
fields in a relational database are also called
entity
generalized category representing person, place or thing on which we store information
foreign key
is essentially a lookup field to find data about the supplier of a specific part
Structured Query Language (SQL)
most prominent data manipulation language today the standard data manipulation language for relational database management systems
associations
occurrences linked to single event
select join project
operations of relational DBMS
relational database
organizes data into two two-dimensional tables (relations) with columns and rows most common type of database today it can relate data stored in one table to data in another as long as the two table share a common data element
classifications
patterns describing a group an item belongs to
One-to-one relationship One-to-many relationship Many-to-many relationship Requires "join table" or intersection relation that links the two tables to join information
relationship database tables may have:
bit
represents the smallest unit of data a computer can handle
tuples
rows or records in a relational database
report generator
software designed to take data from a source such as a database and use the data to produce a report in a polished format
data defintion
specifies the structure of the content of a database
information policy
stats organization's rules for organizing, managing, storing, sharing information
data mart
subset of a data warehouse (smaller and decentralized) that is highly focused and isolated for a specific population of users
Online Analytical Processing (OLAP)
supports multidimensional data analysis, enabling users to view the same data in different ways using multiple dimensions Each aspect of information—product, pricing, cost, region, or time period—represents a different dimension E.g., comparing sales in East in June versus May and July Enables users to obtain online answers to ad hoc questions such as these in a fairly rapid amount of time
row
the actual information about a single supplier that resides in a table, separate records or tuples
primary key
unique identifier for all the information in any row of a database table cannot be duplicated
Entity Relationship Diagram
used to clarify table relationships in a relational database
entity-relationship diagram
used to clarify table relationships in a relational database
forecasting
uses series of values to forecast future values