TEC 200 Exam 2 - Database
Three Approaches to Database Management
--Centralized Database ALL data stored in a central repository Data kept in the same location is easier to change and backup Updates to any given set of data are immediately available. Represents a single point of falure --Decentralized Database A collection of independent but complete databases Data is available in different locations Faster access at each site More difficult to keep data consistent among all databases and produce enterprise reports A network connects to synchronize data --Distributed Database A single logical database that is spread physically across computers in multiple locations that are connected by a communications link Database is Fragmented and Partitioned Each "location partition" has only data it needs Synchronize different parts with full database to ensure integrity
Database Management Systems (DBMS)
-Database: repository for collections of related facts or data. It is an organized collection of information -DBMS is the software that allows multiple users to use a database sort, list, search, organize, print, etc.
Database Anatomy
-database is a collection of one or more database files -file is a collection of related information (records)
Report Generators
-present data in an organized and easily understood manner Often reports are based on queries Generate totals and subtotals ◦ a report with subtotals is often called a control break report and must be run on sorted data Reports can be quite varied ◦ bills, invoices, annual reports, etc.
Databases - Created with Logical Structures
-row numbers are irrelevant -columns have logical names such as Order Date and Customer Number
Modeling Relationships
A Relational Database can model THREE types of relationships: -One to One: each row of Table A matches one and only one row from Table B A 1:1 relationship typically indicates that you have partitioned a larger entity for some reason. For example a student record may have a table for their local data and another that stores their "home" information -One to Many: Each row of Table A, matches many rows of Table B A Customer may have many Orders, Customer number would be the PK and it would appear in the Orders table as a Foreign Key to match the Order to the correct customer. -Many to Many: each row of Table A matches many rows of Table B and many rows of Table B match many rows of Table A A M:M relationship cannot be directly modeled, you must create a composite entity that contains the PKs of each table that will have a 1:M relationship with each original table.
Data Marts
A data mart: simple form of a data warehouse that is focused on a single subject (or functional area), such as Sales, Finance, or Marketing. Data marts are often built and controlled by a single department within an organization Data mart ◦ Smaller version of data warehouse ◦ Used by single department or function Advantages over data warehouses ◦ Easier to work with ◦ More limited scope than data warehouses
DBMS Advantages/Disadvantages
A well-designed relational database management system has several advantages ◦ Reduced information redundancy (duplicate information) ◦ Increased information integrity (quality of information ) ◦ Increased information security (access control) ◦ Provides flexibility ◦ Handle changes quickly and easily ◦ Provides scalability ◦ Scalability - refers to how well a system can adapt to increased demands --Two major disadvantages are ◦ Complexity requires trained (expensive) personnel ◦ Greater impact of failure
Big Data Applications
Analysis requires special big data hardware and that is more effective for this type of data than traditional relational DBMSs NoSQL and Hadoop are two software packages that deal with Big Data Some people have concerns that organizations are harvesting huge amounts of personal data without the consumers/citizens knowledge Risks include ◦ who has access to the data, ◦ how data can be leaked, and ◦ whether it is interpreted correctly
Special-Purpose Database Systems
Apple's iTunes software uses special-purpose database to allow users to find songs NoSQL DBMSs can handle data that does not fit into tables required by traditional relational databases ◦ Much of the data collected via the internet (particularly in social media) do not conform to the strict structure required for Relational databases.
Queries
Ask a question of the database ◦ search and list queries ◦ action queries ◦ delete records ◦ update data ◦ duplicate tables Queries can use one or more tables at a time Queries can limit the rows (records) and the number of columns (fields) displayed ◦ Records are chosen by criteria, which are formed using relational operators ◦ e.g. credit limit > 1000; gender = 'f' ◦ Fields are selected by name ◦ Last Name, Department, Gender,
The database structure
Based upon the hierarchy of data: ◦ a field is a specific fact ◦ a record is a collection of related fields ◦ a table is a collection of related records ◦ a database is a collection of related tables
The Hierarchy of Data
Bit (Binary): circuit that is either on or off Byte: Typically made up of eight bits, Character-Basic building block of information Field: Name, number, or combination of characters that describes an aspect of a business object or activity Record: Collection of related data fields File: Collection of related records Database: Collection of integrated and related files
Other Data Mining Techniques
Cluster analysis: technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible -Association detection: reveals the degree to which variables are related and the nature and frequency of these relationships in the information -Market basket analysis what items does a consumer buy together? Statistical analysis: performs such functions as information correlations, distributions, calculations, and variance analysis -Forecast: predictions made on the basis of time-series information -Time-series information: time-stamped information collected at a particular frequency (interval)
Using DBMS with Other Software
DBMSs can act as front-end or back-end applications: ◦ Front-end applications interact directly with people such as ◦ Webpages ◦ Kiosks ◦ Back-end applications interact with other programs or applications, examples would include ◦ Processing payments ◦ Processing orders ◦ Presenting Financial information to user requests
Data Warehouse Fundamentals
Data warehouse: logical collection of information - gathered from many different operational databases - that supports business analysis activities and decision-making tasks The primary purpose of a data warehouse is to aggregate information throughout an organization into a single repository for decision-making purposes -Extraction, -transformation, -loading (ETL) - a process that extracts information from internal and external databases, transforms the information and loads the information into a data warehouse A data warehouse is static data
Data Mining
Data-mining analysis ◦ Discover patterns and relationships Reports ◦ Cross-reference segments of an organization's operations for comparison purposes ◦ Find patterns and trends that can't be found with databases ◦ Analyze large amounts of historical data quickly Online analytical processing (OLAP) ◦ Uses multiple sources of information and provides multidimensional analysis ◦ Hypercube ◦ Drill down and drill up Predictive analysis: ◦ Form of data mining that combines historical data with assumptions about future conditions to predict outcomes of events ◦ Used by retailers to upgrade occasional customers into frequent purchasers ◦ Used to predict future sales up to a year in the future
Database example
Database Concept Field Record Table Database Example Last Name, Wage Rate, Part No. Employee Record, Inventory Record Employee Table, Inventory Table Personnel Database, Student Database
Data Center
Date is an organizational asset, thus physical protection is a priority as well. Climate-controlled building or set of buildings that house database servers and the systems that deliver mission-critical information and services Traditional data centers: ◦ Consist of warehouses filled with row upon row of server racks and powerful cooling systems Modular Data Centers: ◦ Many organizations now use large shipping containers packed with racks of servers and cooled to easily connect and set up
Create Table
Design a table ◦ each table represents a single entity (person, place, thing, or event) Once you have a table there is a three step process ◦ Name the field ◦ Specify the data type ◦ text, numeric, date/time, logical, BLOB, memo ◦ Specify the field size
Enter/Edit Data
Enter directly into the table Use a form as the input method ◦ forms provide features to assist in data accuracy ◦ masks ◦ case restrictions ◦ value restrictions (e.g. no negative ages) ◦ type restrictions (e.g. no letters in wage rate field)
Types of database structures
Flat File Database - like a spreadsheet, easy to learn, but very limited and difficult to maintain Relational Database - collection of tables linked together by common fields, called key fields (see fig. 16.6) ◦ primary key field - unique unduplicated identifier Hierarchical database - older style, used on mainframe systems. Operate on the basis of a parent-child relationship. One parent can have many children, child can have only one parent Network database - supports a many to many relationship parent can have many children, children can have many parents Both of these models came about when data was physically stored much differently than today.
DBMS Enforce Rules to Maintain Data Integrity
Integrity constraints - rules that help ensure the quality of information -Relational Integrity Constraint -- Do records in different tables match up correctly -Domain Constraint --No illegal data, gender not a M or F, no letters in the balance field, etc. -Business-critical integrity constratint --Credit limit to large, no orders for non-existent cutomers, etc.
Databases Are Complex Systems
It is important that they be carefully designed, which is not a task of the DBMS, it is completed before the DBMS is implemented Designing and implementing a database requires attention to two "views" of the data ◦ Physical view - deals with the physical storage of information on a storage device ◦ A poor physical design an lead to poor performance ◦ Logical view - focuses on how users logically access information ◦ This is based upon the data modeling and final database design Data Modeling is a crucial step in designing a functional database
Sorting Records
Ordering records ◦ Ascending ◦ Descending ◦ Primary sort key ◦ Secondary sort keys (used in tie of primary)
Popular Database Management Systems
Popular Corporate DBMS ◦ Oracle ◦ SQLServer ◦ Sybase ◦ MySQL Server Popular DBMSs for end users: ◦ Microsoft's Access and FileMaker Pro ◦ MySQL, and ◦ CouchDB
DBMS
Software to support data management functions ◦ Create tables ◦ Enter/edit data ◦ View data ◦ Sort records ◦ Create and Run Queries ◦ Generate Reports
Popular Methods for Creating Queries
Structured Query Language (SQL) ◦ Uses a specific language with reserved words are required syntax Query By Example (QBE) a visual method using forms to generate queries. ◦ Uses a click and drag interface
Viewing Records
Table (or data sheet) view Read Only Forms (do not allow data entry) Filters ◦ restrict records to only those matching certain field values
DBMS vs. Database
The DBMS is the software that provides tools for creating, using and maintaining your database ◦ DBMS allows organizations to create, edit, and manipulate the database ◦ Provides security ◦ Enforces integrity constraints ◦ Provides back up processes ◦ Enforces policies (e.g. concurrency control) The relational database itself, is based on a rigid design process called normalization ◦ Leading to a well-formed database ◦ The logical design is created before the database is created
Databases - Collections of Information
The tables (entities) include: ◦ Order ◦ Customer ◦ Concrete Type ◦ Employee ◦ Truck
A DBMS helps to enforce polices keep data reliable
When an application program needs data it requests the data through the DBMS, just like a user Concurrency control deals with the situation in which two or more users or applications need to access the same record at the same time This is an issue for large databases with many users Example: ◦ one user adds a credit to customer account ◦ At the same time a second user subtracts a payment ◦ Depending on the order they are done, the account could be wrong when both transactions are processed
Data Management
Without data and the ability to process it: ◦ An organization could not successfully complete most business activities Data consists of raw facts To transform data into useful information: ◦ It must first be organized in a meaningful way A database is a repository for collections of related facts or data. It is an organized collection of information
Database Professionals
entire organization depends upon it to -deliver timely and -correct data. Managing the database and the data should be left to professionals trained in DBMS DBA: (Database Administrator) ◦ Works with users to decide the content of the database ◦ Works with programmers as they build applications to ensure that their programs comply with database management system standards and conventions Data administrator: ◦ Responsible for defining and implementing consistent principles for a variety of data issues ◦ Duties often include access authorization, monitoring and ensuring the functionality of the software
Database Programs
information management program that allows people to store, organize, communicate and manage information in ways that wouldn't be possible without computers. Database programs are designed to maintain databases--collections of information stored on computer disks.
Big Data
massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software tools. In many cases, it is too big, changes too quickly or exceeds existing processing capabilities. Big data consists of billions or trillions of records from many different sources such as -web sales, -customer contacts, -social media, -mobile data, etc. It is loosely structures and often incomplete. -They use the information to find patterns of behavior -Merging data from vastly different resources is big data You have no control because no one is really sure where it is all going.
Databases - Logical Ties within the Information
relationships: Logical ties, they are created with primary and foreign keys -Primary key - field (or group of fields in some cases) that uniquely describe each record -Foreign key: primary key of one file that appears in another file Foreign keys help create relationships among tables Table = file = relation (don't confuse yourself) Customer File: customer number, PK Concrete Type File: concrete type, PK Truck File: Truck number - PK Employee File: Employee ID - PK Order File: Order Number, PK; Customer Number -FK; Concrete Type-FK; Truck Number-FK; Driver ID-FK
Terminology
• There are many sets of terms that mean the same thing • These terms are used interchangeably and often mixed General: entity, occurence, attribute Relational: table, row, column Classic: file, record, field