Databases and Data Warehouses

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Database Design

-Conceptual / Logical Design - Abstract model of database from business perspective. -Physical Design - determines how the database is arranged, optimized and tuned on storage devices. modeling database- conceptual/logical design thinking about database from business perspective- writing out on paper- not building it yet physical- DB administrator- manages databases, makes sure databases stay up and running, backed up, data dictionary is backed up frequently, overall oversight; monitor performance of database -how database is arranged on storage -decide what storage device it can be stored on faster storage device- solid state drive enterprise server equipment have many hard drive options large organizations may choose a server farm -tuning (lots of settings in enterprise) -can arrange and optimize indexes you can create an index on a table which will speed up how fast you can query and return records on a particular field ex. you want to search or sort on zip code with 80000 customers. if you build an index, db keeps track of every record in the database and what the zip code value is, then it sorts them so they are easy to look up based on what record number it is and what the zip code value is speeds up query indexes are important in making sure database queries are responsive cost of adding an index is every time you add a new record to table or update a value in the table, index has to be rebuild, will take longer in the long run if you have to add a record index property- dont touch it- will create problems if you build wrong index he has used different indexes to see which one makes it go faster when building a program (even 1-3 seconds) physical design- which hard drive you put it on and how

Traditional File Environment Issues

-Data Redundancy-in multiple places. no central place for data to go- it was in multiple different depts -Data Inconsistency- data can get changed in some places but not others. inconsistent data is bad data -Data Isolation- only certain depts or individuals had access to that data -Data Integrity -Security -Application / Data Dependence application- software different apps are dependent on each other based on their codes once you wanted to change the date file, you had to rewrite the program. became problematic and would be too long and costly before we had databases, we had traditional file environments issue- technically wasn't sustainable as a result, databases were created

Data Warehousing

-Data warehouse - a logical collection of information gathered from many different operational databases (Extract, Transform, Load) used to create business intelligence that supports business analysis activities and decision-making tasks. -helps you answer questions like customer profile changes support a high price product, how many products sold for $10000 last month allows you to more quickly analyze data in a summarized fashion DW- pulls data from all across the organization into one process- ETL; how to get data from one system and summarize it and get it to another system -how to make it consistent across all data sources -sometimes you store data in different places and you have to :re-scrub: it (get into a consistent format) load- how to get the data loaded into the DW there are jobs that focus on ETL

Data Hierarchy

-Database -Table, File, Relation -Records, Rows, Tuples -Fields, Columns, Attributes -Bytes -Bits database- collections of data organized in a manner to access retrieval and use of data bytes- made of bits

Network Database

-Depicts data logically as many-to-many relationships -Less flexible compared to RDBMS -Lack support for ad-hoc and English language-like queries downside- little flexibility cannot do ad-hoc queries- you cant make up your own query and pull data the way you want to can only pull out data the way it was originally designed

business intelligence cont

-Online transaction processing (OLTP) - the gathering of input information, processing that information, and updating existing information to reflect the gathered and processed information. systems were talked about, what were building any time you buy something online, its interacting w an OLTP day to day transactions -Operational databases - databases that support OLTP. databases that run an organization on a day to day basis -Online analytical processing (OLAP)- the manipulation of information to support decision making. taking the info to manipulate it and make decisions from it (the summarized view of the data) analogy to pivot tables in excel - summarizes things w a line that drags rows to columns, etc and orient the data differently data warehouse- pulls data from transactional databases and dumps it to one single data store goal- help analyze summary data dont care about individual customer names or product details summary= agrogate you care about summary because its too slow to query through millions of record data warehouse also summarizes data for you (the data is summarized before it gets to DW) you won't find customer names in a DW may find customer zip code won't keep track of every product sold at every location- you'll have an aggregate amount (each day- this is how many you sold)

Hierarchical Database

-Organizes data in a tree-like structure -Supports one-to-many parent-child relationships -Prevalent in large legacy systems fairly intuitive- the way it organizes data is fairly consistent with how we like to organize things in our mind see this in older systems employees(root)- want to track compensation(first child), job assignments, etc when you want to compare "lower children" (second child) becomes inefficient to query and extract that data out ex: manufacturing environment- finished product at top and sub-components that make up finished product then raw materials at bottom- determines how much materials you need on hand to produce whats needed and when having understanding of finished product inventory is important for decision making -to end user- you would really know what type you're using

Relational Database

-Represents data as two-dimensional tables called relations -Relates data across tables based on common data element -Relational DatabaseExamples: DB2, Oracle, MS SQL Server, MySQL tables represented in 2 dimensions (columns and rows) common data element- foreign keys connect to primary key first week of class about phone number of an organization- primary key and foreign key

Evolution of Databases

1960s- traditional files 1970s- hierarchal and network 1980s- relational (most popular) balance of power and flexibility 1990s- object-oriented and object-relational

DBMS Components

3 key components: DDL, DML, data dictionary -Data definition language (DDL): Specifies content and structure of database and defines each data element (data type, length, properties) -CREATE TABLE Customers (Cust_novarchar2(12), FNamechar(30), Lnamechar(30), DOB date, Address varchar2(50)); language will define what tables, fields, properties the database has different data types need different options in different platforms access- go to design view and add tables in a graphical fashion; nothing to do with data stores=d in table just defines structure with no data in it -Data manipulation language (DML): Manipulates data records in a database Select * from orders where customer_IDin (45, 16, 212) and order date > '20180220'; often quieres SQL ex- select * from orders...]allows you to retrieve records from database, add, and delete records recommends everyone learns a little bit of SQL- all about how you can turn data into info

Business Intelligence

Knowledge about your: -Customers -Competitors -Partners -Competitive environment -Internal operations data-> info intelligence- how you can ensure your business has the info on hand so you dont have to keep generating it having knowledge on hand will make you a better business, will help ups monitor better data analytics- we've had elements of it for decades online transactional processing information: (different databases focused on data and info) building transactional systems that run your business, collect data, and provide info -product database: add a product line, change a product like -advertising database: change advertising table, increase radio budget -customer database: increase customer credit limit, change customer salary level online analytical processing business intelligence: one big data warehouse: info used in decision making -how many products sold $10000 last month -if inventory levels are decreased by 10%, what is the new total cost of inventory carried -can customer profile changes support a high priced product

Data dictionary

Stores definitions of data elements, and data characteristics database uses its own tables to keep track of itself ex. an oracle database- when you build a table there is another table that keeps track od the tables and fields you build overall definition of all tables in your database -captures structure you define important because if you need to restore a database you have to restore the data dictionary because DD makes all tables appear again w its structure and relationships once thats done you can start refilling with data from backups -looks like design view in access nothing to do with data itself- it is the definition of your data

data mining

Used to find hidden patterns and previously unknown trends in data. data analytics and AI uses it about finding patterns and previously unknown trends in data ex. marketers will look at purchasing patterns in customers because you want to be successful and know what and how your customer is going to buy- getting into the mind of a consumer (data driven) -basket analysis- identifying a combo of products that are often purchased together- helps with promotions and product placement -data mining has found combos like diapers and beer what they found out is wife is home w child and calls husband on way home from work to get diapers and he grabs beer ex from website how companies learn your secrets target was trying to figure out how to identify customers who are expecting a baby because new mothers will start buying baby stuff and getting into a routine on where they buy- if target gets someones business early on, they'll have their business for a long time- marketing strategy they wanted to see if they could predict who was expecting based on purchasing patterns they were pretty successful in finding purchasing product combos -in the article, a father with an 18 yr old daughter who kept getting baby ads for mad and went to target to complain- the daughter ended ip indeed being pregnant before the dad knew -there are serious privacy concerns multidimensional analysis tool- OLAP

data mart

a subset of a data warehouse in which only a focused portion of the data warehouse information is kept. -within large cube (organization wide data warehouse) there are smaller cubes (merchandising, advertising, distributing, etc) when you have the summarized data- leads to more Qs and As ex. mid 2000s applicants jumped up- MTC did "sorority Life" at UB- portrayed UB as party school. therefore, the more data you get you wonder why that data exists why is one major more popular?

Multidimensional Data Model

cube of data up to 3 dimensions (3 factors were analyzing) Can illustrate up to 3 but warehouse can store dozens OLAP- "slice and dice" the cubes= summarize and analyze the results in each little cube, theres a number DWs store summarized data from multiple sources in one place advantage of data warehouseing- sabres time when querying because there aren't millions of records to query -when he worked at CIT- worked on a project w data warehousing (technically a data mart) they were interested in how students came into the school, what majors and when and how do they graduate (flow of student through school ) interested in this because of money have to know how many people are coming in, whats their budget, and how to allocate their money university gets state support for each student so they factor in that $ they wanted a DW to track enrollment for budget purposes analyze students based on gender, race, ethnicity, SAT, zip code, HS rank, intended major, GPA records for these are how many students fit a specific criteria every metric you add shrinks the numbers in those categories because you split them up more and more

traditional file processing

different depts had access to different files. some had access to the same file only one dept in example had access to "F" file so it was isolated

database design story

he was working at CIT; they had an oracle database running on a server with 4 processors and a bunch of ram. they were putting data on a website for people to access and login, one table was going to be queried every time someone clicked something (basically a security table- authorization) they forced oracle to store that table in RA< all the time because they didn't want it to have to read the hardware beer time, it was just instantly there in memory. made a big difference, saved time, improved user experience

Database Approach Advantages

if designed properly -Minimal data redundancy -Data consistency -Integration of data -Sharing of data -Uniform security, privacy and integrity -Data independence integrates sharing of data from a centralized perspective one place where the data is shared -database approach is data-centric. database is center and everything else is built off of it as a result- data independence; can add tables but cannot delete things because the program will still look for that table even if it is deleted

Database Environment

uniform security and privacy and access to data- can all be managed centrally

databases and the web

web servers are usually talking to a database ex. ublearns- server authenticates w username and password and checks database to see if its correct, then queries things like class schedule, pulls that data, dumps it into a webpage that generates and sends it back to you strength of building a database driven website- you can create 20,000 my ub pages for ex. through databases and no one has to individually update class schedule, etc they likely have a handful of template pages (faculty, students) templates query data a certain way and pull it back -these allow broad access to a lot of data


Set pelajaran terkait

Live Virtual Machine Lab 8.2: Module 08 Network Security Assessment Tools

View Set

Week 16 - Palliative Care & End of Life

View Set

BIOLOGY - UNIT 6: MICROBIOLOGY ARCHAEA

View Set

Compensation/Benefits Administration Ch. 8, 9, 10, & 11

View Set

Biology Concepts and Investigations: Chapter 1

View Set

AP Physics Chapter 7 & 8 practice quizzes

View Set