Introduction to Databases - Module 3
Conduct a Survey
Advantages: - a lot of ground can be covered in a short time - questions are presented in the same manner Disadvantages: - poor response rate - unbiased survey questions are difficult to compose - no benefit of nonverbal clues
Document Review
Advantages: - less time consuming than any of the other methods - provide more thought out information - pictures and diagrams really are worth a thousand words Disadvantages: - reflect what should happen, not what does happen - often out of date
Conduct Interviews
Advantages: - more information on side topics - it easier to detect uncertainty in person Disadvantages: - time consuming - poorly skilled interviewers can give away the answer they are hoping for
Observation
Advantages: - you get to see people following normal processes in everyday use (which may or may not be inline with the processes that management believes are being followed) - you may observe events that people would not think (or dare) to mention in response to questionnaires or interview questions Disadvantages: - behavior can change if people know they are being watched (Hawthorne Effect) - would take a long time to gather useful information - travel costs to various business locations
Requirements Gathering (2)
Project Activities: - Collect requirements - Analyze requirements Database Activities: - Collect and analyze user views - Identify preliminary entities
Construction (6)
Project Activities: - Construct application software - Build application development and test environments Database Activities: - Create development/test databases - Test any required data conversions
Implementation and Rollout (7)
Project Activities: - Create production environment - Install application components Rollout to users Database Activities: - Create production databases - Perform required data conversions
Planning (1)
Project Activities: - Feasibility study - Form project team Database Activities: - Review DBMS options - Assign database specialist to team
Ongoing Support (8)
Project Activities: - Respond to reported problems - Apply mandated changes Respond to change request Database Activities: - Database performance tuning - Database software patches - Schema change to support application changes
Physical Design (5)
Project Activities: - Specify physical system software - Specify physical hardware Database Activities: - Physical database design
Deletion Anomaly
a DELETE operation destroys unintended information. For example, deleting the last employee on a project destroys all the information about the project itself
Unique Identifier
a collection of one or more attributes that uniquely identify each occurrence (each tuple) of a relation
Bridge Table
a dimension that resolves a many-to-many relationship with another dimension technically it departs from the star schema rule against dimensions have relationships with other dimensions
Repeating Group
a group of attributes that repeat together all attributes in a repeating group should be moved to the same new relation because they repeat together
Second Normal Form (2NF)
a relation that is in 1NF where all non-key attributes are functionally dependent on the entire primary key said another way, a relation in 2NF has no repeating attributes and each non-key attribute is functionally dependent on the entire key
Third Normal Form (3NF)
a relation that is in 2NF where all the non-key attributes depend only on the primary key said another way, a relation that is in 2NF and there are no transitive dependencies also remember to remove easily calculated attributes
Fourth Normal Form (4NF)
a relation that is in 3NF and contains no multiple independent sets of multivalued dependencies can be avoided with strict interpretation of 1NF ex: employee (PK) skills (PK) and language does language imply that skills such as typing can only be done in the specified language?
Bridge Attributes
a repeating list of attributes
Boyce-Codd Normal Form (BCNF)
addresses anomalies that occur when an attribute that is part of the primary key is functionally dependent on a non-key attribute essentially just a stronger version of 3NF
Conceptual Design (3)
also known as External Design Project Activities: - Design screens/forms/reports - Design storyboards or screen flows Database Activities: - Develop conceptual data model - Update enterprise conceptual model
Logical Design (4)
also known as Internal Design Project Activities: - Specify logical system software - Specify logical hardware Database Activities: - Develop logical data model - Perform normalization
Insertion Anomaly
an INSERT operation is blocked by an artificial dependency. For example, we cannot insert a new project without an employee to assign to it
Star Schema
an alternative to 3NF schemas intended for analytical databases divides tables (relations) into fact tables and dimension tables star schema diagrams are often drawn with a fact table in the center, surrounded by dimension tables, which forms a star pattern, and hence the name star schema normalization rules are generally not applied to star schemas because analytical applications work with historic data, which is usually not updated (not transaction-based) no primary key because the unique identifier is assumed to be the combination of all the foreign keys for all the dimensions connected to the fact table
Superkey
an attribute that adds more attributes than the minimum required for uniqueness
Surrogate Identifier or Artificial Identifier
an invented artificial identifier that has no real-world meaning
Fifth Normal Form (5NF) = Fully Normalized
any further splitting would lead to relations where the original view cannot be reconstructed with joins only trivial splits remain
Modification (Update) Anomaly
changing a single data value requires an UPDATE operation on multiple tuples (rows of data). For example, changing a project due date requires an UPDATE to the record of every person assigned to the project.
Tuple
collection of data values that form one occurrence of an entity in a physical database, a tuple is called a row of data
Database Life Cycle
consists of all the events from the time the database is first proposed to the time it is retired from service
Dimension Tables
contain dimension attributes (or simply attributes), which characterize or describe the facts in some way, such as Purchase Date, Product Code, and Product Description dimensions provide the business context for the facts, and are typically used for grouping, sorting and filtering the data contained in fact tables
Fact Tables
contain facts, which are quantitative measures of a business process, such as Purchase Quantity or Amount Paid facts are generally numeric and cumulative, which means they can be summed without losing business meaning (1) additive (2) non-additive (3) semi-additive
Agile Methods
cross-functional teams collaborate in order to develop requirements and solutions "time-boxing" development cycles, in that requirements that become blockers to the build deadline are moved to another build iterative, incremental and evolutionary development face-to-face communication very short feedback loop and adaptation cycles focus on quality ex: Rational Unified Process (RUP)
Relation or Entity
data represented logically in a two-dimensional format using rows and columns
Feasibility Study
determine whether the project can be reasonably expected to achieve (or help achieve) the objective and whether preliminary estimates of time, staff, and materials required for the project fit within the required timeframe and available budget
Prototyping (1)
development methods involve rapid development of the application using iterative sets of design, development, and implementation steps as a method for determining requirements
Projects
division of work to be accomplished over the long term
Cold Turkey Method
every user starts on the new version at the same time
Bugs
genuine errors found in the existing application or database
Functional Dependency
if X determines / uniquely identifies Y, then Y is functionally dependent on X X determines Y X is a determinate of Y Y does not necessarily determine X
Modules
individual units of application programming that will be written and tested together
Rollout
is the process of placing groups of business users on the new application. Sometimes a new project is implemented cold turkey, meaning every user starts on the new version at the same time
Grain
level of detail represented in a fact facts with the same grain are placed in the same fact table
Normalization
logical database design technique for producing a set of relations with desirable properties, given the data requirements of an enterprise the process of evaluating attributes and optimizing their placement in entities teaches you how best to organize your data into tables developed by E.F. Codd for designing relational database tables that are best for transaction-based systems (that is, those that insert, update, and delete data in the relational database tables)
User View
method employed for presenting a set of data to the database user in a manner tailored to the needs of that person or application a tailored presentation of the data
Nontraditional Life Cycles
method for developing computer systems in response to the belief that SDLC projects take too much time and consume too many resources (1) prototyping (2) rapid application development (RAD) (3) agile methods
Key Performance Indicator (KPI)
metrics
First Normal Form (1NF)
move multivalued attributes and repeating groups to new relations a relation where the intersection of each row and column contains only one value (i.e. a relation with no repeating groups of attributes and no multi-valued attributes)
Multivalued Attributes
multiple values for at least some tuples (rows) in the relation ex: skill (attribute) -- fast, smart, etc
Over Normalizing
normalizing that does not mitigate insert, update or delete anomalies
Trivial Splits
occur when resulting relations have a primary key consisting only of the primary key of another relation
Denormalization
occurs as a last resort when database users suffer performance problems that cannot be resolved by other means - recombining relations that were split to satisfy normalization rules - storing redundant data in tables - storing summarized data in tables
Transitive Dependency
occurs when an attribute that depends on another attribute that is not the primary key of the relation in other words, occurs when a non-key attribute is dependent on another non-key attribute
Chicken Method
old and new versions of the application must run in parallel for a time while groups of users are trained and migrated over to the new application
Joint Application Design (JAD)
one of the most success prototyping techniques (1) rapidly determine requirements and the user-facing parts of the conceptual design and then (2) complete the project using a traditional SDLC methodology
Candidate
possible unique identifiers for the same relation
Subject Matter Experts
professionals who have expertise in the field of the application but who usually do not have technical computer system knowledge
Deliverables
project goals
Splitting
recommended when the parent table in the hierarchy is likely to be used in the grain of a fact table that does not include the child table
Degenerating Dimensions
removing a dimension and placing it in the fact table commonly used simplification for dimensions containing just one or perhaps two attributes that are not expected to be used as part of the grain of other fact tables
Project Manager
responsible for the overall management and execution of the project
Conformed Dimensions
shared dimensions between fact tables
Database Specialist (Database Administrator or Data Modeler)
someone skilled at analyzing the data especially necessary if a data-driven approach is being utilized
The Project Triangle
the dilemma that you must compromise on either (1) quality, (2) delivery time, or (3) cost good, fast, and cheap when two of objectives are optimized, the third objective always suffers
Analysis Paralysis
the entire project stalls while analysts spin their wheels looking for answers and clarifications that are not forthcoming (not to happen in the near future)
Rapid Application Development (2)
the premise of this type of development method is to build functioning application systems in as little as 60-90 days The 80/20 rule is employed when choosing which requirements will be built in the next release, with the ultimate goal of eventually building an application that implements 100% of the requirements
Implementation
the process of installing the new application system's components (application programs, forms or web pages, reports, database objects, and so on) into the live system and carrying out any required data conversions
System Development Life Cycle (SDLC)
the traditional method for developing computer systems (1) planning (2) requirements gathering (3) conceptual design (4) logical design (5) physical design (6) construction (7) implementation and rollout (8) ongoing support
Primary Key
unique identifier
Natural Identifiers
unique identifiers that have real-world meaning
Semi-Additive
values can only be summed within some known context ex: if I had $1,000 in a savings account at the end of November and I deposited $100 during December with no other activity, my balance at the end of December would be $1100. However, adding the two balances would yield $2100, and I never had $2100
Non-Additive
values cannot be summed without losing business meaning ex: employee hourly rate
Additive
values that can be summed without losing business meaning you should strive to include only additive facts in your fact tables ex: hours worked
Relation (Entity) Consolidation
you should not have any two relations with the exact same unique identifier