1.3.1
Data Bases
A database is a structured, collection of data. It is special because the data it contains is organised. This is done to: - Make processing more efficient. - Reduces storage requirements. - Avoid redundancy. Data Bases allows data to be: - Retrieved quickly. - Updated. - Filtered.
flat file database
A database which is stored on its host computer system as an ordinary 'flat file'. To access the structure of the data and manipulate it, the file must be read in its entirety into the computers memory.
Normalization
A normalized database will have its tables connected by one to many relationships like seen on the previous flash card. Give us sensible tables with the minimum amount of data redundancy.
Secondary key
A primary key is chosen to provide a unique row or combination of rows for each table. This allows software to find a record clearly. For example, there must be only one customer with a particular account number. The primary key is normally indexed automatically by the database software to allow fast searches. Sometimes you need to have this fast search facility using a different field.
Entity relation diagrams
A properly normalized table design can be expressed in various ways as a diagram. The development of the diagram can be useful during the normalization process. A common method of representing the tables and relationships is using crow's feet diagrams. These connect tables using symbols shaped like crow's feet. One prong means 'one'. Three prongs mean 'many'. If we have a situation where each customer can place many orders and each order can contain many items, we represent the data model like this ONE TO MANY RELATIONSHIP Customer ---> Order ---> Item MANY TO MANY RELATIONSHIP Teacher <---> Student
Serial
A serial file is one where records are organised one after another. It is the only possible way to store data on a long, thin medium such as tape. It is possible to divide the data into records in order to help locate related data together. Can be organised any way that was useful to the business using them. They could have as many or as few as necessary.
ER Modelling
Because of this, relational databases are designed to reduce the amount of duplicate data. This means separating out each entity and storing data about each entity in a separate table. We can see the advantages of separating data about each entity. In the online vendor example, if we keep data about customers separate, then when an invoice is created, the customer details will be accessed from the one up to date copy. To achieve the best possible relational database design, it is necessary to apply rules. This process is known as data normalization.
Data Capture
Data capture forms often use boxes or a set amount of spaces and occasionally provide examples too. This is to make sure each field is completed correctly. Data is often entered as a code in a database, for example in the above data capture form Y is used for Yes and N for No. Codes like this are used because they: -are quicker to type in -use less disk space -are easy to validate -make searching the database easier as data is entered in a standard format
Files
Data is typically stored on magnetic tape, which has to be written to or read in an orderly sequence. In a database there are a
Relation databases
Entity; A real world thing that is modeled in a database. it might be a physical object such as a student or a stock item in a shop or it might be an event such as a sale. Relation; A table is called a relation in a relational database. Tuple; A row in a table, equivalent to a record. A Tuple is data about one rather than many. Relational databases organize files in a more efficient way.Data is stored in separate tables. Each table stores data about a single entity.
Fixed and variable length fields
Fixed- Each field always has the same number of bytes in length. Any unused bytes are filled with blank space. Variable- Inserts a comma or a marker to define a field. Flexible as does not waste as much space as a fixed field length structure. Software can advance through records by counting markers.
Serial files
Here is part of a serial file with two fields per record, name and date of birth. To locate a particular record it is necessary to start at the beginning of the file and examine each record in turn until the required record is found or the end of the file is found end is reached.
Foreign Key
Here the field customer_Reference forms the primary key in TABLE CUSTOMER, but is the foreign key in TABLE ROOM. It allows a relationship to link the tables
Indexed sequential files
In which records are stored according to some key field and where one or more indexes have been created to make searching even more efficient. 1,000,000 people Field of surnames Set so starting with letter 'N' starts at 560,216
Random Files
In which records have been placed in the file according to some mathematical hash algorithm. Takes a record (collection of data). converts a code in to an address. address is stored in the data base. needs the same algorithm used to convert it in to an address to retrieve it, can be very quick.
Record
It is a collection of data under one identifier. An example would be your information on a school register.
Database Views
Physical View- This is how the data is actually written and recorded or written to the storage medium. All stored data is held a succession of data bits. The designers of the database will have no interest in this. Logical view: this is how the data is organised for processing. It looks at the construction of tables, queries, reports and the software that will deliver database functionality to the owners of the system. User view: this is about the appearance and functionality of the database. The user of a database is not concerned with the structure of tables and links between them. The user just needs a well-designed interface to allow access to whatever data is necessary to do his or her job.
Indexing
Sequential files can be searched more quickly by producing a separate index file. This is just like an index in a book. The data is divided up in to categories, such as names. Each category is linked to a position in a data file where that category starts., so a tape can be used to fast forward to a better position for starting a sequential search.
Entity Relationship Modelling
The entity-relationship model (or ER model) is a way of graphically representing the logical relationships of entities in order to create a database. Relational databases are made up of several data tables to avoid redundancy. Redundancy is the unnecessary repetition of data. This is avoided in databases because of the risk of inconsistencies between different copies of the same data.
Data normalization
The process of organizing the columns (attributes) and tables (relations) of a relational database to minimize data redundancy.
Sequential files
The records are still arranged one after another but in particular record. This order may be something like a customer number but could also be in alphabetical order by name. These makes searches in a data base easier. A transaction is a change in the state of a database, amendment or deletion of data. A transaction file is a file of events that occur as part of a businesses activity. E.G stock removed, lower the number of stock for that product on the data base.
Fixed Fields
This allows software to count bytes in order to count fields and hence records. Every 15bytes in a name field brings it to the next field. Then the next field can be similarly treated as its length will also be known to the software.
Queries
This is a list of common terms used in query language with example usage based on the table above: AND - use to find all records where the make is a Ford AND the price is less than 8000. This would return the Fiesta but not the Mondeo. OR - use to find all records where the make is a Ford OR Rover. This would only return the Ford Fiesta, the Ford Mondeo and the Rover 200. NOT - use to find all records where the make is NOT Citroen. This would return all records except the Citroen C4.
Master File
This is a principle file held by an organisation that stores basic details about some crucial aspects of the business. It is generally a large file that tends to not change very often. For a super market, it could be a stock file; for a school it could be a file of a students details.
Transaction Processing
This is a type of processing that attempts to provide a response to a user within a short time frame. It is not as time critical as a real time system and normally features a limited range of operations planned in advance. E.g. bank withdrawal. -Create -Read -Update -Delete
Hashing
Using disk file storage, another method of quickly writing and reading files is possible. This method is called hashing. The key field of a record can be transformed in a way to generate disk access. This allows a random access device such as a disk drive to go directly to part of a disk and start working from there.
Validation
Validation is an automatic computer check to ensure that the data entered is sensible and reasonable. It does not check the accuracy of data. For example, a secondary school student is likely to be aged between 11 and 16. The computer can be programmed only to accept numbers between 11 and 16. This is a range check. However, this does not guarantee that the number typed in is correct. For example, a student's age might be 14, but if 11 is entered it will be valid but incorrect.
Verification
Verification is performed to ensure that the data entered exactly matches the original source. There are two main methods of verification: Double entry - entering the data twice and comparing the two copies. This effectively doubles the workload, and as most people are paid by the hour, it costs more too. Proofreading data - this method involves someone checking the data entered against the original document. This is also time consuming and costly.