Chapter 10, File-System Implementation

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Basic File System

1. Issue generic commands to appropriate device driver to read and write physical blocks on the disk. Each physical block HAS a disk address (e.g. drive 1, track 2, sector 10) 2. Layer manages memory buffers and caches that hold various file-system, directory, and data blocks. 3. A block in the buffer is allocated before the transfer of a disk block can occur. when the buffer is full - the buffer manager must find more buffer memory OR fre up buffer space to allow a I/O to complete. Caches are used to hold frequently used file-system data to improve performance.

File Control Block

A file control block contains information about the file - like ownership, permission, location of file contents

Layered Design

A file system is composed of many levels. Application programs --> logical file system --> file organization module --> basic file system --> I/O control --> devices Each level in design USES the features of lower levels to create FEATURES for high-levels. At the lowest level - the I/O control consists of device drivers, and interrupt handlers to transform information between main memory and disk system. A device driver can be thought of as a translator (e.g. high-level commands like retrieve block 123). Its output is low-level, hardware specific instructions used by the hardware controller. The device driver writes bit patterns to special locations in the I/O controller's memory to tell the control which device location to act on and actions to take.

Bit vECTOR

A free-space LIMIT is implemented as a bit map or bit vector. Each block IS represented by a 1 bit. If it is free, the block is 1. If it allocated, the bit is 0. The advantage is its simplify and efficiency of finding the first free block or n consecutive free blocks on a disk. They are inefficient UNLESS the entire vector is kept in main memory.

When is each partition used?

A raw disk is used when no file system is appropriate. 1. UNIX swap space can use a raw partition 2. Raw disk can hold information needed by disk RAID systems Boot information can be stored in separate partition (it has its own format since boot time). It is a sequential series of blocks loaded as an image into memory. This IMAGE execution happens at a predefined location. The boot loader knows enough about the file system to find and load the kernel and start it executing. It can contain more than the instructions on how to boot a specific OS.

Inodes

An inode is a data structure on a filesystem on Linux and other Unix-like operating systems that stores all the information about a file except its name and its actual data.

Linked List

Another approach to free-space management is the link together all free disk blocks - keeping a pointer to the first free block in the special location the disk and caching it to memory. The first block contains a pointer to the next free disk block. To traverse a list, we must read each block, which require I/O time. Traversing a free list is not a frequent action; the first block in the list is used.

More performance I/O

Another issue is if writes to a file system occur synchronously or asynchrously.

3 Ways of Allocating Disk Space

Contiguous, Linked, and Index. A system uses one method for all files in a file-system type.

Efficiency and Performance

Disks tends to major bottleneck in system performance (slowest computer component) The efficient use of disk space DEPENDS on disk allocation and directory algorithms. Inodes in a disk take up a percentage of its space, by preallocating them and spreading them across the volume, we improve performance by keeping the file's data blocks near the file's inode blocks to reduce seek time

Where are file systems stored

File system is on secondary storage permanently - to hold a large amount of data permanently. Disks provides the bulk of secondary storage where the file system is stored. Two characteristics that make it a convenient place to store multiple files: 1. A disk can be rewritten in place; it is possible to read a block from the disk, modify the block, and write it back to the same place. 2. A disk can access directly ANY block of information it contains; it is simple to access a file sequentially or randomly. To improve I/O efficiency - I/O transfer between MEMORY and DISK is performed in units of blocks. Each block has one or more sectors.

File system and data

File systems provide easy access to the disk by allowing data to be stored, located, and retrieved easily. But, two design problems. 1. How the file system should look to the user. It should define a file, attributes, operations, and directory structure for organizing 2. Creating algorithms and data structures to map the logical file system onto the physical secondary storage device.

Virtual File Systems

How does an OS allow multiple types of file systems to be integrated into a directory structure? 1. The OS implements multiple types of file system BY writing directory and file routines for each type. -Users can access the files contained within multiple file systems on the local work -UNIX uses OOP to organize th eimplementation Data structures and procedures are USED to isolate the system call functionality from the implementation details. The file-system implementation is made up of 3 major layers: (1) File-system interface - based on the open(), read(), write(), and close() calls and on file descriptors (2) Virtual file system layer. It serves 2 functions 1. Separates file-system generic operations from their implementation by defining a clean VFS interface; several implementations can exist on the same machine. 2. We can also uniquely represent a file through a network The VFS distinguishes local files from remote ones, and local files are distinguished in their file-system types. -The VFS activities file-system specific operations to HANDLE local requests based on their file-system types -File handling is done from the vnodes and are passed as arguments to these procedures. The third layer is implementing the file-system types and remote file-system protocol.

Performance

Important criteria for performance: storage efficiency, and data-block access times. 1. We select an allocation method (sequential access or random access). For any type of access, contiguous allocating requires 1 access to get a disk block (especially if we keep inital address of file in memory) For linked allocation - we can also keep the address of the next block in memory and read it directly. While fine for sequential access, for direct access, an access to the iths block might require i disk reads. For indexed allocation - if the index block is in memory, access is made directly. But keeping the index block in memory requires space. If it is not available - we have to read FIRST the index block, then the desired data block. .For a two level index - two index-block reads may be necessary. A large file, and near the end file may require a lot. Performance depends on index structure, size of file, and position of the block desired. Some systems use contiguous for small files, and indexed allocation if the file grows large.

Grouping

In grouping, we store the addresses of n free blocks in the first free block. The first n - 1 of these blocks are free the last block contains the addresses of another n free blocks. The addresses of a large number of free blocks can be found quickly.

Counting

It takes advantage of several contiguous blocks that may be allocated or freed at the same time when space is allocated with the contiguous allocation algorithm. We keep the address of the first free BLOCK and the number of free contiguous blocks that FOLLOW the first block. Each entry in the free-space list consists of a disk address and a count. While requiring more space; the list is shorter as long as the count > 1.

Partitions and Mounting

Layout of a disk can be sliced into multiple partitions, or a volume CAN spin multiple partitions on multiple disks

Indexed Allocation

Linked allocation solves external fragmentation and size declaration problems of contiguous allocation. Without a FAT - linked allocation cannot support efficient direct access since pointers of blocks ARE scattered with the block themselves all over the disk. Indexed Allocation: Solves this problem by bringing all the pointers together IN one location: the index block. Each file HAS its own index block - which is an array of disk-block addresses. The ith entry in the index block points to the ith block of the file. The directory contains the address fo the index block. We use a pointer in the ith index block entry to read it When the file is created. 1) All pointers in the index block are set to nil. 2) When the ith block is written, a block is obtained from the free-space manager and its address is put in the ith index block entry. This allocation SUPPORTS direct access without suffering from external fragmentation, because any free block on the disk CAN satsify a request for more space. BUT, it does suffer from wasted space; the pointer overhead of an index block is greater than the pointer overhead of linked allocation. How large should the index block should be? -Linked Scheme: An index block is normally one disk block; it can be read and written directly by itself. To allow for large files, we can link together several index blocks. -Multilevel Index: A varient of linked represents uses a first-level index block to point to a set of second-level index blocks - which in turns points to file blocks. To access a block, the OS uses the first level to find a second level index block, and uses that block to find the desired data block. -Combined Scheme: We keep the first, say 15 pointers of the index block in the file's inode (describes a file-system object). The first 12 of these pointers point to direct blocks; they contain addresses of blocks that contain data of the file. The next three pointers POINT to indirect blocks. The first points to a single indirect block (which is an index block that contains no data, but the addresses of blocks that contain data). The second points to a double indirect block which contains the address of a block that contains the addresses of blocks that contain the pointers to the actual data blocks. The last pointer is a triple indirect block. This is MADE UP of direct block, single indirect, double indirect, triple indirect (number of blocks that can be allocated to a file can exceed the amount of space addressable)

Linked Allocation

Linked allocation solves the problem of contiguous allocation. Each FILE is a linked list of disk blocks; the disk blocks MAY be scattered anywhere on the disk. To create a new file - we create a new entry in the directory. With linked allocation - The directory contains a pointer to the first and last block of the file. If the pointer is nil, it means that it is the end of pointer value and an empty file. The size field is also set to 0. A write to the file CAUSES the free-space management system to find a free block, and this newe block is written to and linked to the end of the file. To read a file, we read blocks from following pointers from block to block. There is no external fragmentation with linked allocation, and any free block on the free-space list can be used. A file can continue to grow as long as free blocks are available. Disadvantages -Problem is that is is used effectively for sequential access files. (we need to do a disk read and a disk seek for direct access). -The space for the pointers takes up a lot of space. The solution to this problem is to collect BLOCKS into multiple called clusters, and to allocate clusters rather than blocks. That way, pointers use a much smaller percentage of the file's disk space. This decreases space of block allocation. The cost of this approach is an increase of internal fragmentation; more space s wasted when a cluster is partially full then when a block is more full. Another issue is reliability; if a pointer is lost or damaged, it would cause a bug.

Logical File System

Manages metadata information. It includes ALL of the file-system structure EXCEPT the actual data. This file system MANAGES the directory structure to provide the file-organization module with information it needs. It maintains file structure through FILE-control blocks. With this layered structure, we minimize code that is repeated. I/O control and basic file-system code CAN be used in multiple file systems. Each file system comes with its own logical file system and file-organization modules. UNIX uses UNIX file system.

Synchronous Writes

Occur in the order in which the disk subsystem receives them and the writes are not buffered. The calling routine MUST wait for the data to reach the disk drive before it can proceed.

Dual-Booted

PCs and other systems can allow multiple OS to be installed on such a system. A boot loader UNDERSTANDS multiple filesystems and multiple OS that occupy the boot space. Once loaded, it can boot one of the available OS on the disk. A disk can have multiple partitions that contain a different type of file system and different type of OS.

Unified Virtual Memory

Several systems use page caching to cache both process pages and file data.

Free-Space Management

Since disk space is limited - we need to reuse the space from deleting files FOR new files if possible. To keep track of free DISK space - the system maintains a free-space list. The free-space list records ALL free disk blocks. To create a file, we search the free space list for required amount of space and allocate the space for a file. When deleting a file, we reallocate this space back into the file.

Volume

Single storage area within a single file system Structures: 1. An in-memory mount table that contains information about each mounted volume 2. In-memory directory structure cache that holds directory information of recently, accessed directories 3. A system-wide open file table that contains a copy of the FCB of each open file 4. The per-process open file that contains an entry in the system wide open file table 5. Buffers that hold file-system blocks when they are being read from disk or written to disk.

VFS in LINUX

The VFS architecture in Linux is made up of four main object types: 1. Inode object: represents an individual file 2. File object: represents an open file 3. Superblock object: represents an entire file system 4. Dentry object: represents an individual directory entry. For each of the object types, the VFS defines a set of operations that MUST be implemented. Every object of these types contains a pointer to a function table. The function table lists the addresses of the actual function that implement the desired operation for the object.

Asynchronous Writes

The data is stored in the cache and control returns to the caller.; this is done most of the time, but metadata writes can be synchronous. We use a flag in the open system call to allow a process to request that writes be performed synchronously.

Directory Implementation (Linear List)

The linear list OF file names with pointers to the data blocks. -Time-consuming To create a new file - we MUST search the directory to be sure that no existing file HAS the same name. Then we add an entry to the end of the directory To delete a file, we search the directory for the named file, and release space allocated to it. To reuse a directory entry - we mark the entry as unused OR we can attach to a list of free directory entries. Disadvantage of Linear List -Finding a file requires linear search; since it is used frequenty, users will notice if access is slow -We can use cache to store the most used information -A sorted list allows binary search and decreases the average search time

Seek Time

The time taken for a disk drive to LOCATE the area where the data to be read is stored The number of disks required for accessing contiguously allocated files is minimal.

Types of Data in Inode

The type of data in an inode ALSO require consideration. This includes data like last write data, last access data. 1. Every time a file is opened for reading - its directory entry must be read and written as well. This requirement is inefficient for frequently accessed files. 2. Another difficulty is pointer size or any fixed allocation size in an OS; this can cause no more processes or files to be stored if it becomes full

Cooked partition

This is when a partition contains a file system

Raw Partition

This is when a partition contains no file system

File-Organization Module

This module knows about files, and their logical blocks as well as physical blocks. By knowing the file allocation and location of files - this module can translate logical block addresses to physical block addresses for file system to transfer. This module also includes a free-space manager - it tracks unallocated blocks and provides these blocks to the FOM when requested.

Contiguous Allocation

This requires that EACH file occupy a set of contiguous blocks on the disk. 1) Disk addresses define a linear ordering on the disk 2) With this ordering, if we assume that only one job is accessing the disk - accessing block b + 1 after block b requires NO head movement When head movement is needed (from last sector of one cylinder to the first sector of the next cylinder), the head need to only move from one track to the next. Contiguous allocation of a file is defined by the disk address and the length (in block units) OF the first block.. The directory entry for of each file INDICATES the address of the starting block and the length of the area allocated for this file. Accessing a file allocated contiguously is easy. For sequential, the file system remembers the disk address of the last block references, and reads the next block. For direct access to block I of a file - we start block b and access b + i. Both sequential and direct access is supported by contiguous allocation. Problem 1) One difficulty is finding space for a new file (it may be slower) -We need to satsify a request of size n for a list of free holes. First fit and best fit are the best options since they are better in terms of time and storage usage. All algorithms suffer from external fragmentation - as files ARE allocated and deleted, the free disk space is broken into little pieces. Free space is broken into chunks, when there are large number of small pieces and none of which is large enough to hold data. To prevent loss of a lot of disk space - we copy an entire file system ONTO another disk or tape AND this creates one large contiguous free space. The original disk is freed. Then, we can copy the files back ONTO the original disk. This COMPACTs all the free space into contiguous space, but it is costly and severe for large hard disks. 2) Another problem with contiguous allocation is HOW MUCH space is needed for a file. When the file is created, the total amount of space it needs MUST be found and allocated. The size of an output file is difficult to estimate. Getting too little WILL cause issues because we can't increase the size. When this happens, the user program must be terminated and the user must allocate more space and run again. By OVERESTIMATING the amount of space needed - this results in wasted space. If the total AMOUNT of space is known in advance, preallocation is not efficient BECAUSE a file will grow slowly over a long period and it must be allocated enough space for its final size even if it is not used (resulting in internal fragmentation). To minimize this risk, some OS use a modified contiguous allocation scheme - a contiguous chunk of space is allocated initally, and if the amount is not enough, another chunk is added. Internal fragmentation can be a problem if IT is too large, and external fragmentation is a problem if sizes are allocated and deallocated.

Creating a new file

To create a new file, the application program calls the logical-file system. This file system KNOWS the format of the directory structures and allocates a new FCB. The system reads the directory into memory, and updates it with a new file name, and FCB. Some properties of FCB include permissions, file owner, file size, file data blocks and pointers, and file dates (create, access, write). Some OS may treat a directory like a file. When a file has been created, it can be use d for I/O, BUT must be opened first. 1. The open() call passes a file name to the logical file system 2. It seaches for the file in the system wide open file table to see if it is already in use. The per-process open file table entry is created pointing to an existing table. The directory structure is searched for the given file name 3. Some parts of the directory are cached in memory to increase speed. Once the file is found, the FCB is copied into the system-wide open-file table - it stores the FCB and ALSO tracks the number of processes that have a file open. An entry is made in the per-process open file table - with a pointer to an entry in the system wide open file table, and other fields. The open() call returns a pointer to the entry in the per process table (a file may not be there because the system has no use for the FCB located on disk; it may be cached) A file descriptor in UNIX or file handle is the name of the entry. When a process closes the file, the per-process table entry is REMOVED, and the system-wide entry's open count is decremented. All metadata is copied back to the disk-based directory structure, and system-wide open-file table entry is removed. File-system structure use caching to keep all information about an open file.

File-System Implementation

To request access to file contents, the OS implements the open() and close() system calls for processes. The file system uses many on-disk and in-memory structures to implement the file system. 1. On disk - the file system may contain information on HOW to boot an operating system stored there, the total number of blocks, number and location of free blocks, directory structure, and individual files. Structures: 1) Boot-control block (per volume): Can contain information needed by the system boot an operating system from that volume. In Unix file system - this is called boot block In new technology file system - this is called partition boot sector. 2) Volume-control block: This contains volume details such as the nUMBER OF blocks in a partition, the size of blocks, free-block count, and free-block pointers, and FCB-count. In UFS, this is a superblock, in NTFS - this is the master file table. 3) Directory Structure - is used to organize the file. In UFS - this includes file name and inode numbers. In NTFS - it is stored in the master file table. 4) A per-file FCB: Contains details about the file - it has a unique identifier number to allow association with a directory entry. The data is loaded at mount time, updated during file-system operations, and discarded at discmount

Unifed Buffer Cache

Unified Buffer Cache (UBC) fully integrated with the file system that caches file system data and can grow or shrink upon demand. UBC references the same physical pages as virtual memory and can use map operations rather than bcopy routines to access data, thereby increasing system performance. When a unified buffer cache is provided, both memory mapping and read() and write() system calls use the same page cache. This avoids double caching and allows the virtual memory system to manage file system data. If we are caching DISK blocks or pages, LRU seems the best algorithm for block or page replacement.

Linked Allocation: File Allocation Table

Used in MS-DOS, this method works like: a section of disk at the beginning of each volume IS set aside to contain a table. The table has one entry for each disk block and is indexed by block number. -It is like a linked list; the directory entry contains the block number of the first block of the file. -The table entry indexed by that block number CONTAINS the block number of the next block in the file. -This chain continues until it reaches the last block. An unused block has a table value of 0. Allocating a new block to a file is simple by FIRST finding the first 0-valued table entry and replacing the previous end of file value with the address of the next block It has 'name', 'start-block'. Then, each start-block is a linked list to the next block. The FAT allocation scheme can results in a significant number of disk head seeks, unless it is cached. The disk head MUST move to the start of the volume to the location of the block itself.

How to improve performance

We can improve performance. 1. Most disk controllers (a hardware which lets us to connect with the CPU) include local memory to form an on-board cache that is large enough to store entire tracks at a time. Once a seek is performed, the track is read into the disk cache starting at the sector under the disk head. Some systems maintain a separate section of main memory for a buffer cache - where blocks are kept to be used shortly. Other systems may cache file data using a page cache (which uses VM techniques to cache file data as pages than file-system blocks). Caching file data using virtual addresses is more efficient than caching through physical disk blocks.

Root Partition

Which contains the operating system kernel and other system files is MOUNTED at boot time. -Other volumes can be mounted at boot, or mounted later -The OS verifies the device contains a valid file system by asking the device driver to read the device directory and verifying format If the format is invalid, the partition must have its consistency checked and corrected. Then, the OS notes in its memory-mount table that a file system is mounted.

Directory Implementation (Hash Table)

With this method - the linear list STORES the directory entries, but a hash data structure is used. It takes a VALUE from a filename and returns a pointer to the file name in the linear list. -It can decreases directory search time -Insertion and deletions are easy - but we need to deal with collision where two files name hash to the same location. The difficulty of hash table is its fixed size and dependence on the hash function on that size. We can implement a chained overflow hash table - each hash entry is a linked list instead of an individual value, and we can resolve collision by adding a new entry to the linked list.


Kaugnay na mga set ng pag-aaral

Series 6: Chapter 5:3 Variable Annuities

View Set

Instructions & Programs: Crash Course Computer Science #8

View Set

EUROPEAN HISTORY SECTION I, Part A Time -- 55 minutes 55 Questions

View Set

Exam 2 WE-1-DP Drawing Interpretation 120102a

View Set

Psychology - Abnormal Behavior & Mental Disorders

View Set

Chapter 2: The Fall and the Promise of a Savior

View Set