computer architecture ch 6 memory heirarchy

¡Supera tus tareas y exámenes ahora con Quizwiz!

Segmentation:

A variable-size address mapping scheme in which an address consists of two parts: a segment number, which is mapped to a physical address, and a segment offset.

Pitfall:

Ignoring memory system behavior when writing programs or when generating code in a compiler.

Seek:

The process of positioning a read/write head over the proper track on a disk. seek time

Swap space:

The space on the disk reserved for the full virtual memory space of a process.

Page table:

The table containing the virtual to physical address translations in a virtual memory system. The table, which is stored in memory, is typically indexed by the virtual page number; each entry in the table contains the physical page number for that virtual page if the page is currently in memory.

Capacity miss:

A cache miss that occurs because the cache, even with full associativity, cannot contain all the blocks needed to satisfy the request.

Three Cs model:

A cache model in which all cache misses are classified into one of three categories: compulsory misses, capacity misses, and conflict misses.

Fully associative cache:

A cache structure in which a block can be placed in any location in the cache. more expensive

Nonblocking cache:

A cache that allows the processor to make references to the cache while the cache is handling an earlier miss

Virtually addressed cache:

A cache that is accessed with a virtual address rather than a physical address.

Physically addressed cache:

A cache that is addressed by a physical address.

Translation-lookaside buffer (TLB):

A cache that keeps track of recently used address mappings to try to avoid an access to the page table.

Context switch:

A changing of the internal state of the processor to allow a different process to use the processor that includes saving the state needed to return to the currently executing process.

Next-state machine:

A combinational function that, given the inputs and the current state, determines the next state of a finite-state machine.

Write buffer:

A queue that holds data while the data are waiting to be written to memory.

Split cache:

A scheme in which a level of the memory hierarchy is composed of two independent caches that operate in parallel with each other, with one handling instructions and one handling data.

Finite-state machine:

A sequential logic function consisting of a set of inputs and outputs, a next-state function that maps the current state and the inputs to a new state, and an output function that maps the current state and possibly the inputs to a set of asserted outputs.

Aliasing:

A situation in which two addresses access the same object; it can occur in virtual memory when there are two virtual addresses for the same physical page.

System call:

A special instruction that transfers control from user mode to a dedicated location in supervisor code space, invoking the exception mechanism in the process.

Prefetching:

A technique in which data blocks needed in the future are brought into the cache early by using special instructions that specify the address of the block.

Virtual memory:

A technique that uses main memory as a "cache" for secondary storage.

Address translation:

Also called address mapping. The process by which a virtual address is mapped to an address used to access memory.

Compulsory miss:

Also called cold-start miss. A cache miss caused by the first access to a block that has never been in the cache.

Conflict miss:

Also called collision miss. A cache miss that occurs in a set-associative or direct-mapped cache when multiple blocks compete for the same set and that are eliminated in a fully associative cache of the same size.

Exception enable:

Also called interrupt enable. A signal or action that controls whether the process responds to an exception or not; necessary for preventing the occurrence of exceptions during intervals before the processor has safely saved the state needed to restart.

Implementing protection with virtual memory Supervisor mode:

Also called kernal mode. A mode indicating that a running process is an operating system process.

Rotational latency:

Also called rotational delay. The time required for the desired sector of a disk to rotate under the read/write head; usually assumed to be half the rotation time.

Reference bit:

Also called use bit or access bit. A field that is set whenever a page is accessed and that is used to implement LRU or other replacement schemes.

Physical address:

An address in main memory.

Restartable instruction:

An instruction that can resume execution after an exception is resolved without the exception's affecting the result of the instruction.

Redundant arrays of inexpensive disks (RAID):

An organization of disks that uses an array of small and inexpensive disks so as to increase both performance and reliability.

The Big Picture

Caches, TLBs, and virtual memory may initially look very different, but they rely on the same two principles of locality, and they can be understood by their answers to four questions: Question 1: Where can a block be placed? Answer: One place (direct mapped), a few places (set associative), or any place (fully associative). Question 2: How is a block found? Answer: There are four methods: indexing (as in a direct-mapped cache), limited search (as in a set-associative cache), full search (as in a fully associative cache), and a separate lookup table (as in a page table). Question 3: What block is replaced on a miss? Answer: Typically, either the least recently used or a random block. Question 4: How are writes handled? Answer: Each level in the hierarchy can use either write-through or write-back.

The Big Picture

Caching is perhaps the most important example of the big idea of prediction. It relies on the principle of locality to try to find the desired data in the higher levels of the memory hierarchy, and provides mechanisms to ensure that when the prediction is wrong it finds and uses the proper data from the lower levels of the memory hierarchy. The hit rates of the cache prediction on modern computers are often above 95% (see COD Figure 5.46 (The L1, L2, and L3 data cache miss rates ...)).

while levels closer to the processor (caches) use SRAM (static random access memory).

DRAM is less costly per bit than SRAM, although it is substantially slower. The price difference arises because DRAM uses significantly less area per bit of memory, and DRAMs thus have larger capacity for the same amount of silicon;

Unlike disks and DRAM, but like other EEPROM technologies, writes can wear out flash memory bits. To cope with such limits, most flash products include a controller to spread the writes by remapping blocks that have been written many times to less trodden blocks. This technique is called wear leveling.

Flash controllers that perform wear leveling can also improve yield by mapping out memory cells that were manufactured incorrectly.

Handling TLB misses and page faults

Handling a TLB miss or a page fault requires using the exception mechanism to interrupt the active process, transferring control to the operating system, and later resuming execution of the interrupted process.

If the valid bit for a virtual page is off, a page fault occurs. The operating system must be given control. This transfer is done with the exception mechanism,

Once the operating system gets control, it must find the page in the next level of the hierarchy (usually flash memory or magnetic disk) and decide where to place the requested page in the main memory.

Sector:

One of the segments that make up a track on a magnetic disk; a sector is the smallest amount of information that is read or written on a disk. Sectors are typically 512 to 4096 bytes in size.

Track:

One of thousands of concentric circles that make up the surface of a magnetic disk.

P + Q redundancy (RAID 6)

Parity-based schemes protect against a single self-identifying failure. When a single failure correction is not sufficient, parity can be generalized to have a second calculation over the data and another check disk of information. This second check block allows recovery from a second failure. Thus, the storage overhead is twice that of RAID 5. The small write shortcut of COD Figure 5.11.2 (Small write update on RAID 4) works as well, except now there are six disk accesses instead of four to update both P and Q information.

Distributed block-interleaved parity (RAID 5)

RAID 4 efficiently supports a mixture of large reads, large writes, and small reads, plus it allows small writes. One drawback to the system is that the parity disk must be updated on every write, so the parity disk is the bottleneck for back-to-back writes. To fix the parity-write bottleneck, the parity information can be spread throughout all the disks so that there is no single bottleneck for writes. The distributed parity organization is RAID 5.

Block-interleaved parity (RAID 4)

RAID 4 uses the same ratio of data disks and check disks as RAID 3, but they access data differently. The parity is stored as blocks and associated with a set of data blocks. In RAID 3, every access went to all disks. However, some applications prefer smaller accesses, allowing independent accesses to occur in parallel. That is the purpose of the RAID levels 4 to 7. Since error detection information in each sector is checked on reads to see if the data are correct, such "small reads" to each disk can occur independently as long as the minimum access is one sector. In the RAID context, a small access goes to just one disk in a protection group while a large access goes to all the disks in a protection group. Writes are another matter. It would seem that each small write would demand that all other disks be accessed to read the rest of the information needed to recalculate the new parity, as in the left inthe figure below. A "small write" would require reading the old data and old parity, adding the new information, and then writing the new parity to the parity disk and the new data to the data disk.

Raid summary

RAID summary RAID 1 and RAID 5 are widely used in servers; one estimate is that 80% of disks in servers are found in a RAID organization. One weakness of the RAID systems is repair. First, to avoid making the data unavailable during repair, the array must be designed to allow the failed disks to be replaced without having to turn off the system. RAIDs have enough redundancy to allow continuous operation, but hot-swapping disks place demands on the physical and electrical design of the array and the disk interfaces. Second, another failure could occur during repair, so the repair time affects the chances of losing data: the longer the repair time, the greater the chances of another failure that will lose data. Rather than having to wait for the operator to bring in a good disk, some systems include standby spares so that the data can be reconstructed instantly upon discovery of the failure. The operator can then replace the failed disks in a more leisurely fashion. Note that a human operator ultimately determines which disks to remove. Operators are only human, so they occasionally remove the good disk instead of the broken disk, leading to an unrecoverable disk failure. Hot-swapping: Replacing a hardware component while the system is running. Standby spares: Reserve hardware resources that can immediately take the place of a failed component. In addition to designing the RAID system for repair, there are questions about how disk technology changes over time. Although disk manufacturers quote very high MTTF for their products, those numbers are under nominal conditions. If a particular disk array has been subject to temperature cycles due to, say, the failure of the air-conditioning system, or to shaking due to a poor rack design, construction, or installation, the failure rates can be three to six times higher (see the fallacy in COD Section 5.16 (Fallacies and pitfalls)). The calculation of RAID reliability assumes independence between disk failures, but disk failures could be correlated, because such damage due to the environment would likely happen to all the disks in the array. Another concern is that since disk bandwidth is growing more slowly than disk capacity, the time to repair a disk in a RAID system is increasing, which in turn enhances the chances of a second failure. For example, a 3-TB disk could take almost 9 hours to read sequentially, assuming no interference. Given that the damaged RAID is likely to continue to serve data, reconstruction could be stretched considerably. Besides increasing that time, another concern is that reading much more data during reconstruction means increasing the chance of an uncorrectable read media failure, which would result in data loss. Other arguments for concern about simultaneous multiple failures are the increasing number of disks in arrays and the use of higher-capacity disks. Hence, these trends have led to a growing interest in protecting against more than one failure, and so RAID 6 is increasingly being offered as an option and being used in the field.

No redundancy (RAID 0)

Simply spreading data over multiple disks, called striping, automatically forces accesses to several disks. Striping across a set of disks makes the collection appear to software as a single large disk, which simplifies storage management. It also improves performance for large accesses, since many disks can operate at once. Video-editing systems, for example, frequently stripe their data and may not worry about dependability as much as, say, databases. Striping: Allocation of logically sequential blocks to separate disks to allow higher performance than a single disk can deliver. RAID 0 is something of a misnomer, as there is no redundancy. However, RAID levels are often left to the operator to set when creating a storage system, and RAID 0 is often listed as one of the options. Hence, the term RAID 0 has become widely used.

The Big Picture

The challenge in designing memory hierarchies is that every change that potentially improves the miss rate can also negatively affect overall performance, as the figure below summarizes. This combination of positive and negative effects is what makes the design of a memory hierarchy interesting.

Question 2: How is a block found?

The choice of how we locate a block depends on the block placement scheme, since that dictates the number of possible locations. We can summarize the schemes as follows: The choice among direct-mapped, set-associative, or fully associative mapping in any memory hierarchy will depend on the cost of a miss versus the cost of implementing associativity, both in time and in extra hardware. virtual memory systems almost always use fully associative placement.

Bit-interleaved parity (RAID 3)

The cost of higher availability can be reduced to 1/n, where n is the number of disks in a protection group. Rather than have a complete copy of the original data for each disk, we need only add enough redundant information to restore the lost information on a failure. Reads or writes go to all disks in the group, with one extra disk to hold the check information in case there is a failure. RAID 3 is popular in applications with large data sets, such as multimedia and some scientific codes. Protection group: The group of data disks or blocks that share a common check disk or block. Parity is one such scheme. Readers unfamiliar with parity can think of the redundant disk as having the sum of all the data in the other disks. When a disk fails, then you subtract all the data in the good disks from the parity disk; the remaining information must be the missing information. Parity is simply the sum modulo two. Unlike RAID 1, many disks must be read to determine the missing data. The assumption behind this technique is that taking longer to recover from failure but spending less on redundant storage is a good tradeoff.

Mirroring (RAID 1)

This traditional scheme for tolerating disk failure, called mirroring or shadowing, uses twice as many disks as does RAID 0. Whenever data are written to one disk, those data are also written to a redundant disk, so that there are always two copies of the information. If a disk fails, the system just goes to the "mirror" and reads its contents to get the desired information. Mirroring is the most expensive RAID solution, since it requires the most disks. Mirroring: Writing identical data to multiple disks to increase data availability.

summary

Virtual memory is the name for the level of memory hierarchy that manages caching between the main memory and secondary memory. Virtual memory allows a single program to expand its address space beyond the limits of main memory. More importantly, virtual memory supports sharing of the main memory among multiple, simultaneously active processes, in a protected manner. Managing the memory hierarchy between main memory and disk is challenging because of the high cost of page faults. Several techniques are used to reduce the miss rate: Pages are made large to take advantage of spatial locality and to reduce the miss rate. The mapping between between virtual addresses and physical addresses, which is implemented with a page table, is made fully associative so that a virtual page can be placed anywhere in main memory. The operating system uses techniques, such as LRU and a reference bit, to choose which pages to replace. Writes to secondary memory are expensive, so virtual memory uses a write-back scheme and also tracks whether a page is unchanged (using a dirty bit) to avoid writing clean pages. The virtual memory mechanism provides address translation from a virtual address used by the program to the physical address space used for accessing memory. This address translation allows protected sharing of the main memory and provides several additional benefits, such as simplifying memory allocation. Ensuring that processes are protected from each other requires that only the operating system can change the address translations, which is implemented by preventing user programs from altering the page tables. Controlled sharing of pages between processes can be implemented with the help of the operating system and access bits in the page table that indicate whether the user program has read or write access to a page. If a processor had to access a page table resident in memory to translate every access, virtual memory would be too expensive, as caches would be pointless! Instead, a TLB acts as a cache for translations from the page table. Addresses are then translated from virtual to physical using the translations in the TLB. Caches, virtual memory, and TLBs all rely on a common set of principles and policies. The next section discusses this common framework.

Question 4: What happens on a write?

We have already seen the two basic options: Write-through: The information is written to both the block in the cache and the block in the lower level of the memory hierarchy (main memory for a cache). Write-back: The information is written just to the block in the cache. The modified block is written to the lower level of the hierarchy only when it is replaced. Virtual memory systems always use write-back, for the reasons discussed in COD Section 5.7 (Virtual memory). Both write-back and write-through have their advantages. The key advantages of write-back are the following: Individual words can be written by the processor at the rate that the cache, rather than the memory, can accept them. Multiple writes within a block require only one write to the lower level in the hierarchy. When blocks are written back, the system can make effective use of a high-bandwidth transfer, since the entire block is written. Write-through has these advantages: Misses are simpler and cheaper because they never require a block to be written back to the lower level. Write-through is easier to implement than write-back, although to be realistic, a write-through cache will still need to use a write buffer.

Question 1: Where can a block be placed?

We have seen that block placement in the upper level of the hierarchy can use a range of schemes, from direct mapped to set associative to fully associative. As mentioned above, this entire range of schemes can be thought of as variations on a set-associative scheme where the number of sets and the number of blocks per set varies:

Question 3: Which block should be replaced on a cache miss?

When a miss occurs in an associative cache, we must decide which block to replace. In a fully associative cache, all blocks are candidates for replacement. If the cache is set associative, we must choose among the blocks in the set. There are the two primary strategies for replacement in set-associative or fully associative caches: Random: Candidate blocks are randomly selected, possible using some hardware assistance. Least recently used (LRU): The block replaced is the one that has been unused for the longest time. In practice, LRU is too costly to implement for hierarchies with more than a small degree of associativity (two to four, typically) In virtual memory, some form of LRU is always approximated

False sharing: .

When two unrelated shared variables are located in the same cache block and the full block is exchanged between processors even though the processors are accessing different variables

Disk memory

consists of a collection of platters, which rotate on a spindle fast. The metal platters are covered with magnetic recording material on both sides. To read and write information on a hard disk, a movable arm containing a small electromagnetic coil called a read-write head is located just above each surface. The entire drive is permanently sealed to control the environment inside the drive, which, in turn, allows the disk heads to be much closer to the drive surface.

6.8 a common framework for memory hierarchy

decent summary of

Given that a multicore multiprocessor means multiple processors on a single chip, these processors very likely share a common physical address space. Caching shared data introduces a new problem, because the view of memory held by two different processors is through their individual caches, which, without any additional precautions, could end up seeing two distinct values.

referred to as the cache coherence problem

What we call the Hamming distance is just the minimum number of bits that are different between any two correct bit patterns.

uses a parity code for detection

Set-associative cache:

A cache that has a fixed number of locations (at least two) where each block can be placed. (middle ground) called an n-way associative cache

Local miss rate:

The fraction of references to one level of a cache that miss; used in multilevel hierarchies. (miss rate of the secondary cache)

Temporal locality:

The locality principle stating that if a data location is referenced then it will tend to be referenced again soon.

Spatial locality:

The locality principle stating that if a data location is referenced, data locations with nearby addresses will tend to be referenced soon.

Block (or line):

The minimum unit of information that can be either present or not present in a cache.

Hit time:

The time required to access a level of the memory hierarchy, including the time needed to determine whether the access is a hit or a miss.

Miss penalty:

The time required to fetch a block into a level of the memory hierarchy from the lower level, including the time to access the block, transmit it from one level to the other, insert it in the level that experienced the miss, and then pass the block to the requestor.

The last component of a disk access, transfer time, is the time to transfer a block of bits.

The transfer time is a function of the sector size, the rotation speed, and the recording density of a track. Transfer rates in 2012 were between 100 and 200 MB/sec.

The principle of locality states that programs access a relatively small portion of their address space at any instant of time, just as you accessed a very small portion of the library's collection.

there are two types of locality

The other key aspect of writes is what occurs on a write miss. We first fetch the words of the block from memory. After the block is fetched and placed into the cache, we can overwrite the word that caused the miss into the cache block. We also write the word to main memory using the full address.

very slow though so we use a write buffer.

the big picture

Programs exhibit both temporal locality, the tendency to reuse recently accessed data items, and spatial locality, the tendency to reference data items that are close to other recently accessed items. Memory hierarchies take advantage of temporal locality by keeping more recently accessed data items closer to the processor. Memory hierarchies take advantage of spatial locality by moving blocks consisting of multiple contiguous words in memory to upper levels of the hierarchy. The figure below shows that a memory hierarchy uses smaller and faster memory technologies close to the processor. Thus, accesses that hit in the highest level of the hierarchy can be processed quickly. Accesses that miss go to lower levels of the hierarchy, which are larger but slower. If the hit rate is high enough, the memory hierarchy has an effective access time close to that of the highest (and fastest) level and a size equal to that of the lowest (and largest) level. In most systems, the memory is a true hierarchy, meaning that data cannot be present in level i unless they are also present in level i + 1.

To improve the interface to processors further, DRAMs added clocks and are properly called synchronous DRAMs or SDRAMs. The advantage of SDRAMs is that the use of a clock eliminates the time for the memory and processor to synchronize.

The fastest version is called Double Data Rate (DDR) SDRAM. The name means data transfers on both the rising and falling edge of the clock,

Hit rate:

The fraction of memory accesses found in a level of the memory hierarchy.

Miss rate:

The fraction of memory accesses not found in a level of the memory hierarchy. 1 - hit rate

Global miss rate:

The fraction of references that miss in all levels of a mutlilevel cache.

Although personal mobile devices like the iPad (see COD Chapter 1 (Computer Abstractions and Technology)) use individual DRAMs, memory for servers is commonly sold on small boards called dual inline memory modules (DIMMs).

To avoid confusion with the internal DRAM names of row and banks, we use the term memory rank for such a subset of chips in a DIMM.

summary

We began the previous section by examining the simplest of caches: a direct-mapped cache with a one-word block. In such a cache, both hits and misses are simple, since a word can go in exactly one location and there is a separate tag for every word. To keep the cache and memory consistent, a write-through scheme can be used, so that every write into the cache also causes memory to be updated. The alternative to write-through is a write-back scheme that copies a block back to memory when it is replaced. To take advantage of spatial locality, a cache must have a block size larger than one word. The use of a bigger block decreases the miss rate and improves the efficiency of the cache by reducing the amount of tag storage relative to the amount of data storage in the cache. Although a larger block size decreases the miss rate, it can also increase the miss penalty. If the miss penalty increased linearly with the block size, larger blocks could easily lead to lower performance. To avoid performance loss, the bandwidth of main memory is increased to transfer cache blocks more efficiently. Common methods for increasing bandwidth external to the DRAM are making the memory wider and interleaving. DRAM designers have steadily improved the interface between the processor and memory to increase the bandwidth of burst mode transfers to reduce the cost of larger cache block sizes.

Error detection code:

A code that enables the detection of an error in data, but not the precise location and, hence, correction of the error.

Tag:

A field in a table used for a memory hierarchy that contains the address information required to identify whether the associated block in the hierarchy corresponds to a requested word.

Valid bit:

A field in the tables of a memory hierarchy that indicates that the associated block in the hierarchy contains valid data. (when the computer starts up, the caches aren't valid)

Multilevel cache:

A memory hierarchy with multiple levels of caches, rather than just a cache and main memory. (work together In particular, a two-level cache structure allows the primary cache to focus on minimizing hit time to yield a shorter clock cycle or fewer pipeline stages, while allowing the secondary cache to focus on miss rate to reduce the penalty of long memory access times.)

Least recently used (LRU):

A replacement scheme in which the block replaced is the one that has been unused for the longest time.

Cache miss:

A request for data from the cache that cannot be filled because the data are not present in the cache.

Write-through:

A scheme in which writes always update both the cache and the next lower level of the memory hierarchy, ensuring that data are always consistent between the two.

Write-back:

A scheme that handles writes by updating values only to the block in the cache, then writing the modified block to the lower level of the hierarchy when the block is replaced. Write-back schemes can improve performance, especially when processors can generate writes as fast or faster than the writes can be handled by main memory; a write-back scheme is, however, more complex to implement than write-through.

Protection:

A set of mechanisms for ensuring that multiple processes sharing the processor, memory, or I/O devices cannot interfere, intentionally or unintentionally, with one another by reading or writing each other's data. These mechanisms also isolate the operating system from a user process.

In a SRAM, as long as power is applied, the value can be kept indefinitely. In a dynamic RAM (DRAM), the value kept in a cell is stored as a charge in a capacitor.

A single transistor is then used to access this stored charge, either to read the value or to overwrite the charge stored there. Because DRAMs use only one transistor per bit of storage, they are much denser and cheaper per bit than SRAM. As DRAMs store the charge on a capacitor, it cannot be kept indefinitely and must periodically be refreshed. That is why this memory structure is called dynamic, in contrast" to the static storage in an SRAM cell.

Memory hierarchy:

A structure that uses multiple levels of memories; as the distance from the processor increases, the size of the memories and the access time both increase.

Although the concepts at work in virtual memory and in caches are the same, their differing historical roots have led to the use of different terminology.

A virtual memory block is called a page, and a virtual memory miss is called a page fault.

A more serious problem associated with just increasing the block size is that the cost of a miss rises. The miss penalty is determined by the time required to fetch the block from the next lower level of the hierarchy and load it into the cache.

Although it is hard to do anything about the longer latency component of the miss penalty for large blocks, we may be able to hide some of the transfer time so that the miss penalty is effectively smaller.

Virtual address:

An address that corresponds to a location in virtual space and is translated by address mapping to a physical address when memory is accessed.

Page fault:

An event that occurs when an accessed page is not present in main memory.

While single or double bit errors are typical for memory systems, networks can have bursts of bit errors. One solution is called Cyclic Redundancy Check.

For a block of k bits, a transmitter generates an n-k bit frame check sequence. It transmits n bits exactly divisible by some number. The receiver divides the frame by that number. If there is no remainder, it assumes there is no error. If there is, the receiver rejects the message, and asks the transmitter to send again.

In earlier sections, we saw how caches provided fast access to recently-used portions of a program's code and data. Similarly, the main memory can act as a "cache" for the secondary storage, traditionally implemented with magnetic disks. This technique is called virtual memory.

Historically, there were two major motivations for virtual memory: to allow efficient and safe sharing of memory among several programs, such as for the memory needed by multiple virtual machines for Cloud computing, and to remove the programming burdens of a small, limited amount of main memory. Five decades after its invention, it's the former reason that reigns today.

When a miss occurs in a direct-mapped cache, the requested block can go in exactly one position, and the block occupying that position must be replaced. In an associative cache, we have a choice of where to place the requested block, and hence a choice of which block to replace.

In a fully associative cache, all blocks are candidates for replacement. In a set-associative cache, we must choose among the blocks in the selected set.

summary

In this section, we focused on four topics: cache performance, using associativity to reduce miss rates, the use of multilevel cache hierarchies to reduce miss penalties, and software optimizations to improve effectiveness of caches. The memory system has a significant effect on program execution time. The number of memory-stall cycles depends on both the miss rate and the miss penalty. The challenge, is to reduce one of these factors without significantly affecting other critical factors in the memory hierarchy. To reduce the miss rate, we examined the use of associative placement schemes. Such schemes can reduce the miss rate of a cache by allowing more flexible placement of blocks within the cache. Fully associative schemes allow blocks to be placed anywhere, but also require that every block in the cache be searched to satisfy a request. The higher costs make large fully associative caches impractical. Set-associative caches are a practical alternative, since we need only search among the elements of a unique set that is chosen by indexing. Set-associative caches have higher miss rates but are faster to access. The amount of associativity that yields the best performance depends on both the technology and the details of the implementation. We looked at multilevel caches as a technique to reduce the miss penalty by allowing a larger secondary cache to handle misses to the primary cache. Second-level caches have become commonplace as designers find that limited silicon and the goals of high clock rates prevent primary caches from becoming large. The secondary cache, which is often 10 or more times larger than the primary cache, handles many accesses that miss in the primary cache. In such cases, the miss penalty is that of the access time to the secondary cache (typically < 10 processor cycles) versus the access time to memory (typically > 100 processor cycles). As with associativity, the design tradeoffs between the size of the secondary cache and its access time depend on a number of aspects of the implementation. Finally, given the importance of the memory hierarchy in performance, we looked at how to change algorithms to improve cache behavior, with blocking being an important technique when dealing with large arrays

Ver todos los conjuntos de estudio

Conjuntos de estudio relacionados

CH21 Transfer of Title and Risk of Loss

(6.3) WPA2 and CCMP

Chapter 1 and 2 quiz

RG Exam Review: Crown & Bridge

Do Knot Enter -vocabulary words

ANFI 205 (GR 1-3) : SFIP Forms - Types of Standard Flood Insurance Policies and To Whom The Are Issued

history 2610 chapter 12

Avakin Quiz

Peds - GI Disorders

Hematological and Oncological Disorders

Types of Banking

Ecosystems

When you get blood drawn by a phlebotomist many of the collection tubes have heparin in them which increases the activity of antithrombin III. This chemical inactivates thrombin. Final Essay Question

computer architecture ch 6 memory heirarchy

Conjuntos de estudio relacionados

CH21 Transfer of Title and Risk of Loss

(6.3) WPA2 and CCMP

Chapter 1 and 2 quiz

RG Exam Review: Crown & Bridge

Do Knot Enter -vocabulary words

ANFI 205 (GR 1-3) : SFIP Forms - Types of Standard Flood Insurance Policies and To Whom The Are Issued

history 2610 chapter 12

Avakin Quiz

Peds - GI Disorders

Hematological and Oncological Disorders

Types of Banking

Ecosystems

When you get blood drawn by a phlebotomist many of the collection tubes have heparin in them which increases the activity of antithrombin III. This chemical inactivates thrombin. Final Essay Question

Finance Chapter 2

Mac you ****

Chapter 3

Real Estate Salesperson Exam

Clin Med - Exam 8 - Thyroid

MKT 335 Test 2

HROB 101 Ch.5 Stress