Ecen 350 post-Exam1/pre-Exam2 quizzes

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

In the previous Question, the number of instructions that will produce a speedup of exactly 1.9 is [N] ____ (quiz 14, fill in ONE blank)

76 EXPLANATION:

Assume the word size is 32 bits. For a 16 KiB cache, Number of index bits = ___ Number of tag bits = ___ Total number of bits per word = ___ Total memory needed by cache = ___ Hint: Recall that the cache needs to store the tag and the validity bit for each word. Enter the total memory needed by the cache in KiB (do not write "KiB"). (quiz 20, fill in FOUR blanks)

12 18 51 25.5 EXPLANATION:

The STUR instruction uses every pipeline stage during its execution. True False (quiz 13, true or false)

False EXPLANATION: The STUR instruction does not use the WB stage.

Based on your results in the previous Question, which factor is the most important for performance, in this case: a) Miss rate b) Miss penalty (quiz 22, single selection)

b) Miss penalty EXPLANATION:

Which implementation requires the least hardware (exclude muxes, for definiteness). a) Single-cycle b) Multicycle nonpipelined c) Pipelined (quiz 18, single selection)

b) Multicycle nonpipelined EXPLANATION:

Pipelining improves throughput at the potential cost of individual instruction latency. True False (quiz 13, true or false)

True EXPLANATION:

Small caches work to cover large memories because of the principle of ________. (quiz 19, fill in ONE blank)

locality EXPLANATION:

The _____ effect is created by collision misses and is the biggest drawback of direct mapped caches. It can be resolved by increasing either the _____ or the _____ of the cache, (quiz 21, fill in THREE blanks)

ping-pong associativity size EXPLANATION:

Forcing the compiler to deal with pipeline hazards can cause binary compatibility problems down the road, when new microarchitectures have different ________ lengths (quiz 16, fill in ONE blank)

pipeline

Caches work because programs tend to access the same data over and over within a given time period (temporal locality) and they tend to access nearby data with high probability (spatial locality). (quiz 19, true or false)

true EXPLANATION:

For all R-Type instruction, Read Data 2 is connected to the ALU. (quiz 11, true or false)

true EXPLANATION:

Having more states in a multicycle implementation allows for a faster clock rate. (quiz 18, true or false)

true EXPLANATION:

Stalling the pipeline is that same as inserting a NOP. (quiz 15, true or false)

true EXPLANATION:

The memory system heirarchy attempts to approximate the size of its largest component with the speed of its smallest. (quiz 19, true or false)

true EXPLANATION:

The primary driving reason that memory system heirarchies have multiple levels in modern machines is because of the difference in speeds between large main memorys and fast processors. (quiz 19, true or false)

true EXPLANATION:

A control hazard created by a branch can be seen as a data hazard on the PC register. (quiz 15, true or false)

true EXPLANATION: It amounts to reading the PC before it's written at a later stage by a previous branch instruction.

The compiler can be used to address data hazards. (quiz 16, true or false)

true EXPLANATION: True, however, it may not be as good as the hardware techniques we have discussed.

In a multi-cycle implementation, the time to complete one instruction may become worse than in a single-cycle implementation. True False (quiz 13, true or false)

true EXPLNATION: This is true in the unbalanced case.

For the next couple questions use the following code: ADD X1, XZR, 4 OuterLoop: ADD X2, XZR, 3 InnerLoop: ADD X2, X2, -1 BR1: CBNZ X2, InnerLoop ADD X1, X1, -1 BR2: CBNZ X1, OuterLoop Calculate the prediction accuracy of a one-bit branch predictor for the bne at BR1. Assume the predictor is initialized as taken (1). The answer should be formated as a decimal, so 20% accuracy should be represented as .2. (quiz 17, fill in the blank)

0.4 EXPLANATION:

For the code in Question 2, calculate the prediction accuracy of a two-bit branch predictor for the CBNZ at BR1. Assume the predictor is initialized as weakly taken (10). The answer should be formated as a decimal, so 20% accuracy should be represented as .2. (quiz 17, fill in the blank)

0.66 EXPLANATION:

Assume that the clock cycle period of the single-cycle implementation in the lecture is 1000 ps, but that the stages are unbalanced, with IF = 200 ps ID = 100 ps ALU = 150 ps MEM = 500 ps WB = 50 ps The clock rate of the single-cycle implementation is ____. The maximum clock rate of a multi-cycle implementation is ____. The maximum clock rate of a pipelined implementation is ____. Enter clock rates in GHz (do not write "GHz"). (quiz 14, fill in THREE blanks)

1 2 2 EXPLANATION:

The multicycle implementation discussed in class has 10 states, which are described below in RTL code. The states are not in order. Identify each state (as numbered in the current version of slide deck). State ___: A <= Reg [IR[25:21]] B <= Reg [IR[20:16]] ALUOut <= PC + (sign-extend (IR[15:0]) << 2) State ___: Memory[ ALUOut ] <= B State ___: Reg[IR[20:16]] <= MDR State ___: PC <= {PC[31:28],(IR[25:0]],2'b00)} State ___: MDR <= Memory[ ALUOut ] State ___: IR <= Memory[ PC ]PC <= PC + 4 Stat

1 5 4 9 3 0 6 7 2 8 EXPLANATION:

Under the same conditions as in Question 3, what would be the speedup of the pipelined implementation over the single-cycle implementation if the code contained: 6 instructions: ____ 12 instructions: ____ 36 instructions: ____ A very large number of instructions: ____ Note: the formula derived in class cannot be used here. (You need to obtain a different formula.) (quiz 14, fill in FOUR blanks)

1.2 1.5 1.8 2 EXPLANATION:

Based on your answer in the previous question, what is the maximum clock rate that can be used in this implementation? Enter your answer in GHz (do not write "GHz"). Round it off to two decimals. (quiz 12, fill in ONE blank)

1.49 EXPLANATION: From the previous question, we can see that the instruction that will require the maximum clock rate would be STUR's 670ps. To convert this to GHz, we do the following: 670 [ps] 1/(670*10^-12) [1/s] 10^9 Hz -> 1 GHz [(1) * (10^9)]/ (670*10^-12) = 1/(670*10^-3) = 1/0.67 [GHz] = 1.4925 GHz

A given program has the following instruction mix: Instruction Type | Frequency R-type | 35% LDUR | 34% STUR | 14% CBZ | 15% B | 2% The fraction of all instructions that use the instruction memory is: _____ The fraction of all instructions that use the register file is: _____ The fraction of all instructions that use the sign-extend unit is: _____ The fraction of all instructions that use the ALU is: _____ The fraction of all instructions that use the data memory is: _____ Enter you answers in percent. (quiz 11, fill in FIVE blanks)

100% 98% (everything but B) 65% (everything but R-type) 98% (everything but B) 48% (LDUR+STUR only) EXPLANATION:

Multilevel caching is an important technique to overcome the limited amount of space that a first-level cache can provide while still maintaining its speed. Consider the following system parameters: L1 cache hit time (clock cycles)1L1 cache miss rate7%L2 cache (direct mapped) hit time (clock cycles)12L2 cache (direct mapped) miss rate3.5%L2 cache (8-way set associative) hit time (clock cycles)28L2 cache (8-way set associative) miss rate1.5%Main memory access time (clock cycles)200 Find the performance of the following three cache hierarchies. (Notice that fractional clock cycles need to be rounded up to the next integer.) AMAT using only the L1 cache = AMAT using a hierarchy with the L2 direct mapped cache = AMAT using a hierarchy with the L2 8-way set associative cache = Which cache hierarchy is the best? (No need to enter a reply.) Repeat, supposing that a cheaper DRAM is used for the main memory, which has doubl

15 3 4 29 3 4 EXPLANATION:

In the following code ADD X19, X0, X1 LDUR X20, [X19, #10] STUR X0, [X21, #0] ADD X21, X0, X1 ORR X22, X0, X1 STUR X22, [X21, #20] SUB X22, X1, X2 ADD X23, X22, 1 STUR X22, [X22, #10] there are ____ EX hazards and ____ MEM data hazards. (quiz 15, fill in TWO blanks)

2 2 EXPLANATION:

For the next few questions, examine the following code to be executed on the ARM 5-stage pipeline: ADD X1, X2, X3 SUB X2, X1, X5 LDUR X8, [X5, 0] ADD X7, X8, X6 Assuming hardware hazard detection/stall but no forwarding hardware (need to insert NOPs to fix all hazards), what is the execution time for this code, if the clock rate is 5GHz? ____ If on the other hand, there is forwarding hardware, what is the execution time? ____ Enter the answer in nanoseconds, but do not write "ns" (quiz 16,

2.4 1.8 EXPLANATION:

Consider a CPU that does not handle data hazards (i.e., the programmer is responsible for addressing data hazards by inserting NOP instructions where necessary). Consider the following code. ADD X1, X2, X19 ADD X3, X1, X2 ADD X4, X1, X20 ADD X5, X1, X1 Assume that before execution, X1 = 11 X2 = 22 X19 = 5 X20 = 25 Suppose that the programmer was negligent and did not insert the required NOPs. The final value of register X3 would be ____. The final value of register X4 would be ____. The fin

33 36 54 EXPLANATION:

Assume the same unbalanced stages of the previous problem: IF = 200 ps ID = 100 ps ALU = 150 ps MEM = 500 ps WB = 50 ps Consider the code LDUR X9, [X0, 0x4] STUR X10, [X1, 0x8] ADD X11, X0, X1 CBZ X0, ADDR The execution time for the single-cycle implementation is ____. The execution time for the multi-cycle implementation is ____. The execution time for the pipelined implementation is ____. The speedup of the multi-cycle implementation over the single-cycle implementation is ____. The speedu

4000 8000 4000 0.5 1 EXPLANATION:

For the previous question, enter below the maximum hit rate that can be obtained by using prefetching (prepopulating the cache). Enter the answer in percent (do not write "%"). (quiz 20, fill in ONE blank)

62.5 EXPLANATION:

Assume the word size is 64 bits but the address is 32 bits. For a four-way associative 16 KiB cache with 1-word blocks, Number of index bits = ____ Number of tag bits = ____ Total number of bits per set (including data, tags, and validity bits) = ____ Total memory needed by cache = ____ Enter the total memory needed by the cache in KiB (do not write "KiB"). (quiz 21, fill in FOUR blanks)

9 20 340 21.25 EXPLANATION:

For the code in Question 6, assuming forwarding is available and that the compiler can reorder code (but only in ways that do not change the ultimate output), which of the following arrangements would achieve the best performance? A) ADD X1, X2, X3 LDUR X8, [X5,0] SUB X2, X1, X5 ADD X7, X8, X6 B) ADD X1, X2, X3 LDUR X8, [X5,0] ADD X7, X8, X6 SUB X2, X1, X5 C) ADD X1, X2, X3 ADD X7, X8, X6 SUB X2, X1, X5 LDUR X8, [X5,0] D) SUB X2, X1, X5 LDUR X8, [X5,0] ADD X1, X2, X3 ADD X7, X8, X6 (quiz 16)

A) ADD X1, X2, X3 LDUR X8, [X5,0] SUB X2, X1, X5 ADD X7, X8, X6 EXPLANATION:

When silicon chips are fabricated, defects in materials (e.g., silicon) and manufacturing errors can result in defective circuits. A very common defect is for one signal wire to get "broken" and always register a logical 0 or 1. This fault is called "stuck-at-0" or "stuck-at-1", respectively. Consider the ADD, LDUR, and CBZ instructions. The instruction that fails to operate correctly if the MemtoReg wire is stuck-at-1 is _____ The instruction that fails to operate correctly if the ALUSrc wire is stuck-at-0 is ____ The instruction that fails to operate correctly if the Reg2Loc wire is stuck-at-0 is ____ (quiz 13, fill in THREE blanks)

ADD LDUR CBZ EXPLANATION:

Consider the datapath with hazard-handling hardware. In this figure, if the InsertNOP control signal is 1, the mux selects the zero input, whereas if it is 0, the mux selects the normal control inputs. Consider the following code ADD X5, X2, X1 LDUR X3, [X5, #4] ORR X4, X5, X6 STUR X4, [X5, #0] Specify the control signals below. (Enter "x" for a don't care.) ADD in EX stage: PCWrite = ____. IFIDWrite = ____. InsertNOP = ____. ForwardA = ____. ForwardB = ____.

ADD in EX stage: PCWrite = 1 IFIDWrite = 1 InsertNOP = 0 ForwardA = 00 ForwardB = 00 LDUR in EX stage: PCWrite = 1 IFIDWrite = 1 InsertNOP = 0 ForwardA = 10 ForwardB = xx ORR in EX stage: PCWrite = 1 IFIDWrite = 1 InsertNOP = 0 ForwardA = 01 ForwardB = 00 STUR in EX stage: PCWrite = 1 IFIDWrite = 1 InsertNOP = 0 ForwardA = 00 ForwardB = 10 EXPLANATION:

With the previous question as warmup, consider now a different ISA, in which the sign-extender must consider two types of branch instruction formats: Conditional branch: opcode | offset | opcode 1 bits | 5 bits | 2 bits Unconditional branch: opcode | offset 1 bits | 7 bits Let the input bits be X7, X6, ..., X0 and the output bits be Y7, Y6, ..., Y0, as before. The bit in the first opcode (X7) encodes the instruction format. If X7 = 0, it's a conditional branch, while if X7 = 1, it's an unconditional one. Complete the logic for each output Yi in terms of inputs Xj below. Enter answers in the format "X0 AND X1" or "X0 AND NOT(X1)". Y0 = _____ OR _____ Y1 = _____ OR _____ Y2 = _____ OR _____ Y3 = _____ OR _____ Y4 = _____ OR _____ Y5 = _____ OR _____ Y6 = _____ OR _____ Y7 = _____ OR _____ (quiz 11, fill in SIXTEEN blanks)

Answer 1: X0 AND X7 | Answer 2: X2 AND NOT(X7) Answer 3: X1 AND X7 | Answer 4: X3 AND NOT(X7) Answer 5: X2 AND X7 | Answer 6: X4 AND NOT(X7) Answer 7: X3 AND X7 | Answer 8: X5 AND NOT(X7) Answer 9: X4 AND X7 | Answer 10: X5 AND NOT(X7) Answer 11: X5 AND X7 | Answer 12: X5 AND NOT(X7) Answer 13: X6 AND X7 | Answer 14: X5 AND NOT(X7) Answer 15: X6 AND X7 | Answer 16: X5 AND NOT(X7) EXPLANATION:

Which of the following is not a way to resolve a structural hazard? A) Stalling the pipeline. B) Adding a mux. C) Duplicating hardware. D) Inserting a NOP. (quiz 15, select one)

B) Adding a mux. EXPLANATION: Adding a mux does not fix the problem, as it doesn't allow both instructions to access the hardware component at the same time.

Cache block size (number of words per block) can affect both miss rate and miss latency. Assume two caches A and B that have miss rates: Block size | Miss rate 8 | 4% 16 | 3% 32 | 2% 64 | 1.5% 128 | 1% However, caches A and B have different miss penalties. Cache | Miss penalty (cycles) A | 20 x block size B | 20 + block size Fill in the following table with the AMATs in clock cycles. Notice that fractional clock cycles need to be rounded up to the next integer. Also notice that the hit time in each case is 1.0 clock cycle. Block size | AMAT Cache A | AMAT Cache B 8 | __ | __ 16 | __ | __ 32 | __ | __ 64 | __ | __ 128 | __ | __ Based on these results, what are the optimal block sizes? (In case of a tie, pick the smallest block size.) Optimal block size for Cache A = ____ Optimal block size for Cache B = ____ (quiz 22, fill in the table and 2 additional blanks, 12 total blanks)

Block size | AMAT Cache A | AMAT Cache B 8 | 8 | 3 16 | 11 | 3 32 | 14 | 3 64 | 21 | 3 128 | 27 | 3 Optimal block size for Cache A = 8 Optimal block size for Cache B = 8 EXPLANATION:

A TLB is like a small _____ for page table entries with good locality. (quiz 22, fill in ONE blank)

caches

________ memory has high capacity because each cell consists of a single transistor and a capacitor. As a result it must be regularly refreshed to maintain its state. (quiz 19, fill in ONE blank)

DRAM EXPLANATION:

Assuming Fetch has placed an instruction on the bus called I[31:0] (in Verilog notation), which wires should be connected to "Read Addr 2" on the register file for the proper execution of R-Type instructions. (quiz 11, fill in the blank)

I[20:16] EXPLANATION:

Under the same conditions as in Question 3, if you could split one of the stages in two, the one you would pick to maximize performance is: ____. There are now six stages. The new execution time for the single-cycle implementation is ____. The new execution time for the multi-cycle implementation is ____. The new execution time for the pipelined implementation is ____. The new speedup of the multi-cycle implementation over the single-cycle implementation is ____. The new speedup of the pipelined

MEM 4000 4500 2250 0.89 1.78 EXPLANATION:

During a STUR instruction execution, fill in the blanks below for the values of each control signal: Reg2Loc = ____ Branch = ____ MemRead = ____ MemtoReg = ____ ALUop = ____ MemWrite = ____ ALUSrc = ____ RegWrite = ____ Enter "x" for a don't care. Do not forget that ALUop is a 2-bit bus. (quiz 12, fill in EIGHT blanks)

Reg2Loc = 1 Branch = 0 MemRead = 0 MemtoReg = X ALUop = 00 MemWrite = 1 ALUSrc = 1 RegWrite = 0 EXPLANATION:

Enter the control signals for an unconditional branch instruction. Reg2Loc = ____ Uncondbranch = ____ Branch = ____ MemRead = ____ MemtoReg = ____ ALUop = ____ MemWrite = ____ ALUSrc = ____ RegWrite = ____ (quiz 14, fill in NINE blanks)

Reg2Loc = X Uncondbranch = 1 Branch = X MemRead = 0 MemtoReg = X ALUop = XX MemWrite = 0 ALUSrc = X RegWrite = 0 EXPLANATION:

In a single clock-cycle implementation, the latency of an instruction is the time that it takes for all signals to become valid on the critical path, i.e., longest path needed to be traversed through the datapath. The latency of a datapath component is the time necessary for its output to become valid after its inputs have become valid. The latency of an instruction is the sum of the latencies of the components on its critical path. Assume the following component latencies: PC | 10ps Instruction Memory | 80 ps Mux | 15 ps Adder | 25ps Control Unit | 40 Ps Register File Read | 50 ps Sign-Extend Unit | 30 ps Shift-Left | 2 ps AND Gate | 0 ps ALU control | 20 ps ALU | 60 ps Memory Read | 300 ps Memory Write | 400 ps The latency of an R-format instruction is ____ The latency of an LDUR instruction is ____ The latency of an STUR instruction is ____ The latency of a CBZ instruction is ____ Enter your answers in ps (do not write "ps"). (quiz 12, fill in FOUR blanks)

The latency of an R-format instruction is 285 The latency of an LDUR instruction is 530 The latency of an STUR instruction is 670 The latency of a CBZ instruction is 285 EXPLANATION:

Given a 16KB, 4-way set associative cache with 16-byte (2-word) blocks and a LRU replacement policy, fill in the table below, where the memory accesses are sequential, from the top to the bottom of the table. Assume 32-bit addresses. Enter the tag and index in hex (do not write "0x"). Enter "H" for a hit and "M" for a miss. Enter the miss rate in percent (do not write "%"). Word Address | Index | Tag | Hit/Miss ----------------+-------+-----+--------- 0x12345004 0x__ | 0x__ | __ 0x21345000 0x__ | 0x__ | __ 0x12345008 0x__ | 0x__ | __ 0x21345010 0x__ | 0x__ | __ 0x12346004 0x__ | 0x__ | __ 0x2222200E 0x__ | 0x__ | __ 0x111111003 0x__ | 0x__ | __ 0x22222003 0x__ | 0x__ | __ The miss rate is ____. (quiz 22, fill in the table and one additional blank, 25 total blanks)

Word Address | Index | Tag | Hit/Miss ----------------+-------+-----+--------- 0x12345004 | 0x00 | 0x12345 | M 0x21345000 | 0x00 | 0x21345 | M 0x12345008 | 0x00 | 0x12345 | H 0x21345010 | 0x01 | 0x21345 | M 0x12346004 | 0x00 | 0x12346 | M 0x2222200E | 0x00 | 0x22222 | M 0x111111003 | 0x00 | 0x11111 | M 0x22222003 | 0x00 | 0x22222 | H EXPLANATION:

A cache has 16 entries of one-word blocks. Fill in the table below. Enter the tag and index in hex (do not write "0x"). Enter "H" for a hit and "M" for a miss. Enter the hit rate in percent (do not write "%"). Word Address | Index | Tag | Hit/Miss ----------------+--------+----+---------- 0x1B4 | __ | __ | __ 0x12F | __ | __ | __ 0x337 | __ | __ | __ 0xEE4 | __ | __ | __ 0x12F | __ | __ | __ 0x1B4 | __ | __ | __ 0xEE4 | __ | __ | __ 0x337 | __ | __ | __ The hit rate is ___. (quiz 20, fill in the table and ONE additional blank, 25 total blanks)

Word Address | Index | Tag | Hit/Miss ----------------+--------+----+---------- 0x1B4 | 4 | 1B | M 0x12F | F | 12 | M 0x337 | 7 | 33 | M 0xEE4 | 4 | EE | M 0x12F | F | 12 | H 0x1B4 | 4 | 1B | M 0xEE4 | 4 | EE | M 0x337 | 7 | 33 | H 25 EXPLANATION:

Consider an ISA with a sign-extender unit that accepts an 8-bit branch instruction of the following format opcode | offset | opcode 2 bits | 4 bits | 2 bits and outputs an 8-bit sign-extended offset. Let the input bits be X7, X6, ..., X0 and the output bits be Y7, Y6, ..., Y0. Enter the value of each output Yi in terms of an input Xj. (Note: X0 and Y0 are the rightmost, least-significant bits.) Y0 = _____ Y1 = _____ Y2 = _____ Y3 = _____ Y4 = _____ Y5 = _____ Y6 = _____ Y7 = _____ (quiz 11, fill in EIGHT blanks)

X2 X3 X4 X5 X5 X5 X5 X5 EXPLANATION:

A microinstruction corresponds to: a) A state b) A machine cycle c) The instruction register d) A fast instruction (quiz 18, single selection)

a) A state EXPLANATION:

Which of the following is not a benefit of virtual memory? a) It improves memory system performance. b) It allows applications to be written as if they have the entire 2^64 bytes worth of memory available to them. c) It provides protection from one application interfering with others. d) It simplifies program loading by making relocation unnecessary in many cases. (quiz 22, single selection)

a) It improves memory system performance. EXPLANATION:

In the LDUR and STUR instructions, the ALU is used for the __________ calculation. (quiz 11, fill in the blank)

address EXPLANATION:

In the previous question, the hit rate cannot be made 100% due to: a) Compulsory misses b) Collision misses c) Capacity misses (quiz 20, single selection)

b) Collision misses EXPLANATION:

For a fixed-sized cache, as the block size increases, the miss ratio tends to a) always increase. b) always decrease. c) increase and then decrease. d) decrease and then increase. (quiz 21, single selection)

d) decrease and then increase. EXPLANATION:

For the pipeline in class where branches are resolved in the ID stage, statically predicting taken will produce better results than statically predicting not taken because taken branches are more common. (quiz 17, true or false)

false EXPLANATION:

SRAM memory must be refreshed to retain its state. (quiz 19, true or false)

false EXPLANATION:

The clock period of a multi-cycle (including pipelined) microarchitecture is set by the fastest stage. True False (quiz 13, true or false)

false EXPLANATION:

The registerfile does not produce any output on Read Data 1 and Read Data 2 for the B instruction. (quiz 11, true or false)

false EXPLANATION:

If there is only one memory for instructions and data, the programmer can avoid a structural hazard by inserting a NOP in the appropriate place in the code. (quiz 16, true or false)

false EXPLANATION: A structural hazard cannot be resolved this way, because even a NOP has to be read from the memory. This does not violate the principle that every hazard can be resolved by stalling, but in this case, the NOP needs to be inserted by the hardware.

A multicycle implementation includes an ALUOut register so that the next instruction is able to inspect the ALU result from the previous instruction. (quiz 18, true or false)

false EXPLANATION: ALUOut holds data for the current instruction and is invisible to the rest of the code.

Memory access is faster for bigger cashes. (quiz 22, true or false)

false EXPLANATION: Bigger cashes are slower to access.

DRAM and SRAM and CDROMs are random access memories. (quiz 19, true or false)

false EXPLANATION: CDROMs are block access.

The benefits of associativity grow with cache size. (quiz 22, true or false)

false EXPLANATION: Generally associativity has a greater benefit with smaller caches because there is greater aliasing of addresses to the same sets.

Increasing associativity of a cache decreases capacity misses. (quiz 22, true or false)

false EXPLANATION: It decreases conflict misses.

Memory access speed tends to be directly correlated to memory size. (quiz 19, true or false)

false EXPLANATION: It's inversely correlated with size, big memories are slower to access

LRU is the best policy for replacement in associative caches. (quiz 22, true or false)

false EXPLANATION: LRU is generally better than random replacement, but not always.

LRU is the optimal policy for replacement in associative caches. (quiz 22, true or false)

false EXPLANATION: LRU is generally better than random replacement, but not always.

Instructions which do not use a given pipeline stage may skip past it in the pipeline. True False (quiz 13, true or false)

false EXPLANATION: The instruction can not skip the stage because it would overlap with instructions ahead of it in the next stage.

Multi-level page tables reduce the time to access the page table versus a single unified page table. (quiz 22, true or false)

false EXPLANATION: They reduce the space, though they increase the access time because more memory accesses are required.

Physical memory is typically much larger than the program's virtual memory space. (quiz 22, true or false)

false EXPLANATION: Virtual memory can be as big as 2^(number of bits in the address), while physical memory is typically much smaller.

Hazards can always be resolved by stalling without loss of performance. (quiz 15, true or false)

false EXPLANATION: stalling always leads to loss of performance

A page _____ occurs when a given virtual page is not currently in physical memory, requiring OS intervention to pull it from disk. (quiz 22, fill in ONE blank)

fault

For the CBZ instruction, the ALU must be set to perform the "pass _________" operation. (quiz 11, fill in the blank)

input b EXPLANATION:

The next-state sequencer for multicycle control is often (especially in cases more complex than the one we discussed) implemented with a 2-step decoding process (recall the 2-step decoding process for ALU control). See the plot below. The main control unit generates a 2-bit control signal "AddrCtrl" to drive the mux, according to: 00 = next state is state 0 (new instruction)01 = "dispatch" next state with ROM 102 = "dispatch" next state with ROM 203 = next state = state + 1 Each dispatch ROM (R

state | CtrlAddr --------+-------------- 0000 | 03 0001 | 01 0010 | 02 0011 | 03 0100 | 00 0101 | 00 0110 | 03 0111 | 00 1000 | 00 1001 | 00 OpCode | ROM 1 Value | Rom 2 Value -----------+----------------+------------------ LDUR | 0010 | 0011 STUR | 0010 | 0101 R-format | 0110 | xxxx CBZ | 1000 | xxxx B | 1001 | xxxx EXPLANATION:

A page ______ is a structure used by the OS to maintain translations between virtual and physical pages. (quiz 22, fill in ONE blank)

table

Caches can work because programs tend to access the same data over and over within a given time period (temporal locality) and they tend to access nearby data with high probability (spatial locality). (quiz 22, true or false)

true EXPLANATION:

With virtual memory, main memory plays the role of a cache for the disk. (quiz 22, true or false)

true

A "NOP" instruction could be any instruction that does not affect the PC, register file, or the memory. (quiz 16, true or false)

true EXPLANATION:

A cache in which write misses are sent directly to the next level down in the hierarchy without a local store is a write-no-allocate cache. (quiz 22, true or false)

true EXPLANATION:


Set pelajaran terkait

Chapter 19: Variable Costing and Analysis

View Set

Chap 29 The Fetal Genitourinary System

View Set

Ch 9 Reproductive Physiology, Conception and Fetal Development

View Set

Exam 2, Intro to Chemistry-BIO 1111

View Set