Computer Architecture final study

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

A main program calls a Power procedure using the instruction: jal Power. That instruction is at address 1000. What happens to $ra?

$ra is set to 1004

In which register do you put the value of service code before executing a "syscall" ?

$v0

Two processors, P1 and P2, execute the same instructions. The clock rate and CPIs are as specified below: Processor P1 P2 Clock Rate 2 GHz 3 GHz CPI 1.0 2.5 For processor P2, we are trying to reduce the time by 40% but this leads to an increase of 20% in CPI. What clock rate is required to achieve this time reduction?

6GHz

T/F The most common metric for measuring CPU performance is the time taken to execute a program - CPU Time. An alternative metric could be "Million instructions executed per second" or MIPS, defined as, MIPS = Instruction Count / (Execution Time*10^6) MIPS is a reliable metric to compare an Intel Pentium and a PowerPC processor which have different instruction set architecture.

False

T/F The directives in an assembly codes, e.g. .text, .data, .word, are translated into a corresponding machine code.

False

T/F The virtual address indicates the location of the page table in memory.

False

T/F There is a subtract immediate operation in MIPS ISA.

False

T/F You are given a single D flip-flop with data input d, clock input clk, and output q. d is 0, clk is 0, q is 0. clk changes to 1. Moments later, what is q? (True means 1, and False means 0)

False

T/F: A speedup of 20 on 50% of a program results in an overall speedup of at least 2 times.

False

For a given number of instructions, assume CPI is decreased by 20%, and clock frequency is increased by 60%. How much faster will a program execute on the processor with the new specification ?

The execution time decreases by 2 times.

When deciding on how to divide instruction execution into two pipeline stages, which of the following is the best strategy?

divide the instruction execution into two stage such that each stage takes about the same amount of time

A TLB should have _____ number of entries compared to the page table.

fewer

The longest possible datapath in a single cycle processor implementation is always,

load/store instruction.

Translate the C code snippet into MIPS code. Assume variable a is mapped to register s5, and the address of array x is in s8. Array is of type integer. a = x[10]

lw $t0, 40($s8)add $s5, $zero, $t0

When a page fault occurs, the operating system must determine where to put the requested page in

main memory

When hardware is accessed by reading and writing to the specific memory locations, then it is called

memory mapped I/O

To implement the following loop in C, what is the code that is missing, shown as instn1 and instn2, in the MIPS translation? C Code: for(i = 0; i < 100; i++) A[i] = i; MIPS: add $t0, $zero, $zeroaddi $t1, $zero, 100LOOP:instn1instn2addi $t0, $t0, 1bne $t0, $t1, LOOP

sw $t0, 0($s2) addi $s2, $s2, 4

The low order two bits of a jump address are always 00 because

the addresses are word aligned in MIPS.

The single-cycle datapath must have separate instruction and data memories, because

the processor operates in one cycle and cannot use a single-ported memory for two different accesses within that cycle

What is the minimum number of two-input NAND gates used to perform the function of two input OR gate ?

three

A cache memory needs an access time of 30 ns and main memory 150 ns, what is the average access time of CPU (assume hit ratio = 80%)?

60

Computer A has an overall CPI of 1.3 and can be run at a clock rate of 600MHz. Computer B has a CPI of 2.5 and can be run at a clock rate of 750 Mhz. We have a particular program we wish to run. When compiled for computer A, this program has exactly 100,000 instructions. How many instructions would the program need to have when compiled for Computer B, in order for the two computers to have exactly the same execution time for this program? (IC*CPI)/ClockRate+CPU Time(100,000*1.3)/600=(IC*2.5)/750IC=(750*100,000*1.3*2.5)/600IC=406,250

(IC*CPI)/ClockRate+CPU Time(100,000*1.3)/600=(IC*2.5)/750IC=(750*100,000*1.3*2.5)/600IC=406,250

List all the integer values (shown in decimal) that can be represented using exactly 2 bits in twos complement format.

-2,-1,0,1

Determine the cache index given the direct-mapped cache size and block address. Type the cache index as a binary value. Direct-mapped cache size: 8 one-word blocks Block address in binary format: 00011

011

What is the value of $t2 after the code below is executed? ori $t0,$0,255addi $t1,$t0,1sll $t2,$t1,2

0x0000 0400

Suppose we have a byte-addressable computer using fully associative mapping with 16-bit main memory addresses and 32 blocks of cache. If each block contains 16 bytes, determine the size of the tag field

12

You wrote a C application that takes 20 seconds using your desktop processor. An improved C compiler is released that requires only 0.5 as many instructions as the old compiler. The CPI however increased by 1.3. How fast will the application run when compiled with the new compiler?

13 seconds

You are given the following code snippet which executes on a 5-stage pipelined processor. How many cycles does the code take to execute, if no data bypassing is implemented ? addi $s1, $0, 10 lw $t0, 4($s0) srl $t1, $t0, 1 add $t2, $t1, $s1 sw $t2, 4($s0)

14

You are given a computer system with 32 bit virtual address space. The page size for memory management is 16 KB. The page table is designed to have 2 bytes per page table entry. What is the maximum size of the physical memory that can be addressed with this memory design ?

1GB

Suppose we have a byte-addressable computer with a cache that holds 8 blocks of 4 bytes each. Assuming that each memory address has 8 bits, to which cache block would the hexadecimal address 0x09 map if the computer uses direct mapping?

2

What is the value in register s0 after executing the following two lines of MIPS code. add $t0, $zero, 9 sll $s0, $t1, 2

2

The physical address of memory on a machine is 32 bits. The machine has a direct mapped cache of size 512 KB with a block size of 8 bytes. What is the size of the tag field in bits ?

23

Consider a machine with 64 MB physical memory and a 32-bit virtual address space. If the page size is 4KB, what is the number of entries in the page table ?

2^20

A 3-input function's truth table has 5 1's. How many AND gates will exist in a circuit derived directly from the equation that is derived from that truth table? (You may assume that AND gates can have more than 2 inputs)

5

Assume an operation can be divided into 1, 10, 30, or 50 pipeline stages with no overhead and the pipeline can be kept full. What should be the number of pipeline stages that would provide the best throughput?

50

You wrote a C application that takes 50 seconds using your desktop processor. You want your program to run faster. So you downloaded a new version of the compiler from an open source site. The new compiler requires only 0.7 as many instructions as the old compiler, but the CPI doubled. How long will the application take to run after compiling with the new compiler ?

70 Seconds

Our favorite program runs in 10 seconds on computer A, which has a 4 GHz. clock. We are trying to help a computer designer build a new machine B, that will run this program in 6 seconds. The designer can use new (or perhaps more expensive) technology to substantially increase the clock rate, but has informed us that this increase will affect the rest of the CPU design, causing machine B to require 1.2 times as many clock cycles as machine A for the same program. What clock rate should we tell the designer to target?

8 GHz

Suppose we have a byte-addressable computer using 2-way set associative mapping with 16-bit main memory addresses and 32 blocks of cache. If each block contains 8 bytes, determine the size of the tag field.

9

Which of the following statements about performance are true?

A program comprises of 50% multiplication operations, 30% add operations and remaining are memory operations. If the multiplication operation is made faster using better hardware design, then the overall performance will improve. If the operand for the next operation can be fetched from memory, while a previous addition is in progress, then this will improve performance. Parallelizing a code improves performance. Reduce the workload on a computer, and the power requirement will reduce proportionally.

You are given the following code snippet which executes on a 5-stage pipelined processor. How many cycles does the code take to execute, if no data bypassing is implemented ? addi $s1, $0, 10 lw $t0, 4($s0) srl $t1, $t0, 1 add $t2, $t1, $s1 sw $t2, 4($s0)

ALU sign-extension unit data memory shift-left unit

What types of information does the system bus carry ?

Address, control and data information.

The performance of a program on a processor depends on which of the following factors.

Algorithm, Programming Language, Compiler, ISA

What type of pipeline hazard does the following code snippet face in a 5-stage pipeline? add $1, $2, $3 lw $4, 8($1)

Data Hazard

T/F Consider a rising clock edge that causes 3000 to be written into the PC. 3001 will be waiting at the PC's input to be written on the next rising clock edge.

False

T/F For a given number of instructions, assume CPI is increased by 20%, and clock cycle time is decreased by 10%. The program execution time decreases.

False

T/F Given the C statement "if (i == j) f = g + h", the instruction needed to complete the MIPS code correctly is "beq" (assume variables are mapped to registers as: f->s0, g->s1, h->s2, i->s3, j->s4) _____ $s3, $s4, Exit add $s0, $s1, $s2 Exit:

False

T/F Given the following logical expressions: G = (A · B · C') + (A · C · B') + (B · C · A') H = B · (A · C' + C · A') The expressions G and H are equivalent.

False

T/F If the latency to access the main memory is relatively small with respect to accessing the cache memory, then a designer should choose larger cache block for better performance.

False

T/F Pipelining decreases CPU instruction throughput but reduces the execution time of each individual instruction.

False

Given a direct-mapped data cache with this configuration: INDEX: 13 bits, OFFSET: 7 bits, find out how many words are in a block, and how many bytes of data does the cache hold ?

Given a direct-mapped data cache with this configuration: INDEX: 13 bits, OFFSET: 7 bits, find out how many words are in a block, and how many bytes of data does the cache hold ?

Which of the following is correct for a D-type flip-flop?

The Q output is either SET or RESET as soon as the D input goes HIGH or LOW

Compiler can affect which of the following metrics?

Instruction Count and Cycles per Instruction

Suppose a program runs in 100 seconds on a computer, with multiply operations responsible for 80 seconds of this time. How much do I have to improve the speed of multiplication if I want my program to run five times faster?

Multiply operations must finish in 0 second to make the program 5 times faster

Consider two different implementations of the same ISA. There are four classes of instructions, Arithmetic, Store, Load, and Branch. The clock rate and CPI of each implementation are given in the following table. Clock Rate CPI-Arithmetic CPI-Store CPI-Load CPI-Branch P1 2.0 GHz 1 2 3 4 P2 2.5 GHz 2 2 2 2 Given a program with 10^6 instructions divided into classes as follows: 10% Arithmetic, 20% Store, 50% Load, and 20% Branch, which implementation is faster?

P2 is faster

Consider the following processors (ns stands for nanoseconds). Assume that the registers between pipeline stages have zero latency. Which processor has the highest peak clock frequency? P1: Four-stage pipeline with stage latencies 1 ns, 2 ns, 2 ns, 1 ns.P2: Four-stage pipeline with stage latencies 1 ns, 1.5 ns, 1.5 ns, 1.5 ns.P3: Five-stage pipeline with stage latencies 0.5 ns, 1 ns, 1 ns, 0.6 ns, 1 ns.P4: Five-stage pipeline with stage latencies 0.5 ns, 0.5 ns, 1 ns, 1 ns, 1.1 ns.

P3

The method of accessing the I/O devices by repeatedly checking the status flags in a register is

Programmed I/O

Assume a cache with 8 one-word blocks. Determine the cache position given the cache configuration and memory block. Cache configuration: Two-way set-associative Memory block: 15

Set #3

The contention for the usage of a functional unit is called

Structural hazard

T/F A floating point number is stored using single precision biased notation. If the number is converted to double precision format, then the absolute value stored in the exponent part of the double precision representation will change.

True

T/F A small cache can benefit from a fully associative design compared to a larger cache.

True

T/F Branch prediction is an optimization technique implemented in a compiler or a processor pipeline designed to reduce or eliminate stalls due to control hazards.

True

T/F Combinational circuits are typically faster than sequential circuits.

True

T/F Every virtual memory reference from the processor can cause a maximum of two physical memory accesses.

True

T/F If the clock rate is increased without changing the memory system, the fraction of execution time due to cache misses increases relative to total execution time.

True

T/F The following addition operation between two signed numbers leads to an overflow. 0 1 1 0 ... + 0 1 1 1 ... ------------- 1 1 0 1 ...

True

T/F: RISC based ISA is more likely to have instructions which take less clock cycles per instruction than that of CISC based ISA.

True

T/F: Replacing a processor in a computer with a faster processor decreased response time and increases throughput.

True

A main program will call a procedure Power for computing x^y. Currently, x is in $s0, y is in $s1. How might the program pass the parameter values to Power?

add $a0, $s0, $zero add $a1, $s1, $zero


Set pelajaran terkait

Introduction to IT - C182 WGU, Introduction to IT - C182 WGU, WGU C182 Introduction to IT

View Set

chapter 4 Which of the following statements best describes the boundary between the two tectonic plates: the Relative motion of the North American & the Pacific Plate

View Set

NREMT practice questions (missed)

View Set