CSC 320 Zybooks

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Each load of laundry takes 4 × 30 = 120 minutes to wash, dry, fold, and store (30 minutes each). How many minutes are required to complete one load of laundry when multiple loads of laundry are done in a pipelined manner?

120 minutes

If laundry is done in a pipelined manner, execution is nearly 4 times faster than if done sequentially. Suppose doing 50 loads sequentially requires 6000 minutes. How long would those 50 loads take if done in a pipelined manner? Assume a 4 times speedup (ignore the fact that some stages are unused for the first few and last few loads).

1500 minutes

ARMv7 and ARMv8 added new registers for NEON that can be viewed as 32 8-byte wide registers or 16 _____-byte wide registers to support subword parallelism

16

The IEEE formed a committee in _____ to begin drafting a floating-point standard.

1977

The control unit sends _____ bits to the ALU control.

2

How many states are in the FSMs for controlling the following types of instructions? R-type completion

2 The first state in the FSM causes the ALU operation to occur and in a second state the result is written to the register file.

Each iteration of the division algorithm consists of _____ basic steps.

3

Each iteration of the multiplication algorithm consists of _____ basic steps.

3

The Multiplier register is _____-bits wide.

32

How many stages exist in the laundry analogy?

4

How many states are in the FSMs for controlling the following types of instructions? Memory reference instructions

4 The first state in the FSM is for memory address computation. Both read and write instructions require an additional memory access state in addition to the memory address computation. Memory read completion requires an additional step for memory read instructions.

The Multiplicand register is _____-bits wide.

64

The Product register is _____-bits wide.

64

If laundry is done sequentially, how many minutes do 60 loads take to wash, dry, fold, and put away?

7200 minutes

When MemToReg is 0, the data appearing at the register file's data input comes from the _____.

ALU'S Output

Which of the following instruction classifications occur most frequently in the execution of the SPEC CPU2006 floating-point benchmarks?

MIPS Core

Does the following cause a data hazard for the 5-stage MIPS pipeline? i1: add $s0, $s1, $s2 i2: add $s3, $s1, $s4

No

Does the following cause a data hazard for the 5-stage MIPS pipeline? i1: add $s3, $s3, $s4

No

The leading 1 indicates that the bit pattern below represents a negative number. 1010 ... 0010 1100

Not enough information to determine what the bit pattern represents

A calculation that leads to a number being too large to represent is called _____.

Overflow

In the above animation with just one state element, what happens just after a rising clock edge writes a new value to the element?

The logic calculates a new value that then waits at the state element's input for the next rising edge

A compiler for the C language never generates add, addi, or sub instructions. True or False?

True

A control hazard can be resolved via a stall.

True

A floating-point value represented in IEEE 754 is typically an approximation.

True

A parallel program containing floating-point arithmetic operations executing on 10 processors may produce a different result than the same program executing on 1,000 processors.

True

An inaccuracy in floating-point division cost Intel an estimated $500 million

True

Forwarding resolves some data hazards. Reordering resolves some others. If neither can resolve a data hazard, a stall may become necessary.

True

In Row 1, RegWrite is 1, meaning the register is always written for an R-type instruction.

True

In an effort to achieve similar floating-point accuracy, programmers converting code from the IBM 7094 to the S/360 replaced single precision declarations with double precision declarations.

True

In beq's Row 4, MemToReg is X because the value appearing at the register file's Write data input is irrelevant.

True

Intermediate calculations of floating-point numbers append two extra bits to improve rounding accuracy.

True

Kahan was honored with the Turing Award in 1989 for the benefits conferred upon the computing industry through standardizing floating point.

True

Multiple floating-point operands can be packed into a 128-bit SSE2 register.

True

Reduced floating-point accuracy and the high cost of converting code from the IBM 7094 to the S/360 led to the formation of a subcommittee of IBM mainframe users to propose improvements to the S/360 floating point.

True

Rounding may require the result to be normalized again.

True

Shifting the bits of an unsigned integer right by n bits divides the integer by 2n.

True

The "Increment or decrement" and "Shift left or right" hardware normalize the sum. True or False?

True

The 32-bit registers, called Hi and Lo, combine to form a 64-bit product register. True or False?

True

The MIPS add, addi, and sub instructions may result in an exception. True or False?

True

The MIPS addu, addiu, and subu instructions may result in an overflow. True or False?

True

The add immediate unsigned (addiu) instruction can be used to subtract a constant from a signed integer when a programmer doesn't care about overflow. A subtract immediate instruction is not available, thus adding a negative constant achieves the desired result. Ex: $s1 = $s2 + (-15) = $s2 - 15The operands may be signed, so the immediate field is sign-extended. The add immediate unsigned (addiu) instruction can be used to subtract a constant from a signed integer when a programmer doesn't care about overflow.

True

The design can read from two registers and write to one register during the same clock cycle.

True

The division hardware supports signed division. True or False?

True

The improved version of the division hardware halves the width of the adder and divisor. True False

True

The multiplication hardware supports signed multiplication. True or False?

True

The refined multiplication hardware halves the width of the Multiplicand register from 64-bits to 32-bits. True or False?

True

The register file always outputs the two registers' values for the two input read addresses.

True

The vector add instruction, VADD, includes support for 8-bit, 16-bit, and 32-bit integers. Operands can be signed or unsigned.

True

mulss, mulps, mulsd, and mulpd are valid x86 instructions.

True

ulp is a measure of accuracy in floating point numbers.

True

Can these instructions be reordered to avoid a pipeline stall? i1: add $s0, $s1, $s2 i2: add $s3, $s4, $s5 i3: lw $t1, 0($t0) i4: add $t3, $t1, $t4

Yes

Does the following cause a data hazard for the 5-stage MIPS pipeline? i1: add $s0, $s1, $s2 i2: add $s3, $s0, $s4

Yes Register $s0 is written in i1's stage 5 (Register write). i2 would already be in stage 4, but would have needed $s0 in stage 2 (Register read).

Which MIPS arithmetic instructions should be generated for byte and halfword arithmetic operations?

add, sub, mult, div

Complete the outer loop of the optimized C version of DGEMM:for (int i = 0; i < n; _____ )

i+=8

A single-cycle implementation is uncommon today because a single-cycle implementation _____.

is slower than a multi-cycle desgin

Which MIPS load instructions should be generated for byte and halfword arithmetic operations?

lb,lh

Each step of the multiplication algorithm shifts the Multiplicand register 1 bit to the _____.

left

Which MIPS store instructions should be generated for byte and halfword arithmetic operations?

sb, sh

If the Remainder is negative, a _____ is shifted into the least significant bit of the Quotient register.

0

A rising clock edge refers to the clock changing from _____.

0 to 1

The control unit's Branch output will be 1 for a branch equal instruction. However, the branch's target address is only loaded into the PC if the ALU's Zero output is _____. Otherwise, PC is loaded with PC + 4.

1

How many states are in the FSMs for controlling the following types of instructions? Branch instructions

1 A branch instruction can be completed in a single step. The branch target address computed in the previous state is used if the result of the branch condition is true.

How many states are in the FSMs for controlling the following types of instructions? Jump instructions

1 A jump completion requires a single state. A new PC is calculated by concatenating the lower 26 bits of the IR left-bit-shifted by 2, and the upper 4 bits of the PC.

Suppose all instructions could potentially execute with a 1 ns clock cycle, except a load instruction requiring 2 ns. Assuming each instruction runs one at a time, how long would 1 load instruction plus 39 other instructions take to execute in a single-cycle implementation using a 2 ns clock cycle?

80 ns

The ALU's top input always comes from the Read data 1 output of the register file. The ALU's bottom input can come from two possible places: The Read data 2 output of the register file, or the instruction's lower 16 bits, sign extended to 32 bits. Which control unit output select among those two places?

ALUSrc

In the above animations, state element 1 gets written with a new value on the first rising clock edge, causing the combinational logic to output a new value _____.

After some amount of time

A datapath element whose output values depend only on the present input values is called a _____ element.

Combinational

A _____ precision floating-point number is represented with two MIPS words.

Double

For a jump instruction, the control unit sets Jump to 1. The control unit must also set Branch to 0. True False

False

MIPS implementations tend to have numerous structural hazards.

False

The ALU adds the 64-bit Product and 32-bit Multiplicand, and then stores the result into the Product register. True or False?

False

The ALU is used for both branch on equal and jump instructions. True or False?

False

The multiply (mult) instruction ignores overflow, while the multiply unsigned (multu) instruction detects overflow True or False?

False

The 3000 waits at the instruction memory input for the next rising clock edge, at which time the instruction at address 3000 is read out.

False Because the instruction memory only reads, the instruction memory is like combinational logic. So the read begins as soon as the new address arrives, without waiting for a rising clock edge.

Because the register file is both read and written on the same clock cycle, any MIPS datapath using edge-triggered writes must have more than one copy of the register file.

False Edge-triggered state elements make simultaneous reading and writing both possible and unambiguous.

Floating-point hardware became standardized by the late 1950s.

False Floating-point hardware was included in many computers by the late 1950s. However, each implementation was different, so floating-point operations behaved differently on each computer.

Numerical software in the late 1950s was portable because the software consists entirely of computer-independent mathematical formulas.

False Floating-point operations produced different results depending on the computer that executed the code. The burden fell upon software developers to implement clever tricks to deliver results correct in all but the last several bits.

Floating-point hardware was included in the first computers.

False John von Neumann refused to include floating-point hardware in the computer he built at Princeton. The loss of memory capacity and the increased complexity were arguments against the inclusion of floating-point hardware.

In Row 1, the last two bits, ALUOp, are 10, meaning the ALU will perform an add function.

False Separate control logic, known as ALU control, sees the 10 and examines the funct field to determine which ALU function to perform. The control unit merely needs to inform the ALU control (via that 10) that the current instruction is an R-type instruction, and then lets the ALU control determine the result. Such separation simplifies the control unit's design.

The MIPS addu, addiu, and subu instructions can only operate on unsigned operands. True or False?

False Some people say these instructions' names are misleading. Those instructions indeed can operate on positive and negative numbers; the instructions just ignore overflow.

IEEE 754, released in 1985, is the most recent floating point standard.

False Standards must be revisited periodically for updating. IEEE Std 754-2019, released in 2019, is the most recent standard for floating point. The new standard includes additional types such as half precision and quad precision floating point representations.

Faster division, like faster multiplication, can be achieved by increasing the number of ALUs. True or False?

False The Divisor is subtracted from the Remainder in each step of the algorithm. The sign of the difference is not known beforehand, and is needed to determine the next step of the algorithm. Instead, faster division is achieved through prediction techniques that attempt to produce more than one bit of the quotient per step of the algorithm.

The small ALU is used to add the significands. True or False?

False The Small ALU determines which operand has the larger exponent and by how much. The exponent difference specifies how to pass the operands to the Big ALU, as well as how many positions the smaller operand is shifted to the right.

The YMM extension was introduced to support floating-point numbers represented in a 256-bit representation.

False The YMM registers pack additional floating point values into a single register, either eight 32-bit or four 64-bit floating-point values. The extension supports SIMD, or single instruction, multiple data, which strives to perform a larger number of operations in parallel.

Shifting the bits of a signed integer right by n bits divides the integer by 2n if the shifter extends the sign bit instead of shifting in 0s.

False The above example demonstrates shifting -5ten two positions to the right produces -2ten rather than the expected value of -1ten. In two's complement representation, the sign bit also contributes to a value's magnitude, so the same operator strength reduction techniques do not apply.

3001 will be waiting at the PC's input to be written on the next instruction fetch cycle.

False The adder adds 4, not 1, because each MIPS word is 4 bytes.

The 3000 waits at the adder input for the next rising clock edge.

False The adder is combinational logic, so the 3000 enters the adder logic without waiting for a rising clock edge.

The programmer must take care not to create a program that writes to a register during the same cycle that the same register is read.

False The programmer must take care not to create a program that writes to a register during the same cycle that the same register is read.

The improved version of the division hardware does not require a quotient to perform a divide operation. True or False?

False The quotient is still needed. The refined division hardware reduces the total hardware needed to perform the divide operation by combining the Quotient register with the right half of the Remainder.

The NEON multimedia instruction extension has little support for subword parallelism beyond data transfer instructions.

False The rising popularity of multimedia applications has led to support for data transfer, arithmetic, and logical/compare instructions. More than 100 instructions have been added to support subword parallelism.

The divide operations, div and divu, detect overflow and division by 0. True or False?

False The software must check the divisor to detect division by 0, as well as check the division operation for overflow.

The speedup comes from reducing the size of the registers and ALU. True or False?

False The speedup comes from shifting the operands and the quotient simultaneously with the subtraction.

The register file writes to one register on every rising clock edge.

False The write only occurs if the RegWrite input is 1.

The MIPS addu, addiu, and subu instructions may result in an exception. True or False?

False These instructions specifically do not result in an exception in case of overflow. The computer just continues executing as normal.

The release of the IEEE 754 standard made the implementation of floating point hardware simple for manufacturers.

False To implement IEEE 754 correctly demands extraordinarily diligent attention to detail; to make it run fast demands extraordinarily competent ingenuity of design.

After the address 3000 is read into the PC, the 3000 only propagates to the adder.

False When the 3000 is written into the PC, the 3000 propagates simultaneously to both the instruction memory and the adder.

Increasing the size of the _____ used to represent a floating-point number impacts the number's precision

Fraction

The control unit enables a write to the register file using the _____ signal.

RegWrite

Each step of the multiplication algorithm shifts the Multiplier register 1 bit to the _____.

Right

If forwarding cannot resolve a data hazard, the pipeline can be _____.

Stalled

A clock input is present on a _____ element.

State

A datapath element that has internal storage is called a _____ element.

State

A register is a _____ element.

State

A sequential element is another name for a _____ element.

State

If the Remainder is negative, then the original Remainder value is restored by adding _____ to the Remainder.

The Divisor

The single-cycle datapath conceptually described in this section must have separate instruction and data memories, because _____.

The processor operates in one cycle

A graduate student in aeronautical engineering used the IBM 7090 to simulate a new _____. After many simulations, the student was disheartened because the simulation predicted an abrupt onset of stall.

Wing design

The following causes a data hazard for the 5-stage MIPS pipeline i1: add $s0, $t0, $t1 i2: sub $t2, $s0, $t3 "Forwarding" can resolve the hazard by providing the ALU's output (for i1's stage 3) directly to the ALU's input (for i2's stage 3).

Yes i1's ALU result will still be written to $s0 stage 5. But i2's ALU operation need not wait for that write, instead using i1's ALU result directly, known as forwarding or bypassing.

_____ intrinsic uses the AVX instruction to load 8 double-precision floating point values .

_mm512_load_pd()

An ALU is a _____ element.

combinational

Goal: $s6 / $s7 (unsigned division) _____ $s6, $s7

divu

Rewriting programs that contain _____ arithmetic operations to execute in parallel may affect the result.

floating-point

MemWrite is 1 for Row 3 (SW), but is 0 for Row 2. The reason is because while a store word instruction writes to the data memory, a _____ instruction does not.

load word

Goal: Place the remainder resulting from the divide operation into register $a3. div $a0, $a1_____

mfhi $a3

Consider state element 2 in the above animation. A rising clock edge writes a new state element 1 value, serving as input to the combinational logic. If the logic requires 5 ns to output a new value, designers should ensure that rising clock edges are separated by _____.

more than 5 ns

Goal: $s6 × $s7 (signed mulitplication) _____ $s6, $s7

mult

Floating-point addition is _____.

no associative

Subword parallelism takes advantage of byte- and halfword-sized data by _____.

partitioning the adder to perform multiple operations in parallel

Compiling the optimized C code replaces most of the x86 floating-point instructions from a sd variation to a _____ variation of the instruction.

pd

For a jump instruction, the datapath forms the destination address by appending 00 to Instruction[25-0], and also _____ bits 31-28 of PC + 4.

prepending

The Multiplier register is removed and placed inside of the _____ register.

product

Each step of the division algorithm shifts the Divisor register 1 bit to the _____.

right

In processor X's pipeline, an add instruction in stage 3 should use the ALU. A branch instruction in stage 4 also should use the ALU. Both instructions cannot simultaneously use the ALU. Such a situation is a structural hazard.

true


Set pelajaran terkait

OCE1001 Ch 13 Biological Productivity and Energy Transfer

View Set

Development Across the Lifespan (Test #4)

View Set

Chapter 75 - Skin Disorders complete

View Set

Cell Biology - Final Exam Review

View Set

A&P Practice Quizzes 5-7 EXAM PREP

View Set