Computer Architecture

Ace your homework & exams now with Quizwiz!

CPU execution time for a program (×) =

(CPU clock cycles for a program) × (Clock cycle time)

CPU execution time for a program (÷) =

(CPU clock cycles for a program) ÷ (Clock rate)

CPI =

(CPU clock cycles) ÷ (Instruction count)

Number of Cycles =

(Cycles per second) × seconds run

CPI =

(Execution time * clock rate) / (instruction count)

Of a doubleword's 64 bits, what is the leftmost bit numbered?

63. The numbering of the rightmost bit starts with 0, so the leftmost bit is numbered with 63 rather than 64.

In the LEGv8 architecture, each register is _____ bits wide.

64

Workload

A set of programs run on a computer that is either the actual collection of applications run by a user or constructed from real programs to approximate such a mix. A typical workload specifies both the programs and the relative frequencies.

Control signal

A signal used for multiplexor selection or for directing the operation of a functional unit; contrasts with a data signal, which contains information that is operated on by a functional unit.

Two's Complement

A signed number representation where a leading 0 indicates a positive number and a leading 1 indicates a negative number. The complement of a value is obtained by complementing each bit (0 → 1 or 1 → 0), and then adding one to the result (explained further below).

Overflow (floating-point)

A situation in which a positive exponent becomes too large to fit in the exponent field.

Load-use data hazard

A specific form of data hazard in which the data being loaded by a load instruction has not yet become available when it is needed by another instruction.

Register file

A state element that consists of a set of registers that can be read and written by supplying a register number to be accessed.

Branch address table

A table of addresses of alternative instruction sequences.

Address

A value used to delineate the location of a specific data element within a memory array.

Doubleword

Another natural unit of access in a computer, usually a group of 64 bits; corresponds to the size of a register in the LEGv8 architecture.

CPU execution time (CPU time)

The actual time the CPU spends computing for a specific task.

Branch target address

The address specified in a branch, which becomes the new program counter (PC) if the branch is taken. In the MIPS architecture the branch target is given by the sum of the offset field of the instruction and the address of the instruction following the branch.

CPI (or Cycles per instruction)

The average number of clock cycles per instruction for a program or program fragment. It is the multiplicative inverse of instructions per cycle.

T-F: A clock rate of 1 GHz corresponds to a period of 1 nanosecond, which is 1x10-9 seconds.

The clock period is the inverse of the clock rate, so 1 / (109) = 1x10-9, which is 1 nanosecond.

Opcode

The field that denotes the operation and format of an instruction.

Rn

The first register source operand

Clock rate

The frequency at which a chip like a central processing unit (CPU), one core of a multi-core processor, is running and is used as an indicator of the processor's speed. It is the inverse of the clock period

Stored-program concept

The idea that instructions and data of many types can be stored in memory as numbers and thus be easy to change, leading to the stored-program computer.

Little Endian

The least significant byte of the data is placed at the byte with the lowest address. The rest of the data is placed in order in the next three bytes in memory.

most significant bit

The leftmost bit in a LEGv8 doubleword

Clock period

The length of each clock cycle.

Spatial Locality

The locality principle stating that if a data location is referenced, data locations with nearby addresses will tend to be referenced soon.

Big Endian

The most significant byte of the data is placed at the byte with the lowest address. The rest of the data is placed in order in the next three bytes in memory.

Instruction count

The number of instructions executed by the program.

Latency (pipeline)

The number of stages in a pipeline or the number of stages between two instructions during execution.

Temporal Locality

The principle stating that if a data location is referenced then it will tend to be referenced again soon.

Program counter (PC)

The register containing the address of the instruction in the program being executed

Rd

The register destination operand. It gets the result of the operation.

Least Significant Bit

The rightmost bit in a LEGv8 doubleword.

Rm

The second register source operand

Clock cycle (tick, clock tick, clock period, clock, or cycle)

The time for one clock period, usually of the processor clock, which runs at a constant rate.

Response time (execution time)

The total time required for the computer to complete a task, including disk accesses, memory accesses, I/O activities, operating system overhead, CPU execution time, and so on.

Sign-extend

To increase the size of a data item by replicating the high-order sign bit of the original data item in the high-order bits of the larger, destination data item.

WB

Write back

Does every byte in memory have a unique address?

Yes. Thus, each byte is addressable. However, programmers usually access memory by words and doublewords.

CPU time (÷) =

[(Instruction count) × (CPI)] ÷ Clock rate

ALUOp

a 2-bit control field that indicates whether the operation to be performed should be add (00) for loads and stores, pass input b (01) for CBZ, or be determined by the operation encoded in the opcode field (10).

Multiplexor

a device which takes in multiple signals and outputs a single signal

base register

a register that holds an array's base address

sign and magnitude representation

a signed number representation where a single bit is used to represent the sign, and the remaining bits represent the the magnitude.

datapath element

a unit used to operate on or hold data within a processor. in LEGv8 implementation, the datapath elements include the instruction and data memories, the register file, the ALU, and adders

The PC is written once at the ( beginning | end ) of every clock cycle

end

(instructions per cycle) × (cycles per second) =

number of cycles

Number of Instructions =

number of cycles ÷ CPI

Dynamic branch prediction

prediction of branches at runtime using runtime information

shamt

shift amount

In a single-cycle implementation, the cycle length must be long enough to support the __________ instruction (probably load).

slowest

base address

the starting address of an array in memory

Flush

to discard instructions in a pipeline, usually due to an unexpected event

In a CB instruction, the offset is in ________, not ___________.

words, bytes. - Address = PC + (signextended offset * 4)

In a D instruction, the offset is in ________, not ___________.

words, bytes. - Address = PC + (signextended offset * 4)

In a B instruction, the offset is in _________, not _________.

words, bytes. - Address = PC + (signextended offset * 4)

R-type instruction format

| 11 bit opcode | 5 bit Rm | 6 bit shamt | 5 bit Rn | 5 bit Rd ********-all operands are registers Instructions: AND, ADD, ORR, SUB, LSL, ASR, EOR, ETC...

D-type instruction format

| 11 bit opcode | 9 bit address | 2 bit op2 | 5 bit rn | 5 bit rt *****-used by the data transfer instructions (loads and stores) Instructions: STUR, LDUR

B-type instruction format

| 6 bit opcode | 26 bit destination address Instructions: B

I-type instruction format

| 6 bit opcode | 5 bit rs | 5 bit rt | 16 bit immediate | (immediate field could be either a constant or an address offset) ********-two operands are registers and one register is a constant Instructions: ADDI, SUBI

CB-type instruction format

| 8 bit opcode | 19 bit destination address | 5 bits Rt Instructions: CBZ, CBNZ

IM-type instruction format

| 9 bit opcode | 2 bit quad | 16 bit immediate | 5 bit Rd Instructions: MOVZ, MOVK

Overall effective CPI =

∑(n→i)) (CPIₙ × ICₙ)

MEM

Data memory access

Program counter =

Register + Branch offset

Combinational element

An operational element, such as an AND gate or an ALU.

Execution time =

(Instruction count x CPI) / (Clock Rate)

CPU time (×) =

(Instruction count) × (CPI) × (Clock cycle time)

CPU clock cycles =

(Instructions for a program) × (Average clock cycles per instruction)

CPU time =

(Number of clock cycles) ÷ (clock rate)

3 types of hazards

-Structural -Data -Control

The two units needed to implement R-format ALU operations:

-register file -ALU

The four units needed to implement loads and stores:

-register file -ALU -data memory unit -sign extension unit (Wires/muxes are also needed)

The five units needed for a branch:

-register file -ALU -data memory unit -sign extension unit -adder (computes branch target address (sign extending, then shifting left by 2)) (Wires/muxes are also needed)

IM

-represents the instruction memory and the PC in the instruction fetch stage

REG

-stands for the register file and sign extender in the instruction decode/register file read stage (ID)

1 nanosecond (10⁻⁹) clock cycle => (in clock rate units)

1 GHz (10⁹) clock rate

The control unit's Branch output will be 1 for a compare and branch on zero instruction. However, the branch's target address is only loaded into the PC if the ALU's Zero output is _____. Otherwise, PC is loaded with PC + 4.

1. To branch when equal to zero, the register must be equal to 0. If it is, Zero is 1, meaning the value is equal to 0 and so the branch SHOULD be taken.

The input to the control unit is the __________________ field from the instruction

6- to 11-bit opcode

five steps to execute load instruction:

1.An instruction is fetched from the instruction memory, and the PC is incremented. 2.A register (X2) value is read from the register file. 3.The ALU computes the sum of the value read from the register file and the sign-extended 9 bits of the instruction (offset). 4.The sum from the ALU is used as the address for the data memory. 5.The data from the memory unit is written into the register file (X1).

four steps to execute cb-type instruction:

1.An instruction is fetched from the instruction memory, and the PC is incremented. 2.The register, X1 is read from the register file using bits 4:0 of the instruction (Rt). 3.The ALU passes the data value read from the register file. The value of PC is added to the sign-extended, 19 bits of the instruction (offset) are shifted left by two; the result is the branch target address. 4.The Zero status information from the ALU is used to decide which adder result to store in the PC.

Typical 5 steps taken by LEGv8 instructions:

1.Fetch instruction from memory. 2.Read registers and decode the instruction. 3.Execute the operation or calculate an address. 4.Access an operand in data memory (if necessary). 5.Write the result into a register (if necessary).

four steps to execute r-type instruction:

1.The instruction is fetched, and the PC is incremented. 2.Two registers, X2 and X3, are read from the register file; also, the main control unit computes the setting of the control lines during this step. 3.The ALU operates on the data read from the register file, using portions of the opcode to generate the ALU function. 4.The result from the ALU is written into the destination register (X1) in the register file.

For all R-type instructions, ALUOp is _______

10

Data hazards occur when (specific examples) ____

1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt

The control unit sends _____ bits to the ALU control.

2

Each doubleword consists of _____ bytes.

8. Each LEGv8 word has 64 bits, meaning 8 bytes (a byte is 8 bits).

Branch not taken

A branch where the branch condition is false and the program counter (PC) becomes the address of the instruction that sequentially follows the branch.

Branch taken

A branch where the branch condition is satisfied and the program counter (PC) becomes the branch target. All unconditional jumps are taken branches.

Edge-triggered clocking

A clocking scheme in which all state changes occur on a clock edge.

Data transfer instruction

A command that moves data between memory and registers.

Double precision

A floating-point value represented in a 64-bit doubleword.

Single precision

A floating-point value represented in a single 32-bit word.

Instruction format

A form of representation of an instruction composed of fields of binary numbers.

Instruction mix

A measure of the dynamic frequency of instructions across one or many programs.

State element

A memory element, such as a register or a memory.

Branch prediction

A method of resolving a branch hazard that assumes a given outcome for the branch and proceeds from that assumption rather than waiting to ascertain the actual outcome.

Word

A natural unit of access in a computer, usually a group of 32 bits.

Normalized

A number in floating-point notation that has no leading 0s.

Benchmark

A program selected for use in comparing computer performance

Amdahl's Law

A rule stating that the performance enhancement possible with a given improvement is limited by the amount that the improved feature is used. It is a quantitative version of the law of diminishing returns.

Basic block

A sequence of instructions without branches (except possibly at the end) and without branch targets or branch labels (except possibly at the beginning).

When MemToReg is 0, the data appearing at the register file's data input comes from the _____.

ALU's outpput. The mux controlled by MemToReg passes the ALU's output when MemToReg is 0. When 1, the data memory's output is passed instead.

Throughput

Also called bandwidth. Another measure of performance, it is the number of tasks completed per unit time

Control hazard

Also called branch hazard. When the proper instruction cannot execute in the proper pipeline clock cycle because the instruction that was fetched is not the one that is needed; that is, the flow of instruction addresses is not what the pipeline expected.

Branch prediction buffer

Also called branch history table. A small memory that is indexed by the lower portion of the address of the branch instruction and that contains one or more bits indicating whether the branch was recently taken or not.

Pipeline stall

Also called bubble. A stall initiated in order to resolve a hazard.

Forwarding

Also called bypassing. A method of resolving a data hazard by retrieving the missing data element from internal buffers rather than waiting for it to arrive from programmer-visible registers or memory.

Exception

Also called interrupt. An unscheduled event that disrupts program execution; used to detect overflow

PC-relative addressing

An addressing regime in which the address is the sum of the program counter (PC) and a constant in the instruction.

Von Neumann Architecture

An architecture for an electronic digital computer with these components: -A processing unit that contains an arithmetic logic unit and processor registers -A control unit that contains an instruction register and program counter -Memory that stores data and instructions -External mass storage -Input and output mechanisms

Harvard Architecture

An architecture with physically separate storage and signal pathways for instructions and data. These early machines had data storage entirely contained within the central processing unit, and provided no access to the instruction storage as data.

Don't-care term

An element of a logical function in which the output does not depend on the values of all the inputs. Don't care terms may be specified in different ways.

Interrupt

An exception that comes from outside of the processor. (Some architectures use the term interrupt for all exceptions.

Pipelining

An implementation technique in which multiple instructions are overlapped in execution, much like an assembly line. It increases the number of simultaneously executing instructions and the rate at which instructions are started and completed

Clock cycles per instruction (CPI) =

Average number of clock cycles per instruction for a program or program fragment.

1023

Bias for an 11-bit exponent (IEEE 754)

127

Bias for an 8-bit exponent (IEEE 754)

Machine Language

Binary representation used for communication within a computer system

Replacing a processor in a computer with a faster processor has what effect?

Both. A faster processor certainly means decreased response time. Plus, more tasks could be executed per minute (or other unit time), so throughput increases too. Decreasing response time almost always also increases throughput.

Adding additional processors to a system that uses multiple processors for separate tasks -- for example, searching the web -- has what effect? Assume that before adding processors, tasks often must wait to execute due to another task executing (tasks "queue up").

Both. If the demand for processing is almost as large as the throughput, the system might force requests to queue up. In this case, increasing the throughput could also improve response time, since it would reduce the waiting time in the queue. Thus, in many real computer systems, changing either execution time or throughput often affects the other.

B meaning

Branch

CB meaning

Conditional Branch

D meaning

Data Transfer

EX

Execution or address calcuation

T-F: Two instructions could possibly need to use the ALU simultaneously.

False. If an instruction uses the ALU, the instruction does so in the third cycle after the instruction starts. Since only one instruction starts at a time, no two instructions could possibly use the ALU in the same clock cycle.

T-F: The LEGv8 pipelined control approach determines all control line values during an instruction's 1st clock cycle, the instruction fetch stage.

False. The control signals only are used in the instruction's 3rd, 4th, and 5th cycles, so are generated in the instruction's 2nd cycle (instruction decode), thus avoiding storing those bits in the IF/ID register.

T-F: The store and load instructions behave similarly in stage 3 (EX: execute).

False. The store instruction differs from the load instruction by writing the register file's second read register's value into the EX/MEM register.

T-F: A smart branch predictor assumes the branch is not taken.

False. A smart predictor uses branching history to make better predictions.

T-F: For a given number of instructions, assume CPI is increased by 20%, and clock cycle time is decreased by 10%. The program execution time decreases.

False. 1 × (1 + 0.20) × (1 - 0.10) = 1.08 If instruction count remains the same, CPI increases by 20%, and clock cycle time decreases by 10%, then the program execution time increases by 8%.

T-F: Consider a rising clock edge that causes 3000 to be written into the PC. 3001 will be waiting at the PC's input to be written on the next rising clock edge.

False. 3001 will be waiting at the PC's input to be written on the next rising clock edge.

T-F: Consider a rising clock edge that causes 3000 to be written into the PC. The 3000 waits at the instruction memory input for the next rising clock edge, at which time the instruction at address 3000 is read out.

False. Because the instruction memory only reads, the instruction memory is like combinational logic. So the read begins as soon as the new address arrives, without waiting for a rising clock edge.

T-F: Overflow does not occur in unsigned integers.

False. Overflow can occur in unsigned integers, but since unsigned integers are commonly used for memory addresses, such overflows are often ignored.

T-F: Consider a rising clock edge that causes 3000 to be written into the PC. The 3000 waits at the adder input for the next rising clock edge.

False. The adder is combinational logic, so the 3000 enters the adder logic without waiting for a rising clock edge.

T-F: An instruction set is a particular program provided in the language of a computer.

False. The instruction set is the language itself, not a particular program. The language is like English, whereas a program is like a particular book written in English.

T-F: The register file writes to one register on every rising clock edge.

False. The write only occurs if the RegWrite input is 1.

T-F: Consider a rising clock edge that causes 3000 to be written into the PC. After the address 3000 is read into the PC, the 3000 only propagates to the adder.

False. When the 3000 is written into the PC, the 3000 propagates simultaneously to both the instruction memory and the adder.

(1, 2046) - 11 bits

For a double-precision number, the exponent is stored in the range _________

(1, 254) - 8 bits

For a single-precision number, the exponent is stored in the range ________

How many bits are used to hold a condition code?

Four extra bits hold a condition code, which record what happened during an instruction's execution.

Arithmetic Logic Unit (ALU)

Hardware that performs addition, subtraction, and usually logical operations such as AND and OR.

Load-use hazards occur when (specific examples) ____

ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt))

Five stages of instruction execution

IF: Instruction fetch ID: Instruction decode and register file read EX: Execution or address calculation MEM: Data memory access WB: Write back

I meaning

Immediate

Adding additional processors to a system that uses multiple processors for separate tasks -- for example, searching the web -- has what effect? Assume that before adding processors, tasks do not wait to execute (tasks do not "queue up").

Increases throughput. More tasks can be completed per unit time, because the additional processor can execute additional tasks.

IF

Ins

ID

Instruction decode and register file read

Relative Performance formula: If α is n times faster at something than β, then ______________

Performance(α) ÷ Performance(β) = n

The stored-program concept means:

Programs are stored in memory along with data

The control unit enables a write to the register file using the _____ signal.

RegWrite. When RegWrite is 1, a register will be written on the next rising clock edge. The data will either come from the ALU or data memory as determined by MemToReg.

R meaning

Register

User CPU time

The CPU time spent in a program itself.

System CPU time

The CPU time spent in the operating system performing tasks on behalf of the program.

T-F: A branch predictor assumes the branch is not taken.

True

T-F: Two instructions could possibly access the register file simultaneously.

True. An instruction might read the register file in the instruction's second clock cycle, and might write in the fifth clock cycle. Thus, two instructions might simultaneously access the register file, one reading, one writing. Such simultaneous reading and writing of the register file is supported.

T-F: Assume CPI and clock cycle time remain constant. Reducing the instruction count will reduce the program's execution time.

True. Execution time is directly proportional to instruction count, so instruction count and program execution time decrease together.

T-F: The most reliable method to evaluate performance is execution time.

True. Individually, performance metrics such as instruction count, CPI, and clock frequency can be flawed. Performance can be accurately reported only when all metrics are combined into execution time.

T-F: Instructions, as well as data, can be stored in memory as numbers.

True. Instructions can be represented as just 0's and 1's, leading to the basic concept of the "stored-program": Both instructions and data can be stored in memory as numbers.

T-F: A particular processor has a clock rate of 1 GHz. The clock thus ticks one billion times per second.

True. The G is short for Giga, meaning billion, or 109.

T-F: The register file always outputs the two registers' values for the two input read addresses.

True. The register file does not wait for a rising clock edge, nor any special control signals, to output those two registers' values.

Data race

Two memory accesses form a data race if they are from different threads to same location, at least one is a write, and they occur one after another.

Given the importance of registers, what is the rate of increase in the number of registers in a chip over time?

Very slow. Since programs are usually distributed in the language of the computer, there is inertia in instruction set architecture, and so the number of registers increases only as fast as new instruction sets become viable.

Data hazard

When a planned instruction cannot execute in the proper clock cycle because data that is needed to execute the instruction is not yet available.

Structural hazard

When a planned instruction cannot execute in the proper clock cycle because the hardware does not support the combination of instructions that are set to execute.

IM meaning

Wide Immediate

For a load or store instruction, the ALU should perform a(n) ______ to determine the memory address

add

SPEC (System Performance Evaluation Cooperative)

an effort funded and supported by a number of computer vendors to create standard sets of benchmarks for modern computer systems


Related study sets

free time activities (свободное время - глаголы)

View Set

PNU 120 Taylor PrepU Chapter 18: Evaluating

View Set

Biology Unit 7: Evolution & Classification

View Set

Foundations Of Business- Chapter 6; Mathis (TCU)

View Set