CSC258 Midterm2

Ace your homework & exams now with Quizwiz!

Memory Access

# a. MEM (Memory Access) # b. This stage involves with reading the data from memory using the address from the EX/MEM pipeline register and load the data into the MEM/WB register # c. EX/MEM Register, MEM/WB register

Execute or address calculation

# a.EX (Execute or address calculation) # b. This stage involves with executing the instruction using the values from the ID/EX register and adding them using the ALU, # and place the value into the EX/MEM register # c. ID/EX Register, EX/MEM Register, ALU

Write back

# a.WB (Write Back) # b. This stage involves in reading the data from the MEM/WB register and writing the data back to the register file. # c. MEM/WB register, Register File

Cache Performance

# misses = (# of instruction * misses) / instructions

memory stalls

# of misses * miss penalty

Cache Sizes

* Each cachee has a finite size)store same max number of blocks + map a block to a a location where it is stored.

Exception

* Exception detected at the execution stage flush all instruction ID, IF and Ex - Load in the SPEC address to handle the exception handler to PC

Whole System

* Includes Main memory, RAM + Series of caches * L1 -> close to the processor - L2 -> further away and a bit bigger L3 -> there's one, off chip even larger * Further a cache is from processor, more it stores the size of the cache block shrinks as you get closer to the processor.

Increase the potential amount of instruction level parallelism

* Increase the depth of the pipeline to overlapped more instructions * Replicate the internal components of the computer (Launch multiple instructions in every pipeline stage: multiple issue• Launching multiple instructions per stage allows the instruction execution rate to exceed the clock rate: i.e., CPI < 1)

Examples of Locality

* Iterating over an array, exhibits both temporal + spatial loclaity * Exectuting code: Exhiits temporal + spatial locality * "Access items" from the dictionary does not: the items in dict -> may not close to each other in many

Forwarding (Bypassing)

* No need to wait for values to be written back * Create Additional connections in datapath to allow recently computed values to be used. Values of the execution stage pass to the string after ID stage of the second instruction.

Memory I/O Devices

* Portions of the address space are assigned to I/O devices * Write to the address are interpresed as command to the I/O device * Read address, processor receive input from the device, from the project, we checked these addresses as we needed them. When I/O is sporatdic, an interrupt is sent to inform the OS that input has been received and is ready be read * Triggers the sam exception handler as before, data in SCAUSE Register allows the handler to identify which device needs attention

tag

* Rest of the address, check if a block in cache is the right block

vetored interrupts

- has this vectored table, containing interrupts/exception, - starting address should be executed -> cause register index through the table for SPEC

Hazard Cost

- insert stalls that keep the pipeline from being filled, but could tank our performance

2's complement

- invert bit and add 1

Addresses + Caches

- Every Load of a value at an address fetches an entire cache block -> not just a single value * size of the cache block -> dependent on cache - block = set of words with closely related data

Pipelining improves throughput. Doesn't decrease the time to complete one load of laundry, but with many loads this improvement results in less time to complete the work

- more loads, more apparent the performance increase

Ecall Exception

- the address of the instruction that triggered the exception will be saved i the supervisor exception will be saved i the supervisor exception program (SEPC) first. - Then, the processor is placed in the supervisor mode (transfer control from user mode to a dedicated location in supervisor code space) -To return user mode form the exception, use the supervisor exception -> return sret , which resets to the user mode sret jumps to the address in SEPC

Taken/ not Taken 2 bit predicator

0 (predict notaken) -> 1 not taken -> taken -> not taken

AlUSrc

0: for register values 1: for immediate values * immediates/register values of the immediates

PCSrc

0: other instructions, 1: for branch instructions

Steps to handle exceptions in RISCV

1. Pause + save current location in running user process (Like a function call, store the PC SEPC register and save register that will be modified(stack)) 2. Store the cause of the exception/ interrupt we use and scause register 3. Invoke a handler that deal with the issue (A single handler is used to handle all exceptions)

How to Handle an Exception

1. Pause and save current location in running user process• Like a function call! Store the PC (SEPC register) and save registers that will be modified (stack) 2. Store cause of the exception/interrupt (We use an SCAUSE register) 3. Invoke a handler that will deal with the issue (A single handler is used to handle all exceptions)

Single Handler Duties

1. Set up any resources required by the handler (or subsequent code) 2. Read the SCAUSE register 3. If the cause allows the user program to restart ... • Handle the error or transfer to code that can• Use the SEPC to return to program4. Else ...• Terminate program and report the error

example

16 32 byte blocks: 4 bits (2^4 = 16) bits from the address as the index Block size = 32 bytes 2^5 block offset tag = 32 -4 -5 = 23 for the tag

Types of Data Hazard

1a. EX/MEM RegisterRd = ID/EX. RegisterRs1 1b. EX/MEM. RegisterRd = ID/EX.RegisterRs2 2a. MEM/WB. RegisterRD = ID?Ex.RegisterRs1 2b. MEM/WB.RegisterRd = ID/Ex. RegsiterRs2 * ex. addi x7, x3, 42 sub x6, x3, x2 add x7, x7, x6 (2a) -> instruction a part

SEPC

A 64 bit register used to hold the address of the affected instruction (Such a register is needed even when exceptions are vectored) - Scause: A register used to record the cause of the exception.

Control Signals for the instruction type R-format

ALUSrc MemToReg RegWrite MemRead Memwrite (0, 0, 1, 0 ,0) Register set up Instruction Memory -> register file -> mux select from register of immediate -> ALU -> Another mux -> Set up register file

beq

ALUSrc MemToReg RegWrite MemRead Memwrite (0, x, 0, 0 , 0) - fetch instruction form instruction memory (PC) - read source operand (rs1, rs2) from RF (REgister file) * ALUSRC = 0 - subtraction between two register ALU - if branch is taken - PC + immediate - if branch not taken - PC + 4 * register set up - instruction memory - register file Mux ALu adder mux write back register file

lw

ALUSrc MemToReg RegWrite MemRead Memwrite (1, 1, 1, 1 ,0) - fetch instruction form instruction memory (PC) - read source operand (rs1) from RF (REgister file) - extend the immediate (ALUSRC) - compute the memory Address (ALU) - Address want to load into the data memory - Write the address data back to the. register file - PC + 4 next instruction Register set up Instruction Memory -> Register File -> Mux selected value form register 2 or immediate -> ALU -> Signal go to data memory -> mux + setup delay to register file write back register file

sw

ALUSrc MemToReg RegWrite MemRead Memwrite (1, x, 0, 0 , 1) sw x6, 8(x9) -> in the mem stage, the address of x6 is stored, and the x6 is writing into the memory of x9 Register set up Instruction memory Register FIle ALU Mux Data memory

Single Cycle processor time

Add all stages up

Elements in Single Cycle processor's datapath

Adder Immediate generation unit Instruction memory Data memory Multiplexer Program counter register ALU Registers/register file

Ch.5

Data Path

Interrupt

An exception that comes from outside of the processor. (Some architectures use the term interrupt for all exceptions.)

Exceptions

An unscheduled event that disrupt program execution; used to detect undefined instructions

ALU

Arithmetic and Logic Unit - does all mathematical calculations and makes all logical decisions

Cost Scheduling

Code Scheduling to Avoid Stalls • Can reorder code to avoid use of load result in the next instruction • C code for a = b + e; c = b + f

Exceptions: a Control Hazard

Consider malfunction on add in EX stage ...add x1, x2, x1 • Must prevent x1 from being clobbered • Must complete previous instructions • To do so, flush add and subsequent instructions - but keep previous The steps required are similar to a mispredicted branch

Control Hazard

Deciding on control action depends on previous instruction Happens with conditional branches as once the branch instruction is received we do not know the next instruction to be executed until we get the outcome of the branch (PC + incrementation) add x4, x5, x6 beq x1, x0, 40 or x7, x8, x9 * stall again between the beq and or (one nop) to execute correctly. For the Third instruction fetch stage line up with the second instruction instruction ALU stage.

Datapath with Hazard

Detection Hazard detection unit is placed in ID stage. Here, it can easily introduce a bubble by zeroing out theID/EX registers

CH.08

EXCEPTIONS

Exceptions and Interrupts

Exceptions are "unexpected" (unpredictable) events requiring non-user code to berun.• Different ISAs use the terms exception and interrupt differently • Exceptions: generally come from within the CPU (syscall, floating point error, ...)• Interrupts: generally generated by an external device Handling exceptions without sacrificing performance is challenging!

Block Offset

Find the right location inside the cache

Sign and Magnitude

First bit is a sign bit 0 for Positive 1 for Negative

How to stall the pipeline

Force control values in ID/EX register to 0 • Prevent update of PC and IF/ID register• This results in the current instruction being decoded again in the next cycle

Which of the following are events which may cause an exception as RISC-V defines it? Request from an I/O device Hardware error System reset ecall instruction Illegal (undefined) instruction

Hardware error System reset ecall instruction Illegal (undefined) instruction

RISC-Instruction stages

IF: Instruction fetch from memory. ID: Instruction decode & register read. EX: Execute operation or calculate address MEM: Access memory operand WB: Write result back to register

Spatial Locality

If one object in memory is accessed, object close to it will also be accessed

Temporal Locality

If one object is accessed (it and the object around it) will be accessed again soon

Interrupts, saved addresses

In , portions of the address space are assigned to I/O devices. Writes to those addresses are interpreted as to the I/O device (recall that writing to specific addresses in your project turned LEDs on or off!). Reads at those addresses allow the processor to receive input from the device. In our project, we checked these addresses as we needed them, but when I/O is sporadic, an is sent to inform the operating system that input has been been received and is ready to be read. This triggers the same exception handler as before, but the data in the register allows the handler to identify which device needs attention.

Branch Prediction

In a deep pipeline, the stall penalty is too high • Branches are common instructions! • So ... let's predict the outcome of the branch • If the prediction is wrong, then it's no worse than stalling.• And if the prediction is correct, there is no penalty .• Easiest prediction: not taken• Just fetch instruction after branch, with no delay The compiler can make not taken work pretty well. • Code can be built to use only unconditional jumps and branches that are likely to not be taken .• But the cost of a missed prediction is very high on a deep pipeline.• Modern pipelines are typically 10-14 stages (with a max of 31!)• So there has been a lot of work on improving branch prediction.

Handling exceptions

In a single-cycle processor, an exception is fairly easy to handle. The current instruction is cancelled, appropriate exception data is stored, and the next instruction to be executed is the start of the exception handler. In a pipelined implementation, however, exceptions are a form of . When an exception is received, all instructions after the offending instruction must be . Then, the PC that is saved is not the "current" PC but rather the address of the first instruction that must be re-executed.

Data Hazard

Need to wait for previous instruction to complete its data read/write ex: add x19, x0, x1 sub x2, x19, x3 need two nop in between need the second instruction ID cycle match with the WB cycle. (first half of the WB cycle is writing the data back, second half of the ID cycle for instruction2 is reading that instruction data) So 2 nop for getting the information back

An ILP Analogy

Pipelined laundry: overlapping execution improves performance Completing a load oflaundry requires foursteps. Each step utilizesdifferent hardware.n Key insight: we can runfour loads at once if thehardware is fullyutilized. Four loads:n Speedup= 8/3.5 = 2.3n Non-stop: n Speedup= 2n/(0.5n + 2.0) ≈ 4(for large n)= number of stages Pipeline 4 times faster than as it approaches to infinity

CH6.

Pipelining

One Handler vs. Many

RISC-V uses a single exception handler, but that's not the only possibility • Other systems use Vectored Interrupts • Here, the handler address is determined by the cause • To call the correct a function, a vector table is maintained, where a handler is registered for every category of exception.• The cause register is used to index into this table to invoke the correct handler.

Tag

Rest of the address, check if a block in cache is in the right block

Caching

Solution to make the memory appear closer than it is. * Stores the value that was loaded and the value near it, in case they needed soon.

Memory System

Source of delay: size + distance, memmory is large and it is too far away

The Bottom Line: Stalls and Performance•

Stalls reduce performance, so we avoid them at all costs .• The addition of hardware to detect hazards and forward data can help. • In some situations, we rely on the compiler (or even morecomplex hardware) to rearrange the instruction stream

linker

Takes all independently assembled machine language programs and "stitch" them together (create a executable program* used to create a executable program 1. place the code and data modules symbolically in memory 2. Determine the addresses of data and instruction labels3. patch both the internal and external references

Control Unit

Takes instruction to be executed as input Used to determine how to set control lines for functional units (register file, ALU and memories) and two of the multiplexors Third multiplexor (top) is driven by a combo of the unit and the output of the Zero line of ALU (performs comparison of beq instruction and determines whether next instruction js just PC+4 or PC+label (for branches

Pipelining

Technique where multiple instructions are overlapped during execution

Forwarding Paths

The forwarding unit detects a hazard condition. It emits a control signal to change the value selected by the multiplexer.

Hardware provides two execution mode

The hardware must provide at least two execution modes: one for user-level execution and the other for -level execution. To switch between the two modes, a exception is performed by invoking the assembly instruction. This exception causes the current PC to be saved in the register, for the processor to be placed in supervisor mode, and for the PC to be set to the start of the exception handler.

Handling an exception

To handle an exception, the operating system must know why the exception occurred. RISC-V uses an register. When the exception occurs, a code indicating the source of the exception is stored in the register. The same exception handler code starts each time; it inspects the register to determine what action to take. Other systems use interrupts. In this scheme, different exception handlers are invoked depending on the cause of the exception.

Dynamic branch prediction

Track historical branch behaviour and based on that history• Use counters to track "taken" vs. "not taken"

IF multiple exceptions happens at the time.

We could have multiple exceptions at once ... • A pipelined processor has more than one instruction in flight • Or an external interrupt could occur while an exception is occurring.

Structural Hazard

When a planned instruction cannot execute in the proper clock cycle because the hardware does not support the combination of instructions that are set to execute. * A required Resource is busy • In RISC-V pipeline with a single memory• Load/store requires data access • Instruction fetch would have to stall for that cycle• Hence, pipelined datapaths require separate ports for instruction/data access • Might be implemented as instruction/data caches

Instruction decode and Register Read

a. ID (Instruction decode and Register Read) b. This stage involves with decoding the instruction and reading the register values. c. IF/ID Pipeline Register, ID/EX Register (stored PC values, and register values), Register File

Fetching instruction (piplined)

a. IF(fetching the instruction) b. This stage is fetching the instruction, and the PC address is save in the IF/ID register. c. PC, IF/ID Pipeline Register

Multiplexor

alternate data sources are used for different instructions - is a device that receives multiple input signals and conveys that input to a single output signal

Compiler Tool Chain

c program -> Compiler Assembly language program -> Assembler Object: Machine Language Module, Object: Library routine(machine language) Linker: -> Machine language program -> Loader memory. Covert high level program into executable- compiler covert C-program into assembly program- assembler convert assembler prorgram into machine language module- multiple piece of machine code are composed using a linker/link editor to create an executable program

Ideal speed up

clock cycle(piplined) = clock Cycle(non-pipedlined)/Number of stages - speedup due to increased throughput, (latency)(time for each instruction does not decrease) - Number of instructions we can execute in a unit of time does increase

Assembler

convert assembly program into machine language code

Single Cycle data path

does an instruction in one clock cycle

Loader

executable place in memory - reads the executable file header to determine size of text and data segments - create address space large enough(allocation) - Copies Instructions and data from exec file into memory- Copies parameters, if any to main program onto stack initiallize process registers and d set the stack point to first location Ranged of Unsigned bits

Set

index in cache a block placed

set

index in the cache a block

ALUOp

opcode that tells what specific operation it is for the ALU

I type

register set up instruction memory register file mux ALU mux write back to register file

Pipeline five stage processor cycle time

take the longest stage

Instruction Level Parallelism

the set of techniques and designs that enable parallel execution of instructions in an architecture. simultaneous execution of instructions from a single thread of execution in a program. the opportunity to execute multiple instructions in a program simultaneously due to a lack of dependence between the instructions.

Bit Mask

value can be used to turn specific bits in a bit vector on or off

Pipelining and ISA Design

• All instructions are 32-bits• Easier to fetch and decode in one cycle• Few and regular instruction formats• Can decode and read registers in onestep• Load/store addressing• Can calculate address in 3rd stage,access memory in 4th stage

Control Hazards

• Branches change the next instruction to execute • As a result, the pipeline can't always fetch correct instruction... or even know what the correct instruction to fetch will be .• We could just stall ...... but we can't determine the next instruction until AFTER the execution stage.... so it's better to compute the target as early as possible and to make an educated guess about the next instruction.

Reducing Branch Delay

• First, move hardware to determine outcome to ID stage• Target address adder and/or a memory to store previously computed targets (the "branch target buffer") • Register comparator • Second, add hardware to choose whether to load target address orPC + 4

Instruction Execution

• PC ® instruction memory, fetch instruction• Register numbers ® register file, read registers• Use ALU to calculate • Arithmetic result * Memory address for load/store • Branch comparison • Access data memory for load/store• PC <- target address or PC + 4

Static branch prediction

• Predict backward branches taken (e.g., the end of a loop body) • Predict forward branches not taken (e.g., if statements)


Related study sets

Biology now Chapter 7 - Patterns of inheritence

View Set

Understanding Psychology Chapter 1

View Set

Nursing Care of the Family During Labor and Birth 44Qw/exp *GOOD*

View Set

NURS 221 - Quiz #6 Pain Management

View Set

Section 6, Unit 3: Loan Assumptions, Modifications, and Seller Financing

View Set

IAS Baba Prelims 2018 Test 14 With Solutions

View Set