Hazards
Moving the branch decision up requires two actions to occur earlier
1. Computing the branch target address 2. Evaluating the branch decision.
correlating predictor
A branch predictor that combines local behavior of a particular branch and global information about the behavior of some recent number of executed branches.
tournament branch predictor
A branch predictor with multiple predictions for each branch and a selection mechanism that chooses which predictor to enable for a given branch.
forwarding/bypassing
A method of resolving a data hazard by retrieving the missing data element from internal buffers rather than waiting for it to arrive from programmer-visible register or memory
Load-use data hazard
A specific form of data hazard in which the data being loaded by a load instruction have not yet become available when they are needed by another instruction.
Pipeline stall/bubble.
A stall initiated in order resolve a hazard.
ADD X19, X0, X1 SUB X2, X19, X3
The add instruction doesn't write its result until the fifth stage, meaning we would have to waste three clock cycles in the pipeline.
To discard instruction
We merely change the original control values to 0's, much as we did to stall for a load-use data hazard.
Structural Hazard
When a planed instruction cannot execute in the proper clock cycle because the hardware does not support the combination of instructions that are set to execute.
Data Hazard
When a planned instruction cannot execute in the proper clock cycle because data that are needed to execute the instruction are not yet available.
Control Hazard/Branch Hazard
When the proper instruction cannot execute in the proper pipeline clock cycle because the instruction that was fetched is not the one that was needed; that is, the flow of instruction addresses is not what the pipeline expected.
Example of Structural Hazard
When we see that in the same clock cycle, the first instruction is accessing data from memory while the fourth instruction is fetching an instruction from the same memory. With out two memories, our pipeline could have a structural hazard.
branch target buffer
a structure that caches the destination PC or destination instruction for branch. It is usually organized as a cache with tag, making it more costly than a simple prediction buffer.
Moving the branch test to the ID stage implies..
additional forwarding and hazard detection hardware since a branch dependent on a result still in the pipeline must still work properly with the optimization
Why is misprediction on the first iteration inevitable?
because the bit is flipped on prior execution of the last iteration of the loop, since the branch was not taken on that exiting iteration.
How to solve load-use data hazard
even with forwarding, we would have to stall one stage.
How do data hazards arise?
from the dependence of one instruction on an earlier one that is still in the pipeline.
methods to avoid load-use pipeline stalls
hardware detection stalls software that reorders code
What is the harder part for reducing the delay of branches.
is the branch decision itself.
When will the steady-state prediction behavior mispredict?
on the first and last loop iteration
A typical tournament predictor might contain two predictions for each branch index
one based on local inforamtion and one based on global branch behavior.
Why is misprediction on the last iteration inevitable?
since the prediction bit will indicate taken.
Move the branch adder from..
the EX stage to the ID stage; of course, the address calculation for branch targets will be performed for all instructions, but only used when needed.
Flush
to discard instructions in a pipeline, usually due o an unexpected event.
one way to improve conditional brance performance is
to reduce the cost of the taken branch.
for compare and branch zero..
we would compare a register read during the ID stage to see if it is zero.