CPEN 4700 Exam Two Review
Register Selection Field
The Second and Fourth fields- they each determine which CPU registers will be used by the instructions. the number of bits determines the number of registers that the machine can have. In this case, three bits are used to identify each register--therefore the machine can only have 2^3 = 8 registers (at least of the type used by this instruction)
Mode Selection Field
The Third and fifth fields-they determine which addressing modes will be used by the instruction to locate operands (In conjunction with the associated registers) The number of bits determines the number of addressing modes that can be used to identify operands for the machine instructions.• In this case, three bits are used to identify the mode - therefore the machine can only have up to 2^3 =8 addressing modes for operands.
control transfers
The most basic situation that can cause delays in pipelined instruction processing is that instruction execution is not always sequential. Programs can include branches and other control transfer instructions.
Operation Code Details
The number of op code bits determines the number of different machine language instructions the computer can have.In this case, four bits are used for the op code- thus machine can have at most 2^4 = 16 different instructionsNeed more bits for op code for more machine instructions
Instruction Set Architecture (ISA)
The part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O. Computer systems that share an ISA are compatible - meaning they can execute the same programs
Memory indirect - OPERAND is memory location specified by memory location in instruction
The pointer to the memory operand is itself located in memory. Advantage: can have a virtually unlimited number of active pointers.• Allows us to easily work with multi-elementdata structures in memory (such as strings and arrays). Disadvantage: access to operand is delayed due to the time taken to get the pointerfrom memory ... also, complicates the CPU design
WAW - write after write
The relationship between I1 and I3 is known as an output dependence, and represents a __________ hazard.
Floating point - sign, mantissa, and exponent
The sign, the mantissa, and the exponent are separate parts of the notation that can be varied independently from each other. The number of digits allocated for the mantissa determinesprecision; the range of exponents controls the range of magnitude for the number.
Single precision exponents are expressed in excess-127 notation. That is to say...
The stored exponent is 127 greater than the actual exponent, such that all stored exponents appear to be positive
ideal speedup factor (Ideal throughput one instruction per cycle)
The task can be equally divided into s subtasks that each take exactly 1/s of the time.• There is no overhead incurred in implementing the pipeline
Bubble
The typical solution is a hardware solution: having the control unit recognize the dependencyrelation and stall the dependent instruction in the pipeline for as long as it takes the previousinstruction to complete.• Example using a 5 stage pipeline. The term _______ for a stall is like an air bubble in a water pipe
Superpipelining
The use of a very deep, high-speed pipeline for instruction processing in a microprocessor is called ______
Register Renaming
The use of dynamically generated tags to identify operands, rather than the static CPU register numbers generated by the programmer (or compiler), is known as _______. Dramatically reduces the need to access CPU registers for operands. • Operands do not have to be read from the register set if they are coming directly from a functionalunit. • Only the last of a series of writes to a register actually needs to be committed to it. (The intermediate values can just be sent directly to the appropriate reservation station)
Top current ISA in CISC- one way in which each has taken some ideas from the RISC Philosophy
The x86-64 architecture is a CISC architecture used in most modern desktop and server computers. It has evolved to include several RISC-inspired features, especially in its microarchitecture. Some of these features include: Complex Instructions Broken Down: x86-64 processors often break down complex CISC instructions into simpler RISC-like micro-operations to be executed efficiently
RAW - read after write
This situation, in which an instruction is dependent on the result being computed by an instruction that comes before it, is called a true data dependence or _______(RAW)hazard In this case I2, which reads R3, comes after I1, which writes R3. To avoid the hazard we mustmake sure the write actually happens before the read.
Input/Output instructions type
Those that enable the CPU to send or receive information to/from devices that connect the computer to the outside world (human users or other machines)
Control transfer instructions type
Those which have the potential to alter the normally sequential flow of instruction execution (by altering the program counter or instruction pointer).
Data Forwarding
To minimize the performance penalty (lost clock cycle[s]) associated with stalling the pipeline to avoid a RAW hazard, some processor designs use _______. __________provides a direct connection between the output of the circuit computing a result and the input of the circuit that needs that data to perform the next computation.
System instructions Type
Typically those that facilitate control of the system environment by the OS - things we generally don't want user/application programs to be able to do: • Enabling/disabling interrupts • Switching between privilege levels • Cache and MMU control • etc.
*Find the data hazard in a short series of instructions
Use 4.3 slides involving input and output
Factors that limit number of Registers in CPU core
Used to be due to lack of physical space on chip. Now... Number of bits available in the instruction format to specify a register The overhead of saving/restoring registers on a context switch The need to maintain compatibility with earlier machines The ability of a compiler to schedule use of the registers
Arithmetic pipelines (Look at 4.2 slides for more clarifying steps)
Used to speed up a particular mathematical operation that must be done repetitively (e.g. in a special-purpose machine such as a vector computer).
Instruction unit pipelines
Used to speed up the processing of a mix of machine language instructions in a general-purpose computer.
bit fields
Within a machine language instruction format of a given number of bits, the bits are divided into this. Each part has a specific meaning to the control unit. EX IMAGE (16 bits divided into a total of five bit fields)
*Compute penalty without and/or with branch prediction - equations provided
Without: Cavg‑ = 1 + pbptb With: Cavg‑ = 1 + pbb - pbpcb + pbptpcc H = 1/Cavg
Data Dependencies
Yet another thing that can cause problems in a pipelined processor is the presence of dependency relations between instructions.• In other words, the control unit of the processor needs to be "on the lookout" for situations where one instruction needs to use data that are being generated/manipulated by another instruction that might be in the pipeline at the same time
Tomasulo's method
_____ is essentially a refinement of the scoreboard approach with additional features and capabilities designed to increase concurrency (the ability to perform multiple operations at the same time).
Superthreading
_______ schedules multiple threads onto a processor core on a rotating basis (instructions from a new thread execute each successive clock cycle). • During any given clock cycle, instructions from only one thread are running This can mitigate the effect of data dependency hazards, since instructions from one thread willnot depend on the execution of instructions from other threads.
superscalar
__________ processors take the approach of increasing spatial parallelism (using up more space on the chip by creating additional instruction processing pipelines) rather than temporal parallelism
Hyperthreading
__________(as used in Intel and IBM POWER processors) attempts to make better use of CPU resources by scheduling multiple threads onto a core without rotating among threads. During any given clock cycle, instructions from morethan one thread could be running.
Instruction-level parallelism
although program instructions are written sequentially, in many cases there are multiple instructions that are not dependent on one another, and thus can be executed simultaneously if parallel hardware is available
WAR - write after read
anytime there are multiple pipelines (or multiple stages within the same pipeline) that can write a result, and when it is possible for instructions to execute out of program order, it is possible for ______ and WAW hazards to cause incorrect results to be computed. The relationship between I2 and I3 is known as an anti-dependence, and represents a ________ hazard.
Registers
are used to holdoperands and/or memory addressesfor operands
Carry Save Adder tree
can be usedto add together four 6-bit numbersdirectly (rather than adding them two ata time).• If we only care about the overall sum and don't need the partial sums, this is a more efficient approach.• Only the final stage (that combines the Sand C bits to produce the final answer)needs to be a CLA. All the previousstages are term-67simple, cheap CSA circuits.
control unit
controls and coordinates not only all the activities of the other parts of the CPU core, but ultimately, the operations of the entire computer system. Carry out the steps of the von Neumann machine cycle
scoreboard
developed for the CDC 6600, It is a collection of registers and logic that monitors the status of all data registers and functional units in the machine. It's to schedule the use of hardware functional units by machine language instructions.
Functional units
different parts of the ALU that instructions can be assigned instructions by type.
concurrency
doing more than one operation at the same time)
Booth's algorithm Continued
eExamines the multiplier (second operand) for strings of 0s and 1s.• If we're in the middle of a string of 0s, do nothing but shift (these partial products willall be zero)• Anytime there is a string of 1s, this can be treated as multiplying by (L - R), where L isthe weight of the 0 before the 1 on the left end, and R is the weight of the rightmost 1.
Sweeney, Robertson, and Tocher (SRT) division (USES LOOKUP TABLE)
generates two quotient bits at a timeinstead of one, using a lookup table. The SRT algorithm is used in many microprocessors (including Intel x86)
Instruction types --might be asked to give an example (can describe an action, doesn't necessarily have to be a real instruction mnemonic), or match an example with an instruction type
Data transfer, Computational, Control Transfer, I/O, System control, Miscellaneous
Von Neumann execution cycle
Design of a machine that carries out the steps. Involves: Fetching, decoding, executing, and writing back instructions (FDEW)
Branch target buffer
Dynamic branch prediction schemes in modern microprocessors often make use of a __________(a.k.a. branch target cache or target instruction cache). This may hold informationsuch as: • Addresses of branch instructions • Addresses of corresponding branch target instructions • Information about the past behavior of the branch (times taken/not taken)
prefetch queue
many processors have an instruction _______ into which anticipated future instructions are placed, in order to funnel them into the pipeline.
Advantage of Program Counter Relative (PC relative)
memory accesses can be made position-independent. If code and data are relocated to a different area of memory but the relative offset between the PC and the target location are unchanged, the access will still succeed (without needing to recompile/reassemble the code as would be necessary if absolute addresses were used).
Restoring Division
most basic algorithm for unsigned integer division
Miscellaneous instructions Type
nstructions that don't clearly fall into any of the above categories ca
Conditional branches
see 4.3 slides
Hardwired Control Unit
the control unit is designed as a finite state machine usingcombinational and sequential logic design techniques
Microprogrammed control (Section 3.3.3)
the control unit is designed through asoftware methodology using a "computer within a computer" approach.
The chosen operation is specified by...
the op code
Register indirect - OPERAND is in memory location specified in register
the pointer to the memory operand is located in a CPU register. Advantage: speed.• Disadvantage: limited number of registers to use to hold pointers.
Carry lookahead adder
this type of addition circuit develops all carries in logic, directly from the inputs, rather than waiting for them to propagate from less significant bit positions
Ripple Carry Adder
Half adder and (n-1) full adders are cascaded together The carry out of each adder is connected to the carry in of the next adder to the left.
Real Numbers
Have both an integer part and a fractional part (e.g. 2.875)• Can be very large (e.g. number of stars in a galaxy), very small (e.g. diameter of anatomic nucleus) or anywhere in between• Do not necessarily terminate within a finite number of digits (e.g. 1/3 = 0.333333...infinitely repeats; π = 3.141592653589793... never terminates or repeats)
nanoprogramming
Instead of containing control words , microinstructions contain pointers to locations in an even lower-level control store (nano-memory)
Binary Coded Decimal (BCD)
Now can build circuits to add things besides "plain old"binary numbers. The BCD adder corrects for the six invalid codes(1010 through 1111) by adding six to the resultanytime it exceeds 9 (1001)
Wallace Tree
Once we have the partial products PP0 through PP3, we can use a tree of carry save adders, we use this to sum them together to get the product This design can easily be expanded to multiplylarger numbers.
Memory-Register Architecture
One in which the operands for computational instructionscan be held in either CPU registers or memory locations Example: Intel x86. It allows either or both operands to reside in registers, and either (but not both) of the operands to reside in memory locations. So, where EAX and EBX are registers and var1 and var2 are variables in memory, we have:
vertical Microprogramming
One method that can be used to reduce the width of the control store (and thus its overall size)
Pipelining
One of the primary techniques used to improve efficiency and increase performance of modern processors The basic idea of pipelining is to divide a task into subtasks, and then overlap performance of the subtasks for multiple iterations of the task.
Register Addressing Example
Operand is in one of the CPU registers as specified in the instruction Example (The bit field in blue tells us the operand is currently stored in R1 (00001))
Stack addressing - register indirect with autoincrement
Operands are located within a stack (Last In, First Out data structure) in memory.• A push operation is used to store an item at the top of the stack.• A pop (sometimes called pull) operation removes the item currently at the top of thestack.
Advantage of a Memory-Register ISA
Operands don't have to be loaded into registers before they can be used - they can be operated on directly in memory. This reduces the need for copying of values between memory and the register set.
RCA (Ripple Carry Adder) Optimization
Optimized for low implementation cost, at the sacrifice of being slow (total delay is the sum of the delay of all full adders,)
CLA (Carry Lookahead Adder) Optimization
Optimized for speed, at the sacrifice of an extremely high implementation cost.
Branch prediction
tries to guess whether or not each branch will be taken and provide this information to the instruction fetching hardware so it will fetch the instructions most likely tobe executed. Static branch prediction is done before the program actually runs (usually by the compiler). Dynamic branch prediction is done by the CPU's control logic "on the fly" (while the program is actually running). typically done by the control unit keeping track of some amount of history on the behavior of branch instructions in the code
speedup factor - approaches number of pipeline stages ((Look at 4.1 slides for clarifying notes)
using the s-stage pipeline to perform n iterations of the task is (n * s)/(n+ s - 1), which approaches s as a limit as n becomes large
Ideal number of operands per instruction
usually three: two "source" operands that provide data for a computation, plus one "destination" operand that receives the result of the computation
Best CASE for five CLA gate propagation delays *Compute delay or possibly fan-in for a CLA given restrictions
1 to compute all the P and G functions (all done in parallel) 1 to AND all the terms together (in parallel) 1 to OR the ANDed terms together to get the carries (in parallel) 2 to pass through the full adders (all in parallel)
RISC Characteristics (WILL need to compare between this and CISC)
1. Fixed-length instructions are used to simplify instruction fetching. 2. Few instruction formats in order to simplify instruction decoding. 3. A load-store instruction set architecture is used to decouple memory accesses from computations so that each can be optimized independently. 4. Instructions have simple functionality, which helps keep the control unit design simple. 5. A hardwired control unit optimizes the machine for speed. 6. The architecture is designed for pipelined implementation, again to optimize for speed ofexecution. 7. Only a few, simple addressing modes are provided because complex ones may slow down themachine and are rarely used by compilers
ARM
16 general purpose registers Some Registers are duplicated so that the CPU can respond to interrupts more quickly. When the mode of the CPU switches, some registers are banked and become inaccessible to the current mode.
Top current ISA in RISC- one way in which each has taken some ideas from the CISC Philosophy
ARM (Acorn RISC Machine) is a well-known RISC architecture that is commonly used in mobile devices, embedded systems, and increasingly in desktop and server applications. While ARM is fundamentally a RISC architecture, it has incorporated some CISC-inspired features in recent iterations, such as the ARMv8-A architecture used in 64-bit processors. These features include: Atomic Operations- Allow for more complex atomic operations like compare-and-swap, load-linked/store-conditional, and synchronization primitives
Stalls
Accessing main memory for an operand can result in a delay of several clock cycles - during which time the pipeline ______
Advantages/Disadvantages of Direct addressing
Advantage: most modern computers have very large amounts of RAM main memory, sowe can have essentially as many direct-addressed variables as we like Disadvantages: 1. Access to variables stored in main memory will be slower than access to registervariables. (Even variables residing in cache aren't as fast as register access.) 2. If the memory space is large, the bit field required to hold a memory address is large (e.g.if we have 4 GB of memory space, it takes 32 bits to hold an address).3. Address in the instruction is fixed. (No self-modifying code.) Like register addressing, direct addressing is good for working with scalar (individual)variables, but is rather clumsy for working with strings, arrays, etc
Advantages/Disadvantages of Register Addressing
Advantages: 1. Operands kept in registers can be accessed very quickly (registers are part of the CPUcore and connect directly to the ALU). 2. We usually only need a small bit field to identify a CPU register (3 bits if there are 8registers, 4 bits if there are 16 registers, 5 bits if there are 32). Disadvantages: 1. There are usually only a limited number of CPU registers (see previous bullet), and someof them may be needed for other uses. (Only able to keep only the most frequently usedvariables in registers.) 2. Registers are fine for scalar (individual) variables, but cannot generally hold large, multi-element data structures such as strings or arrays. (In a modern 64-bit processor, most ofthe general purpose registers are 64 bits.)
Advantages/Disadvantages of Load-store architecture
Advantages: Machine instructions in a load-store ISA are usually shorter and more likely to fit into a fixed number of bits. • This contributes to ease of instruction unit pipelining (Section 4.3) Disadvantage: May take more machine instructions to carry out the same task (vs. a memory-register architecture).• It is also more work for a compiler to manage the larger register set, plus saving andrestoring more registers on a context switch or interrupt can take longer.
Memory-Memory Architecture.
Allows all of the operands for computational instructions to reside in main memory
CPU's Three Major Components
An Arithmetic/Logic Unit (ALU) that performs computations on binary data. Registers that are used to hold operands and/or memory addresses for operands A control unit that controls and sequences the behavior of the other components (and rest of system) based on programmed instructions
delayed loads
An alternative approach would be to document that load operations are delayed (similar to a delayedcontrol transfer instruction) and thus do not take effect until after the following instruction. In other words, the following sequence would use the "old" value in R5 (from before the LOAD) ...◦ LOAD VALUE, R5◦ ADD R5, R4, R3
Booth's algorithm
Another approach to building a signed multiplication circuit. (iterative solution) The basic idea behind it is that every binary number is comprised of strings of 0s and/or 1s.
Delayed control transfers
Are another approach that can be used to minimize the performancepenalty associated with nonsequential code execution in a pipelined processor In conventional instruction set architectures, control transfer instructions (jumps, calls, conditional branches for which the condition is true, etc.) take effect immediately - the next instruction executed is the one at the target location.
CISC Characteristics(WILL need to compare between this and RISC)
CISC architectures generally have the opposite characteristics vs. RISC, for example: often use microprogrammed control units, which provide more flexibility in executing complex instructions but can be slower than hardwired control. typically have fewer general-purpose registers compared to RISC architectures, which can lead to more memory access for temporary data storage. • Variable-length instructions rather than fixed-length; • A memory-memory or memory-register instruction set rather than load/store; • Many and/or complex addressing modes rather than just a few simple ones;
Datapath
Carries out the execution of the machine language instructions which is comprised of: Register set Functional Hardware: ALU, shifter, etc. CPU's Internal Circuitry to store and manipulate binary values
Register Set
Collection of D flip-flops with appropriate hardware(multiplexers,demultiplexers, decoders, etc.) for routing data to/from them, enabling reading/writing of data.
indexed (or displacement) addressing. - memory location + immediate constant
Combines a pointer in a CPU register with a constant offset(displacement) encoded into the instruction.The sum of those two values determines the location of the operand in memory Similar to array indexing
Computational Instructions type
Computational instructions produce a numerical result(s) based on the operands and the operation performed on them. 1. Integer arithmetic (ex. addition, multiplication) 2. Real-number arithmetic (if needed for the intended applications) (ex. also addition, multiplication) 3. Boolean logic (ex. AND, OR) 4. Bit shifting (moving bits left or right) 5. Comparisons (ex. <, >, =)
Data Transfer Instruction Type
Copy data from one place to another within the machine (without doing any actual computation) Memory to register. Register to memory. Register to register. Memory to memory (in some machines). Constant to register or memory
Tomasulo's method Bit and Tag, reservation stations, and
Each data register has a busy bit and a tag field associated with it. • The busy bit is set when an instruction specifies that register as its destination, and cleared when the register receives the result of that instruction. • The tag field identifies which functional unit will compute the result being sent to that register. Each functional unit has Reservation Stations , which are input registers that can hold operands for computations to be performed by that unit. All the functional units and the data registers are connected together by a common data bus(CDB) The CDB feeds results computed by the functional units back to the data registers (and also to the reservation stations that serve as inputs to the functional units).
Horizontal Microprogramming
Each microword will have a bit position for every control signal that the machine needs to operate
Immediate Addressing -- OPERAND is in Instruction
Embed the operand itself (in binary) into the machine language instruction. In other words, the instruction specifies explicitly what the operand is. Primary advantage of immediate addressing is speed of operation. Since the operand is embedded in the code - and since most architectures do notallow for self-modifying code, it's good for constant operands only
*Given an instruction in assembly/binary and some information about it, show the opposite and describe the operands and what it does
Find Example in 3.1 Slideshow
VLIW Advantages
Format for a VLIW architecture contains many more bits than the instruction format for a typical RISC or CISC processor. Each "very long" instruction contains enough bits to specify the equivalent of several (three,four, or even more) machine instructions in a conventional architecture. • Ideally, the number of "slots" (independently programmable operations) in one VLIWinstruction matches up to the number of parallel functional units that are available for executinginstructions. They can achieve higher performance than similar superscalar CPU
Program Counter Relative (PC relative)
Instead of adding the displacement to the contents of a general-purpose CPU register, it is added to the program counter register (the one that keeps track of execution of program code).
Single and double precision (32 and 64 bit)
IEEE 754 defined two principal floating-point number formats: single precision (32 bits) and double precision (64 bits).• The Intel x87 floating-point registers (that hold real number values in x86 processors) are 80 bits wide, corresponding to the double extended precision. • Typical equivalent variable types in high-level languages include:• float = IEEE single precision (32 bits) • double = IEEE double precision (64 bits) • long double (not supported by all compilers) = IEEE double extended precision (80 bits)
nop
If no independent instruction can be found, the compiler (or human assembly languageprogrammer) can place a ____ (no operation) in the delay slot
flushed from pipeline
If the pipeline proceeds to fetch instructions from the correct path, execution can proceed normally; but if instructions from the incorrect path are fetched and proceed some number of stages down the pipeline, they must be _______" from the pipeline to avoid incorrect execution.
Instruction Addressing Modes Summary
Immediate addressing Tells us what the operand is. • Good for constant data - no goodfor variables Register addressing Tells us where the operand is the register set. • Direct addressing tell us where the operand is main memory.
Disadvantage of Memory-Register ISA
Implementing this type of ISA usually requires variable-length instructions (which also may take a variable amount of time to execute).• Control unit design may be more complex• It may be harder to pipeline instruction execution (see Chapter 4)• Instructions may not execute as quickly
Carry Save Adder (CSA)
In a CSA, the carries are not cascaded from one bit position to the next. Instead, each full adder takes three inputs and produces two outputs. This doesn't do much good all by itself, but the CSA stages can be cascaded together in a treestructure to add together more than two binary numbers
Load-store architecture
In such an ISA, the only instructions that access memory for operands are load (transfer datafrom memory to a register) and store (transfer data from a register to memory).
Advantages of wider Registers
In terms of storing values, more bits means a greater range of integer values; greater range/more precision for real values In terms of memory addressing, more bits enables a larger address space, 16 bits can only address 64 KB; 32 bits can address 4GB; 64 bits allows us to address up to 16 EB (2^64bytes)
SPARC's unique Organization structure
Instead of Grouping register by what they hold (Data vs addresses), they are grouped by their scope within a program: Global Registers are visible within all procedures Local Registers contain data private to the currently running procedure Out registers are used to pass arguments to a called procedure In registers are used to receive parameters from the calling procedure
memory data register (MDR)
It is a bidirectional register (inputs and outputs on both sides) that holds the data being read from (or written to) memory.
Restoring Division Continued
It is analogous to long division by hand. We iteratively attempt to see if the divisor "goesinto" the partial dividend (by subtracting it), moving one bit to the right each time.• If we get a positive or zero partial remainder, then we put a 1 in the correspondingquotient bit.• If we get a negative partial remainder, we put a 0 in the quotient and add back (restore)the divisor to the dividend.• The algorithm continues until the entire dividend is processed.
instruction register (IR)
It is part of the control unit. It holds the current machine language instruction so it can be decoded - which will tell the control unit what needs to be done.
Memory Address Register (MAR)
It is used to provide an address to the memory system whenever we need to read or write a memory location.
program counter register (PC)
It keeps track of where in memory the current instruction is located. As part of the execution of the current instruction, it is incremented to point to the next instruction.
Memory-register or Load-Store Oriented
Most of today's popular computer architectures have two- or three-operand instructions and a reasonably large collection of internal registers. this is what we classify them as.
Nonrestoring Division
Removes the need to restore the divisor back to the partial dividend. It generates quotient bits that represent 1 and -1 instead of 1 and 0; the resulthas to be resolved back into the correct binary value at the end of the computation.
ALU Function
Should at least be able to do binary addition and subtraction, a set of Boolean operations, and preferably have some bit shifting capability.
Integer Multiplication
Since the logical AND function implements one-bit binary multiplication, we can generate thepartial products with four AND gates each (16total).
Important Aspects of Register Set
Size and Logical Organization Size: Width of each register (in bits) Number of registers provided Width: Register width in microprocessors has increased over the years from 8 to 16 to 32 now 64 bits
Ring Counter
The Control step counter couldbe implemented using a ringcounter or a binary counterplus a decoder.
Data Hazards
The combination of register renaming and data forwarding helps to minimize the occurrence of stalls caused by _____
Control Store
The control unit itself is programmable; it contains a special memory called ____. from which it fetches microinstructions - in much the same manner that the CPU as a whole fetches machine instructions from main memory.
Branch Penalty
The default behavior is usually to fetch instructions from the sequential execution path. If the branch condition is false, this will be correct. But if the branch condition turns out to be true, fetching the incorrect instructions will incur a _______ (one or more wasted clock cycles due to discarding wrongly fetched instructions)
radix point
The dot that separates the whole part from the fractional part in a real number in any base
IEEE 754 standard
The group came up with a set of specifications for representing and operating on floating-point numbers. IEEE assigned standard number 754 to the document when it was finalized in1985.
Multithreading Purpose and Meaning
The idea here is that by running multiple threads, we can: 1. Help avoid stalls due to data and control hazards 2. Help cover up memory system latency 3. Make better use of CPU resources (keep hardware functional units busy more of the time)... particularly when individual threads have limited ILP
delay slot
The instruction(s) following the delayed control transfer instruction are said to be in its _____. Most architectures that implement this feature have only one delay slot behind each controltransfer
Operation Code (Op code)
The leftmost field (4 bits). It determines the function of the instruction--What will it do?
Overhead from buffer registers (Look at 4.1 slides for clarifying notes)
The logic for the pipeline stages has propagation delay. The pipelineregisters introduce delays also!
Normalization
The mantissa is always expressed with the radix (binary) point inthe same position with respect to the significant digits.
Addressing Modes Definition
The means provided by the architecture to specify the value(s)and/or location(s) of instruction operands.
Direct Addressing - OPERAND is in specified memory location (in instruction)
Uses a bit field within the instruction to specify which main memory location contains an operand (and/or receives the result). • Instead of op code followed by the operand (as in immediate addressing), we have op code followed by the memory address of the operand
Register Addressing
Uses a bit field within the instruction to specify which of the several CPU registers contains an operand (and/or receives the result computed by the ALU). Advantages: 1. Operands kept in registers can be accessed very quickly (registers are part of the CPUcore and connect directly to the ALU). 2. We usually only need a small bit field to identify a CPU register (3 bits if there are 8registers, 4 bits if there are 16 registers, 5 bits if there are 32).
Fixed Point- Disadvantage Notation
Using a finite number of bits, we are trading off range vs. precision.To get more precision we have to sacrifice range, and vice versa. Works reasonably well when all values fall within a limited range, but is extremely inefficient and unwieldy when we have to work with both very large and very small numbers.
VLIW Disadvantages
VLIW architectures are not compatible with previous architectures,and computer users value compatibility. Other disadvantages of VLIW include: 1. Need for a very complex and specialized compiler 2. Large code size (poor code density) compared to conventional architectures (have to encodehundreds of bits per instruction whether or not all of them specify active operations) 3. Need for a very large register set 4. High memory bandwidth is needed to support transfers of "very long" instructions and data 5. Poor performance on branch-intensive code
VLIW - what it's trying to do,
Very Long Instruction Word (VLIW) architectures are an alternative approach that moves the instruction scheduling burden from hardware into software (the compiler).
Arithmetic hardware
What Put the "compute" in a computer Some applications need to work with read numb All applications need to be able to do arithmetic (Plus Boolean logic, shifting, etc.) on integer values-Signed and Unsigned.