CPEN 4700 Exam Two Review

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Register Selection Field

The Second and Fourth fields- they each determine which CPU registers will be used by the instructions. the number of bits determines the number of registers that the machine can have. In this case, three bits are used to identify each register--therefore the machine can only have 2^3 = 8 registers (at least of the type used by this instruction)

Mode Selection Field

The Third and fifth fields-they determine which addressing modes will be used by the instruction to locate operands (In conjunction with the associated registers) The number of bits determines the number of addressing modes that can be used to identify operands for the machine instructions.• In this case, three bits are used to identify the mode - therefore the machine can only have up to 2^3 =8 addressing modes for operands.

control transfers

The most basic situation that can cause delays in pipelined instruction processing is that instruction execution is not always sequential. Programs can include branches and other control transfer instructions.

Operation Code Details

The number of op code bits determines the number of different machine language instructions the computer can have.In this case, four bits are used for the op code- thus machine can have at most 2^4 = 16 different instructionsNeed more bits for op code for more machine instructions

Instruction Set Architecture (ISA)

The part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O. Computer systems that share an ISA are compatible - meaning they can execute the same programs

Memory indirect - OPERAND is memory location specified by memory location in instruction

The pointer to the memory operand is itself located in memory. Advantage: can have a virtually unlimited number of active pointers.• Allows us to easily work with multi-elementdata structures in memory (such as strings and arrays). Disadvantage: access to operand is delayed due to the time taken to get the pointerfrom memory ... also, complicates the CPU design

WAW - write after write

The relationship between I1 and I3 is known as an output dependence, and represents a __________ hazard.

Floating point - sign, mantissa, and exponent

The sign, the mantissa, and the exponent are separate parts of the notation that can be varied independently from each other. The number of digits allocated for the mantissa determinesprecision; the range of exponents controls the range of magnitude for the number.

Single precision exponents are expressed in excess-127 notation. That is to say...

The stored exponent is 127 greater than the actual exponent, such that all stored exponents appear to be positive

ideal speedup factor (Ideal throughput one instruction per cycle)

The task can be equally divided into s subtasks that each take exactly 1/s of the time.• There is no overhead incurred in implementing the pipeline

Bubble

The typical solution is a hardware solution: having the control unit recognize the dependencyrelation and stall the dependent instruction in the pipeline for as long as it takes the previousinstruction to complete.• Example using a 5 stage pipeline. The term _______ for a stall is like an air bubble in a water pipe

Superpipelining

The use of a very deep, high-speed pipeline for instruction processing in a microprocessor is called ______

Register Renaming

The use of dynamically generated tags to identify operands, rather than the static CPU register numbers generated by the programmer (or compiler), is known as _______. Dramatically reduces the need to access CPU registers for operands. • Operands do not have to be read from the register set if they are coming directly from a functionalunit. • Only the last of a series of writes to a register actually needs to be committed to it. (The intermediate values can just be sent directly to the appropriate reservation station)

Top current ISA in CISC- one way in which each has taken some ideas from the RISC Philosophy

The x86-64 architecture is a CISC architecture used in most modern desktop and server computers. It has evolved to include several RISC-inspired features, especially in its microarchitecture. Some of these features include: Complex Instructions Broken Down: x86-64 processors often break down complex CISC instructions into simpler RISC-like micro-operations to be executed efficiently

RAW - read after write

This situation, in which an instruction is dependent on the result being computed by an instruction that comes before it, is called a true data dependence or _______(RAW)hazard In this case I2, which reads R3, comes after I1, which writes R3. To avoid the hazard we mustmake sure the write actually happens before the read.

Input/Output instructions type

Those that enable the CPU to send or receive information to/from devices that connect the computer to the outside world (human users or other machines)

Control transfer instructions type

Those which have the potential to alter the normally sequential flow of instruction execution (by altering the program counter or instruction pointer).

Data Forwarding

To minimize the performance penalty (lost clock cycle[s]) associated with stalling the pipeline to avoid a RAW hazard, some processor designs use _______. __________provides a direct connection between the output of the circuit computing a result and the input of the circuit that needs that data to perform the next computation.

System instructions Type

Typically those that facilitate control of the system environment by the OS - things we generally don't want user/application programs to be able to do: • Enabling/disabling interrupts • Switching between privilege levels • Cache and MMU control • etc.

*Find the data hazard in a short series of instructions

Use 4.3 slides involving input and output

Factors that limit number of Registers in CPU core

Used to be due to lack of physical space on chip. Now... Number of bits available in the instruction format to specify a register The overhead of saving/restoring registers on a context switch The need to maintain compatibility with earlier machines The ability of a compiler to schedule use of the registers

Arithmetic pipelines (Look at 4.2 slides for more clarifying steps)

Used to speed up a particular mathematical operation that must be done repetitively (e.g. in a special-purpose machine such as a vector computer).

Instruction unit pipelines

Used to speed up the processing of a mix of machine language instructions in a general-purpose computer.

bit fields

Within a machine language instruction format of a given number of bits, the bits are divided into this. Each part has a specific meaning to the control unit. EX IMAGE (16 bits divided into a total of five bit fields)

*Compute penalty without and/or with branch prediction - equations provided

Without: Cavg‑ = 1 + pbptb With: Cavg‑ = 1 + pbb - pbpcb + pbptpcc H = 1/Cavg

Data Dependencies

Yet another thing that can cause problems in a pipelined processor is the presence of dependency relations between instructions.• In other words, the control unit of the processor needs to be "on the lookout" for situations where one instruction needs to use data that are being generated/manipulated by another instruction that might be in the pipeline at the same time

Tomasulo's method

_____ is essentially a refinement of the scoreboard approach with additional features and capabilities designed to increase concurrency (the ability to perform multiple operations at the same time).

Superthreading

_______ schedules multiple threads onto a processor core on a rotating basis (instructions from a new thread execute each successive clock cycle). • During any given clock cycle, instructions from only one thread are running This can mitigate the effect of data dependency hazards, since instructions from one thread willnot depend on the execution of instructions from other threads.

superscalar

__________ processors take the approach of increasing spatial parallelism (using up more space on the chip by creating additional instruction processing pipelines) rather than temporal parallelism

Hyperthreading

__________(as used in Intel and IBM POWER processors) attempts to make better use of CPU resources by scheduling multiple threads onto a core without rotating among threads. During any given clock cycle, instructions from morethan one thread could be running.

Instruction-level parallelism

although program instructions are written sequentially, in many cases there are multiple instructions that are not dependent on one another, and thus can be executed simultaneously if parallel hardware is available

WAR - write after read

anytime there are multiple pipelines (or multiple stages within the same pipeline) that can write a result, and when it is possible for instructions to execute out of program order, it is possible for ______ and WAW hazards to cause incorrect results to be computed. The relationship between I2 and I3 is known as an anti-dependence, and represents a ________ hazard.

Registers

are used to holdoperands and/or memory addressesfor operands

Carry Save Adder tree

can be usedto add together four 6-bit numbersdirectly (rather than adding them two ata time).• If we only care about the overall sum and don't need the partial sums, this is a more efficient approach.• Only the final stage (that combines the Sand C bits to produce the final answer)needs to be a CLA. All the previousstages are term-67simple, cheap CSA circuits.

control unit

controls and coordinates not only all the activities of the other parts of the CPU core, but ultimately, the operations of the entire computer system. Carry out the steps of the von Neumann machine cycle

scoreboard

developed for the CDC 6600, It is a collection of registers and logic that monitors the status of all data registers and functional units in the machine. It's to schedule the use of hardware functional units by machine language instructions.

Functional units

different parts of the ALU that instructions can be assigned instructions by type.

concurrency

doing more than one operation at the same time)

Booth's algorithm Continued

eExamines the multiplier (second operand) for strings of 0s and 1s.• If we're in the middle of a string of 0s, do nothing but shift (these partial products willall be zero)• Anytime there is a string of 1s, this can be treated as multiplying by (L - R), where L isthe weight of the 0 before the 1 on the left end, and R is the weight of the rightmost 1.

Sweeney, Robertson, and Tocher (SRT) division (USES LOOKUP TABLE)

generates two quotient bits at a timeinstead of one, using a lookup table. The SRT algorithm is used in many microprocessors (including Intel x86)

Instruction types --might be asked to give an example (can describe an action, doesn't necessarily have to be a real instruction mnemonic), or match an example with an instruction type

Data transfer, Computational, Control Transfer, I/O, System control, Miscellaneous

Von Neumann execution cycle

Design of a machine that carries out the steps. Involves: Fetching, decoding, executing, and writing back instructions (FDEW)

Branch target buffer

Dynamic branch prediction schemes in modern microprocessors often make use of a __________(a.k.a. branch target cache or target instruction cache). This may hold informationsuch as: • Addresses of branch instructions • Addresses of corresponding branch target instructions • Information about the past behavior of the branch (times taken/not taken)

prefetch queue

many processors have an instruction _______ into which anticipated future instructions are placed, in order to funnel them into the pipeline.

Advantage of Program Counter Relative (PC relative)

memory accesses can be made position-independent. If code and data are relocated to a different area of memory but the relative offset between the PC and the target location are unchanged, the access will still succeed (without needing to recompile/reassemble the code as would be necessary if absolute addresses were used).

Restoring Division

most basic algorithm for unsigned integer division

Miscellaneous instructions Type

nstructions that don't clearly fall into any of the above categories ca

Conditional branches

see 4.3 slides

Hardwired Control Unit

the control unit is designed as a finite state machine usingcombinational and sequential logic design techniques

Microprogrammed control (Section 3.3.3)

the control unit is designed through asoftware methodology using a "computer within a computer" approach.

The chosen operation is specified by...

the op code

Register indirect - OPERAND is in memory location specified in register

the pointer to the memory operand is located in a CPU register. Advantage: speed.• Disadvantage: limited number of registers to use to hold pointers.

Carry lookahead adder

this type of addition circuit develops all carries in logic, directly from the inputs, rather than waiting for them to propagate from less significant bit positions

Ripple Carry Adder

Half adder and (n-1) full adders are cascaded together The carry out of each adder is connected to the carry in of the next adder to the left.

Real Numbers

Have both an integer part and a fractional part (e.g. 2.875)• Can be very large (e.g. number of stars in a galaxy), very small (e.g. diameter of anatomic nucleus) or anywhere in between• Do not necessarily terminate within a finite number of digits (e.g. 1/3 = 0.333333...infinitely repeats; π = 3.141592653589793... never terminates or repeats)

nanoprogramming

Instead of containing control words , microinstructions contain pointers to locations in an even lower-level control store (nano-memory)

Binary Coded Decimal (BCD)

Now can build circuits to add things besides "plain old"binary numbers. The BCD adder corrects for the six invalid codes(1010 through 1111) by adding six to the resultanytime it exceeds 9 (1001)

Wallace Tree

Once we have the partial products PP0 through PP3, we can use a tree of carry save adders, we use this to sum them together to get the product This design can easily be expanded to multiplylarger numbers.

Memory-Register Architecture

One in which the operands for computational instructionscan be held in either CPU registers or memory locations Example: Intel x86. It allows either or both operands to reside in registers, and either (but not both) of the operands to reside in memory locations. So, where EAX and EBX are registers and var1 and var2 are variables in memory, we have:

vertical Microprogramming

One method that can be used to reduce the width of the control store (and thus its overall size)

Pipelining

One of the primary techniques used to improve efficiency and increase performance of modern processors The basic idea of pipelining is to divide a task into subtasks, and then overlap performance of the subtasks for multiple iterations of the task.

Register Addressing Example

Operand is in one of the CPU registers as specified in the instruction Example (The bit field in blue tells us the operand is currently stored in R1 (00001))

Stack addressing - register indirect with autoincrement

Operands are located within a stack (Last In, First Out data structure) in memory.• A push operation is used to store an item at the top of the stack.• A pop (sometimes called pull) operation removes the item currently at the top of thestack.

Advantage of a Memory-Register ISA

Operands don't have to be loaded into registers before they can be used - they can be operated on directly in memory. This reduces the need for copying of values between memory and the register set.

RCA (Ripple Carry Adder) Optimization

Optimized for low implementation cost, at the sacrifice of being slow (total delay is the sum of the delay of all full adders,)

CLA (Carry Lookahead Adder) Optimization

Optimized for speed, at the sacrifice of an extremely high implementation cost.

Branch prediction

tries to guess whether or not each branch will be taken and provide this information to the instruction fetching hardware so it will fetch the instructions most likely tobe executed. Static branch prediction is done before the program actually runs (usually by the compiler). Dynamic branch prediction is done by the CPU's control logic "on the fly" (while the program is actually running). typically done by the control unit keeping track of some amount of history on the behavior of branch instructions in the code

speedup factor - approaches number of pipeline stages ((Look at 4.1 slides for clarifying notes)

using the s-stage pipeline to perform n iterations of the task is (n * s)/(n+ s - 1), which approaches s as a limit as n becomes large

Ideal number of operands per instruction

usually three: two "source" operands that provide data for a computation, plus one "destination" operand that receives the result of the computation

Best CASE for five CLA gate propagation delays *Compute delay or possibly fan-in for a CLA given restrictions

1 to compute all the P and G functions (all done in parallel) 1 to AND all the terms together (in parallel) 1 to OR the ANDed terms together to get the carries (in parallel) 2 to pass through the full adders (all in parallel)

RISC Characteristics (WILL need to compare between this and CISC)

1. Fixed-length instructions are used to simplify instruction fetching. 2. Few instruction formats in order to simplify instruction decoding. 3. A load-store instruction set architecture is used to decouple memory accesses from computations so that each can be optimized independently. 4. Instructions have simple functionality, which helps keep the control unit design simple. 5. A hardwired control unit optimizes the machine for speed. 6. The architecture is designed for pipelined implementation, again to optimize for speed ofexecution. 7. Only a few, simple addressing modes are provided because complex ones may slow down themachine and are rarely used by compilers

ARM

16 general purpose registers Some Registers are duplicated so that the CPU can respond to interrupts more quickly. When the mode of the CPU switches, some registers are banked and become inaccessible to the current mode.

Top current ISA in RISC- one way in which each has taken some ideas from the CISC Philosophy

ARM (Acorn RISC Machine) is a well-known RISC architecture that is commonly used in mobile devices, embedded systems, and increasingly in desktop and server applications. While ARM is fundamentally a RISC architecture, it has incorporated some CISC-inspired features in recent iterations, such as the ARMv8-A architecture used in 64-bit processors. These features include: Atomic Operations- Allow for more complex atomic operations like compare-and-swap, load-linked/store-conditional, and synchronization primitives

Stalls

Accessing main memory for an operand can result in a delay of several clock cycles - during which time the pipeline ______

Advantages/Disadvantages of Direct addressing

Advantage: most modern computers have very large amounts of RAM main memory, sowe can have essentially as many direct-addressed variables as we like Disadvantages: 1. Access to variables stored in main memory will be slower than access to registervariables. (Even variables residing in cache aren't as fast as register access.) 2. If the memory space is large, the bit field required to hold a memory address is large (e.g.if we have 4 GB of memory space, it takes 32 bits to hold an address).3. Address in the instruction is fixed. (No self-modifying code.) Like register addressing, direct addressing is good for working with scalar (individual)variables, but is rather clumsy for working with strings, arrays, etc

Advantages/Disadvantages of Register Addressing

Advantages: 1. Operands kept in registers can be accessed very quickly (registers are part of the CPUcore and connect directly to the ALU). 2. We usually only need a small bit field to identify a CPU register (3 bits if there are 8registers, 4 bits if there are 16 registers, 5 bits if there are 32). Disadvantages: 1. There are usually only a limited number of CPU registers (see previous bullet), and someof them may be needed for other uses. (Only able to keep only the most frequently usedvariables in registers.) 2. Registers are fine for scalar (individual) variables, but cannot generally hold large, multi-element data structures such as strings or arrays. (In a modern 64-bit processor, most ofthe general purpose registers are 64 bits.)

Advantages/Disadvantages of Load-store architecture

Advantages: Machine instructions in a load-store ISA are usually shorter and more likely to fit into a fixed number of bits. • This contributes to ease of instruction unit pipelining (Section 4.3) Disadvantage: May take more machine instructions to carry out the same task (vs. a memory-register architecture).• It is also more work for a compiler to manage the larger register set, plus saving andrestoring more registers on a context switch or interrupt can take longer.

Memory-Memory Architecture.

Allows all of the operands for computational instructions to reside in main memory

CPU's Three Major Components

An Arithmetic/Logic Unit (ALU) that performs computations on binary data. Registers that are used to hold operands and/or memory addresses for operands A control unit that controls and sequences the behavior of the other components (and rest of system) based on programmed instructions

delayed loads

An alternative approach would be to document that load operations are delayed (similar to a delayedcontrol transfer instruction) and thus do not take effect until after the following instruction. In other words, the following sequence would use the "old" value in R5 (from before the LOAD) ...◦ LOAD VALUE, R5◦ ADD R5, R4, R3

Booth's algorithm

Another approach to building a signed multiplication circuit. (iterative solution) The basic idea behind it is that every binary number is comprised of strings of 0s and/or 1s.

Delayed control transfers

Are another approach that can be used to minimize the performancepenalty associated with nonsequential code execution in a pipelined processor In conventional instruction set architectures, control transfer instructions (jumps, calls, conditional branches for which the condition is true, etc.) take effect immediately - the next instruction executed is the one at the target location.

CISC Characteristics(WILL need to compare between this and RISC)

CISC architectures generally have the opposite characteristics vs. RISC, for example: often use microprogrammed control units, which provide more flexibility in executing complex instructions but can be slower than hardwired control. typically have fewer general-purpose registers compared to RISC architectures, which can lead to more memory access for temporary data storage. • Variable-length instructions rather than fixed-length; • A memory-memory or memory-register instruction set rather than load/store; • Many and/or complex addressing modes rather than just a few simple ones;

Datapath

Carries out the execution of the machine language instructions which is comprised of: Register set Functional Hardware: ALU, shifter, etc. CPU's Internal Circuitry to store and manipulate binary values

Register Set

Collection of D flip-flops with appropriate hardware(multiplexers,demultiplexers, decoders, etc.) for routing data to/from them, enabling reading/writing of data.

indexed (or displacement) addressing. - memory location + immediate constant

Combines a pointer in a CPU register with a constant offset(displacement) encoded into the instruction.The sum of those two values determines the location of the operand in memory Similar to array indexing

Computational Instructions type

Computational instructions produce a numerical result(s) based on the operands and the operation performed on them. 1. Integer arithmetic (ex. addition, multiplication) 2. Real-number arithmetic (if needed for the intended applications) (ex. also addition, multiplication) 3. Boolean logic (ex. AND, OR) 4. Bit shifting (moving bits left or right) 5. Comparisons (ex. <, >, =)

Data Transfer Instruction Type

Copy data from one place to another within the machine (without doing any actual computation) Memory to register. Register to memory. Register to register. Memory to memory (in some machines). Constant to register or memory

Tomasulo's method Bit and Tag, reservation stations, and

Each data register has a busy bit and a tag field associated with it. • The busy bit is set when an instruction specifies that register as its destination, and cleared when the register receives the result of that instruction. • The tag field identifies which functional unit will compute the result being sent to that register. Each functional unit has Reservation Stations , which are input registers that can hold operands for computations to be performed by that unit. All the functional units and the data registers are connected together by a common data bus(CDB) The CDB feeds results computed by the functional units back to the data registers (and also to the reservation stations that serve as inputs to the functional units).

Horizontal Microprogramming

Each microword will have a bit position for every control signal that the machine needs to operate

Immediate Addressing -- OPERAND is in Instruction

Embed the operand itself (in binary) into the machine language instruction. In other words, the instruction specifies explicitly what the operand is. Primary advantage of immediate addressing is speed of operation. Since the operand is embedded in the code - and since most architectures do notallow for self-modifying code, it's good for constant operands only

*Given an instruction in assembly/binary and some information about it, show the opposite and describe the operands and what it does

Find Example in 3.1 Slideshow

VLIW Advantages

Format for a VLIW architecture contains many more bits than the instruction format for a typical RISC or CISC processor. Each "very long" instruction contains enough bits to specify the equivalent of several (three,four, or even more) machine instructions in a conventional architecture. • Ideally, the number of "slots" (independently programmable operations) in one VLIWinstruction matches up to the number of parallel functional units that are available for executinginstructions. They can achieve higher performance than similar superscalar CPU

Program Counter Relative (PC relative)

Instead of adding the displacement to the contents of a general-purpose CPU register, it is added to the program counter register (the one that keeps track of execution of program code).

Single and double precision (32 and 64 bit)

IEEE 754 defined two principal floating-point number formats: single precision (32 bits) and double precision (64 bits).• The Intel x87 floating-point registers (that hold real number values in x86 processors) are 80 bits wide, corresponding to the double extended precision. • Typical equivalent variable types in high-level languages include:• float = IEEE single precision (32 bits) • double = IEEE double precision (64 bits) • long double (not supported by all compilers) = IEEE double extended precision (80 bits)

nop

If no independent instruction can be found, the compiler (or human assembly languageprogrammer) can place a ____ (no operation) in the delay slot

flushed from pipeline

If the pipeline proceeds to fetch instructions from the correct path, execution can proceed normally; but if instructions from the incorrect path are fetched and proceed some number of stages down the pipeline, they must be _______" from the pipeline to avoid incorrect execution.

Instruction Addressing Modes Summary

Immediate addressing Tells us what the operand is. • Good for constant data - no goodfor variables Register addressing Tells us where the operand is the register set. • Direct addressing tell us where the operand is main memory.

Disadvantage of Memory-Register ISA

Implementing this type of ISA usually requires variable-length instructions (which also may take a variable amount of time to execute).• Control unit design may be more complex• It may be harder to pipeline instruction execution (see Chapter 4)• Instructions may not execute as quickly

Carry Save Adder (CSA)

In a CSA, the carries are not cascaded from one bit position to the next. Instead, each full adder takes three inputs and produces two outputs. This doesn't do much good all by itself, but the CSA stages can be cascaded together in a treestructure to add together more than two binary numbers

Load-store architecture

In such an ISA, the only instructions that access memory for operands are load (transfer datafrom memory to a register) and store (transfer data from a register to memory).

Advantages of wider Registers

In terms of storing values, more bits means a greater range of integer values; greater range/more precision for real values In terms of memory addressing, more bits enables a larger address space, 16 bits can only address 64 KB; 32 bits can address 4GB; 64 bits allows us to address up to 16 EB (2^64bytes)

SPARC's unique Organization structure

Instead of Grouping register by what they hold (Data vs addresses), they are grouped by their scope within a program: Global Registers are visible within all procedures Local Registers contain data private to the currently running procedure Out registers are used to pass arguments to a called procedure In registers are used to receive parameters from the calling procedure

memory data register (MDR)

It is a bidirectional register (inputs and outputs on both sides) that holds the data being read from (or written to) memory.

Restoring Division Continued

It is analogous to long division by hand. We iteratively attempt to see if the divisor "goesinto" the partial dividend (by subtracting it), moving one bit to the right each time.• If we get a positive or zero partial remainder, then we put a 1 in the correspondingquotient bit.• If we get a negative partial remainder, we put a 0 in the quotient and add back (restore)the divisor to the dividend.• The algorithm continues until the entire dividend is processed.

instruction register (IR)

It is part of the control unit. It holds the current machine language instruction so it can be decoded - which will tell the control unit what needs to be done.

Memory Address Register (MAR)

It is used to provide an address to the memory system whenever we need to read or write a memory location.

program counter register (PC)

It keeps track of where in memory the current instruction is located. As part of the execution of the current instruction, it is incremented to point to the next instruction.

Memory-register or Load-Store Oriented

Most of today's popular computer architectures have two- or three-operand instructions and a reasonably large collection of internal registers. this is what we classify them as.

Nonrestoring Division

Removes the need to restore the divisor back to the partial dividend. It generates quotient bits that represent 1 and -1 instead of 1 and 0; the resulthas to be resolved back into the correct binary value at the end of the computation.

ALU Function

Should at least be able to do binary addition and subtraction, a set of Boolean operations, and preferably have some bit shifting capability.

Integer Multiplication

Since the logical AND function implements one-bit binary multiplication, we can generate thepartial products with four AND gates each (16total).

Important Aspects of Register Set

Size and Logical Organization Size: Width of each register (in bits) Number of registers provided Width: Register width in microprocessors has increased over the years from 8 to 16 to 32 now 64 bits

Ring Counter

The Control step counter couldbe implemented using a ringcounter or a binary counterplus a decoder.

Data Hazards

The combination of register renaming and data forwarding helps to minimize the occurrence of stalls caused by _____

Control Store

The control unit itself is programmable; it contains a special memory called ____. from which it fetches microinstructions - in much the same manner that the CPU as a whole fetches machine instructions from main memory.

Branch Penalty

The default behavior is usually to fetch instructions from the sequential execution path. If the branch condition is false, this will be correct. But if the branch condition turns out to be true, fetching the incorrect instructions will incur a _______ (one or more wasted clock cycles due to discarding wrongly fetched instructions)

radix point

The dot that separates the whole part from the fractional part in a real number in any base

IEEE 754 standard

The group came up with a set of specifications for representing and operating on floating-point numbers. IEEE assigned standard number 754 to the document when it was finalized in1985.

Multithreading Purpose and Meaning

The idea here is that by running multiple threads, we can: 1. Help avoid stalls due to data and control hazards 2. Help cover up memory system latency 3. Make better use of CPU resources (keep hardware functional units busy more of the time)... particularly when individual threads have limited ILP

delay slot

The instruction(s) following the delayed control transfer instruction are said to be in its _____. Most architectures that implement this feature have only one delay slot behind each controltransfer

Operation Code (Op code)

The leftmost field (4 bits). It determines the function of the instruction--What will it do?

Overhead from buffer registers (Look at 4.1 slides for clarifying notes)

The logic for the pipeline stages has propagation delay. The pipelineregisters introduce delays also!

Normalization

The mantissa is always expressed with the radix (binary) point inthe same position with respect to the significant digits.

Addressing Modes Definition

The means provided by the architecture to specify the value(s)and/or location(s) of instruction operands.

Direct Addressing - OPERAND is in specified memory location (in instruction)

Uses a bit field within the instruction to specify which main memory location contains an operand (and/or receives the result). • Instead of op code followed by the operand (as in immediate addressing), we have op code followed by the memory address of the operand

Register Addressing

Uses a bit field within the instruction to specify which of the several CPU registers contains an operand (and/or receives the result computed by the ALU). Advantages: 1. Operands kept in registers can be accessed very quickly (registers are part of the CPUcore and connect directly to the ALU). 2. We usually only need a small bit field to identify a CPU register (3 bits if there are 8registers, 4 bits if there are 16 registers, 5 bits if there are 32).

Fixed Point- Disadvantage Notation

Using a finite number of bits, we are trading off range vs. precision.To get more precision we have to sacrifice range, and vice versa. Works reasonably well when all values fall within a limited range, but is extremely inefficient and unwieldy when we have to work with both very large and very small numbers.

VLIW Disadvantages

VLIW architectures are not compatible with previous architectures,and computer users value compatibility. Other disadvantages of VLIW include: 1. Need for a very complex and specialized compiler 2. Large code size (poor code density) compared to conventional architectures (have to encodehundreds of bits per instruction whether or not all of them specify active operations) 3. Need for a very large register set 4. High memory bandwidth is needed to support transfers of "very long" instructions and data 5. Poor performance on branch-intensive code

VLIW - what it's trying to do,

Very Long Instruction Word (VLIW) architectures are an alternative approach that moves the instruction scheduling burden from hardware into software (the compiler).

Arithmetic hardware

What Put the "compute" in a computer Some applications need to work with read numb All applications need to be able to do arithmetic (Plus Boolean logic, shifting, etc.) on integer values-Signed and Unsigned.


संबंधित स्टडी सेट्स

match element in column a with column B that has the same chemical properties

View Set

ACTIVE VOICE- PASSIVE VOICE REVIEW

View Set

Chapter 9 - Operating Activities

View Set

Renal (CKD/AKD) Review Questions

View Set

Ev.- Maternity and Women's Health Nursing

View Set