Module 1
Register set
(also called register file) serves as internal storage within the CPU to hold operands The major subsystems within the CPU are: and intermediate values during instruction fetch The control unit (CU) which selects and interprets machine instructions and and execution. coordinates the various parts of the computer in executing these instructions.
Performance
= 1/Execution Time Performance X / Performance Y = Execution time Y / Execution time X = n can be improved by: Reducing number of clock cycles Increasing clock rate Hardware designer must o;en trade off clock rate against cycle count depends on Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI, Tc
Clock Cycles
= Instruction Count × Cycles per Instruction
Harvard Architecture
Allows data to be brought into the system and results to be sent out of the system. The distinguishing feature is the presence of separate instruction and data memories. This allows one instruction to be fetched while another stores or reads an operand.
CPI
Cycles per Instruction
Architecture of a Computer
Defines the view of the computer from the perspective of an assembly language or machine language programmer.
MIPS R3000 processor
Fixed size machine instructions (32 bits) • Only load and store instructions can reference memory (load/store architecture) • Uses 32-bit addresses • Pipelined functional units only three formats for machine instructions, all of which are 32 bits • only four addressing modes • 32-bit registers ($0, $1, ... , $31, Hi and Lo) CPU contains 32 registers, each 32 bits wide • Registers may be used for general purposes • MIPS calling convention should be adhered to if the user program is to be compatible with library routines and system software. Register $0 ($zero) is hardwired to zero • (reads as 0 and cannot be overwritten) • Serves a convenient source or a zero constant. Example use: add $3,$2,$0 has same effect as move $3,$2 to copy $2 into $3 • Register $1 ($at) is used by the assembler to implement pseudo-instructions (i.e. synthetic instructions) Temporary registers are not preserved across function calls: $t0 - $t7 (register numbers 8 - 15) $t8 - $t9 (register numbers 24 - 25) Saved registers must be saved and restored if used within a function: $s0 - $s7 (register numbers 16 - 23) $a0 - $a3: arguments (reg's 4 - 7) $v0, $v1: result values (reg's 2 and 3) $t0 - $t9: temporaries (can be overwritten by callee) $s0 - $s7: Must be saved/restored by callee $gp: global pointer for static data (reg 28) $sp: stack pointer (reg 29) $fp: frame pointer (reg 30) $ra: return address (reg 31)
IR (instruction register)
Holds the instruction while the control unit processes it
Response time
How long it takes to do a task
Amdahl's Law
Improving an aspect of a computer and expecting a proportional improvement in overall performance T improved = (T affected / Improve factor) + T unaffected Corollary: make the common case fast
Instruction Set Architecture (ISA)
Layer between hardware & software This saves on development time and on cost manufacturers can innovate and fine-‐tune the hardware for performance without breaking the existing software base
The Computer Hierarchy
Level 4: Assembly Language Level - Acts upon assembly language produced from. Level 5, as well as instructions programmed directly at this level. Level 3: System Software Level - Controls executing processes on the system. - Protects system resources. - Assembly language instructions often pass through Level 3 without modification. Level 2: Machine Level - Also known as the Instruction Set Architecture (ISA) Level. - Consists of instructions that are particular to the architecture of the machine. - Programs written in machine language need no compilers, interpreters, or assemblers. Level 1: Control Level - A control unit decodes and executes instructions and moves data through the system. - Control units can be microprogrammed or hardwired. - A microprogram is a program written in a low-level language that is implemented by the hardware. - Hardwired control units consist of hardware that directly executes machine instructions. Level 0: Digital Logic Level - This level is where we find digital circuits (the chips). - Digital circuits consist of gates and wires. - These components implement the mathematical logic of all other levels.
Native MIPS
Millions of Instructions Per Second Doesn't account for Differences in ISAs between computers Differences in complexity between instructions = Instruction count / Execution time × 10^6 = Clock rate / CPI * 10^6 can be misleading Varies between programs on the same machine Does not reflect the instruc1on set complexityMay vary inversely with performance
Von Neumann Machines
Most modern computers are based on this design. Developed by John von Neumann at Princeton in the 1940's. These machines have 3 major components: • a CPU • a main‐memory system • an I/O system • Both programs and data are stored in a single memory program instructions can be manipulated like data • The program counter is used to fetch instructions data operands are fetched during the execute cycle based on the operand addressing mode • Instructions execute sequentially • flow of control may be altered by a branch type instruction • CPU accesses memory over a single path Which is a potential botlle neck Called the "von Neumann botlleneck"
MIPS Instruction Types
R-type employs register operands I-type contains a literal (immediate value) J-type contains part of the jump address
Stack Architecture
Some classes of computers use mostly implicit stack operands These are known as stack-‐based machines they use zero-‐address instructions the instructions make no explicit reference to operands
MIPS Addressing Modes
The MIPS only supports the following addressing modes: 1. Register mode (operands in registers) 2. Immediate mode (literal contained in instruction) 3. Base relative mode (offset + contents of base register give the memory address of the operand) 4. PC-relative: PC + (4*displacement) gives the branch target address) 5. Pseudo-direct (rightmost 26 bits with machine instruction is multiplied by 4 and concatenated with upper 4 bits of PC to yield the jump address)
Throughput
Total work done per unit time
Clock frequency (rate)
cycles per second e.g., 4.0GHz = 4000MHz = 4.0×10^9 Hz
Instruction Set Architecture (ISA)
defines the software hardware boundary It includes: • the instruction set • the machine instruction formats • the available addressing techniques • the operational register set • the format of the available data types
Complex instruction set computer (CISC)
desire was to close the "semantic gap" the difference in the way operations are specified in an expressive high-level language such as Java or C++ and the their hardware implementation have intricate addressing modes to ease processing high level data structures Using complex instructions made programs smaller CISC type instructions may use multiple memory operands each operand may be referenced using a different addressing mode The instructions support a large and flexible set of operations They can vary in size from one byte to 15 or more bytes They must be partly decoded to know how many additional bytes makeup the instruction This prevents pre-fetching instructions to speed processing The instructions are difficult to implement in hardware and require microprogramming microcode slows down the system after fetching each machine instruction, a series of micro-instructions must be retrieved and executed to interpret the machine instruction Compiler writers often avoid using CISC instructions that are unique to one machine in favor of generating groups of more standard simple instructions to perform the same task may have 8 or fewer registers
Clock period
duration of a clock cycle e.g., 250ps = 0.25ns = 250×10^-12 s
Control unit
generates signals that direct operations For example, which registers to use, what ALU operation to perform, etc. The ALU actually performs the operations (such as addition, multiplication, shifting, etc.)
STATUS register
holds condition codes (negative, zero, positive, a carry, etc.)
MAR (Memory address register)
holds the address of the item in memory to be accessed.
PC (program counter register)
holds the address of the next instruction to fetch from memory
System Bus
is a collection of parallel wires that carry address, data, and control signals. External buses connect the CPU to memory and I/O devices. Internal buses carry signals between registers, the ALU and the control unit
Computer Organization
is also called the microarchitecture describes how the capability defined by the architecture is implemented. Architecture may define 32-‐bit memory word transfers the organization may internally perform two 16-‐bit transfers Computer models that share a common architecture may have different microarchitectures (i.e., organiza1ons). A machine code program can run on different machines with the same architecture without change, organizational details may differ
Opcode
is always the leftmost 6 bits. When the opcode is 0, the operation is determined by rt field. When the opcode is 1, the operation is determined by function code field in the rightmost 6 bits.
Rela1ve MIPS
is based on a standard machine MIPS relative = Time ref / Time unrated * MIPS reference
Peak MIPS
is based on minimum CPI Assumes all instructions in program have minimum CPI Minimum possible CPI = 1 All instructions require at least 1 clock cycle
Cycle time
is the duration of a clock cycle (clock period) Instructions that take fewer cycles execute faster The shorter the cycle time the faster the execution
Clock rate
is the number of cycles per second (Hz) Clock rate = 1/cycle time
CPU time
o Time spent processing a given job o Discounts I/O time, other jobs' shares o includes user CPU time and system CPU time o Different programs are affected differently by CPU and system performance = CPU Clock Cycles * Clock Cycle Time = CPU Clock Cycles / Clock Rate = Instruction Count × CPI × Clock Cycle Time = Instruction Count × CPI / Clock Rate = (Instructions / Program) * (Clock cycles / Instruction) * (Seconds / Clock cycle)
Elapsed time
o Total response time, including all aspects o Processing, I/O, OS overhead, idle time o Determines system performance
Arithmetic Logic Unit (ALU)
performs arithmetic, logical, and shi] operations on the operands and generates the results.
Microprogramming
provides these more complex features microinstructions carry out the many intricate steps easier to implement than hardwired logic A machine instruction is interpreted by a microprogram may be as many as a dozen or more microinstructions they direct hardware in performing the required actions Microprograms are stored in an internal control memory the same chip as the CPU take additional time to process compared to hardwired logic
MDR (Memory data register)
receives the item read from memory or holds the item to be written in to memory
Code expansion
refers to the increase in size that you get when you take a program that had been compiled for a CISC machine and re-‐compile it for a RISC machine The amount of expansion depends primarily on the compiler higher quality compilers generate less code the nature of the machine's instruction set has an effect Typically, code expansion can range up to 100% or more
Reduced instruction set computers (RISC)
require that most instructions use only registers operands only the load and store instructions use memory operands. That is, they employ a load/store architecture. • Instruction sets that only included simple straight forward instructions • Simple addressing modes • Fixed size machine instructions that can be fetched with one memory access • Instruction pipelines that overlap the execution of instructions and complete one instruction per clock cycle • Restricting the use of memory operands to only the load and store type instructions • Hardwired logic implementation as opposed to microprogramming • The use of optimizing compilers to generate the code to perform more complex tasks • May have large instruction sets but the instructions are simple in nature Require multiple instructions to accomplish what CISC machines perform in a single complex instruction. The compiler tor assembly language programmer) has to generate a sequence of simple instructions that perform the same actions of the individual CISC instructions. Can result in code expansion which can be a problem. tend to have 32 or more registers • MIPS • Sparc • ARM • PowerPC
Load/Store Architecture
• Compilers that generate code for RISC processors try to maximize and optimize the use of registers • Only the load and store instructions can access memory operands • Registers are faster to access than memory • Less frequently used variables are spilled to memory • This improves performance and makes the common case fast
Questions answered by the Organization
• are the instructions pipelined? • is a single-‐cycle or multi-cycle datapath used? • how many cache levels are employed? • is secondary storage provided by magnetic disks or by flash? • are there multiple buses? • is the control unit micro programmed or is hardwired logic used? • are multiple execution units included? • how many memory accesses are used to retrieve data or instructions? • are vector type instructions provided by an array processor or by a vector processor?
Ques1ons answered by the Architecture
• what types of operations are available? (integer, floating point, etc.) • which instructions are allowed to reference memory? • do operands have to reside in registers? or can one or more reside in memory? • are vector type instructions and vector registers available? • how many bits do the instructions operands require? • is a segmented or flat memory model used? • how many operands can instructions employ?