Digital VLSI - Hardware Section

Ace your homework & exams now with Quizwiz!

What are the different techniques used to improve performance of instruction fetching from memory?

1) Instruction Cache and Pre-fetch: An instruction cache and Prefetch algorithm will keep on fetching instructions ahead of the actual instruction decode and execute phases, which will hide the memory latency delay for instruction fetch stage in the design. 2) Branch Prediction and Branch Target Prediction: A Branch Prediction will help in predicting if a conditional branch will take place or not based upon the history, and A Branch Target Prediction will help predicting the target before the processor computes. This helps in minimizing instruction fetch stalls as the fetch algorithm can keep fetching instructions based on prediction.

What are the different types of registers implemented in a CPU?

1) Program Counter (PC): A Program Counter is a register that holds the address of the instruction being executed currently. 2) Instruction Register (IR): An Instruction Register is a register that holds the instruction that is currently getting executed. (It will be value at the address pointed by PC) 3) Accumulator: An accumulator is a register that holds the intermediate results of arithmetic and logic operations inside a processor 4) General Purpose Registers: General Purpose Registers are registers that can store any transient data required by a program. The number of General purpose registers is defined by the architecture and these can be used by Software (Assembler) for storing temporary data during a program execution. More the number of General Purpose registers, faster will the CPU execution. 5) Stack Pointer Register (SP): The Stack Pointer Register is a special purpose register that stores the address of the most recent entry that was pushed on to stack. The most typical use of a stack is to store the return address of a subroutine call. The SP register helps in maintaining the top of the Stack Address.

What are different kinds of memories in a system?

1) Register 2) Cache 3) Main Memory/Primary Memory 4) Secondary Memory (Magnetic/Optical)

What is a TLB (Translation lookaside buffer)?

A TLB is a cache that stores the recent address translations of a virtual memory to physical memory which can be then used for faster retrieval later. If a program requests a virtual address and if it can find a match in the TLB, then the physical address can be retrieved from the TLB faster (like a cache) and the main memory need not be accessed. Only, if the translation is not present in TLB, then a memory access needs to be performed to actually do a walk through the page tables for getting the address translation which takes several cycles to complete. Following diagram illustrates this, where if the translation is found in the TLB, the physical address is available directly without needing to go through any Page table translation process.

What is the difference between a conditional branch and unconditional branch instruction?

A branch instruction is used to switch program flow from current instruction to a different instruction sequence. A branch instruction can be a conditional branch instruction or an unconditional branch instruction. Unconditional Branch Instruction: A branch instruction is called unconditional if the instruction always results in branching. Example:Jump <offset> is an unconditional branch as the result of execution will always cause instruction sequence to start from the <offset> address Conditional Branch Instruction: A branch instruction is called conditional if it may or may not cause branching, depending on some condition. Example:beq ra , rb, <offset> is a conditional branch instruction that checks if two source registers (ra and rb) are equal, and if they are equal it will jump to the <offset> address. If they are not equal, then the instruction sequence will continue in the same order following the branch instruction.

What is branch prediction and branch target prediction?

A branch predictor is a design that tries to predict the result of a branch so that correct instruction sequences can be pre-fetched into instruction caches to not stall instruction execution after encountering a branch instruction in the program. A branch predictor predicts if a conditional branch will be taken or not-taken. A branch target predictor is different and predicts the target of a taken conditional branch or an unconditional branch instruction before the target of the branch instruction is computed by the execution unit of the processor.

What's the disadvantage of having a cache with more associativity?

A cache with more associativity will need a bigger comparator to compare an incoming address against the tags stored in all the ways of the set. This means more power consumption and more hardware.

Will there be a difference in the performance of a program which searches a value in a linked list vs a vector on a machine that has cache memory present?

A linked list is a data structure that stores its elements in non-contiguous memory location while a vector is a data structure that stores elements in contiguous locations. For a design with cache memory: if one of the memory locations is present in the cache, it is highly likely that the following bytes (contiguous bytes) would also be present in cache memory as any fetch from the main memory to the cache is usually fetched in terms of cache lines (which are generally 64 or 128 bytes). Because of this, searching through a vector will be faster than searching through a linked list on a machine which has cache memory.

What is a pipeline hazard? What are the different types of hazards in a pipelined microprocessor design?

A pipeline hazard is a situation where the next instruction in a program cannot be executed for a certain reason. There are three types of hazards that occur in a pipelined microprocessor as follows: 1) Structural Hazards: These hazards arise because of resource conflict that prevents overlapped execution. For example: if the design has a single Floating Point Execution unit and if each execution takes 2 clock cycles, then having back to back Floating point instructions in the program will cause the pipeline to stall. Another resource that can conflict is memory/cache access. 2) Data Hazards: These hazards arise when an instruction depends on the result of a previous instruction in a way exposed by the pipeline overlapped execution. There can be three types of Data hazards: a) Read after Write (RAW) - This happens if an instruction needs a source which is written by a previous instruction. b) Write after Write (WAW) - This happens if an instruction writes to a register which is also written by a previous instruction c) Write after Read (WAR) - This happens if an instruction writes to a register which is a source for a previous instruction 3) Control Hazards: These hazards arise because of branch and jump instructions that changes the sequence of program execution.

What is meant by a superscalar pipelined processor?

A superscalar pipelined design uses instruction level parallelism to enhance performance of processors. Using this technique, a processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. If the processor can execute "N" instructions parallely in a cycle then it is called N-way superscalar.

What is a vectored interrupt?

A vectored interrupt is a type of interrupt in which the interrupting device directs the processor to the correct interrupt service routine using a code that is unique to the interrupt and is sent by the interrupting device to the processor along with the interrupt. For non-vectored interrupts, the first level of interrupt service routine needs to read interrupt status registers to decode which of the possible interrupt sources caused the interrupt and accordingly decide which specific interrupt service routine to be executed

What is the concept of paging?

All virtual memory implementations divide a virtual address space into pages which are blocks of contiguous virtual memory addresses. A page is the minimum granularity on which memory is moved from a secondary storage to physical memory for managing virtual memory. Pages on most computer systems are usually at least 4 kilobytes in size. Some architectures also supports large page sizes (like 1MB or 4MB) when there is a need of much larger real memory. Page tables are used to translate the virtual addresses seen by the application into physical addresses. The page table is a data structure used to store the translation details of a virtual address to a physical address for multiple pages in the memory.

What is a cache?

Cache is a small amount of fast memory. It sits between the main memory and the CPU. It may also be located on a CPU chip/module.

What is the difference between a SRAM and a DRAM?

DRAM stands for Dynamic Random Access Memory. It is a type of memory in which the data is stored in the form of a charge. Each memory cell in a DRAM is made of a transistor and a capacitor. The data is stored in the capacitor. DRAMs are volatile devices because the capacitor can lose charge due to leakage. Hence, to keep the data in the memory, the device must be regularly refreshed. On the other hand, SRAM is a static memory and retains a value as long as power is supplied. SRAM is typically faster than DRAM since it doesn't have refresh cycles. Each SRAM memory cell is comprised of 6 Transistors (unlike a DRAM memory cell which is comprised of 1 Transistor and 1 Capacitor). Due to this, the cost per memory cell is more for SRAM In terms of usage, SRAMs are used in Caches because of higher speed and DRAMs are used for main memory in a PC because of higher densities.

A pipelined machine has 10 stages as shown below. Each stage takes 1 ns to process a data element. Assuming there are no hazards, calculate the time taken to process 100 data elements by the machine.

Each stage of pipeline takes 1ns to process a data element. Since there are 10 stages, the first element takes 10 * 1ns to come out of the pipeline and by that time, the pipeline would be full and all other 99 elements would only take 1ns each. Hence total time taken = (10+99) ns = 109 ns

Explain the concept of Little Endian and Big Endian formats in terms of memory storage?

Endian-ness refers to the order in which bytes are stored in a memory (It can also be applicable to digital transmission systems where it describes the byte order for transmission) Memory is normally byte addressable but majority of the computer architectures works on 32 bit size or a word size (4 bytes) operands. Hence, for storing a word into a byte addressable memory there are two ways: 1) Store the Most significant byte of the word at a smaller address. This type of storage refers to Big Endian format. 2) Store the Least significant byte of the word at a smaller address. This type of storage refers to Little Endian format. For example: If a CPU is trying to write the word 0xDDCCBBAA to an address starting from 0x1000 (address range: 0x1000 to 0x1003), the bytes can be stored in following two different endianness as shown below.

What are the different algorithms used for cache line replacement in a set-way associative cache?

Following are some of the algorithms that can be implemented for cache line replacements. 1) LRU (Least Recently Used) Algorithm: This algorithm keeps track of when a cache line is used by associating "age bits" along with cache line and discards the least recently used one when needed. 2) MRU (Most Recently Used) Algorithm: This is opposite to LRU and the line that is most recently used in terms of age gets replaced. 3) PLRU (Pseudo LRU) Algorithm: This is similar to LRU except that instead of having aging bits (which is costly with larger and higher associative caches), only one or two bits are implemented to keep track of usage. 4) LFU (Least Frequently Used) Algorithm: This algorithm keeps track of how often a line is accessed and decides to replace the ones that are used least number of times. 5) Random replacement: In this algorithm, there is no information stored and a random line is picked when there is a need for replacement.

What are the different types of addressing modes for an instruction?

Following are some of the most commonly used addressing modes for an instruction (though several other modes might also be supported by some architectures): 1) Immediate mode: In this mode, the operands are part of the instruction itself as constants as shown below: add r0 r1 0x12 (add a constant value of 0x12 with contents of r1 and store result in r0) 2) Direct Addressing mode: In this mode, the address of the operand is directly specified in the instruction. load r0 0x10000 (load data from address 0x10000 to register r0) 3) Register Addressing mode: In this mode, the operands are placed in registers and the register names are directly specified part of instruction mul r0, r1 , r2 (multiply contents of r1 and r2, and store the result in r0) 4) Indexed Addressing mode: In this mode, content of an index register is added with an offset (which is part of instruction) to get the effective address. load r0 r1 offset (Here r1 contains the base address, and "r1 + offset" will give the address of a memory location from which data is read and stored into r0)

What techniques can be used to avoid each of the 3 types of pipeline hazards - Structural, Data and Control Hazards?

Following are some of the techniques that are used to avoid each of the pipeline hazards: 1) Structural Hazards: a) Duplicating resources to enable parallel execution - separating instruction and data caches, having multiple execution units for integer and floating point operations, separate load and store units, etc. 2) Data Hazards: a) Out of order execution - Instructions which are not dependent on each other can execute while the dependent instruction stalls. b) Data forwarding - For RAW hazards, the write from an instruction can be forwarded to the next dependent instruction to eliminate the hazard. 3) Control Hazards: a) Use branch prediction algorithms to make predictions about branch outcome so that correct set of instructions can be fetched following the branch.

What is the difference between snoop based and directory based cache coherency protocol?

Following is the difference between the two types of cache coherency protocols: 1) Snoop based Coherence Protocol: In a Snoop based Coherence protocol; a request for data from a processor is send to all other processors that are part of the shared system. Every other processor snoops this request and sees if they have a copy of the data and responds accordingly. Thus every processor tries to maintain a coherent view of the memory 2) Directory based Coherence Protocol: In a Directory based Coherence protocol; a directory is used to track which processors are accessing and caching which addresses. Any processor making a new request will check against this directory to know if any other agent has a copy and then can send a point to point request to that agent to get the latest copy of data. Following are some of the advantages or disadvantages of each protocol

What is the problem of cache coherency?

In Shared Multiprocessor (SMP) systems where multiple processors have their own caches, it is possible that multiple copies of same data (same address) can exist in different caches simultaneously. If each processor is allowed to update the cache freely, then it is possible to result in an inconsistent view of the memory. This is known as cache coherency problem. For example: If two processors are allowed to write value to a same address, then a read of same address on different processors might see different values.

What is the difference between Von-Neumann and Harvard Architecture and which would you prefer?

In Von Neumann architecture, there is a single memory that can hold both data and instructions. Typically, this would mean that there is a single bus from CPU to memory that accesses both data and instructions. This architecture has a unified cache for both data and instructions. In Harvard Architecture, memory is separate for data and instructions. There can be two separate buses to access data and instruction memory simultaneously. There will also be separate caches for Instruction and Data in this architecture. The Von Neumann architecture is relatively older and most of the modern computer architectures are based on Harvard architecture.

What's the advantage of using one-hot coding in design?

In one-hot coding, two bits are changing each time, one being cleared and the one being set. The advantage being that you don't need to do decode to know which state you are in. It uses more Flip-Flops but less combinational logic and in timing critical logic not having the decode logic may make the difference.

What is the difference between in-order and out-of-order execution?

In-Order Execution: In this model, instructions are always fetched, executed and completed in the order in which they exist in the program. In this mode of execution, if one of the instructions stalls, then all the instructions behind it also stall. Out-of-Order Execution: In this model, instructions are fetched in the order in which they exist in the program, their execution can happen in any order, and their completion again happen in-order. The advantage of this model is that if one instruction stalls, then independent instructions behind the stalled instruction can still execute, thereby speeding up the overall execution of program.

What is the difference between an inclusive and exclusive cache?

Inclusive and exclusive properties for caches are applicable for designs that have multiple levels of caches (example: L1, L2, L3 caches). If all the addresses present in a L1 (Level 1) cache is designed to be also present in a L2 (Level 2) cache, then the L1 cache is called a strictly inclusive cache. If all the addresses are guaranteed to be in at-most only one of the L1 and L2 caches and never in both, then the caches are called exclusive caches. One advantage of exclusive cache is that the multiple levels of caches can together store more data. One advantage of inclusive cache is that in a multiprocessor system, if a cache line has to be removed from a processor's cache, it has to be checked only in L2 cache while with exclusive caches, it has to be checked for presence in both L1 and L2 caches.

What are interrupts and exceptions and how are they different?

Interrupt is an asynchronous event that is typically generated by an external hardware (an I/O device or other peripherals) and will not be in sync with instruction execution boundary. For example: An interrupt can happen from a keyboard or a storage device or a USB port. Interrupts are always serviced after the current instruction execution is over, and the CPU jumps to execution of the Interrupt service routine. Exceptions are synchronous events generated when processor detect any predefined condition while executing instructions. For example: when a program encounters a divide by zero or an undefined instruction, it can generate an exception. Exceptions are further divided into three types and how the program flow is altered depends on the type: 1) Faults: Faults are detected and serviced by processor before the faulting instruction 2) Traps: Traps are serviced after the instruction causing the trap. The most common trap is a user defined interrupt used for debugging. 3) Aborts: Aborts are used only to signal severe system problems when execution cannot continue any longer.

What is the principle of spatial and temporal locality of reference?

Locality of reference is a principle that defines if a memory location is accessed by a program, how frequently will the same memory location or nearby storage locations be accessed again. There are two types of locality of reference explained as below: 1) Temporal Locality: If at one point in time a particular memory location is referenced, then it is likely that the same location will be referenced again in the near future. 2) Spatial Locality: If a particular memory location is referenced at a particular time, then it is likely that nearby memory locations will be referenced in the near future.

What is meant by memory mapped I/O?

Memory Mapped I/O (MMIO) is a method of performing input/output (I/O) between a CPU and an I/O or peripheral device. In this case, the CPU uses the same address bus to access both memory and I/O devices (the registers inside I/O device or any memory inside the device). In the system address map, some memory region is reserved for the I/O device and when this address is accessed by the CPU, the corresponding I/O devices that monitor this address bus will respond to the access. For Example: if a CPU has a 32 bit address bus: it can access address from 0 to 232, and in this region, we can reserve addresses (say from 0 to 210) for one or more I/O devices.

Explain the concept of pipelining in computer architecture?

Pipelining is a technique that implements a form of parallelism called instruction-level parallelism within a single processor. The basic instruction cycle is broken up into a series of steps called a pipeline. Rather than processing each instruction sequentially (finishing one instruction before starting the next), each instruction is split up into a sequence of steps so different steps can be executed in parallel and instructions can be processed concurrently (starting one instruction before finishing the previous one).Pipelining increases instruction throughput by performing multiple operations at the same time, but does not reduce instruction latency, which is the time to complete a single instruction from start to finish, as it still must go through all steps. For example: an Instruction life cycle can be divided into five stages - Fetch, Decode, Execute, Memory access, and Write back. This allows the processor to work on several instructions in parallel.

What is a RFO?

RFO stands for Read for Ownership. It is an operation in cache coherency protocol that combines a read and invalidate broadcast. It is issued by a processor trying to write into a cache line that is in the Shared or Invalid states. This causes all other processors to set the state of that cache line to I. A read for ownership transaction is a read operation with intent to write to that memory address. Hence, this operation is exclusive. It brings data to the cache and invalidates all other processor caches that hold this memory address.

A byte addressable CPU with 16-bit address bus has a cache with the following characteristics: a) It is direct-mapped with each block of size 1 byte and b) The cache index for blocks is the four bits. How many blocks does the cache hold? How many tag bits are needed to be stored as part of cache block?

Since the index for blocks in cache is 4 bits, there will be a total of 16 blocks in the cache. Given a 16 bit address and block size of 1 byte, address [3:0] will be used to index into the 16 blocks in cache and remaining bits address[15:4] will be used as tag bits.

A computer has a memory of 256 Kilobytes. How many address bits are needed if each byte location needs to be addressed?

Since the total memory size is 256KB (28*210 Bytes), each address would be eighteen bits wide.

What is a MESI protocol?

The MESI protocol is the most commonly used protocol for cache coherency in a design with multiple write back caches. The MESI stands for states that are tracked per cache line in all the caches and are used to respond to snoop requests. These different states can be explained as below: 1) M (Modified): This state indicates that the cache line data is modified with respect to data in main memory and is dirty. 2) E (Exclusive): This state indicates that the cache line data is clean with respect to memory but is exclusively present only in this memory. The exclusive property allows the processor in which this cache is present to do a write to this line 3) S (Shared): This state indicates that the cache line data is shared in multiple caches with same value and is also clean with respect to memory. Since this is shared with all caches, the protocol does not allow a write to this cache line. 4) I (Invalid): This state indicates that the cache line is invalid and does not have any valid data. A cache can service a read request when the cache line is in any state other than Invalid. A cache can service a write request only when the cache line is in Modified or Exclusive state.

What is the difference between a virtual memory address and a physical memory address?

The address used by a software program or a process to access memory locations in it's address space is known as virtual address. The Operating System along with hardware then translates this to another address that can be used to actually access the main memory location on the DRAM and this address is known as physical address. The address translation is done using the concept of paging and if the main memory or DRAM does not have this location, then data is moved from a secondary memory (like Disk) to the main memory under OS assistance.

If a CPU is busy executing a task, how can we stop it and run another task?

The program execution on a CPU can be interrupted by using external interrupt sources.

What are the different mechanisms used for mapping memory to caches? Compare the advantages and disadvantages of each.

There are three main mapping techniques that are used for mapping main memory to caches as explained below. In each of these mapping, the main memory and the cache memory are divided into blocks of memory (also known as cache line and is generally 64 bytes) which is the minimum size used for mapping. 1) Direct Mapping: In Direct mapping, there is always a one to one mapping between a block of main memory and cache memory. For example: in the diagram below, the size of cache memory is 128 blocks while the size of main memory is 4096 blocks. Block 0 of main memory will always map to Block 0 of cache memory, Block 1 to Block 1, .... , and Block 127 will map to Block 127. Further, Block 128 will again map to Block 0, Block 129 to Block 1, .. , so on, and this can be generalized as Block " k" of main memory will map to Block "k modulo 128" on to the cache. If the block size is 64B and the address is 32 bits, then address [5:0] will be used to index into the block, address [12:6] will be used to identify which block in cache this address can map and remaining address bits address [31:13] will be stored as tag bits along with the data in the cache memory. This is the simplest of all mapping and by knowing the memory address, the possible location in cache can be computed easily and a comparison with tag bits in that single location alone will tell you if a cache is hit or miss. The disadvantage of this mapping is that even though cache may not be full, but if memory access pattern is to addresses which fall in same block, it can cause more evictions and is not efficient. 2) Fully Associative Mapping: In fully associative mapping, any block of memory can be mapped to any block in the cache memory. Using the same example as shown in the diagram above, address [5:0] will be used to index inside the block, and all remaining bits i.e. address [31:6] will be stored as tag bits along with the data in cache. For looking up any memory address, all the address bits [31:6] have to be compared against all the tag bits in the cache location and this demands a bigger comparator logic that can consume more power. The advantage of this mapping is that all locations can be fully utilized as any memory block can map to any cache block. 3) Set-Way Associative Mapping: In this mapping, the blocks of cache memory are grouped into a number of sets. For example, the diagram below shows the same cache of 128 blocks organized as 64 sets with each set having 2 blocks. Based on the number of blocks in a set, this is known as 2 way set associative cache. In this mapping, the main memory block is direct mapped to a set and then within the set it is associated with any block.Considering the same example of a 32 bit address, address [5:0] will be used to index to a byte in the block, address [11:6] will be used to directly map to one of the 64 sets of the cache and remaining address bits [31:12] will stored as tag bits along with each cache line.

What are MESIF and MOESIF protocols?

These are two extensions of MESI protocol which introduces two new states "F" and "O" which are explained below: 1) F (Forward): The F state is a specialized form of the S state, and indicates that a cache should act as a designated responder for any requests for the given line by forwarding data. If there are multiple caches in system having same line in S state, then one of them is designated as F state to forward data for new requests from a different processor. The protocol ensures that if any cache holds a line in the S state, at most one (other) cache only holds it in the F state. This state helps in reducing traffic to memory as without F state, even if a cache line is in S state in multiple caches, none of them cannot forward data to a different processor requesting a read or write. (Note that an S state line in cache can only service the same processors reads). 2) O (Owned): The O state is a special state which was introduced to move the modified or dirty data round different caches in the system without needing to write back to memory. A line can transition to O state from M state if the line is also shared with other caches which can keep the line in S state. The O state helps in deferring the modified data to be written back to memory until really needed.

A 4-way set associative cache has a total size of 256KB. If each cache line is of size 64 bytes, how many sets will be present in the cache? How many address bits are needed as tag bits? Assume address size as 32 bits.

Total number of blocks in cache = 256K/64 = 4096. Since the cache is 4 way set associative, number of sets = 4096/4 = 1024 sets. Given a 32 bit address and 64 byte cache line, address [5:0] is used to index into cache line, address [15:6] is used to find out which set the address maps to (10 bits) and remaining address bits [31:16] are used as tag bits.

What is the concept of Virtual memory?

Virtual memory is a memory management technique that allows a processor to see a virtual contiguous space of addresses even if the actual physical memory is small. The operating system manages virtual address spaces and the assignment of memory from secondary device (like disk) to physical main memory. Address translation hardware in the CPU, often referred to as a memory management unit or MMU, translates virtual addresses to physical addresses. This address translation uses the concept of paging where a contiguous block of memory addresses (known as page) is used for mapping between virtual memory and actual physical memory. Following diagram illustrates this concept.

What is meant by page fault?

When a memory page that is mapped into Virtual Address Space but is not loaded into the main memory is accessed by a program, computer hardware [Memory Management Unit (MMU)] raises an interrupt. This interrupt is called Page Fault.

What is a cache miss or hit condition?

When an address is looked up in the cache and if the cache contains that memory location, it is knows as a cache hit. When the address looked up in the cache is not found, then it is known as a cache miss condition.

Give an overview of Cache Operation. What's the principle behind functioning of cache?

Whenever CPU requests for contents of a memory location, cache is checked for this data first. If data is present in the cache, CPU gets the data directly from the cache. This is faster as CPU doesn't need to go to main memory for this data. If data is not present in the cache, a block of memory is read from main memory to the cache and is then delivered from cache to CPU in chunks of words. Cache includes tags to identify which block of main memory is in each cache slot.

What is difference between write-thru and write-back caches? What are the advantages and disadvantages?

Write Thru Cache: In a write-thru cache, every write operation to the cache is also written to the main memory. This is simple to design as memory is always up to date with respect to cache, but comes with the drawback that memory bandwidth is always consumed for writes. Write Back Cache: In a write-back cache, every write operation to the cache is only written to cache. Write to main memory is deferred until the cache line is evicted or discarded from the cache. Write back Caches are better in terms of memory bandwidth as data is written back only when needed. The complexity comes in maintaining coherent data if there are multiple caches in system that can cache a same address, as memory may not always have latest data.

What is the difference between a RISC and CISC architecture?

● RISC architecture has less number of instructions and these instructions are simple instructions (i.e. fixed length instructions, and less addressing modes). On the other hand, CISC architecture has more number of instructions and these instructions are complex in nature (i.e. variable length instructions, and more addressing modes). ● RISC approach is to have smaller instructions and less complex hardware, whereas CISC approach is to have more complex hardware to decode and break down complex instructions. Hence, In RISC architecture emphasis is more on software, whereas in CISC architecture emphasis is more on hardware. ● Since CISC has complex hardware, it requires smaller software codes and hence less RAM to store programming instructions. As RISC has less complex hardware, RISC require software programs that uses more number of instructions and hence more RAM to store instructions. ● Instructions in RISC architecture usually require one clock cycle to finish, whereas instructions in CISC architecture can take multiple clock cycles to complete depending upon the complexity of the instruction. Due to this, pipelining is possible in RISC architecture. ● RISC architecture aims to improve performance by reducing number of cycles per instruction whereas CISC architecture attempts to improve performance by minimizing number of instructions per program. CISC architectures supports single instruction that can read from memory, do some operation and store back to memory (known as memory to memory operation). RISC architectures on the other hand would need multiple instructions to: 1) load the value from memory to an internal register, 2) perform the intended operation, and 3) write the register results back to memory. Example: If we have to multiply two numbers stored at memory locations M1 and M2 and store the result back in memory location M1, we can achieve this through a single CISC instruction: MULT M1, M2 Whereas for RISC, we would need following multiple instructions: LOAD A, M1 LOAD B, M2 PROD A, B STORE M1, A Having mentioned all the differences above, it's important to point out that with advanced computer micro-architecture; even lots of CISC architecture implementations internally translate the complex instructions into simpler instructions first.


Related study sets

AWS Cloud Practitioner Test Questions

View Set

Bracero Program - Mexico Project

View Set

Chapter 8: Customer Accounts (Quiz #1)

View Set

Chapter 12: The Reformation of Christianity Quizlet Extra Credit Terms

View Set

Chapter 32 to 37 Multiple Choice

View Set

Chapter 7: Leader Member Exchange Theory

View Set