Computer Architecture Final
CH 5 - 14. What is an address mode?
Addressing modes are used to identify the instruction operand. An addressing mode can specify a constant, a register or a location in memory.
CH 7 - 1. State Amdahl's Law in words.
Amdahl's Law Amdahl's law states that the computer system total speedup depends on the speedup of a particular component and how much system has used that component.
CH 7 - 9. What is a bus master?
Bus Master Bus Master is the device in computer which drives the address bus and the bus that controls signals.
CH 7 - 8. How does direct memory access (DMA) work?
Direct memory access (DMA) is a computer's feature which allows hardware subsystems in the computer to access memory of the system, independent of CPU.
CH 6 - 27. Explain the differences among L1, L2, and L3 cache.
L1 is the smallest and the fastest cache. The L2 cache is searched next in the case that the required data is not found in the L1. L3 is used as an extra cache which normally located between the processor and the memory.
CH 7 - 30. Magnetic disks store bytes by changing the polarity of a magnetic medium. How do optical disks store bytes?
Optical disks store bytes The optical discs store binary data due to the microscopic disparity in the disc surface's height called as pits and lands. The light is differently reflected by pits as compared to the lands, while reflective difference is translated by a device to bits of "on/off" or 1 and 0. Then the bytes are formed by these bits containing the optical disk data's digital code.
CH 7 - 6. What is polling?
Polling Polling is the concurrent activity of actively examining the external device's status by a client program. In Programmed I/O, CPU continually monitors the control register associated with each I/O port. This regular inspection of I/O module by CPU is called Polling.
CH7 -21. Define the terms seek time, rotational delay, and transfer time. Explain their relationship.
Seek time is the time taken by the disk arm to position itself over the required track, rotational delay is the time taken by the sector to position itself under the read/write head and transfer time is if the access time is added to the time it takes to read data from the disk. The relationship between the three is that access time = seek time + rotational delay. So if access time is added to the time it takes to read data from the disk the result would be transfer time.
CH 6 - 2. What are the advantages of using DRAM for main memory?
Since the DRAM consumes lesser power, generates lesser heat and is capable of storing a higher number of bits per chip, it is the preferred choice for designers for main memory.
CH 7 - 24. By how much is an SSD faster than a magnetic disk?
Solid State Drives (SSDs) are typically 1000 times faster than the traditional magnetic disks.
CH 5 - 4. If a byte-addressable machine with 32-bit words stores the hex value 98765432, indicate how this value would be stored on a little endian machine and on a big endian machine. Why does "endian-ness" matter?
The term endian-ness refers to computer architecture's "byte order", or the way the computer stores the bytes of a multiple-byte data element. Little endian machines store information with the least significant byte first. The hexadecimal value 98765432 would be stored as follows: 32547698 Big endian machines store information with the most significant byte first. The hexadecimal value 98765432 would be stored as follows: 98765432
CH 7 - 27. What is wear leveling, and why is it needed for SSDs?
Wear levelling is the technique used to extend life of erasable computer storage media such as Solid-state drives (SSDs). This technique distributes data and erase/writes cycles evenly over the entire disk. The SSD's stores the data
CH 9 - 10. Suppose that a RISC machine uses 5 register windows. d) Now suppose one more procedure is called. How many register windows need to be stored in memory?
When one more procedure is called, the input, local and output values have to be saved.
CH 9 - 10. Suppose that a RISC machine uses 5 register windows. c) Now suppose that the most recently called procedure returns. Explain what occurs.
When the most recently called procedure returns the previously saved registers are restored.
CH 6 - 18. What is the worst-case cache behavior that can develop using LRU and FIFO cache replacement policies?
While the above algorithms are pretty reliable in most cases, the one significant problem common to both is that they are degenerate. They could cause the system to thrash- the constant alternate induction and expulsion of certain blocks into and out of the cache.
CH 5 - 28, a) The memory unit of a computer has 256K words of 32 bits each. The computer has an instruction format with four fields: an opcode field; a mode field to specify one of seven addressing modes; a register address field to specify one of 60 registers; and a memory address field. Assume an instruction is 32 bits long. Answer the following: a) How large must the mode field be?
a) Since there are 7 modes in all, we would need Ceiling(log2(7))=3 bits to accommodate them. Since each mode must be uniquely identified, the mode selector will require 3 bits.
CH 5 - 14, a) Convert the following expressions from reverse Polish notation to infix notation. a) W X Y Z - + ×
a. W * (X + Y - Z)
CH 5 - 12, a) Convert the following expressions from infix to reverse Polish (postfix) notation. a) X × Y + W × Z + V × U
a. X Y * W Z * + V U * +
CH 6 - 14. Direct mapped cache is a special case of set associative cache where the set size is 1. So fully associative cache is a special case of set associative cache where the set size is ___.
all cache block
CH 5 - 12, b) Convert the following expressions from infix to reverse Polish (postfix) notation. b) W × X + W × (U × V + Z)
b. W X * W U V * Z + * +
CH 5 - 17. Why do we need so many different addressing modes?
The addressing modes allow us to specify where the instruction operands are located. The various addressing modes allow us to specify a much larger range of locations than if we were limited to using one or two modes.
CH 9 - 24. Given the following Omega network, which allows 8 CPUs (P0 through P7) to access 8 memory modules (M0 through M7): a) Show how the following connections through the network are achieved (explain how each switch must be set). Refer to the switches as 1A, 2B, etc.: i) P0 → M2 ii) P4 → M4 iii) P6 → M3
1 - The connection is made from the switches 1A, 2A and 3B.Switch 1A, 2A and 3B is set to through state. 2 - The connection is made from the switches 1A, 2B and 3C. Switch 1A, 2B and 3C is set to through state. 3 - The connection is made from the switches 1C, 2A and 3B. Switch 1C and 3B are set to cross state and 2A is set to through state.
CH 9 - 18. Describe briefly and compare the VLIW and superscalar models with respect to instruction-level parallelism.
18. VLIW models work on instruction level parallelism. These processors detect instructions that are independent of prior instructions and run them parallel, at the same time. They do not rely on hardware to find the instruction stream, but instead the compilar finds the instructions that can run in parallel. Super scalar models have multiple processors that perform different operations on different data streams and they use instruction level parallelism within the processors. It's achieved by distributing multiple instructions among different functional units that allow them to execute at the same time.
CH 9 - 22. What is the difference between UMA and NUMA?
22. The difference between UMA and NUMA is: UMA has shared memory architecture and is used in parallel computing, CPU's communicate through bus to RAM and the access time is independent of the processor that makes the request for access. NUMA has shared memory design that is used in multiprocessing, it has a single address space that is available to all CPU's and it can access it's own local memory faster than it can access non-local memory.
CH 6 - 5. Suppose a computer using fully associative cache has 224 bytes of byte-addressable main memory and a cache of 128 blocks, where each cache block contains 64 bytes. a) How many blocks of main memory are there?
224/26 = 218
CH 6 - 33. What is a page fault?
A page fault occurs when a program attempts to access a block of memory that is not stored in the physical memory, or RAM.
CH 9 - 10. Suppose that a RISC machine uses 5 register windows. a) How deep can the procedure calls go before registers must be saved in memory? (That is, what is the maximum number of "active" procedure calls that can be made before we need to save any registers in memory?)
Because of the circular nature of the windows, the output registers that are in the last window are shared as the input registers of the first window, so only 4 procedures can be active without saving the registers in memory.
CH 6 - 28. Explain the differences between inclusive and exclusive cache.
Caches at multiple levels holding the data at different hierarchical bands on the memory at the same time are called inclusive cache. Functionally opposite to the inclusive cache is the exclusive cache which allows data to be present at only one level at a time.
CH 7 -12. How is channel I/O different from interrupt-driven I/O?
Channel I/O: One or more I/O processors control various I/O pathways called channel paths. One controller can be used to manage several slow devices such as terminals and printers, by combining (multiplexing) channel paths of these devices. Interrupt-Driven I/O: Here, the Interrupt system is used so that the CPU does not have to watch and wait for the I/O module. Instead of CPU continually asking its attached devices whether they have any input, the devices tell the CPU when they have data to send.
CH 9 - 1. Why was the RISC architecture concept proposed?
Cost of Memory, Program Complexity, Length of Clock Cycle, Efficiency, Cost Efficient.
CH 7 - 18. Explain the relationship among disk platters, tracks, sectors, and clusters.
Disks contain the control circuitry and one or more metal or glass disks called platters to which a thin film of magnetisable material is bonded. Tracks are the ring like structure on each platter. These are the concentric circles on the platter. Every track on the disk contains exactly same number of sectors, where each sector contains the same number of bytes. Each sector has a unique address that can be accessed independently. Cluster is a group of sectors. Operating systems address sectors in groups to make file management simpler.
CH7 -11. Name the four types of I/O architectures. Where are each of these typically used, and why are they used there? 12. A CPU with interrupt-driven I/O is busy servicing a disk request. While the CPU is midway through the disk-service routine, another I/O interrupt occurs. c) If not, why not? If so, what can be done about it?
It's not a problem because interrupt driven I/O is capable of taking care of multiple interrupts, which can be done by having multiple interrupt lines available with each line having its own property.
CH 5 - 2, c) Show how the following values would be stored by byte-addressable machines with 32-bit words, using little endian and then big endian format. Assume that each value starts at address 1016. Draw a diagram of memory for each, placing the appropriate values in the correct (and labeled) memory locations. c) 0x14148888
Little endian: 88 88 14 14 1016: +0 +1 +2 +3 Big endian: 14 14 88 88 1016: +0 +1 +2 +3
CH 9 - 24. Given the following Omega network, which allows 8 CPUs (P0 through P7) to access 8 memory modules (M0 through M7): c) List a processor-to-memory access that conflicts (is blocked) by the access P0 → M2 and is not listed in part (a).
Much like P3 to M2, all of the processors try and access M2 which is blocked by the connection between P0 and M2.
CH 9 - 22. What is the fundamental computing element of a neural network?
Neural networks are based on the parallel architecture of human brains. They attempt to implement simplified version of biological neural network. The fundamental elements of a neural network are the Processing Elements (PEs). The job of a processing element is to multiply inputs by various set of weights and ultimately yielding a single output value. The true power of a neural network is the parallel processing of interconnected PEs and the adaptive nature of the sets of weights. These processing elements act in the same way as a neuron does for a human brain. All artificial neural networks are constructed from a basic building block of either processing element or artificial neuron (whichever applicable).
CH 9 - 24. Given the following Omega network, which allows 8 CPUs (P0 through P7) to access 8 memory modules (M0 through M7): b) Can these connections occur simultaneously, or do they conflict? Explain.
No, the connections can't occur simultaneously. When the connections to P6 to M3 are blocked by M0 to M2 and the states of the switches 2A and 2B are not matched.
CH 7 - 28. What is the name for robotic optical disk library devices?
Optical jukeboxes
CH7 - 48. a) Which of the RAID systems described in this chapter cannot tolerate a single disk failure?
RAID-0 is the RAID system that cannot tolerate a one disk failure since of one disk fails then it will affect the whole array.
CH 6 - 3. Name three different applications where ROMs are often used.
ROM's also have applications in automobiles, toys and other appliances where the information needs to be maintained even without a continuous power supply.
CH 7 - 16. What is settle time, and what can be done about it?
Settle time is the time taken by the signal levels to stabilize, or "settle down" in the I/O transfers.
CH 6 - 6. What are the three forms of locality?
Temporal locality: Items that have been recently accessed have a higher probability of being accessed in the near future is known as temporal locality. Spatial locality: Data is more likely to be accessed from adjacent locations is known as spatial locality. Sequential locality: It is more likely that instructions would be executed in the sequence they are stored- this is sequential locality.
CH 7 - 41. Which RAID levels offer the best economy while providing adequate redundancy?
The RAID 5 among the other RAID levels, offers the best economy while providing the adequate redundancy. Compared with other RAID systems, RAID 5 offers the best protection for the least cost.
CH7 - 28. Suppose a disk drive has the following characteristics: • 4 surfaces • 1,024 tracks per surface • 128 sectors per track • 512 bytes/sector • Track-to-track seek time of 5ms • Rotational speed of 5,000 rpm a) What is the capacity of the drive?
The calculation for the capacity of the disk drive is capacity = surfaces x tractors x sectors x bytes. Therefore capacity = 4 x 1024 x 128 x 512 = 256 MB.
CH7 -8. Suppose the daytime processing load consists of 60% CPU activity and 40% disk activity. Your customers are complaining that the system is slow. After doing some research, you learn that you can upgrade your disks for $8,000 to make them 2.5 times as fast as they are currently. You have also learned that you can upgrade your CPU to make it 1.4 times as fast for $5,000. c) What is the break-even point for the upgrades? That is, what price would we need to charge for the CPU (or the disk—change only one) so the result was the same cost per 1% increase for both?
The money needed to upgrade the CPU activity is $41.4284, money needed to upgrade the disk activity is $60.7995, therefore the break-even point for these upgrades is 41.4284 x 131.58 = 5451.14, so everything will be equal when $5451.16 is paid for the disk upgrade and no result will change cost per 1% increase for both.
CH 6 - 31. What is the objective of paging?
The page file is the space on the hard drive that's used for storing these portions of main memory. Virtual memory is generally implemented using a technique known as paging. The intent of paging is mainly to enable the allocate processes uniform chunks of physical memory.
CH 9 - 15. There are three types of shared memory organizations. What are they?
The three types of shared memory organizations are: 1. Global Shared Memory: A shared memory system achieves the inter processor communication through a global memory. This Global Memory is shared equally among all processors and they can exchange information through it. These are generally server systems that communicate through bus and cache memory controller. These systems provide a general and convenient programming model which helps in data sharing through a simple mechanism of reading and writing the shared structures in common memory. 2. Distributed Shared Memory: In this system there are multiple independent processing nodes and the local memory modules combined at each node of the interconnection network. In this system there is no provision for any global memory the data is transferred rather exchanged between the local memories by means of message passing. They have a scalable nature which makes them very high computing power possible. Communication is done between the processes residing on different nodes require explicit usage of the send and receive signals. 3.
CH 6 - 8. Which of L1 or L2 cache is faster? Which is smaller? Why is it smaller?
There is a substantial variation in the size of the cache memory. The Second Level (L2) cache of a computer can hold anywhere between 256K and 512 K of data. The First Level (L1) cache is smaller and has a capacity of just around 8K to 16K. The cache that's local to the processor is the L1 cache while the L2 cache is located somewhere between the main memory and the central processing unit. Due to this arrangement, the L1 cache is much faster than the L2.
CH 7 - 36. Name the three methods for recording WORM disks.
Three methods for recording WORM disks are: 1. Ablative: In this, the reflective metal coating is contained between the protective layers of the disk and a high-powered laser is used to melt. 2. Bimetallic Alloy: Two metallic layers encased between the protective coatings on the surfaces of disk are fused using a laser light, which causes a reflectance change in the lower metallic layer. 3. Bubble-Forming: A high-powered laser light is used to hit a single layer of thermally sensitive material pressed between two plastic layers, which results in the formation of bubbles in the material, causing a reflectance change.
CH 9 - 19. What is ubiquitous computing?
Ubiquitous computing is defined as an advanced computing concept in which the computing can be done everywhere and anywhere, but virtually invisible.
CH 7 - 11. What does it mean when someone refers to I/O as bursty?
When I/O tends to create bursty traffic on the bus by sending data in blocks or clusters.
CH 6 - 2. Suppose a computer using direct mapped cache has 232 bytes of byte-addressable main memory and a cache of 1024 blocks, where each cache block contains 32 bytes. c) To which cache block will the memory address 0x000063FA map?
0x000063FA = 0000 0000 0000 0000 0110 0011 1111 1010 in binary. 1100011111 is 799 in decimal. Therefore, 0x000063FA will map to block 799.
CH 5 - 12. What is the difference between an arithmetic shift and a logical shift?
1) Arithmetic Shift: These are commonly used to multiply or divide by 2, treat data as signed two's complement numbers, and do not shift the leftmost bit, since this represents the sign of the number. 2) Logical Shift: These instructions simply shift the bits to either the left or the right by a specified number of bits, shifting in zeros on the opposite end.
CH 7 - 48. How does CNT storage work?
CNTs are abbreviated as Carbon NanoTubes. CNT storage is technology discovery in the field of Nano technology. • CNTs are cylindrical in shape and made up of elemental carbon. The walls of the cylinder are one atom thick. • The Nanotube is suspended over a conductor that represents a zero state. • Voltage needed to attract the nanotube is applied to the gate to change the state from 0 to 1. • The tube stays in place until a release voltage is applied. • The main feature is that it does not consume any power at all until something is written to or read from it.
CH 6 - 19. What, exactly, is effective access time (EAT)?
Effective access time (EAT) is essentially a metric used to determine the performance of hierarchical memory.
CH 7 -14. What is multiplexing?
In telecommunications and computer networks, multiplexing (sometimes contracted to muxing) is a method by which multiple analog or digital signals are combined into one signal over a shared medium. The aim is to share a scarce resource.
CH 6 - 1. Which is faster, SRAM or DRAM?
SRAM is faster than DRAM as SRAM can store the data as long as power is available to SRAM and does not require refreshing.
CH 6 - 4. Explain the concept of a memory hierarchy. Why did your authors choose to represent it as a pyramid?
The memory hierarchy can be visualized as a pyramid which helps depict the categorization based on size/capacity. While the types of memory closer to the peak have a higher performance and hence, cost, they are also smaller in size.
CH 6 - 2. Suppose a computer using direct mapped cache has 232 bytes of byte-addressable main memory and a cache of 1024 blocks, where each cache block contains 32 bytes. a) How many blocks of main memory are there?
The number of blocks in main memory are 2^32/2^5, or 2^27 blocks.
CH 6 - 38. When would a system ever need to page its page table?
When we need to obtain the physical memory address from a virtual memory access.
CH 9 - 6. We propose adding a level to Flynn's taxonomy. What is the distinguishing characteristic of computers at this higher level?
https://www.chegg.com/homework-help/the-essentials-of-computer-organization-and-architecture-3rd-edition-chapter-9-problem-6retc-solution-9781449600068?trackid=d99e94d24b89&strackid=c56582326b2f&ii=1
CH 5 - 14, b) Convert the following expressions from reverse Polish notation to infix notation. b) U V W X Y Z + × + × +
b. U + V * (W + X * (Y + Z))
CH 5 - 19. What is the theoretical speedup for a 4-stage pipeline with a 20ns clock cycle if it is processing 100 tasks?
https://www.chegg.com/homework-help/the-essentials-of-computer-organization-and-architecture-3rd-edition-chapter-5-problem-19retc-solution-9781449600068?trackid=bb69133872a6&strackid=7e4e1587f91d&ii=1
CH 5 - 6. How do memory-memory, register-memory, and load-store architectures differ? How are they the same?
1 - If the architecture allows an instruction to perform an operation even if there is no operand stored in a register, sometimes allowing up to two and three operands in memory, then the architecture is said to be Memory-memory. 2 - In the case where at least one operand is in a register and one in memory, we can identify the architecture as a Register-memory. 3 - When it is mandatory that, for any operations on data to be performed, it first needs to be moved into registers then the architecture is of the Load-store type.
CH 5 - 15. Give examples of immediate, direct, register, indirect, register indirect, and indexed addressing.
1 - Immediate Addressing: Operation code in the instruction is immediately followed by the value to be referenced. 2 - Direct Addressing: Here, the memory address of value to be referenced is specified in the instruction. 3 - Register Addressing: Instead of memory location, a register is used to specify the operand. 4 - Indirect Addressing: The contents of the address field specify a pointer (address) to another location. 5 - Indexed Addressing: A specific register is designated to store an increment or difference, which is then summed with the argument, thereby obtaining the effective address of the data.
CH 5 - 20. What are the pipeline conflicts that can cause a slowdown in the pipeline?
1 - Resource conflicts: Parallel execution of instructions is one area where the effects of these are felt most. 2 - Data dependencies: These arise when the result of one instruction, not yet available, is to be used as an operand to a following instruction. 3 - Conditional branch statements: Branch instructions which allow alteration in the flow of executions in a program cause major problems in terms of pipelining.
CH 5 - 22. Explain superscalar, superpipelining, and VLIW architectures.
1 - Superscalar architectures: Multiple operations at the same time are performed by superscalar architectures by employing parallel pipelines. 2 - Superpipelining: Combining superscalar concepts worth pipelining is accomplished by superpipelining architectures by dividing the pipeline into smaller pieces. 3 - VLIW architecture: The IA-64 architecture exhibits a VLIW architecture, which means each instruction can specify multiple scalar operations. Superscalar and VLIW machines fetch and execute more than one instruction per cycle.
CH 6 - 7. Give two noncomputer examples of the concept of cache.
1 - Suppose we have a huge, heavy box of tools. We need to perform certain carpentry on a block of wood some distance away. We obviously cannot haul the entire toolbox all the way. It would also not be feasible to run back and forth from the work to the toolbox each time we needed another tool. The sensible thing to do thus, would be to pack a case with tools we'd most likely need. This is one example of a cache. 2 - Grocery shopping is another instance. Neither do we buy out the entire grocery store nor do we buy an item at a time. What we do is we purchase and stock a reasonable amount of items depending on various factors such as the size of the household, the weather etc. In this case, the house acts as the cache while the grocery is analogous to the main memory.
CH 5 - 2. Several design decisions exist with regard to instruction sets. Name four and explain.
1) Complexity: How much efforts are required to decode an instruction, how complex it is and what amount of time will be required. 2) Space: While designing the total amount of space needs to be calculated as everything comes with a cost and we need to have a better design with as minimal cost as it could be. 3) Length: The length of instruction needs to undergo thinking while design decision, do we need fixed length instruction or variable length as before executing the instruction the CPU needs to first decode the instruction size and again everything comes with a price. 4) Number: Number of instructions needs to be thought properly as the more number of instructions the more time consuming and complex the design becomes.
CH 5 - 20. What is the difference between using direct and indirect addressing? Give an example.
1) Direct Memory Addressing: The value required is accessed by directly stating the address in the instruction. 2) Since it provides significant flexibility, indirect addressing is a very powerful mode of addressing. Here, the memory address that is to be used as a pointer is contained in the address field.
CH 5 - 16. How does indexed addressing differ from based addressing?
1) Index addressing - The index register basically stores an offset, which is added to the operand of the instruction, resulting in the effective address of the data. 2) The difference with the based addressing mode is that a base address register rather than an index register is used.
CH 7 - 4. Name three types of durable storage.
1) Magnetic Disks: A non-volatile memory where data is saved on a magnetized medium. 2) Optical Disks: The data is stored on an optical medium. 3) Magnetic Tapes: The data is stored as digital information on the magnetic tape as digital recording.
CH 6 - 16. Explain the four cache replacement policies presented in this chapter.
1) Optimal algorithm: The ability to predict future patterns of such access requirements would definitely be desired of an algorithm. 2) Least Recently Used (LRU) algorithm: The item block that was last accessed earliest is then discarded because it is assumed that since it wasn't recently used, it would not be required in the future. 3) First In First Out (FIFO) algorithm: In this method, the block that would be removed is the one that was into the cache before any of the others present. 4) The random method: The main advantage of this mapping scheme over the others is that it would never cause the system to thrash- the constant alternate induction and expulsion of certain blocks into and out of the cache.
CH 7 - 3. What is a protocol, and why is it important in I/O bus technology?
1) Protocol is the form of the signals exchanged between a sender and a receiver. It is considered as the system of digital rules for the exchange of messages between the sender and receiver. 2) Protocols work like the traffic directors in I/O bus technology. Protocols include the command signals, such as "Printer reset"; status signals, such as "Tape ready"; or data-passing signals, such as "Here are the bytes you requested". In most of the protocols, the commands and data sent must be acknowledged by the receiver or it should indicate that it is ready to receive data.
CH 5 - 25. Give an example of a current stack-based architecture and a current GPR-based architecture. How do they differ?
1) Stack-architecture: In this type of ISA, the architecture makes use of stack to store the data or operands, so that we can execute instructions. The operands are stored on top of the stack. Though the architecture is simple for evaluating expressions and possess good density code, we cannot randomly pick or choose an operand value from stack whenever and whatever we want. 2) Set of general purpose registers to store operands and are fetched from registers whenever they need access.
CH 5 - 5. We can design stack architectures, accumulator architectures, or general-purpose register architectures. Explain the differences between these choices and give some situations where one might be better than another.
1) The operands found on top of the stack are what are used by stack architectures use to execute instructions. A stack cannot be accessed at random even though the code density of stack-based machines is good and the model for evaluation of expressions is simple. 2) An accumulator architecture has one operand implicitly in the accumulator. This allows for shorter instructions and the internal complexity of the machine is also minimized. However, there is a lot of memory traffic since the accumulator is but a temporary storage. 3) The most popular models for machine architectures currently are the general-purpose register architectures. They have a higher speed than memory, linking with compilers is smoother and this supports its effective and efficient usage.
CH 6 - 23. Describe the advantages and disadvantages of the two cache write policies.
1) Write-through: With this policy, the updates to the cache and the main memory are simultaneous. The advantage of this policy is that consistency is maintained at all times. On the downside, though, a main memory access would be required for every write operation which greatly reduces the speed of the system. 2) Write-back: Blocks in main memory, in this case, are only updated when the block is to be removed from the cache. Time is thus saved since the number of writes is greatly reduced. The downside to this method is that the main memory and the cache are not constantly in synch. Also there is the danger that the data in the cache may be lost if the process terminates before the write back to memory.
CH 5 - 11. Name the seven types of data instructions and explain each.
1) data movement, arithmetic, Boolean, bit manipulation, I/O, transfer of control, and special purpose. 2) Data movement: These are the most frequently used instructions. Data is moved from memory into registers, from registers to registers, and from registers to memory, and many machines provide different instructions depending on source and destination. Examples of this type are MOVE, LOAD, STORE, PUSH, POP etc. 3) Arithmetic Operations: They include those instructions that use integers and floating point numbers. Many instruction sets provide different arithmetic instructions for various data sizes. As with data movement instructions, there are sometimes different instructions for providing various combinations of register and memory accesses in different modes. This class of instructions also affects the flag registers. Examples are ADD, SUBTRACT...its, setting specific bits, and toggling specific bits. 4) Input/Output Instructions: These vary greatly among architectures. The input instruction transfers data from a device or port to either memory or a register. The output instruction transfers data from a register or memory to a specific port or device. The basic schemes for handling I/O are programmed I/O, interrupt-driven I/O, and DMA devices. 5) Instructions for Transfer of Control: Controls instructions are used to alter the normal sequence of program execution. These include branches, skips, procedure calls, returns and program termination. Skip instructions are essentially branches with implied addresses. Procedure calls are special branch instructions that automatically save the return address. 6) Special purpose Instructions: These include string processing, high-level language support, protection, flag control, word/byte conversions, cache management etc. and any other instructions that do not fit into the other categories.
CH 9 - 8. Define superpipelining.
1. It is pipelining technique that enhances the clock speed and reduces the latency of individual steps by raising the depth for the pipeline. 2. Due to this the execution time is reduced to half the clock cycle. 3. In this some of the logical functions which are singular in nature are broken down into more than one stage, if it's taking longer time and allowing other instructions to be executed simultaneously while the slower stages are being processed, like- arithmetic operations, cache access operations, memory. This will save the time otherwise wasted during the shorter stages.
CH 9 - 4. Flynn's taxonomy classifies computer architectures based on two properties. What are they?
1. Number of Instruction 2. Number of Data Streams, flowing into the processor
CH 6 - 15. What are the three fields in a set associative cache address, and how are they used to access a location in cache?
1. Tag 2. Set 3. Offset
CH 5 - 21. What are the two types of ILP, and how do they differ?
1. The first type decomposes an instruction into stages and overlaps these stages. This is identical to pipelining. 2. The second kind of ILP allows whole individual instructions to overlap.
CH 9 - 16. Describe one of the cache consistency protocols discussed in this chapter.
1. Write-through: In this method the memory content is updated as soon as the cache contents are modified simultaneously. It is done by broadcasting the modified data to all the processor modules in the system, including the other caches probably using the same data. As each of these receives the broadcast data they update the contents of the specific cache block, if the same data is present there too. 2. Write-Back: In write back the modified cache content is updated in the main memory once all the data has been changed i.e. once the content of any cache is modified, it is marked as updated and the flag is set, and when this data is removed or replaced by some other data in the cache the main memory is updated if the flag is set.
CH 6 - 2. Suppose a computer using direct mapped cache has 232 bytes of byte-addressable main memory and a cache of 1024 blocks, where each cache block contains 32 bytes. b) What is the format of a memory address as seen by the cache; that is, what are the sizes of the tag, block, and offset fields?
1024 blocks is 10 bits for the block field. 32 bytes per block is 5 bits for the offset field and 32 - 10 - 5 = 17 bits for the tag field.
CH 9 - 12. Flynn's taxonomy consists of four primary models of computation. Briefly describe each of the categories and give an example of a high-level problem for which each of these models might be used.
12. The four primary models of computation in Flynn's taxonomy are SISD (Single Instruction Single Data Stream), SIMD (Single Instruction Multiple Data Stream), MISD (Multiple Instruction Single Data Stream) and MIMD (Multiple Instruction Multiple Data Stream). SSID is where each arithmetic instruction performs a single operation on a single data item taken from a single stream of data elements. Von Neumann machines are an example. SIMD has a single control unit that fetches instructions from the instruction store, such as an array processor or GPU. In MSID the same data stream flows through a linear array of processors, such as in a systolic array. MIMD involves multiple separate processors that are interconnected, such as BBN butterfly and Sequent computers.
CH 9 - 16. What is the difference between SIMD and SPMD?
16. SPMD is a single program with multiple data that allows parallelism and is a kind of MIMD. SIMD is a single instruction with multiple data where the same operation is performed on many kinds of data and helps achieve high speed performance.
CH 9 - 2. Which characteristics of RISC systems could be directly implemented in CISC systems? Which characteristics of RISC machines could not be implemented in CISC machines (based on the defining characteristics of both architectures as listed in Table 9.1)?
2. The characteristics of RISC systems that could be directly implemented in CISC systems are as follows: The use of reduced instructions that are directly implemented in CISC or RISC, 3 register operands by RISC are directly implemented in CISC and instructions that directly access memory in RISC and CISC. Other characteristics are that hardware control in RISC is directly implemented in CISC, fixed length instructions in RISC are directly implemented in CISC and allow a few addressing modes to be implemented in CISC. Also, risk machines require executes the entire program in the same amount as the multi cycle and execution of instructions is done in an amount of time that makes pipelining possible. The characteristics of RISC systems that can't be used to implement CISC systems are that multiple registers sets require additional hardware, parameter passing using on chip register windows for RISC machines could not be used in CISC machines and the pipelining structure is heavy in the case of RISC machines whereas CISC uses less pipelining. Also, the complexity in the compiler and easiness of the microde in RISC is difficult to implement in CISC, in CISC machines many instructions can access memory whereas RISC machines can only load and store instructions in access memory. Also, CISC machines are micro-program controlled so the characteristics of RISC machines can't be used.
CH 6 - 5. Suppose a computer using fully associative cache has 224 bytes of byte-addressable main memory and a cache of 128 blocks, where each cache block contains 64 bytes. b) What is the format of a memory address as seen by the cache; that is, what are the sizes of the tag and offset fields?
24 bit address, 18 bits in the tag field and 6 in the word field.
CH 6 - 25. A system implements a paged virtual address space for each process using a one-level page table. The maximum size of virtual address space is 16MB. The page table for the running process includes the following valid entries (the → notation indicates that a virtual page maps to the given page frame; that is, it is located in that frame): The page size is 1024 bytes and the maximum physical memory size of the machine is 2MB. d) To which physical address will the virtual address 0x5F4 translate?
A 14 bit page number is matched to an 11 bit frame number, so the virtual address (1524)10 is equal to Hexadecimal (5F4)16. The virtual address would appear as: Page number: 0000 0000 0000 01 Offset: 01 1111 0100 Therefore, the main memory address is: Page frame: 0000 0000 010 Offset: 01 1111 0100
CH 6 - 22. What is a dirty block?
A modify bit, generally referred to as the dirty bit is used to identify a page in memory whose contents have been written into. Even if a word or byte is written into the page, the dirty bit is set. We examine this bit of the page when it is selected for replacement since the modified contents would need to be written back to the main memory to maintain consistency.
CH 9 - 26. How does a quantum computer differ from a classical computer? What are the obstacles that must be overcome in quantum computing?
A quantum computer is highly efficient computation device that directly incorporates quantum mechanics phenomena, like superposition and entanglement to perform various operations on the data. In general digital computers data is encoded into the machine language i.e. into binary digits (bits), whereas Quantum Computers use quantum properties to represent the data and also perform operations on these data. The quantum computers however share similar strategies with the non-deterministic and probabilistic computers. For example: the Quantum Turing Machine. Advantages of Quantum Computers: Quantum computers are much more efficient and faster therefore it is estimated that higher versions of these computers will be able to solve certain complex problems much faster than the classical ones with the best tricky algorithms like integer factorization using Shor's algorithm or the simulations of many-body systems. There are also certain quantum algorithms like Simon's Algorithm which work even faster than the common probabilistic classical algorithm.
CH 7 - 35. Explain why Blu-Ray discs hold so much more data than regular DVDs.
Blue-Ray discs store more data than regular DVDs because: • Unlike DVDs, multiple layers (up to six) can be stacked on a Blue-ray disc. Thus the total recording capacity of a single layer Blue-Ray disk is 25 GB. • Blue-Ray discs use a 405 nanometres (nm) laser while DVD employs a 650 nm laser. It implies that the feature size is much smaller on Blue-Ray disc, so the linear space occupied by a single bit is shorter. • The shortest pit length on a track Blue-Ray disc is 0.13 nm, much less than that of a DVD. Thus, tracks in Blue-Ray discs can be placed much close together. • The track pitch on Blue-Ray disc is 0.32 nm, much less than that of a DVD resulting in longer spiral track on Blue-Ray discs.
CH7 - 28. Suppose a disk drive has the following characteristics: • 4 surfaces • 1,024 tracks per surface • 128 sectors per track • 512 bytes/sector • Track-to-track seek time of 5ms • Rotational speed of 5,000 rpm b) What is the access time?
Access time is seek time + latency. The calculation for latency is (60 sec/disk rotation speed x 1000ms/sec)/2 which is (60 sec/5000 x 1000ms/sec)/2 = 6ms. Therefore the total access time is 11ms.
CH 7 - 15. What distinguishes an asynchronous bus from a synchronous bus?
An Asynchronous bus can be distinguished by Synchronous bus as both the sender and receiver are required to share a common clock of timing in a synchronous transfer. Synchronous bus contains the control lines with clock in it and has a fixed protocol for the communication relative to the clock. One example is the Memory Bus, as the memory unavailable issue should never be there.
CH 5 - 18. Explain the concept behind instruction pipelining.
As we are aware, computer's clock generates pulses which control each step in the sequence. The fetch-decode-execute cycle is decomposed by some computers into smaller steps, where some of these steps can be executed simultaneously. This overlapping speeds up the execution. This method is called pipelining.
CH 6 - 11. How does associative memory differ from regular memory? Which is more expensive and why?
Associative memory is used to make search faster. Each memory location has a storage element and a comparator. This makes the associative memory very expensive.
CH 7 - 13. How is channel I/O similar to DMA?
Channel I/O and DMA • Most large computer systems use an intelligent type of DMA (Direct Memory Access) interface known as an I/O channel. • DMA uses device interfaces to communicate with devices that are connected to it likewise In the channel I/O an IOP(IO processor) is used as an interface for the devices which steals the CPU memory cycles like a standalone DMA, and the channel I/O systems equipped with separate I/O buses. • The channel I/O is an extension to DMA. The DMA is a block oriented mechanism which interrupts the CPU only after completion of transferring group of bytes. • DMA takes care of external devices that it executes an instruction with the help of CPU similarly I/O channel have the ability to execute programs that include arithmetic-logic and branching instructions. • After completion of work it places completion information in memory and sends an interrupt to the CPU.
CH 7 - 32. Why are CDs especially useful for long-term data storage?
Compact Disks are especially useful for long term data storage because of the tamper-resistant storage of the unlimited quantities for the documents and data offered by them. Optical storage CDs can be stored for 100 years without the noticeable degradation. Thus, CDs are used for long-term data archiving.
CH 7 - 29. What is the acronym for computer output that is written directly to optical media rather than paper or microfiche?
Computer Output to Laser Disk COLD is a computer output that is written directly to optical storage than paper or microfiche, for the long-term archival storage of data. The system is for archiving the data. We can store the text reports and graphical reports in the COLD system. The system holds the information of different records.
CH 5 - 23. A nonpipelined system takes 200ns to process a task. The same task can be processed in a five-segment pipeline with a clock cycle of 40ns. Determine the speedup ratio of the pipeline for 200 tasks. What is the maximum speedup that could be achieved with the pipeline unit over the nonpipelined unit?
Consider the following data: Time taken by a non-pipelined system to process a task = 200 ns. Time taken by a 5-segment pipeline using clock cycles to process a task = 40 ns. Number of tasks in the pipeline = 200 Using a 5-segment pipeline: • Suppose we have a k-stage pipeline. Assume the clock cycle time is tp. Also assume we have n instructions or tasks to process. Then, Task1 requires time to complete. • The remaining tasks emerge from the pipeline one per cycle, which implies a total time for these tasks of. • Therefore, to complete n tasks using a k-stage pipeline requires: Without pipeline: Without a pipeline the time required to process a task in 1 cycle is. For, n tasks the number of cycles required = Speedup: The speedup gained using the pipeline is the ratio of the time taken without pipeline to the time taken using the 5-segment pipeline. The formula to calculate speed up is as follows: Actual Speedup As n approaches infinity, S approaches k. It is known as the theoretical speedup of the pipeline. Hence, the maximum speedup is .
CH 6 - 25. A system implements a paged virtual address space for each process using a one-level page table. The maximum size of virtual address space is 16MB. The page table for the running process includes the following valid entries (the → notation indicates that a virtual page maps to the given page frame; that is, it is located in that frame): The page size is 1024 bytes and the maximum physical memory size of the machine is 2MB. c) What is the maximum number of entries in a page table?
Considering that the page size is 1KB and the virtual memory size is 16KB, the calculation for the number of pages is: NumberofPages = VirtualMemorySize/PageSize = 16MB/1KB = 2^24/2^10 = 2^14
CH 7 - 10. Why does DMA require cycle stealing?
Cycle Stealing in Direct Memory Access (DMA) Cycle stealing refers to the "stealing" of a single CPU cycle. For example, DMA allows I/O controllers to access memory without the intervention of CPU.
CH 7 - 34. How do DVDs store so much more data than regular CDs?
DVDs store more data than regular CDs because of the following reasons • Unlike CDs, DVDs can be single-sided or double-sided, single layer or double layer. Single layer, single-sided DVDs can store 4.78 GB and double layer, double-sided DVDs can store 17 GB of data as compared to the maximum CD capacity of 742 MB. • DVD uses a 650 nanometres (nm) laser while CD employs a 780 nm laser. It implies that the feature size is much smaller on DVD, so the linear space occupied by a single bit is shorter. • The shortest pit length on DVD is 0.4 micron as opposed to the shortest pit length of 0.83 micron on CD. Thus, DVD tracks can be placed much close together. • The track pitch on DVD is 0.74 micron as opposed to the 1.6 microns for CD resulting in longer spiral track on DVD. • The track length of a CD if unwound from its spiral is about 5 miles (8 km), while the unwounded length of DVD track is 7.35 miles (11.8 km).
CH 7 - 22. What is the sum of rotational delay and seek time called?
Definition of Access Time The sum of the rotational delay and seek time is known as the access time, where rotational delay is the time taken by the sector of a track to position itself under a read/write head and seek time is the time taken by disk arm to position itself over the required track in the disk drive. It is the time a hard disk controller takes to locate a specific piece of stored data.
CH 9 - 5. What is the difference between MPP and SMP processors?
Differences between SMP and MPP SMP- Symmetric Multi-Processor MMP- Massively Parallel Processors Sharing Of Memory: SMP shares memory while MMP does not. It means that in SMP processors exchange information through their central shared memory and exchange information through their interconnection network. It accomplishes inter processor coordination through a global memory shared by shard by all processors. While MMP combines the local memory and processor at each node of the inter connection network. 1. MPP machines hold thousands of CPUs in a large cabinet connected to hundreds of gigabytes of memory whereas 2. One another significant differences between SMP and MPP is that with MPP, each CPU has its own memory that helps prevent a possible hold up that one may experience with SMP when all the CPUs attempt to access memory at once. MPP (Massive Parallel Processing) MB-Memory Bus SMP (Symmetric Multi-Processor) P/C: Microprocessor and Cache
CH 6 - 36. What is a TLB, and how does it improve EAT?
Effective access time (EAT) is essentially a metric used to determine the performance of hierarchical memory. The speeding up of the page lookup is made possible with the help of the translation look aside buffer (TLB). Hence, this way the access time would be reduced since the most recent lookup values would be stored in the cache and would hence involve cache (and not main memory) access.
CH 7 - 31. How is the format of a CD that stores music different from the format of a CD that stores data? How are the formats alike?
Format of CD (Compact Disk) Music data is stored in a Compact disc in two formats they are Mode 0 and Mode 2 which is intended to recording of music and have no error correction capabilities that means if the laser misses a location of data in the track no rectification will assigned to it. But in the case of Mode 1 which is used to store data in compact disc it contains two levels of error detection and correction. The total capacity in Mode 1 is 650MB.Mode 0 and Mode 2 is 742MB. (a) Mode 0 (b) Mode 1 (c) Mode 2 Each mode contains 2352 bytes that data stored in these chunks called sectors that lie along the length of the track. A Mode 2 format is identical to Mode 1 format except the 8 bytes of blank in Mode 1. Mode 2 format also contains the synchronization, header and sub-header bytes along with 4 EDC bytes that appear in Mode 1 and Mode 2.
CH 5 - 7. What kinds of problems do you think endian-ness can cause if you wished to transfer data from a big endian machine to a little endian machine? Explain.
If the user wants to transfer the data from a Big-endian machine to a little-endian machine, it is misinterpreted.
CH 9 - 3. Describe how register windowing makes procedure calls more efficient.
If the windows are overlapped they considerably reduce the work of parameter passing between modules to just shifting from one register set to another, letting the two sets to overlap into those registers that must be shared to perform parameter passing.
CH 7 - 46. Explain how holographic storage works.
Holographic data storage can be defined as a potential technology in the computer storage that uses laser beams to store computer-generated data in three dimensions. Working of Holographic data storage: • A laser beam is partitioned into two separate beams, an object beam and a reference beam. • Coded data pattern is generated when the object beam is passed through a modulator. • Then the object beam from the modulator will intersect with the reference beam. • The outcome is the interference pattern in the polymer-recording medium. • The reference beam is used to illuminate the medium to recover the data.
CH 7 - 43. What are hybrid RAID systems?
Hybrid RAID systems are new kind of RAID systems formed by the combination of various RAID schemes.
CH 7 - 38. Explain how serpentine recording differs from helical scan recording.
In Serpentine Recording bits are stored serially on the tape in this method. Also, while the bytes are stored at right angles to the length in the nine-track format, in the serpentine recording they are stored parallel to the length. The data continues to be loaded till the end of the tape. Systems using such a method of recording are the Quarter Inch Cartridges and Digital Linear Tape (DLT) in cases where there are 50 or more tapes. Helical Scan recording is similar to a tape recorder. The tape is passed straight across the magnetic head in most recording systems. In the helical recording, however, the tape is passed over a rotating drum (capstan) which is tilted at 40 degrees to it. There are both read and write heads on the capstan which perform the data transfer operations.
CH 7 - 2. What is speedup?
In computer architecture, speedup is a number that measures the relative performance of two systems processing the same problem. More technically, it is the improvement in speed of execution of a task executed on two similar architectures with different resources.
CH 9 - 23. Describe how neural networks "learn."
In neural network architecture learning is an essential part of the process and henceforth the learning algorithm selection is an important issue in network development. In learning there is a term perceptron which produces outputs for specific inputs according to how the network is trained. The perceptron are trained by use of either supervised learning or unsupervised learning. 1. Supervised learning: In this there is an assumption of prior knowledge of correct results. These results are then fed to neural net during the training phase. During learning the network is told whether its final state is correct or not. In case of incorrect network the input weights are modified to produce desired results. 2. Unsupervised learning: During training the correct solution is not provided unlike supervised learning. The network adapts itself to the response to its inputs, learning to recognize patterns and structures within the input sets. Since a neural network is only as good as training data, hence great care is taken using sufficient number of correct examples to it. For example a child can only recognize a chicken if he or she has ever seen a chicken before. In the same way a neural network must also see sufficient number of examples so that correct characteristics are sought.
CH 6 - 30. What is the difference between a virtual memory address and a physical memory address?
In summary, the virtual memory is, in effect, an imaginary memory location where the operating system handles addressing issues. The virtual memory address is larger than that of the physical memory simply because it has a significantly higher capacity in terms of number of pages, which are the units in which storage capacity is compared between the virtual and the main memory.
CH 9 - 25. What kinds of problems are suitable for solution by systolic arrays?
In systolic arrays there is high level of parallelism which is incorporated by pipelining technology. This type of arrays can used to achieve and sustain high throughput. In these the connections are made typically short and the designs are kept simple too. In this way they be robust, highly compact, efficient and in fact cheap to produce. In a linear systolic array the processors are arranged in pairs and the polynomial is evaluated using Horner's rule. For repetitive tasks like using Fourier Transformation, Image processing, data compression, shortest path problems, sorting, signal processing and various matrix computations systolic arrays solutions are the most suitable one.
CH 7 - 7. How are address vectors used in interrupt-driven I/O?
In the Interrupt-driven I/O, service routines can accommodate the hardware changes by modifying it because vectors for the different hardware are kept in the same locations in systems with the operating system of same type and level. These vectors are easily changed to point vendor-specific code. For example, if there is a new type of disk drive which is not supported by operating system, the disk's manufacturer may provide a specialized device driver program to be kept in memory along with the code for standard devices. This installation of device driver code involves updating the disk I/O vector to point to code particular to the disk.
CH 6 - 17. Why is the optimal cache replacement policy important?
In the case of a conflict for occupation of a cache block as happens in direct mapping, the new block would need to replace the existing block. While there arises no question of which block to replace in the case of direct mapping since that is already predetermined. In other mapping schemes, however, we would need to determine which block needs to be replaced by an incoming block. Replacement policy dictates how this is done.
CH 6 - 5. Explain the concept of locality of reference, and state its importance to memory systems.
Locality of reference is a bunching of memory references. It is implemented by structuring the memory as a hierarchy. This way, data is not just transferred to the next higher level whenever a miss is encountered, instead the entire block of memory containing the data is passed on.
CH 5 - 23. List several ways in which the Intel and MIPS ISAs differ. Name several ways in which they are the same.
Intel versus MIPS ISA's Intel: • Byte Storage is Little Endian notation. • Addressing Architecture is two-addressing mode. • It uses instruction variable length. • Pipelining: The Pentium I had two 5-stage pipelines. Stages were Prefetch, Instruction Decode, Address Generation, Execute, and Write Back. Pentium II had 12-stage pipeline, Pentium III had a 14-stage pipeline and Pentium IV had a 24-stage pipeline. • Addressing Modes: Intel processors allowed for the basic addressing modes as well as certain combinations of those. In all, 17 addressing modes were provided. MIPS: • Byte Storage is Little Endian notation. • Addressing Architecture is Word-addressable, three-address mode. • It uses Instruction fixed length. • Pipelining: R2000 and R3000 have 5-stage pipelines. The R4000 and R4400 have 8-stage superpipelines. In the R10000, the number of pipeline stages depends on the functional unit through which the instructio...erscalar. • ISA: Allows five types of instructions: simple arithmetic, data movement, control, multicycle and miscellaneous. Three instruction formats are available: immediate, register and jump. • Addressing Modes: A variety of modes like immediate, register, direct, indirect register, base, and indexed are allowed. However, only the base addressing mode can be used explicitly. The MIPS ISA is different from the Intel ISA partially because the design philosophies between the two are so different. Intel created it IS for the 8086 when memory was very expensive, which meant designing an instruction set that would allow for extremely compact code. This is the main reason Intel uses variable-length instructions. The small set of registers used in the 8086 did not allow for much data to be stored in these registers; hence the to-operand instructions (as opposed to three as in MIPS). When Intel moved to the IA32 ISA, backwards compatibility was a requirement for its large customer base.
CH 6 - 5. Suppose a computer using fully associative cache has 224 bytes of byte-addressable main memory and a cache of 128 blocks, where each cache block contains 64 bytes. c) To which cache block will the memory address 0x01D872 map?
It's an associate cache, so it can map anywhere.
CH 6 - 39. What causes external fragmentation, and how can it be fixed?
Just like how paging suffers from the problem of fragmentation, so does segmentation. In paging, the fragmentation is internal and so the free space within the page goes waste. In segmentation, however, vacant chunks identified for storing segments are spread across the memory and eventually there will be a point when there are many small chunks vacant, but not large enough to accommodate a segment. This is known as external fragmentation.
CH 5 - 2, b) Show how the following values would be stored by byte-addressable machines with 32-bit words, using little endian and then big endian format. Assume that each value starts at address 1016. Draw a diagram of memory for each, placing the appropriate values in the correct (and labeled) memory locations. b) 0x0000058A
Little endian: 58 00 00 00 1016: +0 +1 +2 +3 Big endian: 00 00 00 58 1016: +0 +1 +2 +3
CH 5 - 2, a) Show how the following values would be stored by byte-addressable machines with 32-bit words, using little endian and then big endian format. Assume that each value starts at address 1016. Draw a diagram of memory for each, placing the appropriate values in the correct (and labeled) memory locations. a) 0x456789A1
Little endian: A1 89 67 45 1016: +0 +1 +2 +3 Big endian: 45 67 89 A1 1016: +0 +1 +2 +3
CH 7 - 47. What is the general idea behind MEMS storage?
MEMS (Micro Electro Mechanical Systems) based storage is a latest emerging technology which provides significant performance and costs low compared to EEPROMs. • It provides higher performance when compared to current disk drive performances. • It is a non-volatile storage technology. • In MEMS technology, the magnetic recording material merges with thousands of record heads and provide storages capacity up to 10GBin a square centimetre area with a access time of less than one millisecond, bandwidth of over 50 bytes per second. • These are built using photolithographic IC processes same as like in standard CMOS. • It usually costs much lesser than per-byte costs of DRAM. • The amazing factor to be considered about MEMS-based storage is that it can embed both storage and data in a single chip. • It integrates several micro processors or computational engines with the storage device since it is CMOS-based. Hence it will increase the performance and reduce the cost and power consumption. Since it is enumerating to provide both non-volatile and volatile storage, it has been a single computing brick with processing on chip. Even it is not a commercial technology researches are been fast enough to emerge this technology.
CH 7 - 37. Why is magnetic tape a popular storage medium?
Magnetic Tape Magnetic tape is a popular storage medium because of the following reasons: • The tape technology evolution has been remarkable over the years, with manufacturers packing more and more bytes onto each linear inch of tape. • It is not economical to purchase and store higher density tapes, and allow backups to be made more quickly. • Many innovations have been there in the tape technologies with many standards and emerged proprietary techniques. • Tapes are known to support variety of track densities and employ various recording methods.
CH 7 - 17. Why are magnetic disks called direct access devices?
Magnetic disks are called Direct Access Devices because each unit of storage in it, called sector has a unique address that can be accessed independently of the sectors around it.
CH 7 - 49. What is a memristor, and how does it store data?
Memristor memories are a type of non-volatile RAM. Memristor memories promise to replace the usage of Flash memory. • Memristor memories combine the properties of resistor with memory. • The states of 'high' and 'low' can effectively store bits. • Threshold currents that change the physical properties of the underlying semi-conductive materials are applied to control the states. • The high resistance or low resistance state can be shifted through this process. • By using this technology, the problems relating to 'big data' may be solved very soon.
CH 7 - 5. Explain how programmed I/O is different from interrupt-driven I/O.
Programmed I/O: Data transfer is initiated by CPU to access memory or registers under the control of driver software. Interrupt-Driven I/O: It is the converse of programmed I/O. Instead of CPU continually asking its attached devices whether they have any input, the devices tell the CPU when they have data to send.
CH 9 - 7. Do all programming problems lend themselves to parallel execution? What is the limiting factor?
No, 1. Control Dependencies and hazards-Instructions streams are dependent on branch results 2. Instruction level Processors can achieve significant level of speed ups on a wide variety of programs by executing instructions in parallel but their maximum performance improvement is limited by instruction dependencies. 3. A processor's ability to exploit instruction level may be limited by the number and type of execution units present in the processor and type of execution units present in the processor and by restrictions on which instructions in the program can be examined to local operations that can be performed in parallel. 4. RAW (Read after Write) dependencies limit the performance by requiring that instructions should be executed sequentially to receive correct results, thus limiting the amount of instruction level parallelism available to the programs. 5. WAW dependencies too obstruct parallelism. This type of data hazard occurs in concurrent execution statements.
CH 6 - 21. When does caching behave badly?
One example of such a scenario is object-oriented programming. In this case, when the program encounters an object it breaks away from the main flow of the program to the portion in memory where the object is defined. The location of the definition could be far displaced from where the object is accessed. Another example is while accessing arrays. Arrays, as we know, are always stored in row-major order. Suppose the size of the array is larger than the cache, then a portion of the array will always fall outside the cache. While this would result in a periodic (but still acceptable) misses in case the array is accessed in row-major order, this could lead to constant misses in case of column-major access.
CH 5 - 9. Which is likely to be longer (have more instructions): a program written for a zero-address architecture, a program written for a one-address architecture, or a program written for a two-address architecture? Why?
One-address instructions have longer in programming: Arithmetic, logical and comparison instructions of accumulator based machines contains the address of only one operand. It is implied that the other operand is in the accumulator. The result of the operation is placed in the accumulator.
CH 7 - 42. Which RAID level uses a mirror (shadow) set?
RAID Level 1 protects against memory loss in the best possible way. This is mainly due to the fact that it provides a duplicate for each data written. A duplicate set of drives is provided for this purpose. There is a performance lag since the writing time is doubled (each write is duplicated). This is compensated for by faster reads since the data can be accessed from whichever disk arm is currently closer to the target sector.
CH 5 - 1. Explain the difference between register-to-register, register-to-memory, and memory-to-memory instructions.
Register to register - Arguments involve only registers, data moves only within the registers, time execution is much faster and the length of the bus connecting the registers s the shortest. Register to memory - Arguments involve a register and a memory location, data moves between a register and a location in memory, time of execution is slower and the length of the bus is longer. Memory to memory - Arguments involve only memory locations, data moves between two locations in memory, time of execution is the slowest and the length of the bus is longer.
CH 7 - 19. What are the major physical components of a rigid disk drive?
Rigid Disk Drive Major physical components of a rigid disk drive are: • Platters: These are the metal or glass disks contained in the rigid disks, to which a thin film of magnetisable material is bonded. • Spindle motor: It is used to spin the disks platters stacked on it. • Read/Write head: It is the interface between magnetic media where the data is stored and electronic components in the hard disk. • Actuator arm: Read/write heads are typically mounted on the rotating actuator arm which is positioned in its proper place by magnetic fields induced in coils surrounding the axis of the actuator arm. • Disk packs: These are the disk drives with the removable disks.
CH 9 - 18. What is SETI, and how does it use the distributed computing model?
SETI is a radio astronomical group • It is an abbreviation for Search for Extra Terrestrial Intelligence. • It is a collective approach which endeavors to search for and monitors extra-terrestrial life and activities. • This is a completely scientific endeavor. It analyses data from radio telescopes. • SETI incorporates NOW architecture. It could accumulate half million years of CPU time into just 18 months. • Some of the esteemed institutes which undertook projects on this were by the Harvard University, University Of California, Berkeley, SETI Institute.
CH 7 - 21. What is seek time?
Seek time is the time taken by disk arm to position itself over the required track in the disk drive. It is the time taken by a hard disk controller to locate a specific piece of stored data. Seek time does not include the time that it takes for the head to read the disk directory.
CH 7 - 39. What are two popular tape formats that use serpentine recording?
Serpentine Recording The following two popular tape formats that use serpentine recording are: 1. Digital Linear tape (DLT), and 2. Quarter Inch cartridge. The function of the Digital Linear tape is computer data storage and this is the form of the magnetic tape and the Quarter Inch cartridge is for the desktop backup data.
CH 6 - 13. Explain how set associative cache combines the ideas of direct and fully associative cache.
Set associative cache mapping: It resembles the direct cache scheme in that every address maps to a definite set of several cache block. It is first required to locate the direct set of blocks, after which the associative format of access is then commenced. Hence, while selecting the block, there is direct mapping but within the block it is associative mapping.
CH 7 - 25. What is short stroking, and how does it affect the relative cost per gigabyte of SSDs?
Short stroking is a practice of formatting a disk drive such that data is written only to the outer sectors of the disk platters. Short stroking reduces latency and increases performance by reducing the time the actuator spent to seek the sectors on platter. This may not relatively much affect the cost but we need an extra amount of memory for special partition.
CH 7 - 44. What is the significance of the superparamagnetic limit?
Significance of superparamagnetic limit The density limit for disk storage has increased exponentially over the years, because of the advancement in technologies. But as data densities increase, fewer magnetic grains are available within the boundaries of each bit cell. The smallest possible bit cell area is reached when the thermal properties of the disk cause encoded magnetic gains to spontaneously change their polarity, causing 1 to change to 0 or a 0 to 1. This behavior is known as superparamagnetic, and the bit density at which it occurs is called the superparamagnetic limit. It is thought to be between 150GB/in2 and 200GB/in2. With a little bit of changes in this magnitude, it is likely that the greatest increases in magnetic data density have been achieved. Future exponential gains in data densities will almost certainly be realized by using entirely new storage technologies.
CH 9 - 11. What are the similarities and differences between EPIC and VLIW?
Similarities between EPIC and VLIW: VLIW: • Very Long Instruction Word • Uses Instruction Level Parallelism • VLIW bundles it's instructions for delivery to various execution units • VLIW too laid stress on switching the complexity of instruction scheduling from the CPU hardware to the software compiler. This was really helpful in removing the dependence on complex circuitry in the CPU, thus making it more space and power efficient allowing more space and power for other functions. EPIC: • Explicitly Parallel Instruction Computing • Uses Instruction Level Parallelism • EPIC also bundles it's instructions for delivery to various execution units • EPIC too was based on same goals of laying more emphasis on software capacities than increasing the hardware complexity making them more cost efficient and space efficient. Differences between EPIC and VLIW: VLIW: • The bundles of instructions sent to the various execution units are of same length. • No provision of delimiters to indicate start or end of any particular instruction. • VLIW Instruction sets are not backward compatible when implemented in wider machines (machines with more number of execution units) i.e. the instruction sets for the newer versions with wider instruction enabled are not supported by the older versions as they have a narrower range. • VLIW does not use any prefetch statement. EPIC: • The bundles of instructions in case of EPIC sent to the various execution units need not be of same length. • A special delimiter present, indicating the beginning and end of the instructions. • EPIC is backward compatible between implementations. • EPIC uses a software prefetch instruction as a type data prefetch. This enhances the chances of cache hits, by causing temporal locality at various levels of the cache.
CH 6 - 18. Suppose a process page table contains the entries shown below. Using the format shown in Figure 6.17a, indicate where the process pages are located in memory. Frame Valid Bit 1 1 - 0 0 1 3 1 - 0 - 0 2 1 - 0
Since the valid bit is 1, the virtual page 0 is mapped to the frame 1 in the physical memory. Since the valid but that corresponds to the virtual page 1 is 0, the page of the process is not currently in physical memory. Since the valid bit that corresponds to the virtual page 2 is 1, the page will be mapped to the frame 0 in physical memory. Since the valid bit that corresponds to the virtual page 3 is 1, the page is mapped to frame 3 in physical memory. Since the valid bits that correspond to the virtual pages 4 and 5 are 0, the pages of both processes are currently not in physical memory. Since the valid bit that corresponds to the virtual page 6 is 1, the page is mapped to frame 2 in physical memory. Since the valid bit that corresponds to the virtual page 7 is 0, the page of the process is currently not in physical memory.
CH 6 - 12. Explain how fully associative cache is different from direct mapped cache.
Since there is no searching involved, the direct mapped cache is relatively economical. There is a specific designated block of memory in the cache to which every block in main memory directly maps to.
CH 7 - 45. What does the superparamagnetic limit mean for disk drives?
Super paramagnetic limit can be defined as the limit imposed on a magnetic disk. It is the limit on maximum number of bits per square inch that is commercially feasible. • Super paramagnetic limit for disk drive is the maximal areal density of a disk drive beyond which it can no longer reliably store data. • It was once thought to be as low as 20GB to 40GB per inch square. • Usually this limit in some disk drives is supporting densities from 40GB per square inch. • Research being made to improve these and someday it may reach a range from 150GB per square inch to 200GB square inch.
CH 9 - 9. How is a superscalar design different from a superpipelined design?
Super-Pipelining: • It is pipelining technique that enhances the clock speed and reduces the latency of individual steps by raising the depth for the pipeline. • Due to this the execution time is reduced to half the clock cycle. • In this some of the logical functions which are singular in nature are broken down into more than one stage, if it's taking longer time and allowing other instructions to be executed simultaneously while the slower stages are being processed, like- arithmetic operations, cache access operations, memory. This will save the time otherwise wasted during the shorter stages. • Using pipelining several instructions can be executed at one time, but they essentially need to be different pipeline stages at any particular time. • Some important examples are-Pentium Pro (P6): 3-degree superscalar, 12-stage "super pipeline", Intel Pentium 4 processor. Figure: ...f pipelining, besides in addition it can handle/execute several instructions simultaneously in the same pipelining stage. • Thus it shows a new level of parallelism called the instruction level parallelism • Duplication of hardware is required by definition. • Few examples are- PowerPC 620: 4-degree superscalar, 4/6-stage pipeline. Figure:
CH7 - 8. Suppose the daytime processing load consists of 60% CPU activity and 40% disk activity. Your customers are complaining that the system is slow. After doing some research, you learn that you can upgrade your disks for $8,000 to make them 2.5 times as fast as they are currently. You have also learned that you can upgrade your CPU to make it 1.4 times as fast for $5,000. a) Which would you choose to yield the best performance improvement for the least amount of money?
The CPU activity is as follows: The fraction of work done by the CPU is f1 = 60% = 6/100 = 0.6, the upgraded CPU speed would be k1 = 1.4 and the upgrade cost is $5000. The fraction of work done by the disk is f2 = 40% = 40/100 = 0.4, the upgraded disk speed is k2 = 2.5 and the upgrade cost is $8000. Using Amdahl's law, the formula for the speedup of a computer is S = 1/(1 - f) + f/k. The formula for the CPU activity is S CPU = [1/(1 - f1)+(f1-k1)] = [1/(1 - 0.60)+(0.60/1.4)] = 120.69%. The formula for disk activity is SDisk = [1/(1 - f2) + (f2/k2)] = [1/(1 - 0.40) + (0.40/2.5)] = 131.58%. The formula for the CPU upgrade is CPU upgrade/S CPU = $5000/120.69% = $41.4284. The formula for the disk upgrade is Disk upgrade/S Disk = $8000/131.58% = $60.7995. Therefore, the CPU option provides the best performance for the lowest amount of money.
CH 7 - 40. Which RAID levels offer the best performance?
The RAID-0 offers the best performance of all RAID configurations, because it offers no redundancy. It offers bets performance, mainly if separate controllers and caches are used for each disk.
CH 5 - 7. What are the pros and cons of fixed-length and variable-length instructions? Which is currently more popular?
The advantage of fixed length instructions is that they perform better during instruction pipelining. The down side though is that they are not storage efficient. In contrast, it demands more effort to decode a variable length instructions but easier to store.
CH 6 - 29. What is the advantage to a nonblocking cache?
The advantage of having a non-blocking cache is that multiple (upto four) requests can be serviced by the cache simultaneously.
CH7 - 33. What are the advantages and disadvantages of having a small number of sectors per disk cluster? (Hint: You may want to think about retrieval time and the required lifetime of the archives.)
The advantages of having a low number of clusters per sector is that it wastes less space in the disk and that it provides more efficient usage on the disk. The disadvantages of having a low number of clusters per sector is that bookkeeping overhead, size of disk directory and fragmentation are increased, while overall reading and writing speed is reduced and accessing information disk directory is slowed down.
CH 6 - 10. What are the three fields in a direct mapped cache address? How are they used to access a word located in cache?
The cache is broken up into uniform size chunks called blocks. It is these blocks in which the actual data is stored. The particular data within a block is identified by the offset field. This depends on the block size and the number of bits in this field should be enough to address every data in the block. A unique block of cache is selected by the block field. This depends upon the number of blocks present in the cache and should have sufficient bits to select any block in the cache. The remaining bits make up the tag field.
CH 6 - 24. Explain the difference between a unified cache and a Harvard cache.
The cache used to hold data as well as instructions is called an integrated or unified cache. There are a few contemporary caches which store instructions and data separately. This type of cache is known as Harvard cache
CH 6 - 25. A system implements a paged virtual address space for each process using a one-level page table. The maximum size of virtual address space is 16MB. The page table for the running process includes the following valid entries (the → notation indicates that a virtual page maps to the given page frame; that is, it is located in that frame): The page size is 1024 bytes and the maximum physical memory size of the machine is 2MB. e) Which virtual address will translate to physical address 0x400?
The decimal equivalent of the hexadecimal value 0x400 is 1024. Since the physical address 1024 is at offset 0 in frame 1 with the virtual page 0 mapping to frame 1, the physical address translates to the virtual address 0.
CH 6 - 32. Discuss the pros and cons of paging.
The disadvantages include: • Increase in access time due to an extra memory reference is added. • The use of a translation look-aside buffer further slows down the access since it adds another level altogether. • There is an excess of memory consumption due to the necessity to store the page tables which can sometimes occupy a significant amount of memory space. • Besides the increase in the required memory space to function, a tailored operating system as well as specialized hardware might also be required to support paging. The advantag...e process include: • The amount of physical memory available no longer restricts the functioning of the programs. Even if the physical address space is less than the virtual address space, it is still possible to run programs. • This greatly allows more flexibility to the programmer so he can focus on the functionality rather the operations. Also, due to lesser physical memory required a larger number of programs can be run. • Processes with sizes exceeding that of the physical memory can now share the machine amongst themselves and this increases the overall efficiency and utilization of the system. • Since pages categorize the memory into fixed size chunks, allocating the memory is simplified. • Protection by means of forbidding any unauthorized access and levels of access granted to a user can also be defined with the help of paging. As is evident, the advantages far outweigh the disadvantages of paging, which is why this process is adopted by every operating system in use today.
CH 6 - 37. What are the advantages and disadvantages of virtual memory?
The disadvantages include: • Increase in access time due to an extra memory reference is added. • The use of a translation look-aside buffer further slows down the access since it adds another level altogether. • There is an excess of memory consumption due to the necessity to store the page tables which can sometimes occupy a significant amount of memory space. • Besides the increase in the required memory space to function, a tailored operating system as well as specialized hardware might also be required to support paging. The advantag...e process include: • The amount of physical memory available no longer restricts the functioning of the programs. Even if the physical address space is less than the virtual address space, it is still possible to run programs. • This greatly allows more flexibility to the programmer so he can focus on the functionality rather the operations. Also, due to lesser physical memory required a larger number of programs can be run. • Processes with sizes exceeding that of the physical memory can now share the machine amongst themselves and this increases the overall efficiency and utilization of the system. • Since pages categorize the memory into fixed size chunks, allocating the memory is simplified. • Protection by means of forbidding any unauthorized access and levels of access granted to a user can also be defined with the help of paging. As is evident, the advantages far outweigh the disadvantages of paging, which is why this process is adopted by every operating system in use today.
CH7 -11. Name the four types of I/O architectures. Where are each of these typically used, and why are they used there?
The four types of I/O architectures are programmed I/O, interrupt-driven I/O, direct memory access, and channel I/O. The programmed I/O are used in embedded systems and automated teller machines. It's used in these systems because programmed in programmed I/O CPU is in a repeated "busy wait" loop. The interrupt-driven I/O is used in small and single user systems since it can handle data one byte at a time. Direct memory access is used in disk controllers, graphic cards, network card and sound cards since CPU cannot keep up with the data transfer rate. Channel I/O is suitable for large and multiuser systems since these systems perform many I/O complex tasks.
CH 5 - 9. There are reasons for machine designers to want all instructions to be the same length. Why is this not a good idea on a stack machine?
The length of the instructions for a stack-based architecture would keep changing since it has a different format for different types of instructions.
CH 6 - 25. What are the advantages of a Harvard cache?
The main advantage that the Harvard cache over the unified is more cohesive and less sporadic accesses to the cache. A Harvard cache delivers the same performance of an integrated cache with a much larger capacity. It also overcomes one limitation of the integrated cache which is the conflict between data and instruction accesses arising out of a common port used for both- which is separate in the case of the Harvard cache.
CH 6 - 25. A system implements a paged virtual address space for each process using a one-level page table. The maximum size of virtual address space is 16MB. The page table for the running process includes the following valid entries (the → notation indicates that a virtual page maps to the given page frame; that is, it is located in that frame): The page size is 1024 bytes and the maximum physical memory size of the machine is 2MB. b) How many bits are required for each physical address?
The maximum physical memory size is 2MB, so the calculation for the required number of bits for the physical address is: log 2 (Physical Memory Size) = log 2 (2 MB) = log 2 (2^1 x 2^20) = log 2 (2^21) = 21 bits
CH 6 - 25. A system implements a paged virtual address space for each process using a one-level page table. The maximum size of virtual address space is 16MB. The page table for the running process includes the following valid entries (the → notation indicates that a virtual page maps to the given page frame; that is, it is located in that frame): The page size is 1024 bytes and the maximum physical memory size of the machine is 2MB. a) How many bits are required for each virtual address?
The maximum size of the virtual address space is 16MB, so the calculation for the number of bits in a virtual memory is as follows: log2(Virtual Memory Size) = log 2 (16 MB) = log 2 = (2^4 x 2^20) = log 2 (2^24) = 24 bits
CH7 -11. Name the four types of I/O architectures. Where are each of these typically used, and why are they used there? 12. A CPU with interrupt-driven I/O is busy servicing a disk request. While the CPU is midway through the disk-service routine, another I/O interrupt occurs. a) what happens next? b) Is it a problem?
The processor will then check both the interrupt lines to check the priority of both lines. Based on what the priority is, the interrupt controller will determine which one should take precedence. It's not a problem when the CPU is halfway over the disk-service routine and additional I/O interrupt happens.
CH 9 - 10. Suppose that a RISC machine uses 5 register windows. b) Suppose two more calls are made after the maximum value from part (a) is reached. How many register windows must be saved to memory as a result?
The register windows that must be saved to memory are the input registers from the first window and the input, local and output registers on the second call.
CH 6 - 9. Suppose a byte-addressable computer using set associative cache has 216 bytes of main memory and a cache of 32 blocks, and each cache block contains 8 bytes. b) If this cache is 4-way set associative, what is the format of a memory address as seen by the cache?
The size of the cache is 2^16, so the number of required address bits is 16. The cache contains 32 blocks, therefore the number of sets required is 2^3. Each block contains 8 words, which is 2^3 words, so 3 bits are required for the word field. The number of bits in the tag field is 16 - 3 - 3, which is 10 bits. Therefore the tag field is 10 bits, the set field is 3 bits and the word field is 3 bits.
CH 6 - 9. Suppose a byte-addressable computer using set associative cache has 216 bytes of main memory and a cache of 32 blocks, and each cache block contains 8 bytes. a) If this cache is 2-way set associative, what is the format of a memory address as seen by the cache; that is, what are the sizes of the tag, set, and offset fields?
The size of the cache is 2^16, so the number of required address bits is 16. The cache contains 32 blocks, therefore the number of sets required is 2^4. Each block contains 8 words, which is 2^3 words, so 3 bits are required for the word field. The number of bits in the tag field is 16 - 4 - 3, which is 9 bits. Therefore the tag field is 9 bits, the set field is 4 bits and the word field is 3 bits.
CH 6 - 34. What causes internal fragmentation?
The size of the pages of main memory is the same as that of the virtual memory. The process is also broken up into the identical sized frames. In this way, it is possible that the last frame would not be fully occupied. This vacant space in the frame would not be filled by another process and hence goes to waste. This unusable space within the frame is called internal fragmentation.
CH 9 - 13. Give two reasons for the efficiency of vector processors.
The two reasons why the vector instructions are more efficient are: • The first reason is that, the processor allows fetching only a fewer number of instructions at any time, which in turn is very useful in many ways like: o Needs lesser decoding o Does not cause control unit overhead o Lesser and more efficient bandwidth usage o Better debugging scopes • Secondly the processor assumes that it will access a continuous set of data and thus can begin prefetching the corresponding pairs of values. If interleaving is used one pair of data can arrive per clock cycle.
CH7 - 48. b) Which can tolerate more than one simultaneous disk failure?
The type of RAID that can tolerate more than one simultaneous disk are RAID-1, RAID-2 and RAID-6. RAID-1 will improve the reading performance of the disk by reading at the same time. RAID-2 will configure only by using striping across the disk storing error checks and correction information. RAID-6 has an additional parity that allows the disk to continue to function even if two disks fail at the same time.
CH7 -8. Suppose the daytime processing load consists of 60% CPU activity and 40% disk activity. Your customers are complaining that the system is slow. After doing some research, you learn that you can upgrade your disks for $8,000 to make them 2.5 times as fast as they are currently. You have also learned that you can upgrade your CPU to make it 1.4 times as fast for $5,000. b) Which option would you choose if you don't care about the money, but want a faster system?
The upgraded activity disk speed is 131.58%, the money needed for the upgraded disk activity is $60.7995 and the disk option provides better performance improvement for speed without worrying about money.
CH 5 - 8. How does an architecture based on zero operands ever get any data values from memory?
The use of a stack is necessary when machine instructions that do not allow for operands need to perform those that do.
CH 6 - 35. What are the components (fields) of a virtual address?
The virtual address is bifurcated into two fields: page and offset.
CH 5 - 24. Explain Java bytecodes.
The virtual machine accepts Java bytecodes and translates them into accurate machine instructions, thereby acting as an interpreter.
CH 6 - 26. Why would a system contain a victim cache? A trace cache?
There also exists a victim cache in some systems. Blocks that, resulting from conflicts, have been thrown out of the cache are normally stored here. The rationale behind this is suppose the blocks recently thrown out were needed in the near future, the same could be retrieved from the victim cache which would involve far lesser time that from the main memory. There is a variant of the instruction cache which is called the trace cache. Previously decoded instructions are stored in such a cache. The process of decoding hence wouldn't need to be repeated in case these instructions need to be executed shortly again. This is of a great advantage while processing multiple blocks of a program involving branched statements as well. The cache makes instructions appear contiguous even when they are not since dynamic instruction stream traces are stored here.
CH 9 - 21. What is reentrant code?
There is a particular architecture based on dynamic tagging where data values are tagged with unique identifiers for specifying current iteration level. These tags are required because programs can be reentrant which means that the same code can be used with different set of data. Using the different tags provided the system determines on selecting the data to be used during each iteration. Loops are a good example for tagging as the same code is repeated number of times. This type of architecture is generally used in operating system or in application which is intended to be shared with multiple users. The reentrant program is written such that there is no modification to the instruction of the contents of the variable values is done in other instructions within the same program.
CH 5 - 3. What is an expanding opcode?
There is always a trade-off involved in expanding opcode. On the one hand a rich set of opcode and, on the other, short and concise opcode. While we would need to truncate some opcode, we would also need an availability of longer ones when the need arises. There is a greater space to accommodate the operands when the opcode is short. When an instruction requires no operands, all the bits can be can be used for the opcode which allows for greater number of instructions. Hence, the longer the opcode, the lesser the operands and vice versa.
CH 5 - 13. Explain what it means for an instruction set to be orthogonal.
This characteristic of uniqueness is known as orthogonal. Consistency is another requirement of every instruction in the instruction set besides its independence of other instructions. Orthogonal determines the uniformity in the availability of operands and addressing modes across various operations. This implies that the operands should not have a bearing on their addressing modes.
CH 6 - 20. Explain how to derive an effective access time formula.
To calculate the above metric we use a mathematical formula to express in what way this value depends on various factors. The effective access time (EAT) is thus expressed as follows: Where: H= Hit rate of Cache Access c = Access time of Cache Access MM =Access time of Main Memory The above formula works for a single-level access Main Memory-Cache. This can be extended to multiple levels of access as well. A more generic formula would be one which would include every level of memory access: Here: H i = Hit rate of Memory level i Access i = Access time involved for Memory Level i
CH 9 - 2. Why is a RISC processor easier to pipeline than a CISC processor?
To manage its complexity CISC requires microcode, which tackles the complexities. This made it difficult to implement instruction pipelining since microcode interprets each instruction when it's drawn from the memory.
CH 9 - 12. Explain the limitation inherent in a register-register vector processing architecture.
Vector registers are special purpose registers which are meant to hold different vector elements at any time. The contents of these registers are sent one element at any time to the vector pipeline, and again the outputs from the pipeline, are received into these vector registers one element at one time. Thus they function in the FIFO order, enabling them to work with and handle many values. The vector processors thus have many of these registers to facilitate their working. The instruction sets using these registers are for loading data into these registers, performing various operations on these register data, storing the final values into either these registers or memory. The Vector Processors are mainly of two types: • Register-Register Vector Processors • Memory-Memory Vector Processors Register-Register Vector Processors: In these types of processors use these vector registers to both load the operands as well as the final results after the execution of certain operations (opcode). Disadvantages: However this method has certain disadvantages, i.e. • These registers cannot be used to store long vectors at a time and that they must be broken down into smaller fixed length segments that can suitably fit into these register sizes. • This demands a need for careful management of the data being broken down, also careful handling while operating with these data.
CH 6 - 9. Cache is accessed by its ________, whereas main memory is accessed by its _______.
What's unique about the cache is that, unlike the main memory which is accessed by memory address, the cache entries are first checked to see if they contain the value requested. Thus, they are accessed by content.
CH 9 - 24. Given the following Omega network, which allows 8 CPUs (P0 through P7) to access 8 memory modules (M0 through M7): d) List a processor-to-memory access that is not blocked by the access P0 → M2 and is not listed in part (a).
When compared to others, the P2 to M6 connection to memory access is not blocked.
CH 7 - 33. Do CDs that store data use recording sessions?
Yes, the CDs that store data use the recording sessions. Initially, the Audio CDs used to have songs recorded in sessions. When CDs began to be used for the data storage, the idea of music "recording session" was extended to include the data recording sessions. The compact disks are tamper-resistant storage of the unlimited quantities for the documents and data offered by them
CH 7 - 20. What is zoned-bit recording?
Zoned-bit recording in disk drives is to store more sectors per track on the outer tracks than on the inner tracks. Generally, on most systems, every track contains exactly same number of sectors and each sector contains the same number of bytes, resulting in more data written at the centre of disk than at the outer edge. Thus to pack more data onto the disks, more sectors are placed on the outer tracks than on the inner tracks, known as Zoned-bit recording.
CH 5 - 28, b) The memory unit of a computer has 256K words of 32 bits each. The computer has an instruction format with four fields: an opcode field; a mode field to specify one of seven addressing modes; a register address field to specify one of 60 registers; and a memory address field. Assume an instruction is 32 bits long. Answer the following: b) How large must the register field be?
b) Since there are 60 registers in all, we would need Ceiling(log2(60))=6 bits to accommodate them. Since each register must be uniquely identified, the register selector will require 6 bits.
CH 5 - 28, c) The memory unit of a computer has 256K words of 32 bits each. The computer has an instruction format with four fields: an opcode field; a mode field to specify one of seven addressing modes; a register address field to specify one of 60 registers; and a memory address field. Assume an instruction is 32 bits long. Answer the following: c) How large must the address field be?
c) We assume that the memory is word addressable. Also, the size of the memory is 256K words. Hence, each word would need Ceiling(log2())=18bits to uniquely identify it.
CH 5 - 14, c) Convert the following expressions from reverse Polish notation to infix notation. c) X Y Z + V W - × Z + +
c. X + (Y + Z) * (V - W) + Z
CH 5 - 12, c) Convert the following expressions from infix to reverse Polish (postfix) notation. c) (W × (X + Y × (U × V)))/(U × (X + Y))
c.W X Y U V * * + * U X Y + * /
CH 5 - 28, d) The memory unit of a computer has 256K words of 32 bits each. The computer has an instruction format with four fields: an opcode field; a mode field to specify one of seven addressing modes; a register address field to specify one of 60 registers; and a memory address field. Assume an instruction is 32 bits long. d) How large is the opcode field?
d) Since the total length of an instruction is 32 bits and since the mode, register and memory address fields take up 3, 6 and 18 bits respectively, i.e. 27 bits in all, there would be 32-27=5 bits left for the opcode.
CH 7 - 23. Explain the differences between an SSD and a magnetic disk.
https://www.chegg.com/homework-help/essentials-of-computer-organization-and-architecture-4th-edition-chapter-7-problem-23ret-solution-9781284074482?trackid=7be415d1f6f7&strackid=dcbe52fdc9ce&ii=1
CH 7 - 26. How do enterprise SSDs differ from SSDs intended for laptop computers?
https://www.chegg.com/homework-help/essentials-of-computer-organization-and-architecture-4th-edition-chapter-7-problem-26ret-solution-9781284074482?trackid=e897ed615f32&strackid=4201bd35bec0&ii=1
CH 9 - 10. In what way does a VLIW design differ from a superpipelined design?
https://www.chegg.com/homework-help/essentials-of-computer-organization-and-architecture-4th-edition-chapter-9-problem-10ret-solution-9781284074482?trackid=c2f18ca88844&strackid=b2275d60b405&ii=1
CH 9 - 14. Draw pictures of the six principal interconnection network topologies.
https://www.chegg.com/homework-help/essentials-of-computer-organization-and-architecture-4th-edition-chapter-9-problem-14ret-solution-9781284074482?trackid=81fefaffdbf2&strackid=a2f9f66b1ecf&ii=1
CH 9 - 17. Describe grid computing and some applications for which it is suitable.
https://www.chegg.com/homework-help/essentials-of-computer-organization-and-architecture-5th-edition-chapter-9-problem-17ret-solution-9781284123043?trackid=1c2c2025d3c3&strackid=ee7558a193a5&ii=1
CH 9 - 20. What differentiates dataflow architectures from "traditional" computational architectures?
https://www.chegg.com/homework-help/essentials-of-computer-organization-and-architecture-5th-edition-chapter-9-problem-20ret-solution-9781284123036?trackid=b4042221c713&strackid=241e1f187b9a&ii=1
CH 9 - 24. Through what metaphor do systolic arrays get their name? Why is the metaphor fairly accurate?
https://www.chegg.com/homework-help/the-essentials-of-computer-organization-and-architecture-3rd-edition-chapter-9-problem-24retc-solution-9781449600068?trackid=0f44deb7e1e2&strackid=26644448cf6a&ii=1
CH 5 - 10. Why might stack architectures represent arithmetic expressions in reverse Polish notation?
stack architectures represent arithmetic expressions in reverse Polish notation because a stack organisation is used for evaluating the airthmetic expressions.