Comp Org - More Theory
Why are immediate versions of instructions (such as ADDI) included in various instruction sets?
Immediate instructions are used to avoid any LOAD instruction, and perform operations which have one of the operands a constant.
Which are the five classic components of a computer system and what is their function?
Input - the computer gets information from the user (keyboard, microphone, scanner) Output - the computer displays information (monitor, printer, audio) Memory - it is a storage that keeps data needed to run a program. Data path - part of the CPU that performs arithmetic or other operations on data. Control - part of the CPU that commands data path, I/O and memory.
Explain the purpose of the branch-and-link (BL) instruction.
It branches to an address and simultaneously saves the address of the following instruction in register LR (X30). The link portion of the name means that an address or link is formed that points to the calling site to allow the procedure to return to the proper address. This "link," stored in register LR (register 30), is called the return address.
Personal PCs
It is a computer used by a single user, has low cost, attached with keyboard, mouse, graphics display. General purpose. Variety of software.
Servers
It is a computer used to run programs for multiple users, maybe at the same time, accessed or linked via a network. Used for file storage, small business applications, simple web serving. High capacity, performance.
Performance via Prediction
It is better to guess and start working, while thinking, than starting only when you know for sure. Assuming that the mechanism to recover from a misprediction is not too expensive and your prediction is relatively accurate.
Why is it more efficient to add two values stored in CPU registers than to add two values stored in the main memory? Do these operations require the same number of assembly instructions?
It is more efficient to add two values in CPU registers, because the only operation needed is ADD. Meanwhile, while adding two values stored in the main memory, at first there are two LOAD operations, one for each, and then the ADD operation. Registers are faster to access than memory and have higher throughput. Compiler needs the most used variables in registers.
What is the role of linker in the program compilation procedure?
Linker is a systems program that combines independently assembled machine language programs and resolves all undefined labels into an executable file. There are three steps for the linker: 1. Place code and data modules symbolically in memory. It finds the old addresses and replaces them with the new addresses. 2. Determine the addresses of data and instruction labels. Determines the memory locations each module will occupy. When the linker places a module in memory, all absolute references, that is, memory addresses that are not relative to a register, must be relocated to reflect its true location. 3. Patch both the internal and external references. The linker produces an executable file that can be run on a computer. Typically, this file has the same format as an object file, except that it contains no unresolved references.
Explain the purpose of LDURB instruction, and how it differs from LDURSB.
Load byte (LDURB) treats the byte as an unsigned number and thus zero extends to fill the leftmost bits of the register, while load byte signed (LDURSB) works with signed integers.
Explain the phenomenon of power wall and its implications on microprocessor progress.
Lowering of the voltage appears to make the transistors too leaky. So, the power wall has to do with the facts that the voltage cannot be further reduced, and the heat cannot be removed. To try to address the power problem, designers have already attached large devices to increase cooling, and they turn off parts of the chip that are not used in a given clock cycle. E.g.: Power is a challenge for integrated circuits for two reasons. First, power must be brought in and distributed around the chip; modern microprocessors use hundreds of pins just for power and ground. Similarly, multiple levels of chip interconnect are used solely for power and ground distribution to portions of the chip. Second, power is dissipated as heat and must be removed. Server chips can burn more than 100 watts, and cooling the chip and the surrounding system is a major expense in warehouse scale computers
We could represent numbers as strings of ASCII digits instead of as integers. How much does storage increase if the number 1 billion is represented in ASCII versus a 32-bit integer?
One billion is 1,000,000,000, so it would take 10 ASCII digits, each 8 bits long. Thus the storage expansion would be (10 × 8)/32 or 2.5. Beyond the expansion in storage, the hardware to add, subtract, multiply, and divide such decimal numbers is difficult and would consume more energy. Such difficulties explain why computing professionals are raised to believe that binary is natural and that the occasional decimal computer is bizarre.
What are addressing modes? Describe at least three different addressing modes supported in LEGv8.
PC-relative addressing - An addressing regime in which the address is the sum of the program counter (PC) and a constant in the instruction. addressing mode One of several addressing regimes delimited by their varied use of operands and/or addresses. The addressing modes of the LEGv8 instructions are the following: 1. Immediate addressing, where the operand is a constant within the instruction itself. 2. Register addressing, where the operand is a register. 3. Base or displacement addressing, where the operand is at the memory location whose address is the sum of a register and a constant in the instruction. 4. PC-relative addressing, where the branch address is the sum of the PC and a constant in the instruction.
State the advantages of a high-level programming language over assembly and machine languages.
Programmer thinks and writes code using English and simple algebra, which are familiar. These languages are designed related to a specific purpose. Less time required, few code lines, more productive (conciseness). These codes are independent from the computer on which they are developed.
Supercomputers
They are servers with high performance and high cost. Used for scientific and engineering calculations and large scale problems (weather forecast).
Make the common case fast
This tend to enhance (boost) performance better than optimizing the rare case. Ironically, the common case is often simpler than the rare case and hence is usually easier to enhance (improve). This common sense advice implies that you know what the common case is, which is only possible with careful experimentation and measurement.
Explain the difference between branch register (BR) and unconditional branch (B). Give one situation where BR is necessary.
To support the return from a procedure, computers like LEGv8 use the branch register instruction (BR), introduced above to help with case statements, meaning an unconditional branch to the address specified in a register: BR LR The branch register instruction branches to the address stored in register LR—which is just what we want. PC (Program Counter) is the register containing the address of the instruction in the program being executed.
What are the steps followed by a program to execute a procedure call?
1. Put parameters in a place where the procedure can access them. 2. Transfer control to the procedure. 3. Acquire the storage resources needed for the procedure. 4. Perform the desired task. 5. Put the result value in a place where the calling program can access it. 6. Return control to the point of origin, since a procedure can be called from several points in a program. X0-X7: eight parameter registers in which to pass parameters or return values. n LR (X30): one return address register to return to the point of origin. X9-X17: temporary registers that are not preserved by the callee (called procedure) on a procedure call n X19-X28: saved registers that must be preserved on a procedure call (if used, the callee saves and restores them) The stack pointer (SP), which is just one of the 32 registers, is adjusted by one doubleword for each register that is saved or restored. Stacks are so popular that they have their own buzzwords for transferring data to and from the stack: placing data onto the stack is called a push, and removing data from the stack is called a pop.
Embedded computer
A computer designed inside another device, used to run an application or software.
Good design demands good compromises
A problem occurs when an instruction needs longer fields than those shown above. For example, the load register instruction must specify two registers and a constant. If the address were to use one of the 5-bit fields in the format above, the largest constant within the load register instruction would be limited to only 25−1 or 31. The compromise chosen by the LEGv8 designers is to keep all instructions the same length, thereby requiring distinct instruction formats for different kinds of instructions. (R / D / I formats)
Why are LDURB, STURB, LDURH, and STURH instructions preferred in some situations over the LDUR and STUR instructions?
A series of instructions can extract a byte from a doubleword, so load register and store register are sufficient for transferring bytes as well as words. Because of the popularity of text in some programs, however, LEGv8 provides instructions to move bytes. Load byte (LDURB) loads a byte from memory, placing it in the rightmost 8 bits of a register. Store byte (STURB) takes a byte from the rightmost 8 bits of a register and writes it to memory. The LEGv8 instruction set has explicit instructions to load and store such 16-bit quantities, called halfwords. Load half (LDURH) loads a halfword from memory, placing it in the rightmost 16 bits of a register. Like load byte, load half (LDURH) treats the halfword as a signed number and thus sign-extends to fill the 48 leftmost bits of the register. Store half (STURH) takes a halfword from the rightmost 16 bits of a register and writes it to memory.
Use abstraction to simplify design
Abstraction has to do with using more productive techniques. It characterizes the design at different levels of representation: lower-level details are hidden to offer a simpler model at higher levels. E.g.: from application to primitive instructions are several layers. / instruction set architecture - from hardware to lower-level software.
Performance via Pipelining
An example of parallelism. The technique of allowing the steps in the machine cycle to overlap. In particular, while one instruction is being executed, the next instruction can be fetched, which means that more than one instruction can be in "the pipe" at any one time, each at a different stage of being processed.
State the primary differences between CISC and RISC ISA's and provide at least one example ISA for each category.
CISC is complex and non-uniform machine language instructions; do more in fewer lines of assembly. E.g.: Intel x86. RISC - hardware design and instructions are simpler; do less in more lines. Fixed instruction lengths, Load-Store instructions, limited addressing modes, limited operations. E.g.: ARM, MIPS.
Define CPU execution time, user CPU time and system CPU time.
CPU execution time is the time CPU spends for a task. It is divided in these 2: User CPU time is the one spent in the program. System CPU time is the one spent from the OS, that performs tasks on behalf of the program.
Performance via Parallelism
Computing many operations in parallel. Scheduling, load balancing, time for synchronization, and overhead for communication between the parties.
What is the difference between conditional and unconditional branches? Illustrate with an example.
Conditional branches branch to a labeled instruction, if the condition is true. Unconditional branches branch to a labeled instruction directly, without checking any condition.
Make the common case fast
Constant operands occur frequently, and by including constants inside arithmetic instructions, operations are much faster and use less energy than if constants were loaded from memory. The constant zero has another role, which is to simplify the instruction set by offering useful variations. For example, the move operation is just an add instruction where one operand is zero. Hence, LEGv8 dedicates a register XZR to be hard-wired to the value zero. (It corresponds to register number 31.) Using frequency to justify the inclusions of constants is another example of the great idea of making the common case fast.
Dependability via Redundancy
Dependability has to do with the security. including redundant components that can take over when a failure occurs and to help detect failures.
Compare and contrast characteristics of RAM and magnetic storage.
RAM is volatile, it loses information when computer loses power. The magnetic storage is nonvolatile, it does not lose information and stores program data. RAM holds programs when they are running. Magnetic storage stores
Explain the technique of program interpretation through the use of virtual machines.
Rather than compile to the assembly language of a target computer, Java is compiled first to instructions that are easy to interpret: The Java bytecode instruction set. A software interpreter, called a Java Virtual Machine (JVM), can execute Java bytecodes. To preserve portability and improve execution speed, the next phase of Java's development was compilers that translated while the program was running. Such Just In Time compilers (JIT) typically profile the running program to find where the "hot" methods are and then compile them into the native instruction set on which the virtual machine is running. The compiled portion is saved for the next time the program is run, so that it can run faster each time it is run. This balance of interpretation and compilation evolves over time, so that frequently run Java programs suffer little of the overhead of interpretation.
How are LSL and LSR used as shortcuts to perform multiplication and division in LEGv8?
Shifting left by i bits gives the identical result as multiplying by 2^i Shifting right by i bits gives the identical result as dividing by 2^i
What are pseudoinstructions? Give two examples of pseudoinstructions and show how they are translated to instructions supported by the ISA.
Since assembly language is an interface to higher-level software, the assembler can also treat common variations of machine language instructions as if they were instructions in their own right. The hardware need not implement these instructions; however, their appearance in assembly language simplifies translation and programming. Such instructions are called pseudoinstructions.
Design for Moore's Law
The Moore's Law states that integrated circuit resources double every 1.5 - 2 years. The focus is on thinking where the technology will be when a certain design finishes, and not design for where it starts.
What is the Moore's law and how does it apply to computer industry?
The Moore's Law states that integrated circuit resources double every 1.5 - 2 years. The focus is on thinking where the technology will be when a certain design finishes, and not design for where it starts.
Hierarchy of Memories
The fastest, smallest, and the most expensive memory per bit at the top of the hierarchy and the slowest, largest, and cheapest per bit at the bottom. Caches help memory, giving the idea of being fast as top of hierarchy, and large & cheap as bottom of hierarchy.
What is the stored-program concept and why is it considered powerful in computer architecture?
The idea that instructions and data of many types can be stored in memory as binary numbers and thus be easy to change, leading to the stored-program computer. Compilers can operate on these programs. Binary allows the program to run in different computers.
What are dynamically loaded libraries? What are their advantages?
The library routines become part of the executable code. If a new version of the library is released that fixes bugs or supports new hardware devices, the statically linked program keeps using the old version. It loads all routines in the library that are called anywhere in the executable, even if those calls are not executed. The library can be large relative to the program; for example, the standard C library is 2.5 MB. These disadvantages lead to dynamically linked libraries (DLLs), where the library routines are not linked and loaded until the program is run. Both the program and library routines keep extra information on the location of nonlocal procedures and their names. In the original version of DLLs, the loader ran a dynamic linker, using the extra information in the file to find the appropriate libraries and to update all external references. In summary, DLLs require additional space for the information needed for dynamic linking, but do not require that whole libraries be copied or linked. They pay a good deal of overhead the first time a routine is called, but only a single indirect branch thereafter. Note that the return from the library pays no extra overhead. Microsoft's Windows relies extensively on dynamically linked libraries, and it is also the default when executing programs on UNIX systems today.
How does the choice of a particular algorithm affect program performance?
The more instructions the algorithm provide, the bigger the executing time is, and the "worse" the performance is. The algorithm determines the number of source program instructions executed and hence the number of processor instructions executed. The algorithm may also affect the CPI, by favoring slower or faster instructions. For example, if the algorithm uses more divides, it will tend to have a higher CPI.
Simplicity favors regularity
The natural number of operands for an operation like addition is three: the two numbers being added together and a place to put the sum. Requiring every instruction to have exactly three operands, no more and no less, conforms to the philosophy of keeping the hardware simple: hardware for a variable number of operands is more complicated than hardware for a fixed number.
What is the difference between response time and throughput? Illustrate with an example.
The response time is the time needed from the computer to finish a certain task. Throughput (bandwidth) is the capacity or the amount of the work done in a certain time. E.g.: If you were running a program on two different desktop computers, you'd say that the faster one is the desktop computer that gets the job done first. If you were running a datacenter that had several servers running jobs submitted by many users, you'd say that the faster computer was the one that completed the most jobs during a day.
Explain how the LEGv8 instruction format and field sizes limit: the maximum number of CPU general purpose registers.
The restriction that the three operands of LEGv8 arithmetic instructions must each be chosen from one of the 32 64-bit registers. A very large number of registers may increase the clock cycle time simply because it takes electronic signals longer when they must travel farther. Another reason for not using more than 32 is the number of bits it would take in the instruction format
Smaller is faster
The restriction that the three operands of LEGv8 arithmetic instructions must each be chosen from one of the 32 64-bit registers. A very large number of registers may increase the clock cycle time simply because it takes electronic signals longer when they must travel farther. Another reason for not using more than 32 is the number of bits it would take in the instruction format
Define the term "word" and explain why it is significant in computer architecture.
The term "word" characterizes the instructions of a computer's language. It is a unit of access in a computer, equal to 32 bits. Double word (64 bits) is the size of a register in LEGv8 architecture.
Why is the use of MIPS (Millions of Instructions Per Second) not a reliable measure of computer performance?
There are three problems with using MIPS as a measure for comparing computers. First, MIPS specifies the instruction execution rate but does not take into account the capabilities of the instructions. We cannot compare computers with different instruction sets using MIPS, since the instruction counts will certainly differ. Second, MIPS varies between programs on the same computer; thus, a computer cannot have a single MIPS rating. Finally, and most importantly, if a new program executes more instructions but each instruction is faster, MIPS can vary independently from performance.
Explain how the LEGv8 instruction format and field sizes limit: the maximum value of immediate constants.
While we could have used the D-format instruction since it has a 9-bit field holding a constant, the ARMv8 architects decided it would be useful to have a larger immediate field for these instructions, even shaving a bit from the opcode field to make a 12-bit immediate.