CS 2231 Comp Org Final

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

What is the basic execution loop for a 3 stage pipeline?

- Fetch - instruction is brought from memory or cache to the processor. Program counter is incremented. - Decode - instruction is decoded - Execute - instruction is executed

What is the basic execution loop for a 5 stage pipeline?

- Fetch - processor reads instruction from instruction memory. - Decode - processor decode instruction and reads operands from register file. - Execute - processor performs operation with ALU. - Memory - processor reads or writes data from/to memory - Writeback - processor updates register file when applicable.

What are the main computer components?

- Input - Output - Memory - Datapath // together known as - Control // the CPU

What are the common addressing modes for ARM and how do you use them?

- Register to register: ex. MOV R0, R1 - Immediate: ex. MOV R0, #15 - Register Indirect Addressing: ex. LDR R2, [R0] - Register Indirect Addressing with an Offset: ex. LDR R0, [R1, #20]

How is performance evaluated and why are benchmarks used?

- Short response time for a given piece of work - High throughput (rate of processing work) - Low utilization of computing resource(s) - High availability of the computing system or application - Fast (or highly compact) data compression and decompression - High bandwidth - Short data transmission time benchmarks are used to be able to have a more objective scale because individual category evaluations can be deceptive

How does virtual memory work?

- Virtual memory maps memory addresses used by a program, called virtual addresses, into physical addresses in computer memory (e.g. into RAM or onto the disk) - The operating system manages virtual address spaces and the assignment of real memory to virtual memory. Address translation hardware in the CPU, often referred to as a memory management unit or MMU, automatically translates virtual addresses to physical addresses.

How does the stack work? What do push and pop do? Why do we have the stack?

- manages memory - push loads memory, moves the stack pointer - pop retrieves memory, moves the stack pointer - we have the stack in order to save more data than just the temporary registers (r0-r3) between function calls

Assembly instructions to have in r0 the value 'B' 'C' 'D' 'E' in this order

.DATA w1: .ASCIZ "ABCDEFGH" .TEXT start: ldr r2,=w1 @pointer to 8 byte string add r2,r2,#1 @pointer to 2nd byte mov r3,#4 @do the following for the next 4 bytes loop: mov r0,r0, lsl #8 @make room for the next byte ldrb r1, [r2],#1 @get first word and prepare for next orr r0,r0,r1 @save result in r2 subs r3,r3,#1 bne loop .END

How does the assembler deal with ldr rd,=immediate value?

1. If the immediate value can constructed with a mov or mvn instruction (and the barrel shifter) then the assembler generates the appropriate instruction. 2. Else the assembler places the value in a "literal pool" (portion of memory to hold constant values) and generates a ldr instruction with a PC-relative address that reads the constant from the literal pool.

How is a 32-bit processor different from a 64-bit processor?

32-bit processors work with data that are 32-bits wide, while 64-bit processors work with 64-bit wide data.

What is a cache line?

A block of cells inside a cache

What is translation (compilation)?

A compiler is a computer program (or a set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language), with the latter often having a binary form known as object code.

What is a cache?

A component that stores data so future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation, or the duplicate of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot.

How can you see if a floating point representation IEEE754 is normalized or not?

A floating is normalized if the exponent is not null.

What is a register?

A special, high-speed storage area. All data must be represented in a register before it can be processed.

What are stalls?

A stall is a delay in execution of an instruction

What are the ALU, CPU, etc.?

ALU: arithmetic logic unit - digital circuit used to perform arithmetic and logic operations. - It represents the fundamental building block of the CPU of a computer CPU: central processing unit - the brains of the computer where most calculations take place. - contains the ALU and the CU CU: Control unit - directs operation of the processor - tells the computer's memory, arithmetic/logic unit and input and output devices how to respond to a program's instructions FPU: Floating Point Unit - where floating point values are calculated and stored MMU: Memory Management Unit - decodes virtual memory addresses to physical addresses

What is an ISA?

An instruction set architecture is an interface between software and hardware

What is interpretation?

An interpretor is a computer program that directly executes, i.e. performs, instructions written in a programming or scripting language, without previously compiling them into a machine language program.

What is a Von Neumann architecture?

Any stored-program computer in which an instruction fetch and a data operation cannot occur at the same time because they share a common bus.

How are assembly programs converted (translated) to machine language (binary) that the processor can understand? Why are they converted this way? Hint: why are multiple steps needed?

Assembler works in 2 passes: Pass 1: builds the symbol table Pass 2: creates machine instructions using the symbol table to calculate addresses.

Explain how the following ARM program can be used to determine if a computer is big-endian or little-endian: mov r0, #0x1000 ldr r1, = 0xABCD876 str r1, [r0] ldrb r2, [r0, #1]

Consider the following program running on a little endian machine. The least significant byte goes in the smallest address. With big endian, the most significant byte goes in the smallest address.

What are the pros/cons for the different architectures of a cache?

Direct-mapped cache (worst but simple): - each location in main memory can go in only one entry in the cache - If two locations map to the same entry, they may continually knock each other out - needs to be much larger than an associative cache to give comparable performance, and it is more unpredictable Set associative cache: - hybrid between a fully associative cache, and direct mapped cache - group slots into sets, find the appropriate set for a given address - you gain the flexibility of allowing up to N cache lines per slot for an N-way set associative scheme Fully associative cache: - best miss rates - only practical for small number of entries

What is DRAM and how does it store data?

Dynamic Random Access Memory - stores data as a charge in a capacitor - bits are organized as a rectangular array

You want to calculate the total number of bits required to implement a 32KB cache with 2 words blocks. How many lines of data are in the cache?

Each line of data is two 64 bit words = 2 x 64 bit = 16 bytes = 2^4 bytes The cache (data part) is 32KB = 2^15 bytes There are : 2^15 / 2^4 = 2^11 lines of data in the cache

How do computers compute?

Fetch, decode, execute, memory, write back (5 stage pipeline)

What are the techniques to prevent or limit stalling?

Forwarding - not waiting for the result to be put into the register, but connecting the value as soon as it can possibly be used via extra connections in the datapath Rearrange the code (compilers do this)

What are hazards?

Hazards are problems with the instruction pipeline in CPU microarchitectures when the next instruction cannot execute in the following clock cycle, and can potentially lead to incorrect computation results. Three common types of hazards are data hazards, structural hazards, and control flow hazards (branching hazards).

For a processor with at least one level of cache, does it take more or less time to access data that is in the cache than to access that same data from main memory? Why? About how much is the difference in access time?

If a processor has one level of cache, it takes approximately 1 cycle to get data from the cache to the processor compare to 50-100 to get it from DRAM because the technology is not the same (Static for SRAM/cache versus dynamic for DRAM).

If it takes 1 second for a processor to execute an instruction that does not require memory access, can you give an order of magnitude for how much it takes if it needs a DRAM access? How long - roughly - does it take to access the disk?

If it takes 1 second for a processor to access registry file. It will take a few seconds to access data in the cache, a few minutes to access data in memory and several months to access data on disk.

Why do we use floating point numbers?

Infinite precision is not possible because of limited memory, and the floating point notation gives accuracy where it is needed in the number (significand, exponent, scientific notation, and fixed-point value)

What is the formula for mean access time of the cache?

Mean access time: cache access time + (1 - hit ratio) * main mem. access time = C + (1 - H) * M

What is RISC vs. CISC?

RISC: reduced instruction set computing - Emphasis on software - Single-clock, reduced instruction only - Register to register: "LOAD" and "STORE" are independent instructions - better than CISC because the RISC use of RAM and emphasis on software has become ideal (cheaper & better) CISC: complex instruction set computing - Emphasis on hardware - Emphasis on software Includes multi-clock complex instructions -

How does a cache work?

Recently accessed data can be stored in the cache in order to increase speed of data retrieval.

Where are registers located and how many are there?

Registers are in the CPU. The number of registers depends on the architecture. ARM 32-bit has 16 registers x86-64 have 16 registers ARM 64-bit and MIPS have 31 registers

What is the difference between a register and memory?

Registers are temporary storage in the CPU that holds the data the processor is currently working on, while memory holds the program instructions and the data the program requires.

You want to calculate the total number of bits required to implement a 32KB cache with 2 words blocks. In a 64 bit address, how many bits are used for the offset in a word?

Since the memory is byte addressable, we need to be able to access one byte in a word. Each word contains 8 = 2^3 bytes - so we need 3 bits as byte offset within a word. To say it another way, when we look at the last 3 bits of an address, if the bits are 000 this is the address of a word (and the address of the byte 0 of this word), if the bits are 001 this is the address of the second byte of this word...etc. if the bits are 111 this is the address of the 8th and last byte of this word.

What are spatial locality and temporal locality and how are these exploited by caches?

Spatial locality: the use of data elements within relatively close storage locations. Temporal locality: reuse of specific data, and/or resources, within a relatively small time duration These are exploited by caches because the cache is both located close to where the instructions are being executed (inside the CPU) and also the instructions stored in the cache are expected to be used again quickly.

How do stalls relate to the pipeline?

Stalls allow an instruction pipeline to resolve a hazard.

What is a TLB and what is it used for?

TLB: Translation lookaside buffer - used to reduce the time taken to access a user memory location. - TLB stores the recent translations of virtual memory to physical memory and can be called an address-translation cache - may reside between the CPU and the CPU cache, between CPU cache and the main memory or between the different levels of the multi-level cache

What is the dirty bit?

The bit that is associated with a block of computer memory and indicates whether or not the corresponding block of memory has been modified. The dirty bit is set when the processor writes to (modifies) this memory. The bit indicates that its associated block of memory has been modified and has not yet been saved to storage. When a block of memory is to be replaced, its corresponding dirty bit is checked to see if the block needs to be written back to secondary memory before being replaced or if it can simply be removed.

You want to calculate the total number of bits required to implement a 32KB cache with 2 words blocks. In a 64 bit address, how many bits are used for the block index?

The block index is used to identify the line within the cache. Since there are 2^11 lines, we need 11 bits to indicate the block index.

What is the architectural type of the ARM processor? Does this mean program and data memory share the same address space that can be modified during program execution?

The early ARM are Von Neumann architecture with data and instruction in the same memory address space (and competing for resources like bus). Modern ones follow the Harvard architecture with data and instructions separated.

You want to calculate the total number of bits required to implement a 32KB cache with 2 words blocks. In a 64 bit address, how many bits are used for the tag?

The remaining bits of an address are used for the tag: 64 - 11 - 1 - 3 = 49 bits

You want to calculate the total number of bits required to implement a 32KB cache with 2 words blocks. What is the total bit size needed for the cache?

The volume of metadata is [49 (Tag) + 1 (Valid bit)] x 2^11 = 50 x 2^11 bits = 6.25 x 2^11 bytes Total volume for cache = 2^15 (data) + 6.25 x 2^11 (metadata) = 32768 + 12800 bytes = 45568 bytes

How are hazards addressed in some ways?

There are several methods used to deal with hazards, including pipeline stalls/pipeline bubbling, operand forwarding, and in the case of out-of-order execution, the scoreboarding method and the Tomasulo algorithm.

How are translation and interpretation different?

Translation understands code in one language and simply produces code in a target language. It does no execution of instructions. Interpretation is the actual execution of commands.

What is the memory hierarchy and why does it exist?

We can have superfast expensive memory. We can also have slow cheap memory. We want lots of superfast cheap memory. We can get close to that goal using a memory hierarchy. •At the top, a little bit of superfast expensive memory: CPU registers. • Each next level is somewhat slower, and somewhat cheaper top level - smallest and fastest 1: registers 2. cache 3. main memory 4. magnetic / solid state disk 5. tape / optical disk bottom level - biggest and slowest

You want to calculate the total number of bits required to implement a 32KB cache with 2 words blocks. In a 64 bit address, how many bits are used for the block offset?

We have 2 blocks per entry, so one bit will be enough to tell us which block is referred.

Why do we use translators and/or interpreters?

We use translators in order to be able to write code in high-level languages. We use interpreters to execute command languages since each operator executed in command language is usually an invocation of a complex routine such as an editor or compiler.

If a multiply instruction is not available, how can it be created using loops and addition?

You can add the two numbers together and loop until the addition has been executed the desired number of times.

What are the conventions when calling a subroutine for registers r0-r15?

r0-r3: Should not expect their values to be saved. Could expect return values r4-r12: Caller expects them to be left intact. Callee pushes to the stack those used in the subroutine and pops when returned. r13(sp): Caller expects it to be used wisely. Callee should use it wisely (e.g. pop after push) r14(lr): Caller does nothing; sets return value automatically. Callee uses good programming habits to push/pop on stack. r15(pc): Caller expects to be used wisely. Callee uses it for returning.

What is the minimum number of simultaneous supported reads and writes needed for a register file to work with the ARM ISA ?

reads: 2 writes: 1


Ensembles d'études connexes

SECURITY + WIRELESS DEFENSES 5.12

View Set

11-Powers of Congress Flash Cards

View Set

PrepU Oxygenation Questions (In progress)

View Set

Cognitive Psychology (McBride &Cutting) Chapter 4

View Set