iCS Qs

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

how many cycles will the following instruction sequence require to complete on a single-cycle processor? addi $s1, $zero, 4 addi $s2, $zero, 10 addi $t0, $s1, 6 bne $t0, $s2, label lw $t0, 0($s1) label: lw $t1, 4($s1)

6 - in a single-cycle processor, each instruction takes 1 clock cycle to complete and this instruction sequence consists of 6 instructions

both CPU and DMA controller need to access memory. how is their access managed so that attempts at simultaneous accesses are avoided (bc both accesses take place over memory bus)? explain the signals involved in this management

bus arbiter - grants signals and requests signals (handles conflicts) CPU has priority if requests come in simultaneously from CPU and DMA controller

if (p+i) points to a[i], then *(p+i) is equivalent to...

a[i]. so (p+i) -> (*(p+i) == a[i])

translate this MIPS assembly language into machine language: A[300] = h + A[300]

assign $t1 = base array A, $s2 = h lw is an immediate op so: op 35 (see table), ask selin for rest lw $t0, 1200($t1) r-type, so op, rs, rt, rd, shamt, funct add $t0, $s2, $t0 i-type, so op, rs, rd, constant/address sw $t0, 1200($t1)

write this in MIPS (compile this C assignment using registers): f = (g+h)-(i+j)

assign f = $s0, g = $s1, h = $s2, i = $s3, j = $s4 add $t0, $s1, $s2 add $t1, $s3, $s4 sub $s0, $t0, $t1

C code: int a = 20; *a = 10; What is the value (in hex) assigned to the pointer?

assigns value to pointer a - makes it point to address 0x0000000A in memory (10 = A). stores 20 in 0x0000000A.

A processor with one level of cache executes a load word (lw) instruction and experiences a cache miss. Which of the following most accurately describes what happens next? (a) A cache block containing the requested word is moved from memory to cache (so only the cache has the block) (b) A cache block containing the requested word is copied from memory to cache (both cache and memory have the block) (c) Only the word requested by the load instruction is delivered to the processor (only memory has the block) (d) It depends on the programmer's intent (e) None of the above

b - block is copied from mem to cache

in a write-back cache, why must we first write the block back to memory if the data in the cache is modified and we have a cache miss?

bc if we don't, if we just overwrite the block on a store instruction before knowing whether the store had hit in the cache (as we could in write-through cache), we would destroy the contents of the block, which is not backed up in the next lower level of memory hierarchy

compile the following if-then-else into conditional branches: if f, g, h, i, and j correspond to $s0-4, find MIPS code for: if (i == j), f = g+h; else f = g-h

bne $s3, $s4, Else add $s0, $s1, $s2 j Exit Else: sub $s0, $s1, $s2 Exit:

Which of the following is not part of a MIPS processor's exception handling mechanism? (a) vectored interrupt (b) interrupt mask (c) conditional return instruction (cret) (d) Cause register (e) all of the above are part of the exception handling mechanism

c - conditional return (cret) does not exist in MIPS

what does hardware move memory between?

caches and main memory (SRAM and DRAM) - SW and programmers don't rlly care for caches

how to calculate propagation delay?

calculate no. gates and add up separate propagation delays - only account for longest path

write a C program with double pointers to make a strTable

char **strTable; strTable = malloc(n*sizeof(char *)); for (i = 0; i < n; i++) { *(strTable+i) = malloc(l*sizeof(char)); strcpy(strTable[i], s); } // strTable[i][j] == *(*(strTable+i)+j)

what does the software (compiler and OS) move memory between?

compiler - registers and main mem (DRAM) OS - main mem (DRAM) and disk/SSD

how can you clear the nth bit (from the RHS) of a binary word w?

create a mask m where only the required bit Is cleared, with all other bits set, then bitwise AND w with m (e.g. 1110 1111 AND blah will always set 5th bit to 0) in general: bitwise AND with mask word to reset (set to 0) selected bits in a word

how can you set the nth bit (from the RHS) of a binary word w?

create a mask m where only the required bit Is set, with all other bits cleared, then OR w with m (e.g. 0001 0000 OR blah will always set 5th bit to 1)

how can you flip the nth bit (from the RHS) of a binary word w?

create a mask m where only the required bit Is set, with all other bits cleared, then bitwise XOR w with m (e.g. 0001 0000 AND blah will always set 5th bit to 0) in general: bitwise XOR with mask word to invert selected bits in word (XOR 0110)

Which of the following is a feature of 2's complement representation? (a) Add/subtract operations do not depend on the sign of the operands (b) Sign-extension is trivial to perform (c) Range of positive and negative values is symmetric (d) (a) and (b) (e) (a), (b) and (c)

d - 2's complement numbers have an asymmetric range of pos/neg values

Which one of the following is sufficient to construct a ripple-carry adder? (a) Clock (b) AND gate (c) OR gate (d) AND gate and OR gate (e) NAND gate

d - A ripple-carry adder requires AND + OR + NOT, meaning it needs functional completeness to implement. Only NAND provides that in the answer choices given.

what is the heap used for in C programming?

data allocated via a call to malloc may reside here

should software or hardware be responsible for moving data between levels of the memory hierarchy?

depends; there is a trade off between ease of programming, hardware complexity, and performance

Which of the following might limit the performance of a multi-cycle processor? (a) Number of cycles needed to complete an instruction (b) Clock frequency of the processor (c) Propagation delay incurred in each cycle (d) (a) and (b) (e) (a), (b) and (c)

e - all 3 are important. Performance is clock frequency * number of cycles per in- struction. Propagation delay dictates clock frequency.

Which of the following will not trigger a MIPS processor exception? (a) TLB miss (b) disk I/O completing (c) syscall instruction (d) user clicking a mouse button (e) all of the above will trigger an exception

e - all will trigger an exception

(tut 4 ans) discuss JR instruction in single-cycle datapath

fetch and decode instruction as usual. $rs specifies jump target, whos value is read out through reg 1 reg file output. Jump control signal set to 1 to select register as new PC (new mux in dp selects between branch PC and jump register; new control signal, Jump, to choose appropriate mux input)

explain how MIPS handles exceptions/interrupts. describe which tasks are handled by hardware and which by software.

first hardware task: essentially insert a special type of jump instruction into program flow that saves current PC in EPC (register). status (dual-mode architecture) changed to kernel mode, with cause register $v0 recording reason for interrupt. secondly: OS takes control (kernel mode), exception handling. when finished, and depending on what exception was for, original program can be run again (or context switch may happen if time allocated to that process lapses). finally: eret - exception return (via hardware) used to transfer control back to user mode and set PC to appropriate return address

what is the stack used for in the C programming language?

for a method call, the following can be placed on/reside on the stack: - input parameters (placed) - local variables (reside) - Return PC (placed)

in choosing fixed-size instructions, MIPS sacrifices what in order to have what?

in choosing fixed-size instructions, MIPS sacrifices memory used in order to have simple hardware

what does *++p mean?

increment p, then use value pointed by p

what does (*p)++ mean?

increment value pointed by p, but keep p unchanged

given a 4KB, 4-way set-associative cache with 4-byte blocks and 32-bit addresses, how many tag, index and offset bits does the address decompose into?

index: log2(num_sets) = log2(size_cache/num_blocks/n) = log2(2KB/4/4) = log2(256) = 8 bits offset: log2(size_blocks) = log2(4) = 2 bits tag: 32-8-2 = 22 bits

what is the equation for execution time?

instruction count x cycles per instruction x cycle time

what will the following piece of C code do and why? int *a = 10; *a = 100;

it will make the program crash because int *a = 10; sets pointer to point to address 0x0000000A in memory HOWEVER this memory hasn't been allocated (afaik) and is not accessible by the user's program. *a = 100; statement tries to change that piece of memory and give it a value of 100, which results in the program crashing

what does the following piece of C code do? int x[10]; ... // initialize x[] with valid values int *p = &x[0]; int i; for (i = 0; i < 10; i++) { *p = (*p)--; p++; }

iterates over an array of integers starting at the beginning, decrementing the value of each array element by 1.

what denotes the end of a string in C?

"\0"

what is the output of: 1) slt $t0, $s0, $s1 2) sltu $t1, $s0, $s1 where $s0 contains bin num: 1111 1111 1111 1111 1111 1111 1111 1111 and $1 contains bin num: 0000 0000 0000 0000 0000 0000 0000 0000

$t0 = 1 $t1 = 0

what does this do? lw $t0, 8($s3)add $s1, $s2, $t0

$t0 = A[8] g = h+A[8]

if $t1 contains 0000 0000 0000 0000 0011 1100 0000 0000 and $t3 contains $zero, what is result of: nor $t0, $t1, $t3

$t0 will contain: 1111 1111 1111 1111 1100 0011 1111 1111_two

if $t0 contains 0000 0000 0000 0000 0000 1101 1100 0000 and $t1 contains 0000 0000 0000 0000 0011 1100 0000 0000, what is the result of: and $t2, $t1, $t2 ?

$t2 will contain: 0000 0000 0000 0000 0000 1100 0000 0000_two

if $t0 contains 0000 0000 0000 0000 0000 1101 1100 0000 and $t1 contains 0000 0000 0000 0000 0011 1100 0000 0000, what is the result of: or $t2, $t1, $t2 ?

$t2 will contain: 0000 0000 0000 0000 0011 1101 1100 0000

what is &a[1] equivalent to?

&a[0]+1

Assume that a register contains the 32-bit representation of the number -128 in 2s complement form. That register is then stored via a store word instruction starting at address A on a machine that is big endian. The byte stored at address A in memory in hex is:

0xFF bc 32-bit signed representation of -128 is 11111111 11111111 11111111 10000000 so in big endian, that is 0xFFFFFF80 where address A has FF, address B FF, address C FF and then address D 80

what is 0 in ASCII (hex)

0x30

what is A in ASCII (hex)

0x41

what is the result of this as a hex no.? 0x5e AND 0x30

0x5e = 0101 1110 0x30 = 0011 0000 ANDing the two no.s gives 0001 0000

what is a in ASCII (hex)

0x61

(lecture) discuss the steps in executing the lw instruction in the single-cycle datapath

1 / rs -> read reg 1 rt -> write register 2/ reg 1 added with sign-extended immediate to form memory address from which to read 3/ read data from rs (reg 1) plus the immediate, then write data into register rt

Show the IEEE Standard 754 single-precision and double-precision binary representation of floating-point no. -0.75_ten

1) denary: -0.75_ten = -3/4_ten = (-3/(2^2))_ten 2) binary fraction: (-11_bin/(2^2)_ten) = -0.11_two 3) normalised scientific notation: -1.1x2^(-1) 4) general rep for single precision no.: [signed bit, s](1+mantissa)*(2^y) -> [here s = 1 bc -ve; exponent in excess-127 notation: -1+127=126] - don't really care abt 1. 5) single precision: 1 126_10 mantissa0000... = 1 1111110 10000000000 0000000000000 (check 32 bits) (s = 1, 126 = 11111101, fraction is the rest (og .blah extended)) 1) bias for double-p = 1023 2) [signed bit = 1][y = -1, so -1+1023 = 1022 = 011111111110][mantissa 1000...] (check 64 bits): 3) double precision: 1011 1111 1110 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

tut5 mcp discuss the steps in executing the add instruction in the m-c datapath (4 cycles)

1) during the first cycle, we fetch the instruction register from the memory location indicated by current PC, and increment PC by 4. 2) during the 2nd cycle, we read the values in the register file registers addressed by bits 24-21 (rs) and bits 20-16 (rt) of the loaded instruction into registers A and B respectively. Also use ALU to speculatively compute a branch destination address (extra step). 3) ADD is a register-register operation, so in the 3rd cycle, we compute the addition using the ALU of the contents of register A and register B. 4) In the 4th cycle, load the register indicated by the bits 15-11 of the instruction register (rd) with the computed value.

using 8 bits, represent the number +13 and -4, find both 2's complement and sign-magnitude binary representation. follow up q, performing common binary addition of the number in both representations, which is right and which is wrong?

+13 = 0000 1101 (both reps) -4 = 1000 0100 (signed-magnitude), 1111 1100 (2's complement) adding in signed-mag: -17 (incorrect) adding in 2's comp: 9 (correct) can do addition and subtraction in 2's comp, but NOT sign-mag

in which ways can a compiler influence (by affecting...) processor performance of a multi-cycle processor?

- by affecting the number of instructions in a program - by affecting the type of instructions used in a program - by affecting the number of clock cycles a given program takes to execute

tut5 mcp discuss the steps in executing the lw instruction in the m-c datapath (4 cycles)

1) during the first cycle, we fetch the instruction register from the memory location indicated by current PC, and increment PC by 4. 2) during the 2nd cycle, we read the values in the register file registers addressed by bits 24-21 (rs) and bits 20-16 (rt) of the loaded instruction into registers A and B respectively. Also use ALU to speculatively compute a branch destination address (extra step). 3) LW is a memory instruction, so in the 3rd cycle, we form the effective address, using the ALU to add the A register contents to the sign-extended immediate value from the instruction register. 4) In the 4th cycle, access memory using the address computed in 3rd cycle. The loaded data is stored in the Memory Data Register (MDR). Load the destination register indicated by bits 20-16 of Instruction Register w value in MDR.

tut5 mcp discuss the steps in executing the jal instruction in the m-c datapath (3 cycles)

1) during the first cycle, we fetch the instruction register from the memory location indicated by current PC, and increment PC by 4. 2) during the 2nd cycle, we read the values in the register file registers addressed by bits 24-21 (rs) and bits 20-16 (rt) of the loaded instruction into registers A and B respectively. Also use ALU to speculatively compute a branch destination address (extra step). 3) in 3rd cycle, save current PC (PC+4, where PC is address of JAL instruction) to register 31. extend write register mux to take 31 as extra input and extend write data mux to take current value in PC. At the same time, compute jump location by shifting lower 26 bits of Instruction Register left by 2 (pad w 2 0s) PLUS concatenate with upper 4 bits of current PC. Then, load this new value (4 from current PC + 26->28) into PC at end of 3rd cycle. NOTE: memory operation costs more than access to the register file and ALU, so more cost efficient way is to use cache to provide a fast access for vast majority of memory operations

What decimal no. is presented in the single precision float? 1100 0000 1010 0000 0000 0000 0000 0000

1) s = 1 (-ve) 2) conversion: ((-1)^s)(1+Fraction)(2^(Pow-Bias)) = (-1)(1+(2^-2))(2^(129-127)) = -1(1.25)(4) = -5

list the 4 steps in MIPS execution handling mechanism

1) save address of current instruction into EPC 2) transfer control to OS at known address 3) handle exception 4) return to user program execution (restore user regs, jump to EPC via eret)

tut 4 ans: discuss the steps in executing the sw instruction in the single-cycle datapath

1/ reg 1 is added with sign-extended immediate to form memory address in which to write 2/ meanwhile reg 2 (write value) skips mux and goes from the register file directly to the write data port of data memory 3/ mem write control signal is set to enable the memory to perform the write

translate this hex to binary: eca8 6420_hex

1110 1100 1010 1000 0110 0100 0010 0000_two

translate this binary to hex: 0001 0011 0101 0111 1001 1011 1101 1111_two

1357 abdf_hex

what is the size of the page table given a 32-bit virtual address space, 4KB phsyical pages, and 1GB main memory?

1GB data = 30 bits in main memory Each PTE contains PPNs (physical addresses); offset: 12 bits = log2(physical page sizes) = log2(4KB) - offset NOT translated since offset not translate, 2^(30-12) = 2^18 bits for page table size

Consider that you have a direct-mapped TLB with 64 entries and 4KB physical pages. Comment on how the max amount of memory the application can use to ensure 0% TLB miss rate would compare with that from fully-associative cache.

A direct-mapped TLB functions similarly to a direct-mapped cache. Page table entries map to different locations in the TLB based on their index. Two page table entries with the same index will map to the same TLB location, and the latter can cause the eviction of the former - conflict. Note that the eviction takes place even if there are empty/unused locations in TLB. Therefore, the amount of application memory with 0% TLB miss rate allowed w directly-mapped TLB >= amount of application memory with 0% TLB miss rate allowed w fully-associative TLB.

compile this C while loop in MIPS: while (save[i] == k) { i += i } (where $s3 contains i and $s5 contains k, and $s6 contains the base of the save array)

Loop: sll $t1, $s3, 2 # multiply the index by 4 due to byte-addressing issue when incrementing i add $t1, $t1, $s6 lw $t0, 0($t1) bne $t0, $s5, Exit addi $s3, $s3, 1 j Loop Exit:

which gate is sufficient to construct a ripple-carry adder?

NOR

compile this C procedure in MIPS (it doesn't call another procedure): int leaf_example (int g, int h, int i, int j) { int f; f = (g+h) - (i+j); return f; }

Parameter variables {g h i and j} correspond to $a0, $a1, $a2, $a3; f corresponds to $s0. leaf_example: addi $sp, $sp, -12 #adjusting stack to make room for 3 items sw $t1, 8($sp) sw $t0, 4($sp) sw $s0, 0($sp) add $t0, $a0, $a1 add $t1, $a2, $a3 sub $s0, $t0, $t1 add $v0, $s0, $zero lw $s0, 0($sp) lw $t0, 4($sp) lw $t1, 8($sp) addi $sp, $sp, 12 #adjusting stack to delete 3 items jr $ra #to end procedure w jump register using return address

Corporation M wants to improve the performance of a multi-cycle processor. Jane says that decreasing the average number of cycles that an instruction will take to complete is the way to go. Meanwhile, Jack says that increasing the average number of cycles per instruction is a better option. Succinctly explain how each of these options could be valid in improving the performance of M's multiprocessor. Treat each option independently of the other one and clearly state under what conditions the option will be beneficial.

Performance - execution time (which we want to be short as poss) - is a function of the cycle count and cycle time Decreasing avg cycles reduces cycle count; this is great if NOTTTT accompanied by a proportional increase in cycle time Increasing avg cycles can be useful to reduce the number of gates per stages, hence reducing the propagation delay and hence, cycle time Reduction in cycle time must offset increase in average cycle count

You have a fully associative TLB with 64 entries deployed in a system that uses 4KB physical pages. Consider a latency-critical application where TLB misses cannot be afforded. What is the maximum amount of memory the app can use to ensure a 0% TLB miss rate?

The TLB can store 64 translations. Each physical page is 4KB. Total memory for which translations can be stored in the TLB = 64 * 4K = 256KB. If the application allocates and uses 256KB of memory, there will be no TLB misses as all translations can be stored in TLB.

A person may open a particular door (1 of 2) if they have a card containing the corresponding code and enter an authorized keypad code for that card. The outputs from the card reader are as follows: Status | A | B No card inserted | 0 | 0 Valid code for door 1 | 0 | 1 Valid code for door 2 | 1 | 1 Invalid card code | 1 | 0 To unlock a door, they must hold down the proper keys on the keypad and, then, insert the card in the reader. The authorized keypad codes for door 1 are 101 and 110, and the authorized keypad codes for door 2 are 101 and 011. If the card has an invalid code or if the wrong keypad code is entered, the alarm will ring when the card is inserted. Design the security system control unit logic (i.e the logic functions). The control unit's inputs will consist of a card code AB, and a keypad code CDE. Output X, Y, Z = door 1, door 2, alarm sounding, respectively. (tut 3)

The control logic comprises of three outputs; X, Y and Z. The logic functions for X, Y and Z can be derived using a truth table. Using sum of products (SOP) notation: X = ~A · B · C · ~D · E + ~A · B · C · D · ~E; Y = A · B · ~C · D · E + A · B · C · ~D · E; Z = everything else A | B | C | D | E || X | Y | Z - then figure out (set up appropriately, with all As and all Bs in one section = 0 or 1 or wtv, CDE vary accordingly)

what is the dual mode protection mechanism found in MIPS and other modern processors? Why is it used and when does teh processor mode change?

This mechanism exists so that user programs cannot access resources (e.g. peripherals, memory allocated to other processes, etc.) directly without calling the operating system. This ensures that programs with bugs (or malicious programs) cannot affect other processes executing on the machine. The hardware helps to enforce this by providing two processor operating modes: user and kernel. Instructions that perform I/O, change the values held in special registers, such as those that mask interrupts, and instructions that change the processor's operating mode are privileged and can only be executed in kernel mode. Only the operating system can use them and the only way to call the operating system is to force an exception. Therefore each time an exception happens, the processor enters into kernel mode. After the exception is handled, the processor can switch into user mode by executing a eret; a privileged instruction.

The job of the Operating System is to protect which of the following system resources (a) The processor's TLB (b) The processor's cache (c) The processor's ALU (d) (a) and (b) (e) (a), (b) and (c)

a - Only the TLB needs protection. The OS is not aware of the cache.

Which of the following bits is not required to be maintained in a page table? (a) "T" bit (indicates that the translation is present in a TLB) (b) "A" bit (indicates whether the page has been recently accessed) (c) "M" bit (indicates whether the page has been modified after it was loaded to memory) (d) "R" bit (indicates that the page is resident in memory) (e) All of the above are maintained in a page table

a - The bit indicating whether a translation is present in a TLB is not required.

write code in C which will return the length of a string

a string is an array of characters (where pointer is used) int strlen(char *s) { char *p = s; while (*s++ != '\0'); return s-p-1; } NOTE 1: argument/variable s is local, so we can modify NOTE 2: pointer increment, dereference and comparison all in one. no statement in loop body, bc when null terminator is found (and there WILL always be one), it breaks and returns len. NOTE 3: pointer subtraction at return statement. NOTE 4: remember, assigning a pointer *p to s here assigns it to the address of the first character in s (s[0]) - *p used to keep track of starting point of string

The C statement int *** could be used to declare (a) a pointer to a 3-dimensional array of int (b) a pointer to a 4-dimensional array of int (c) a 3-dimensional array of pointer to int (d) a 4-dimensional array of pointer to int (e) This is an invalid declaration

a) a pointer to a 3D array of int

consider that you have a fully-associative TLB with 64 entries deployed in a system which supports all 3 page sizes (4KB, 2MB, 1GB). The TLB is capable of storing translations for each page size and the 64 TLB entries are divided into: - 4 entries for 1GB pages - 16 entries for 2MB pages - 44 entries for 4KB pages a) comment on when this support for multiple page sizes, more specifically the larger page sizes of 2MB and 1GB, would be beneficial for TLB performance. b) what potential problems could arise from this TLB configuration to support multiple page sizes?

a) large page sizes - 2MB and 1GB - are helpful when a program allocates large chunks of contiguous memory. e.g. if a program allocates 2MB of memory, 2MB page size support will allow a single translation entry for this chunk of memory. With only 4KB page size support, 512 translations would be required. This would improve TLB performance by improving the TLB hit rate. Support for multiple page sizes is good if the program allocates both small and large chunks of contiguous memory. b) if a program exclusively allocates small chunks of contiguous memory, only 4KB pages would be used. This would render the 4 1GB and 16 2MB TLB entries useless, and it would also reduce the usable TLB capacity. The TLB would, thus, capture fewer 4KB page translations than it would if the whole TLB were dedicated for 4KB entries, and would have a lower hit rate. The 1GB and 2MB cannot be er-purposed to store translations for 4KB pages. Different page sizes require different numbers of bits to store the translation (physical page number/tag). This may create challenges from an implementation perspective if we would like to make all entries available for translations regardless of page size, because of differences in tag size required for various page sizes. Alternatively, we could partition our TLB, so different entries can only store translations for a given page size, but that would waste TLB capacity if not all 3 page sizes are used by a given application.

which of the following is not an advantage of having an ISA? a) Simplifies the transition from CISC to RISC b) Hides hardware details from the programmer c) Enables multiple microarchitectures of the same ISA d) Defines set of allowed instructions e) all of the above are advantages of having an ISA

a) simplifies the transition from CISC to RISC - CISC and RISC are 2 different ISAs

you wish to add a new instruction jmt (jump to memory target), which reads the value stored in the specified memory address and then jumps to it. The instruction format is: jmt n(r1), where n is the offset in bytes and register r1 stores the memory address. Explain if any changes would be required to the existing data and control path of the mcp as shown in figure 1 (see desktop). Discuss the steps involved in executing the jmt instruction.

jmt has same format as lw instruction. HWV, rt field in the instruction is unused bc data (jump address) read from memory is put directly into the PC instead of a specified register. Mux connecting to the PC will get the value from the MDR as an extra input. The select signal/PCSource will be modified to select this input whenever a jmt instruction is encountered. First 4 cycles same as lw: 1) during the first cycle, we fetch the instruction register from the memory location indicated by current PC, and increment PC by 4. 2) during the 2nd cycle, we read the values in the register file registers addressed by bits 24-21 (rs) and bits 20-16 (rt) of the loaded instruction into registers A and B respectively. Also use ALU to speculatively compute a branch destination address (extra step). 3) LW is a memory instruction, so in the 3rd cycle, we form the effective address, using the ALU to add the A register contents to the sign-extended immediate value from the instruction register. 4) In the 4th cycle, access memory using the address computed in 3rd cycle. The loaded data is stored in the Memory Data Register (MDR). Load the destination register indicated by bits 20-16 of Instruction Register w value in MDR THEN 5) in the final cycle, the modified multiplexer selects the MDR output with PCSource = 11 and loads into PC register w PCWrite = 1

which instruction takes the most time to complete in a single-cycle processor? lw, beq or sw?

lw (4 steps) beq and st are 3 steps each

if there is something wrong with this code (C): int *a = 10; *a = 100; fix it [2 ways]

make the pointer point to something on the head, either through malloc() or by using an additional integer variable and assigning its address to a: 1) int *a = (int *) malloc(sizeof(int)); // int *a = malloc(sizeof(int)); or 2) int x; int *a = &x;

what is the decimal number represented by this single precision float? 1 10000001 00100000000000000000000

negative (from 1) (1.mantissa)x(2^(exponent-127)) exponent = 129 mantissa = 0.125 (0.001 = 1x2^-3) result: -1 x 1.125 x 2^2 = -4.5

compute the equivalent normalised binary number and the corresponding IEEE 754 32-bit (single precision) rep for 61_10

notes: no fractional part binary = 111101 = 1.11101x2^5 mantissa = 11101 exponent in excess-127 notation: 127+5 = 132_10 = 10000100_2 61 = 0 10000100 11101000... until 32 bits long

given a 4KB direct-mapped cache with 4-byte blocks and 32-bit addresses, how many tag, index and offset bits does the address decompose into?

offset: log_2(block_size) = log_2(4) = 2 bits index: log_2(num_blocks) = log_2(4KB/4B) = log_2(1K) = 10 bits (cs) tag = leftover = total_bits_in_address - offset - index = 32 - 2 - 10 = 20 bits

flip n clear

one set rest cleared

what is the point in interrupt masking?

other interrupts should not be visible to the processor when an interrupt is being serviced because the EPC and Cause registers of the first interrupt will be overwritten.

when initialised like so (in C): int a[10]; int *p; what does p = a; signify?

p points to a[0]. equivalent to p = &a[0].

when initialised like so (in C): int a[10]; int *p; p = a; what does p+1; point to? why?

p+1 points to a[1]. the compiler multiplies +1 with data type size

state the action taken by the OS in response to each of the following events. Assume a 32bit processor with virtual memory system. page fault, I/O stall, load accesses unsafe address

page fault - bring in the page from disk and install it i n memory, update page table, update TLB i/o stall - context-switch to another process load accesses unsafe address - kill the process

Explain how parity can be used to detect a bit error in a word of data. Can parity be used to detect multi-bit errors?

parity is a bit added to a word of data to ensure that the total number of 1s in the word is even or odd. parity can detect multi-bit errors, but only if the no. of errors is odd

write a function in MIPS assembly that will perform a copy of a block of given words from one memory location to another. the function input parameters are the initial (lowest) source address, the initial target address and the number of words to copy. part a: simpler version - ignore the case of overlapping blocks part b: don't ignore that case (use part a) part c: suppose we want to change the granularity of the memory copy function from words to bytes. How can the previous program be converted to do this efficiently?

part a: We assume $a0 is the source address, $a1 is the destination address, and $a2 is the length in words. sll $a2, $a2, 2 # multiply by 4 to convert length to bytes add $t1, $a0, $a2 # address of the end of the block loop: slt $t0, $a0, $t1 # t0 is 1 if there is still copying to be done beq $t0, $zero, done lw $t2, 0($a0) # read from source sw $t2, 0($a1) # write to destination addi $a0, $a0, 4 # increment source address addi $a1, $a1, 4 # increment destination address j loop done: part b: If the source address is lower than the target address, you can safely copy words from the highest address down to the lowest. If the source address is higher than the destination address, then it's safe to copy from the lowest address up to the highest. sll $a2, $a2, 2 # convert length to bytes slt $t0, $a0, $a1 # $t0 is 1 if source lower # than destination beq $t0, $zero, src_high # source lower, so copy from high address down add $t1, $a0, $a2 # point to last word +1 addi $t1, $t1, -4 # t1 is last word of source add $t2, $a1, $a2 addi $t2, $t2, -4 # t2 is last word of destination addi $t3, $a0, -4 # address to stop looping li $t4, -4 # will use t4 to increment. Negative since we're moving down j loop src_high: add $t1, $a0, $zero add $t2, $a1, $zero add $t3, $a0, $a2 # address to stop looping li $t4, 4 # positive increment in this case loop: beq $t3, $t1, end lw $t5, 0($t1) sw $t5, 0($t2) add $t1, $t1, $t4 add $t2, $t2, $t4 j loop end: part c: use the code above with the only change being that we use the load/store byte instructions instead of load word and store word instructions (lb and sb), AND increment the pointers by 1, not 4

compare the 3 main input/output methods: polling-based, interrupt-based, direct memory access. what are their characteristics and pros/cons

polling-based - user process calls the OS at regular intervals to check the status of the requested I/O operation; time-consuming, especially for large I/Os, inflexible, mostly found in embedded systems, not good for infrequent events (e.g. key click) interrupt-based - IO controller interrupts user process to signal an I/O event; used in general purpose systems; interrupt for every 512-byte secotr that needs to be written; not great for system performance if too frequent dma - most flexible; controller accesses memory bus directly and can independently transfer data from/to memory to/from disk; special registesr handle IO-specific information (e.g. memory address, disk address, transfer length); only one interrupt to signify end of transfer

given the following declaration: int **array; allocate a triangular array of n rows in C; then deallocate

row 0 should have 1 column, row 1 should have 2 columns, etc. each cell of a column should have an initial value equal to the current row (i.e. array[0][0] = 0, array[1][0] = array[1][1] = 1, etc.) int n; int **array; n = ...; int i, j; array = (int **) malloc(n*sizeof(int *)); // allocates as many pointers to integers as there are rows in the array for (i = 0; i < n; i++) { array[i] = (int *) malloc((i+1)*sizeof(int)); // allocates as many integers as there are columns in each row for (j = 0; j <= i; j++_ { array[i][j] = i; // or *(*(array+i)+j) } } deallocate in inverse allocation order: for (i = 0; i < n; i++) { free(array[i]); // de-allocate each row } // de-allocate the table free(array);

(lecture) discuss steps in executing r-type instructions in single-cycle datapath

rs (25:21) and rt (20:16) -> reg 1 and reg 2 reg 1 and reg 2 summed and result passed through mux to write or not to write pc = pc+4, move to next word

discuss the steps in executing a beq (taken) (ie branch taken == True) in a single cycle datapath (lecture)

rs + rt read into reg 1 and reg 2 address (15:0) sign extended and multiplied by 4 reg 1 and reg 2 subtracted - they are equal bc zero result then since branch is taken, no need to take pc+4 (if not, then move to next word @ pc+4), instead take newly calculated address (pc+4+shifted address)

what about nested methods?

see desktop file

design a synchronous Mod-6 counter that will be able to count up to 5 clock positive edges. When it reaches 5, it resets to 0 and it starts the whole process again. You must derive the logic functions and then use them to sketch out the logic circuit. You can use D flip-flops and any 1, 2 or 3 input gate you like.

see desktop/onenote tut4 solution

Given this MIPS assembly program, where: - integer variables x and y are assigned to registers $s0 and $s1 respectively - integer arrays A and B have their base addresses stored in $s2 and $s3 respectively Describe in simple terms what the program computes: sll $t0, $s0, 2 add $t0, $s2, $t0 sll $t1, $s1, 2 add $t1, $s3, $t1 lw $s0, 0($t0) addi $t2, $t0, 4 lw $t0, 0($t2) add $t0, $t0, $s0 sw $t0, 0($t1)

sll $t0, $s0, 2 # $t0 = x * 4 add $t0, $s2, $t0 # $t0 = &A[x], $t0 = address of A at index x sll $t1, $s1, 2 # $t1 = y * 4 add $t1, $s3, $t1 # $t1 = &B[y], $t1 = address of B at index y lw $s0, 0($t0) # x = A[x] addi $t2, $t0, 4 # $t2 = &A[x + 1], $t2 = address of A at index x+1 lw $t0, 0($t2) # $t0 = A[x + 1] add $t0, $t0, $s0 # $t0 = A[x] + A[x + 1] sw $t0, 0($t1) # B[y] = A[x] + A[x + 1]

what MIPS assembly code does the following piece of C code correspond to? if (s1 >= s2) s3 = s1+55; else s3 = s1+s2;

slt $t0, $s2, $s1 beq $t0, $zero, l1 addi $s3, $s1, 55 j l2 l1: add $s3, $s1, $s2 l2:

explain what a DMA controller is and how it works

stateful device that sits on the memory bus and can independently transfer data between memory and disk; it's set up by the processor for each transfer using memory-mapped registers inside the DMA controller

write a swap function in MIPS

swap: # inputs: $a0 - array base, $a1 - index # compute the address into the array sll $t0, $a1, 2 # reg $t0 = idx * 4 add $t0, $a0, $t0 # reg $t0 = v + (idx*4) # reg $t0 = v + (idx * 4); $t0 now holds the address of v[idx] # Load the 2 values to be swapped lw $t1, 0($t0) # reg $t0 = v[idx] lw $t2, 4($t0) # reg $t2 = v[idx+1] # Store the swapped values back to memory sw $t2, 0($t0) # v[idx] = $t2 sw $t1, 4($t0) # v[idx+1] = $t0

assume that the propagation delay in each gate in a 32-bit ripple carry adder is 100ps. how fast can a 32-bit addition be performed? assuming that the adder delay is the major limiting factor on the clock speed, how can we clock the processor?

the longest path is from carry in at bit 0 to sum 31. For the full adders at bit positions 0-31, the carry propagates through 2 gates: one AND and one OR (cout = ab + ac + bc). For the last full adder, the slowest path is to the sum output, not the carry output. The delay is 3 gates: inverter, AND, OR (s=¬a.¬b.c+¬a.b.¬c+a.¬b.¬c+a.b.c). So total 31 full-adders * 2 gates each + 3 gates for last adder = 65 gate delays * 100 ps = 6.5 ns if carry-in for bit 0 is always 0, an optimisation is to replace the bit 0 full adder with a half-adder (cout = ab, s = ¬a.b + a.¬b). In this case, the longest delay is 64 gate delays = 6.4ns in standard synchronous digital logic designs, the clock period must be longer than the max delay through the comb logic, in order to ensure that the clocked sequential elements following the combinational logic always capture the expected values. THEREFORE the max clock frequency is 1/6.5ns = 154MHz

what does *p++ mean?

use value pointed by p, then make p point to the next element

write a swap function for int *a and int *b, which are now outside of the function (stored in memory stack) in C

void swap(int *a, int *b) { int t = *a; *a = *b; *b = t; } [function call:] swap(&x, &y) orrrrrr (without pointers) void swap(int v[], int idx) { int temp = v[idx]; v[idx] = v[idx+1]; v[idx+1] = temp; }

what is memory-mapped i/o?

when data transfers to/from peripherals rather than actual memory devices

B -> KB -> MB -> GB -> TB -> PB

x 1000 each time


Set pelajaran terkait

The endoplasmic reticulum: RER protein synthesis

View Set

unit 1 checkpoint exam - wrong answers

View Set

GCH 332 - Chapter 4 (Sleep Physiology)

View Set

The Story of India - Chapter - Freedom

View Set

Stupid Smartbook Connect Orientation Assignment

View Set