CSC-231 final
what are the 32-bit values in $t1 and $t2 after using the following code? la $t0, val lb $t1, 0($t0) lbu $t2, 0($t0) val: .word 0xB330C49C
$t1: FFFF FF9C $t2: 0000 009C
How many bits are in a MIPS word?
32 bits
Mercury thermometers and hydraulic brakes are examples of
Analog computers
part of a computer that is the main processing unit
CPU
(T or F) All computers use a 32-bit word
False
the choice NOT to have a subi operation is best supported by which design principle?
Good design demands good compromise
Which is accurate concerning Moore's Law?
Refers to the number of transistors on a chip doubling about every 18 months
Amdahl's Law
The performance enhancement possible with a given improvement is limited by the amount that the improved feature is used.
(T or F) Compiler parsing comes before semantic analysis
True
Control hazards are improved by: (any)
Using a multi-bit prediction scheme, Using a branch history table
computer floating point representations suffer from which issues?
overflow, cannot represent all fractions(within range), cannot represent all whole numbers
if 2^30 is one billion then 2^31 is?
two gigabytes
according to DeMorgan's Law : rewrites !A or !B using an AND operation
~(A^B)
to optimize a CPU you should:
(Make a deeper pipeline add more cache add more functional units to CPU make the CPU clock faster) Compromise with all of the above
A clock frequency of 2 giga hertz yields a cycle time of :
0.5 ns
If you had an array w[] of widgets(that are 4 words each) - with a base address of 0x1000 0008 what is the proper final address of w[4]?
0x1000 0048
What is the actual memory address of str[3] for an string of characters, if str is at 0x10010003
0x10010006
What is the actual memory address of arr[1] for an array of words. if arr is at 0x1001000C.
0x10010010
What is the hex value for a positive infinity for IEEE single -precision numbers
0x7F800000
A clock frequency of 1 Giga hertz yields a cycle time of:
1 ns
what is larger? 1,000 KB 0.1 GB 1,000 MB or are they the same
1,000 MB
Which is faster: 1,000 ns 0.1 ms or they are the same
1,000 ns
which is faster: 100 ns 0.1 ms 10 mircoseconds or are they the same
100 ns
write 5.25 as a binary number(no normalizing)
101.01
What is the "bias" for IEEE double-precision floating point exponents?
1023
In IEEE double precision floating point representation, how many bits are given to the exponent?
11 bits
A binary number consisting of 7 bits will provide how many unique values?
128
A microprocessor that can address 8k words needs only this many address bits?
13
If you were using base 7, how would the decimal value 12 be written?
15
How many hex digits represent on byte?
2 digits
In IEEE single precision floating point representation, how many bits of precision are available in the significand?
24 bits
how many bits are in a byte?
8 bits
Underflow
A situation in which a negative exponent becomes too large to fit in the exponent field.
Which of the following are considered combinational datapath elements?
ALU, Multiplexor
Pipelines always perform best when:
All stages are balanced Instructions are regular sized
Charles Babbage and Ada Lovelace collaborated to design this early computing machine:
Analytic engine
BNF
Baukus-nourm formL which is a context-free-grammar. uses terminals and non-terminals to get particular strings.
A MIPS branch instruction can
Branch +_ 2^15 instructions
Branch prediction technique that tracks the (partial) address of multiple branches:
Branch history table
a processor architecture that uses a large, powerful instruction set
CISC
Which of these(pick all that apply) can be affected by the compiler?
CPI and Instruction count
Write the performance equation:
CPU time - Instructions count * CPI * Clock cycle time
bne rs, rt, off set amount means: if register rs != rt...
Change the PC such that PC = PC + 4 + (off set amount *4)
List one way that compilers can assist pipeline performance
Compilers can reorder instructions to optimize the time avoiding stalls, etc. that may have been in your regular code.
Define the acronym; CPI
Cycle per instruction; how long it takes in clock cycles to complete an instruction.
Describe specifics and all actions required for a memory "hit" scenario in a set associative cache:
Dont know
(T or F) 32-bit multiplication results in a 32-bit result
False
(T or F) Booth's algorithm is a technique for integer fdivision
False
(T or F) Overflow cannot occur in floating-point math operations
False
(T or F) Pipelining will usually improve instruction latency
False
(T or F) RISC processors are faster than CISC processors
False
(T or F) System clock speed is the most important factor in the performance equation
False
(T or F) Translated languages are typically faster than compiled languages
False
Which pipeline stage is most affected by variable length instructions(explain)
Fetch because :
Instruction latency
How long it take for an instruction to be completed
The _____ is the abstract interface between hardware and the lowest level of software encompassing everything needed to write a program that will run correctly. includes instructions, registers, memory access, I/O
ISA
Define the term instruction set architecture
Interface between hardware and lowest-level software.
He is credited with formalizing the major functional components of all computers: input-output-storage-processing
John Von Neumann
Beyond pipelines, modern CPUs are using techniques:
Multiple issues, out-of-order processing, Multiple core processors, speculation(prediction)
Making the multiply instruction a pipelined instruction is best done by making:
Multiply cannot easily be pipelined
Which operations belong to the compiler "Back End"?
Optimization, code generations
an add instruction uses which instruction format:
R format
The kind of computer memory where user programs and data reside is
RAM
Given $t0 holds the address of a string, write the single-line of MIPS code to store the byte value in $t5 as the third character of that string.
SB $t5, 2($t0)
For the MIPS integer division operation, explain how the results are obtained:
So the results are obtain through the LO and HI registers. you habe to fmlo $t0 to get the number and mfhi $t1 to get the remainder.
Larger cache blocks take advantage of:
Spatial locality
Context frame(or context block)
Stack mentality
The major advantage of direct mapping of a cache is its simplicity. the main disadvantage of this organization is that
The cache hit rate is degraded if two or more blocks alternately map onto the same block in the cache
delayed branching
The effect is to execute one or more instructions following the conditional branch before the branch is taken.
In multi-level caches, the closer to RAM:
The larger the access time
"load-use" data hazards can be fixed: (best)
Through forwarding and a stall
(T or F) A 'linker' is responsible for connecting your object code to pre-compiled library code
True
(T or F) A signed integer with the hex value 0x90983812 is a negative number?
True
(T or F) Exceptions are similar to interrupts but indicate some error condition
True
(T or F) Precision errors do not occur in integer math operations
True
(T or F) The 'heap' is computer memory that stores a process's dynamic memory needed
True
(T or F) The k0 and k1 registers are used by interrupt handlers
True
Describe the "write-back" scheme for a cache:
Uses a dirty bit to signify change, if there is change modify the next level but only on replacement.
wide vs narrow bus
Wide bus can take more word sin at a time, while a narrow bus takes less words
As machines move toward wider addressing(64-bit addressing an beyond), what necessary affect will this have on caches?
Will need store more overhead bits due to larger tags
if the binary 8-bit pattern 11010011 represents an unsigned number it is:
a large positive number
What best describes a computer variable?
a reference to a value in memory
move $s0, $t1 is an example pseudo-instructions processed by the MIPS assembler. write the actual MIPS instructions that is needed toe execute this?
add $s0, $t1, $zero
What is the difference between the addi and addiu instructions?
addi has registers with the sign addiu has registers without the sign addiu doesn't have overflow.
program that translates symbolic instructions to binary machine instructions
assembler
A single binary digit ( value 0 or 1)
bit
The _______ is the physical connection(wires) that connects the processor to memory.
bus
program that translates statements in high-level language to assembly language
compiler
ALU
does all mathematical calculations and makes all logical decisions
A computer used to run one fixed(i.e. very limited) application
embedded computer
Exceptions use this instruction to return to the point of exception:
eret
In our original, simple 5-stage pipeline, after which stage is the result of a bne instruction actually known?
execute
what are the main steps in a computer instruction cycle
fetch, decode, execute
In a pipelined datapath, ________ is the process of bypassing the normal data path to provide updated register values to later instructions that depend on the updated value.
forwarding
flushing a pipeline
getting rid of/ clearing a pipeline of data
2^30 bytes is a(n)
gigabyte
what is the average access time for memory equations?
ht + (mr * mp)
What pipelining stages does a jr instruction need to be active/productive?
if, id
What pipelining stages does a bne instruction need to be active/productive?
if, id, ex
What pipelining stages does a sub instruction need to be active/productive?
if, id, ex
Memory scheme that accesses 'wide' memory in parallel, yet transfers the data from the memory over a narrow bus. this is called ____ memory
inner leaves
Which of these(pick all that apply) are components of performance?
instruction count clock rate CPI
Interrupt
is when you are doing one thing, then click or move on another and it interrupts that last action
Discuss compiler optimizations: algebraic simplification
is where the compile use simpler instruction in the place of more complex and time consuming operations. An example would be shifting left once to multiply by 2.
Explain the exact bit format(all 32-bits) of the MIPS instruction beq $t1, $t2, label.
it has 6 in the op-code, 5 in rs, 5 in rt, and 16 in the address to get 32 bits.
1024 bytes is a(n)
kilobyte
_______ is the principle used in caches that data used recently will be used again soon.
locality
Write and example, and briefly explain the MIPS lw syntax and how it works
lw $t1 0($t0): loads the value at address held in ($s0 offset 0) into $t1
binary instructions that processor can understand
machine language
The inclusion of the lui operation is best supported by which design principle?
make the common case faster;
A compiler can optimize by " loop unrooling" This would help by:
minimizing control hazards
Extra time incurred because the memory value needed is not in the cache and must be fetched from slower memory is called _____
miss penalty
If you could only implement a pipelined CPU -or- a multi-level cache, which would you pick- based on performance gains?
multi-level cache; it much faster
microprocessors containing several processors in the same chip
multicore processors
datapath organization that allows for pipelining to occur _________
multicycle
having several instructions executing in parallel by duplicating the entire pipeline hardware_______
multiple issue
RAM access times are measured in:
nanoseconds
A 3-way set associative cache is
one in which each main memory word can be stored at any of 3 cache locations
A 4-way set associative cache is:
one in which each main memory word can be stored at any of 4 cache locations
Explain the exact instruction format (all 32-bits) of the MIPS instruction : addi $t1, $t2, -1
op-code is 6, rs is 5, rt is 5, and address is 16.
performs a variety of services and supervision functions on a computer
operating systems
Amdahl's Law (pipelining)
performance enhancements are limited by amount improved feature is used
The primary motivation for utilizing cache:_____
performance/Speed
what is word-size as it relates to general computing?
refers to the number of bits processed by a computer's CPU in one go
Draw the typical memory hierarchy:
register cacher RAM mem disk mem
Very limited number of very high-speed memory directly on the processor
registers
Extra bits used during floating-point calculations to improve rounding accuracy - list two of the three: ____
round, sticky, guard
RISC processors are unique for their:
simplicity of instructions
The MIPS stack and $sp register:
starts in high memory and grows down, useful for spilling registers
List two(different) components of modern computer architecture championed y John Von Nuemann
storage program input, output, process
Basic Block
straight-line code; no entrances, no exits
In a pipeline, when an instruction cannot be executed in parallel due to resource conflicts, this is called a _____ hazard
structural
In order to print "hello world" using syscall, what must be in register $a1?
the address of the 'h'
If 'D' is represented by ASCII as the decimal number 68, what is the decimal value for 'J'?
the decimal value is 74
the binary number system is used in computers because:
the hardware technology is binary in nature.
The MIPS jump instruction uses 6-bits for the op code and 26 for the address. since memory addresses needs 32-bits, how do we get 32 from 26?
the last 6 bits come from 2 for the word alignment and 4 from the $pc
Short-circuit boolean
the semantics of some Boolean operators in some programming languages in which the second argument is executed or evaluated only if the first argument does not suffice to determine the value of the expression:
Read and writes make up 35% of the instructions of a typical program. Therefore the maximum amount of improvement by improving the speed of read and write is:
unable to tell from this information
Why don't we use two's compliment for the exponent value in IEEE floating point?
when the negative exponent so close to zero it appears represent zero.
Normalized floating point number
when we force the integer part of its mantissa to be exactly 1 and allow its fraction part to be whatever we like. For example, if we were to take the number 13.25 , which is 1101.01 in binary, 1101 would be the integer part and 01 would be the fraction part.
requiring whole words to begin at addresses ending with 0x0, 0x4, 0x8, or 0xC
word alignment
what approximate performance improvement is most likely to be achieved by switching from a single -cycle to a multi-cycle t-stage pipelined implementation?
x3 better