C952 Final Set
LEGv8
32 registers - X0...X30, XZR - 64 bits wide memory locations - 2^62 memory doublewords {memory[0]...} <--
RAID 4
*Block-Interleaved* Parity, network appliance for large data sets that need to be accessed in sequential blocks
RAID 3
*bit-interleaving* parity, network appliance multimedia, scientific codes, lg data sets protection group parity over all disks, no blocking
sequential doubleword addresses
8 bytes apart, therefore a variable index needs to be multiplied by 8 before adding the variable to the address
coarse-grained multithreading
A version of hardware multithreading that implies switching between threads only *after significant events*, such as a last-level cache miss.
fine-grained multithreading
A version of hardware multithreading that suggests switching between threads *after every instruction*.
Simultaneous multithreading (SMT)
A version of multithreading that lowers the cost of multithreading by utilizing the resources needed for multiple issue, dynamically schedule microarchitecture.
2 ARM processors
A5 package has chip containing ____ ____ ____.
CPI (clock cycles per instruction)
Average number of clock cycles per instruction for a program or program fragment
very slow
Given the importance of registers, what is the rate of increase in the number of registers in a chip over time?
All 0s
How is 0 represented in two's complement?
/0
In C, the two-letter sequence that represents null used to check whether character is the last char in string
procedure
In C, this must restore appropriate values before returning.
type
In contrast to C, Java calls the appropriate method (procedure) based on the _______ of the calling object.
upper bound
In contrast to C, Java explicitly stores an extra item in every array indicating the array's ____ ______.
positive
In two's complement 0 is considered positive or negative? hint: leaves one less value available for the other _____s
11, 5, 6, 5, 5 = 32
LEGv8 field bits used L --> R = total opcode, rm (2nd reg), shamt, Rn (first reg), Rd: (destination)
RAID 1
Mirroring File blocks are duplicated between physical drives High disk space utilization High redundancy Minimum of 2 drives
63
Of a doubleword's 64 bits, what is the leftmost bit numbered?
RAID 6
P+Q redundancy, recently popular
weak scaling
Speed-up achieved on a multiprocessor *while* increasing the size of the problem *proportionally* to the increase in the number of *processors*. NOT BOUND BY AMDAHLs
TLB (Translation Lookaside Buffer)
a cache that keeps track of recently used address mappings to try to avoid an access to the page table each entry includes the physical page address, tag, and status bits fewer number of entries as a page table entry is replaced using a write-back shceme
state element
a memory element, such as a register or a memory datapath element that has internal storage aka sequential element
normalized scientific notation
a number in _________ _________ _________ does not have leading 0s
write-through
a scheme in which writes always update both the cache and the next lower level of the memory hierarchy, ensuring that data are always consistent between the two 1. a value is read from the cache and modified 2. modified value is written to the cache and corresponding mem loc
write-back
a scheme that handles writes by updating values only to the block in the cache, then writing the modified block to the lower level of the hierarchy *when the block is replaced*
Linker (link editor)
a systems program that combines independently assembled ML programs and resolves all undefined labels --> executable file combines the *pre-compiled* programs combines udef labels -> ex file
DRAM
access time 50ns most expensive / GB @ $5 capacity quadrupled / year until 1990s
multiply
all _____ instructions ignore overflow.
dont care term
an element of a logical function in which the output does not depend on the values of all the inputs
combinational element
an operational element, such as an AND gate or an ALU datapath element whose output values depend only on present input values
write buffer
a *queue* that holds data while the data are waiting to be written to memory
L1
a cache for a cache
Main Memory
a cache for disks virtual memory technique
L2
a cache for main memory
TLB
a cache for page table entries stores recent address mappings to avoid page table accesses
direct mapped cache
a cache structure in which each memory location is mapped to exactly one location in the cache
Implementation
hardware that obeys the architecture abstraction
2 chips
iPad2 consists of how many chips?
executable file
if executed after the loader fxnl program in format of an object file that contains no unresolved references may contain: symbol tables, debugging info ("stripped executable" does not have these)
one
in base two, multiplying by 2 is like shifting left _ place
thread
includes a PC, reg state, and stack light weight process share a single address space
dirty bit
is set when any word in a page (VM) is written and indicates if a page needs to be copied back when the page is replaced
MOVK
leaves the remaining bits unchanged uses 16-bit chunks, with LSL or LSR
accumulator architechture
single register architecture
strong scaling
speed-up achieved on a multiprocessor *without* increasing the size of the problem
adder
subword parallelism takes advantage of byte- and halfword-sized data bay partitioning this to perform multiple operations in parallel
PC-relative addressing mode
the branch address is [PC + constant in instruction]
protection group
the group of data disks or blocks that share a common check disk or block
Immediate Addressing mode
the operand is a constant within the instruction itself
register addressing mode
the operand is a register
performance
1/execution time
context switch
A changing of the internal state of the processor to allow a different process to use the processor that includes saving the state needed to return to the currently executing process.
OpenMP
API for shared memory multiprocessing in C, C++, or Fortran runs on UNIX/Microsoft includes: compiler directives, library, runtime directives
CBNZ
ARM if( i==0 ) f= g + h ----------------- ____ X3, Else ADD .... B Exit Else: SUB ....
32
ARMv8 and MIPS have __-bit instructions
Compiler
Affects IC, CPI IC: efficiency of it, translation speed CPI: varied
programming language
Affects IC, CPI in program performance IC: statements translated to processor instructions = IC deteminant CPI: features, indirect calls
algorithm
Affects IC, possibly CPI in program performance IC: determines number of source program instructions executed --> number of processor instructions executed CPI: favoring slower or faster instructions
Instruction Set Architecture
Affects: IC, Clock Rate, CPI
striping
Allocation of logically sequential blocks to separate disks to allow higher performance than a single disk can deliver.
CPU execution time
Also called CPU time. The actual time the CPU spends computing for a specific task. *seconds for the program*
LR (X30)
BL instruction stores return addresses in register __?
RAID 2
Bit-level striping with dedicated Hamming-code parity. OBSOLETE. error detection and correction code
transistors, conductors, and insulators
Blank wafers undergo 20 to 40 chemical processing steps to create (3)
Lisp
Created by John McCarthy is 1958, Lisp equates programming to list manipulation. Lisp is commonly used by people working in artificial intelligence research. Short for LISt Processing.
clock edge
DRAMs enable fast access to data by transferring bits in bursts. Successive bits are transferred on each _______.
RAID 5
Distributed block-interleaved parity, widely used
increases response time, decreases throughput
Replacing a processor in a computer with a faster processor has what effect? (2)
SSE
Streaming SIMD Extesnsions introduces in 1999 version 5 2007 = 170 instructions
User CPU Time
The CPU time spent in a program itself. aka CPU performance
System CPU Time
The CPU time spent in the operating system performing tasks on behalf of the program.
ingot
The chip manufacturing process starts with a silicon ______.
overflow
When positive and negative operands are added in binary addition ___ is impossible.
LDURB, LDURH
Which LEGv8 load instructions should be generated for byte and halfword arithmetic operations? Hint: These load instructions sign extend the most significant bits, which preserves the variable's value.
ABI (Application Binary Interface)
[the user portion of the instruction set + the OS interfaces] defines standard for binary portability across computers used by application programmers
Instruction
command that computer hardware understands and obeys
processor
control + datapath
code generator
converts the optimized intermediate representation into a processor's machine instructions the last compiler phase
Stack
data structure for spilling registers organized as a last-in-first-out queue region in memory SP points to the top
clocking methodology
defines when signals can be read and when they can be written; timing def: the approach used to determine when data is valid and stable relative to the clock
Fortran
developed for IBMs 704 computer in 1954 one of the first widely used programming languages
Cobol
developed for use in business settings employs English vocabulary and pucntuation
system performance
elapsed time is aka
secondary memory
memory layer used to store programs and data between runs *nonvolatile*
coherence
memory system behavior that defines *what* values can be returned by a read popular: snooping
consistency
memory system behavior that defines *when* written values will be returned by a read
energy
most critical resource of microprocessor design
execution time
most reliable method to evaluate performance
RAID 0
no redundancy, widely used
stack architechture
no registers architecture
base or displacement addressing mode
operand is at the memory location whose address is the sum of a register and a constant in the instruction
datapath
performs arithmetic computations where data is transformed via computations like addition or subtraction
random, LRU
primary replacement schemes (2) for fully-associative caches
LDURB, STURB
read byte from source, write byte to destination used for ACSII *remember: 64-bit = doubleword = 8 bytes
output
reads data from memory
edge-triggered
these state elements make simultaneous reading and writing both possible and unambiguous
advance to the next address
to increment a pointer to an integer (add 8 to the pointer address)
ELR, ESR
two of the special ARMv8 control registers to help with page faults, TLB misses, and exceptions.
GPR (General Purpose Register)
type of register found in ARMv8 architecture can be used for addresses or data with virtually any instruction
datapath element
unit used to operate on or hold data within a processor LEGv8 implementation: include (5) the instruction and data memories, reg rile, ALU, adders
primitives
used to solve data race issue
MIPS
uses comparison instructions and branches based on the results of the comparisons
Algol
was invented as a way to express algorithms more naturally than its predecessors
entire page
what a virtual memory system writes to disk
Input
writes data to memory
flash memory
writes to the same location in a __________ can wear out memory bits.
MOVZ
zeros the rest of the bits of the register uses 16-bit chunks, with LSL or LSR