CMSC 411 Final

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Calculate Binary Floating Point Representation by Hand

Given 3.625, Sign bit is zero because the number is positive 3 in binary is 0011 0.625 in binary is achieved by: 0.625 x 2 = 1.250 -> 1 (Subtract the 1 from the answer) 0.250 x 2 = 0.500 -> 0 0.500 x 2 = 1.00 -> 1 (Subtract the 1 from the answer) 0 -> stop So 0.625 in binary is 0.101 3.625 in Binary is 0011.101 -> PAD WITH ZEROS (16 for 32 bit and 45 for 64 bit in this case)-> 00000000000000000011.101 Normalize this -> 1.101 x (10^-1) Exponent is offset by 127 for 32 bit, and 1023 for 64 bit 32 bit: 127 -1 = 126 -> 01111110 64 bit: 1023 -1 = 1022 -> 0000001111111110 So the Single Precision is 0011111100000000000000000011101!

MTTF

Mean Time To Failure Improved by Fault Avoidance, Fault Tolerance, Fault Forecasting

Two's Compliment

find the one's compliment and add 1 -Ex: 101010101 becomes 01010110

Loader

loads programs and libraries

Binary Division

long division -answer will be same bit length as divisor 1001010/1000 = 1001

Binary Multiplication

long multiplication -Ex: 1000 X 1001 ------------------- 1000 0000 0000 1000 ------------------------------------ 01001000

Direct mapped

(1-way associative) • One choice for placement (Block address modulo number of blocks)

MTBF

(Mean Time Between Failures) - is the predicted elapsed time between inherent failures of a system during operation.

MTTR

(Mean Time to Repair) - average time required to repair a failed component or device.

NUMA

(Non-Uniform Memory Access) - each physical processor has its own memory controller and, in turn, its own bank of memory

UMA

(Uniform Memory Access) - all of the processors in the system send their requests to a single memory controller

Cache Placement

Determined by associativity

MIPS v. MFLOPS

*Both are bad measurements of performance MIPS (Million Instructions per Second) Most popular performance metric Bad because it doesn't account for: Difference in ISAs between computers Differences in complexity between instructions MIPS = Clock Rate / CPI x (10^6) MFLOPS (Million Floating Point Operations per Second) More fair than MIPS MFLOPS is based on actual operations, not instructions Bad because: Set of FP operations is not the same across machines Rating changes not just based on FP and non-FP operations but also based on fast to slow FP Operations

Binary Logical Operators

*Know what they look like* INV (Invert) -> if x = 1, then x = 0 and vice versa OR -> output is 1 if at least one input is one AND -> output is 1 if all inputs are 1 XOR (Exclusive or) -> output is 1 if only x or only y is 1 NOR (Not or) -> output is 1 only if neither inputs are 1 NAND (Not and) -> inverted output of an "AND" XNOR (Exclusive not or) -> inverted output of an "XOR" MUX- chooses one output for every two inputs

Pipeline Operation

*Pipelining is a form of computer organization in which successive steps of an instruction sequence are executed in turn by a sequence of modules able to operate concurrently, so that another instruction can be begun before the previous one is finished* IF (Instruction Fetch) - fetches instructions from memory ID (Instruction Decode) - uses registers to decode instructions EX (Execute) - executes operation or calculates address MEM (Memory) - Access memory operand WB (Write Back) - writes data back to register

The Mill

- Operations execute in program order - The compiler controls when ops issue and retire - Short pipeline - Is not yet silicon. - No rename registers to eliminate hazards - Has no general registers since transient data lives on the Belt which is a FIFO

Servers

-Design geared toward reliability Ex: HP Itanium

Desktops

-Device design is driven by cost Ex: Mac Pro

Embedded Computers

-Example: DSP -Cheap little mini-computers that do one task (driven by cost and unique application)

Custom

-Example: GPU -Generally geared towards one task

Supercomputers

-raw computation power -price is not a concern Ex: Watson

RAID Level

0 - no redundant arrays 1 - fault-tolerance configuration known as "disk mirroring." With RAID 1, data is copied seamlessly and simultaneously, from one disk to another, creating a replica, or mirror. If one disk gets fried, the other can keep working. 5 - most common RAID configuration; data and parity (which is additional data used for recovery) are striped across three or more disks. If a disk gets an error or starts to fail, data is recreated from this distributed data and parity block— seamlessly and automatically.

Binary Addition

Add numbers, shift carries -Ex: 7 + 6 000111 + 000110 -----------> 000111 (7) 000110 (6) ---------------- 001101 -----> (13)

Binary Subtraction

Add the Two's Compliment of the second operand -Ex: 7 - 6 = 7 + (-6) 00000111 + 11111010 --------------> 00000111 (7) 11111010 (-6) ---------------------- 00000001 -------> (1)

Fully associative

Any location

ALU

Arithmetic Logic Unit -> Does arithmetic and logic

GFLOPS

Billion Floating Point Operations per Second

Who Has the Most Powerful Computer:

CHINA

Valid Flag

Cache is loaded with data

CPU

Central Processing Unit (Has an ALU, Memory, Registers, and Program Counter)

Memory Misses (3 Cs)

Compulsory or Cold-start: First to a block is not in the cache, so the block must be brought into the cache Capacity: Cache cannot contain all the blocks needed during execution of a program Conflict or Collision: Competition for entries in a set; doesn't occur in a fully associative

DMA

Direct Memory Access a method that allows an input/output (I/O) device to send or receive data directly to or from the main memory, bypassing the CPU to speed up memory operations.

DRAM

Dynamic RAM - stores each bit of data in a separate capacitor within an integrated circuit.

GPU

Graphics Processor Unit - these are processors that are optimized for 2D and 3D graphics, video, visual computing, and display. They allow these tasks to be separated from the CPU and are designed specifically to perform these tasks. GPUs have a highly parallel structure that is extremely efficient at manipulating computer graphics. Examples of GPUs are add-on cards made by NVidia and AMD

Study CPU, Clock Cycles, Clock Time and CPI

I can't import pics because I'm not paying for this shit

Carry Look Ahead

Improves speed by reducing the time it takes to determine carry bits. Calculates one or more carry bits before the sum, which reduces the wait time.

Unsupervised Training:

In unsupervised training, the network is provided with inputs but not with desired outputs. The system itself must then decide what features it will use to group the input data. Example architectures: Kohonen, ART

Caches Hierarchy

L1 - smaller than a single cache, block size less than L-2 Cache L2 - focus on low miss rate to avoid memory access L3 L4 *All L-Caches are faster than RAM* SRAM - fastest memory access speeds. Smaller, lower power, more expensive DRAM - stored as capacitor charge, and must refreshed periodically DISK - slowest memory access speeds Ideal - speed of SRAM, cost of DISK

LEGv8 Structure:

LEGv8 is a 32 bit machine, but all instructions are 32-bits, Registers, Datapath, PC & Memory Length are all 32 bits Has 32 Registers *ARM'S datapath and memory are 64 bits, ARM is a 64 bit system

Amdahl's Law:

Law of Diminishing Returns Improving an aspect of a computer does not necessarily give a proportional improvement to overall performance

LRU Replacement

Least Recently Used replace the data in the cache that has gone unused the longest. Too hard beyond 4-way set associative.

LSB

Least Significant Bit - is the bit position in a binary integer giving the units value, that is, determining whether the number is even or odd. Has the lowest value

Translation Look-aside Buffer (TLB)

Lists the physical address page number associated with each virtual address page number

Availability

MTTF / (MTTF + MTTR)

MMU

Memory Management Unit - is a computer hardware unit having all memory references passed through itself, primarily performing the translation of virtual memory addresses to physical addresses

MSB

Most Significant Bit - binary number with the highest numerical value

MIMD

Multiple Instruction streams, Multiple Data streams. This is a multiprocessor where multiple instructions are applied to many data streams. Intel Xeon e5345 is an example.

MISD

Multiple instruction, single data. This is a type of parallel computing architecture where many functional units perform different operations on the same data. Pipeline architectures belong to this category. Not many instances of this architecture exist, but one prominent example of MISD is the Space Shuttle flight control computers.

North Bridge vs South Bridge

NB - connected directly to the CPU via the front-side bus (FSB) and is thus responsible for tasks that require the highest performance. SB - typically implements the slower capabilities of the motherboard in a northbridge/southbridge chipset computer architecture

Who Has the Largest Server Farm

NSA *Data is more valuable than power*

PC

Program Counter, increments each time a new instruction is given

PLA

Programmable Logic Array - used to implement combinational logic circuits

Instruction Formats:

R: opcode|rm|shamt|rn|rd -> Arithmetic Format Instructions; (arithmetic, load or store) I: opcode|immediate|rn|rd -> Immediate Format Instruction (doesn't have load or store) D: opcode|address|op2|rn|rt -> Data Address Format Instructions (gets data out of memory or puts it into memory) B: opcode|address -> Unconditional Branch (branch to another location) CB: opcode|address|rt -> Conditional Branch (branch to another location if a condition is met)

RISC v. CISC

RISC (Reduced Instruction Set Computing) - simpler instructions to provide better performance -MIPS, ARM CISC (Complex Instruction Set Computing) -can execute several low-level operations (such as a load from memory, an arithmetic operation, and a memory store) or are capable of multi-step operations or addressing modes within single instructions. -Intel

RAM

Random Access Memory - is a form of computer data storage which stores frequently used program instructions to increase the general speed of a system. A random-access memory device allows data items to be read or written in almost the same amount of time irrespective of the physical location of data inside the memory.

SIMD

Single Instruction Multiple Data (GPUs are an example) -has multiple instances of data on which it performs same operation

SIMD

Single Instruction stream, Multiple Data streams. This is a multiprocessor, where the same instruction is applied to many data streams, as in a vector processor or array processor. An example of this type of implementation is the SSE instructions of the x86.

SISD

Single Instruction stream, Single Data stream. This is a conventional uniprocessor where a single processor executes a single instruction stream to operate on data stored in a single memory. An example of this type of processor is the Intel Pentium 4.

Binary Floating Point Representation

Single Precision (32 BIT) Sign (1 bit) Exponent (8 bits) Mantissa (23 bits) Double Precision (64 BIT) Sign (1 bit) Exponent (11 bits) Mantissa (52 bits)

Layers of Code (Abstraction)

Software Application Layer ex. Matlab, Web Browsers, Office Products Programming Language ex. C, C++, Fortran Assembly Binary Microcode Nanocode

SRAM

Static RAM - a type of semiconductor memory that uses bistable latching circuitry (flip-flop) to store each bit.

Strong Scaling vs. Weak Scaling:

Strong -time varies with # of processors for fixed total problem size Weak -time varies with # of processors for fixed problem size/processor

Pipeline Hazards

Structural Hazard -A required resource is busy -Solved by Stalling and Punching the Sys. Engineer Data Hazard -need to wait for previous instruction to complete data read/write -Solved by bubbles, data forwarding and stalls Control Hazard -deciding on a control action depends on previous instruction -Solved by branch prediction and stalls

Types of Computers

Supercomputers Servers Embedded Computers Desktops Custom

Moore's Law

The number of transistors on a circuit board doubles every 18 months to 2 years

TFLOPS

Trillion Floating Point Operations per Second

Pipeline

The stages of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel.

Quantum Computers

Use quantum bits, or qbits, but instead of 1's and 0's can be 1, 0, or both; called superposition.

VHDL

VHSIC Hardware Description Language a hardware description language used to design programmable gate arrays and integrated circuits

Flags

Valid Flag - Cache loaded with valid data Dirty Flag - Cache changed since it was read from main memory Reference Flag - LRU bits

VHSIC

Very High Speed Integrated Circuit

VLIW

Very Long Instruction Word - a VLIW processor allows programs to explicitly specify instructions to execute at the same time, concurrently, in parallel

VM

Virtual Memory- memory that appears to exist in main storage although most of it is supported by data held in secondary storage

Assembler

a program for converting instructions written in low-level symbolic code into machine code.

Linker

a program used with a compiler or assembler to provide links to the libraries needed for an executable program.

Microcode

a very low-level instruction set that is stored permanently in a computer or peripheral controller and controls the operation of the device.

Tag

block address of data in a cache

Supervised Training:

both the inputs and the outputs are provided. The network then processes the inputs and compares its resulting outputs against the desired outputs. Errors are then propagated back through the system, causing the system to adjust the weights which control the network. This process occurs over and over as the weights are continually tweaked. Example architectures: Multilayer perceptrons

One's Compliment

change all 1's to 0's and all 0's to 1's -Ex: 10101010 becomes 01010101

Index

determines which cache set the address should reside in

Page Faults

is a type of interrupt, called trap, raised by computer hardware when a running program accesses a memory page that is mapped into the virtual address space, but not actually loaded into main memory.

Simultaneous

multiple instructions are executed during a single clock cycle. The goal of SMT is to use the resources of a multiple-issue, dynamically scheduled processor to exploit thread-level parallelism at the same time it exploits instruction- level parallelism. It is the best form, but only applicable to superscalar CPUs

Course Grained

multithreading, which is an alternative to fine-grained multithreading. Coarse-grained multithreading switches threads only when a costly stall is encountered. The advantage of this multithreading scheme is that individual threads are not slowed down since instructions from other threads will only be executed if the current thread encounters a costly stall.

Set associative

n choices within a set (Block address modulo number of sets in cache)

Higher associativity

reduces miss rate - Increases complexity, cost, and access time

Page Table

the data structure used by a virtual memory system in a computer operating system to store the mapping between virtual addresses and physical addresses

Fine Grained

the processor switches between threads on each instruction, resulting in multiple threads being interleaved. A major advantage of fine-grained multithreading is that throughput losses from both short and long stalls can be hidden since instructions from other threads can be executed when a thread stalls

Faster Multiplication

use multiple adders -cost/performance tradeoff -sacrifice speed for area -can be pipelined


Kaugnay na mga set ng pag-aaral

HIST 170 New World Encounters: Preconquest-1608

View Set

MEEEN 360 Ch 10: Solid Solutions and Phase Equilibrium

View Set

Articles of Confederation and The Framing of the Constitution

View Set

Health Assessment- Final PrepU Questions

View Set

Chapter 38: Agents to control Blood Glucose

View Set