CPSC 474 Midterm
The fraction of ____ __________ satisfied by the cache is called the _____ ___ _____ of the computation on the system
Data references, cache hit ratio
What are some tradeoffs of multithreading and prefetching? (3)
1) Bandwidth requirements of a multithreaded system may increase very significantly because of the smaller cache residency of each thread. 2) Multithreaded systems become bandwidth bound instead of latency bound. 3) Multithreading and prefetching also require significantly more hardware resources in the form of storage.
What two parameters capture memory system performance? (2)
1) Latency - time from issue of a memory request to the time the data is available at the processor 2) Bandwidth - rate at which data can be pumped to the processor by the memory system
What are some methods for minimizing interaction overheads? (4)
1) Maximize data locality 2) Minimize volume of data exchange 3) Minimize frequency of interactions 4) Minimize contention and hotspots
What are some pros of using OpenMP? (5)
1) No need for major changes in serial code 2) Portable 3) Scalability 4) Data decomposition handled automatically 5) Does not need to deal with message passing
What are two approaches for hiding memory latency? (2)
1) Prefetching 2) Multithreading
While there is no single recipe that works for all problems, we use a set of commonly used decomposition techniques that apply to broad classes of problems. Name any two decomposition techniques.
1) Recursive - divide and conquer 2) Speculative - when dependencies between tasks are not known beforehand
What are the four decomposition techniques? (4)
1) Recursive decomposition 2) Data decomposition (can be input, output, or both) 3) Exploratory decomposition 4) Speculative decomposition
What are some cons of using OpenMP (4)
1) Requires a compiler that supports OpenMP 2) Currently only runs efficiently on shared-memory multiprocessors architectures 3) Scalability limited by the computer's architecture 4) Can not be used on GPU
What are two examples of control structures for parallel programs? (2)
1) SIMD - Single Instruction Multiple Data 2) MIMD - Multiple Instruction Multiple Data
What are some relevant task characteristics? (3)
1) Task generation 2) Task sizes 3) Size of data associated with tasks.
Consider a processor operating at 1GHz (1ns clock) connected to a DRAM with a latency of 100ns (no caches). Assume that the processor has two multiply-add units and is capable of executing four instructions in each cycle of 1ns. The following observations follow: -Peak processor rating is 4GFLOPS -Every time memory request is made, processor must wait 100 cycles before processing data Consider the problem of computing a dot-product of two vectors. -A dot product computation performs one multiply add on a single pair of vector elements. Calculate the peak processor rating.
10MFLOPS
Go over diameter, bisection width, and cost of a Crossbar, Hypercube, and Omega Network
Crossbar -diameter: 1 -bisection width: p -cost: p^2 Hypercube -diameter: logp -bisection width: p/2 -cost: (plogp)/2 Omega Network -diameter: logp -bisection width: p/2 cost: p/2
What is the Data Communication Argument?
As the network evolves, the vision of the internet as one large computing platform has emerged (ie: SETI@home, Folding@home). With such large volumes of data to be analyzed, parallel techniques must be employed.
Why is it important to learn about master slave threads?
Before calculations are done, work has to be divided among the processors, to increase the performance one thread is in charge of distributing the work and eventually gathering the results, while the rest of the threads are used to do calculations.
The number of tasks that can be executed in parallel is the ______ __ ___________ of a composition
Degree of concurrency
Which of the following can be used to minimize interaction overheads? A) Maximize data locality B) Minimize volume of data exchange C) Minimize frequency of interactions D) Minimize contention and hotspots E) All of the above
E
Memory layouts and organizing computation appropriately cannot make a significant impact on the spatial and temporal locality (T/F)
False. It CAN make a significant impact.
Memory bandwidth cannot be improved by increasing the size of memory blocks (T/F)
False. Memory bandwidth CAN be improved by increasing the size of memory blocks
A ______ decomposition combines more than one of the mentioned decomposition techniques.
Hybrid
What is the difference between Message Passing and Shared Address Space Platforms?
Message passing requires little hardware support, other than a network. Shared address space platforms can easily emulate message passing. The opposite is more difficult to do.
What is the computational power argument?
Moore's Law, which states that circuit complexity doubles every year. Later revised to every eighteen months.
The physical complexity of an Ideal Parallel computer is _____, which is (realizable/unrealizable)
O(mp) where m is the number of words and p is the number of processors, and unrealizable
KNOW Assignments 1 and 2
Ok
Know commands for pragmas, reductions, private, etc. in OMP
Ok
Know slides 88 and 89
Ok
Know various outputs of HelloWorld.c in OMP
Ok
What is pipelining?
Overlapping various stages of instruction execution to achieve performance.
The _____ ________ ____ generally states that the process assigned to a particular data item is responsible for all computation associated with it
Owner Computes Rule
As microprocessor clock speeds improve, question arises on how best to utilize. Current processors use these resources in multiple functional units and execute multiple instructions in the same cycle. This is done by __________ and ___________ _________
Pipelining, superscalar execution
In OpenMP all the threads can access ______ ______ ____. _______ ______ ____ can only be accessed by the thread that owns it.
Shared memory data, private memory data
What is the difference between Shared-Address-Space and Shared Memory Machines?
Shared-Address-Space is a programming abstraction and Shared Memory Machines are a physical machine attribute
Task sizes can be ______ or _______
Static, dynamic
Task interactions may be ______ or _______, as well as _______ or _________
Static, dynamic, regular, irregular
A directed graph with nodes corresponding to tasks and edges indicating that the result of one task is required for processing is called a ____ __________ _____
Task dependency graph
What is the critical path length?
The longest path that determines the shortest time in which the program can be executed in parallel
What is critical path length?
The longest path which determines the shortest amount of time in which a program can be executed in parallel
What is the maximum degree of concurrency?
The number of tasks that can be executed in parallel
Exploiting spatial and temporal locality in applications is critical for amortizing memory latency and increasing effective memory bandwidth (T/F)
True
It is better to use group communications instead of point to point primitives (T/F)
True
Memory bandwidth is determined by the bandwidth of the memory bus as well as the memory units (T/F)
True
The ratio of the number of operations to number of memory accesses is a good indicator of anticipated tolerance to memory bandwidth (T/F)
True
In general, the number of tasks in a decomposition exceeds the number of processing elements available. (T/F)
True.
Task sizes can be _______ or __________
Uniform, nonuniform
What is a fork-and-join model?
When a program starts to run, it consists of just one thread. The master thread then uses OpenMP directives which initialize executing the Parallel Section. When all the threads finish executing their instructions, the results are synchronized and the program can continue through the completion.
What is race condition?
When multiple threads change the value of a variable at the same time.
What is the memory/disk speed argument?
While clock-rate of processors have increased largely, DRAM access times have improved much slower, which causes bottlenecks in performance. Using parallel platforms provide increased bandwidth to the memory system.
Review slide 36
ok
Know slide 21
okay
Mapping techniques may be ______ or _______
static, dynamic