CPSC 474 Midterm

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

The fraction of ____ __________ satisfied by the cache is called the _____ ___ _____ of the computation on the system

Data references, cache hit ratio

What are some tradeoffs of multithreading and prefetching? (3)

1) Bandwidth requirements of a multithreaded system may increase very significantly because of the smaller cache residency of each thread. 2) Multithreaded systems become bandwidth bound instead of latency bound. 3) Multithreading and prefetching also require significantly more hardware resources in the form of storage.

What two parameters capture memory system performance? (2)

1) Latency - time from issue of a memory request to the time the data is available at the processor 2) Bandwidth - rate at which data can be pumped to the processor by the memory system

What are some methods for minimizing interaction overheads? (4)

1) Maximize data locality 2) Minimize volume of data exchange 3) Minimize frequency of interactions 4) Minimize contention and hotspots

What are some pros of using OpenMP? (5)

1) No need for major changes in serial code 2) Portable 3) Scalability 4) Data decomposition handled automatically 5) Does not need to deal with message passing

What are two approaches for hiding memory latency? (2)

1) Prefetching 2) Multithreading

While there is no single recipe that works for all problems, we use a set of commonly used decomposition techniques that apply to broad classes of problems. Name any two decomposition techniques.

1) Recursive - divide and conquer 2) Speculative - when dependencies between tasks are not known beforehand

What are the four decomposition techniques? (4)

1) Recursive decomposition 2) Data decomposition (can be input, output, or both) 3) Exploratory decomposition 4) Speculative decomposition

What are some cons of using OpenMP (4)

1) Requires a compiler that supports OpenMP 2) Currently only runs efficiently on shared-memory multiprocessors architectures 3) Scalability limited by the computer's architecture 4) Can not be used on GPU

What are two examples of control structures for parallel programs? (2)

1) SIMD - Single Instruction Multiple Data 2) MIMD - Multiple Instruction Multiple Data

What are some relevant task characteristics? (3)

1) Task generation 2) Task sizes 3) Size of data associated with tasks.

Consider a processor operating at 1GHz (1ns clock) connected to a DRAM with a latency of 100ns (no caches). Assume that the processor has two multiply-add units and is capable of executing four instructions in each cycle of 1ns. The following observations follow: -Peak processor rating is 4GFLOPS -Every time memory request is made, processor must wait 100 cycles before processing data Consider the problem of computing a dot-product of two vectors. -A dot product computation performs one multiply add on a single pair of vector elements. Calculate the peak processor rating.

10MFLOPS

Go over diameter, bisection width, and cost of a Crossbar, Hypercube, and Omega Network

Crossbar -diameter: 1 -bisection width: p -cost: p^2 Hypercube -diameter: logp -bisection width: p/2 -cost: (plogp)/2 Omega Network -diameter: logp -bisection width: p/2 cost: p/2

What is the Data Communication Argument?

As the network evolves, the vision of the internet as one large computing platform has emerged (ie: SETI@home, Folding@home). With such large volumes of data to be analyzed, parallel techniques must be employed.

Why is it important to learn about master slave threads?

Before calculations are done, work has to be divided among the processors, to increase the performance one thread is in charge of distributing the work and eventually gathering the results, while the rest of the threads are used to do calculations.

The number of tasks that can be executed in parallel is the ______ __ ___________ of a composition

Degree of concurrency

Which of the following can be used to minimize interaction overheads? A) Maximize data locality B) Minimize volume of data exchange C) Minimize frequency of interactions D) Minimize contention and hotspots E) All of the above

E

Memory layouts and organizing computation appropriately cannot make a significant impact on the spatial and temporal locality (T/F)

False. It CAN make a significant impact.

Memory bandwidth cannot be improved by increasing the size of memory blocks (T/F)

False. Memory bandwidth CAN be improved by increasing the size of memory blocks

A ______ decomposition combines more than one of the mentioned decomposition techniques.

Hybrid

What is the difference between Message Passing and Shared Address Space Platforms?

Message passing requires little hardware support, other than a network. Shared address space platforms can easily emulate message passing. The opposite is more difficult to do.

What is the computational power argument?

Moore's Law, which states that circuit complexity doubles every year. Later revised to every eighteen months.

The physical complexity of an Ideal Parallel computer is _____, which is (realizable/unrealizable)

O(mp) where m is the number of words and p is the number of processors, and unrealizable

KNOW Assignments 1 and 2

Ok

Know commands for pragmas, reductions, private, etc. in OMP

Ok

Know slides 88 and 89

Ok

Know various outputs of HelloWorld.c in OMP

Ok

What is pipelining?

Overlapping various stages of instruction execution to achieve performance.

The _____ ________ ____ generally states that the process assigned to a particular data item is responsible for all computation associated with it

Owner Computes Rule

As microprocessor clock speeds improve, question arises on how best to utilize. Current processors use these resources in multiple functional units and execute multiple instructions in the same cycle. This is done by __________ and ___________ _________

Pipelining, superscalar execution

In OpenMP all the threads can access ______ ______ ____. _______ ______ ____ can only be accessed by the thread that owns it.

Shared memory data, private memory data

What is the difference between Shared-Address-Space and Shared Memory Machines?

Shared-Address-Space is a programming abstraction and Shared Memory Machines are a physical machine attribute

Task sizes can be ______ or _______

Static, dynamic

Task interactions may be ______ or _______, as well as _______ or _________

Static, dynamic, regular, irregular

A directed graph with nodes corresponding to tasks and edges indicating that the result of one task is required for processing is called a ____ __________ _____

Task dependency graph

What is the critical path length?

The longest path that determines the shortest time in which the program can be executed in parallel

What is critical path length?

The longest path which determines the shortest amount of time in which a program can be executed in parallel

What is the maximum degree of concurrency?

The number of tasks that can be executed in parallel

Exploiting spatial and temporal locality in applications is critical for amortizing memory latency and increasing effective memory bandwidth (T/F)

True

It is better to use group communications instead of point to point primitives (T/F)

True

Memory bandwidth is determined by the bandwidth of the memory bus as well as the memory units (T/F)

True

The ratio of the number of operations to number of memory accesses is a good indicator of anticipated tolerance to memory bandwidth (T/F)

True

In general, the number of tasks in a decomposition exceeds the number of processing elements available. (T/F)

True.

Task sizes can be _______ or __________

Uniform, nonuniform

What is a fork-and-join model?

When a program starts to run, it consists of just one thread. The master thread then uses OpenMP directives which initialize executing the Parallel Section. When all the threads finish executing their instructions, the results are synchronized and the program can continue through the completion.

What is race condition?

When multiple threads change the value of a variable at the same time.

What is the memory/disk speed argument?

While clock-rate of processors have increased largely, DRAM access times have improved much slower, which causes bottlenecks in performance. Using parallel platforms provide increased bandwidth to the memory system.

Review slide 36

ok

Know slide 21

okay

Mapping techniques may be ______ or _______

static, dynamic


Set pelajaran terkait

UNIT: BRIGHT ROMANTICISM: AMERICAN INDIVIDUALISM

View Set

IME Quiz: Compressors, Purifiers, and OWS's

View Set

Cinema History Take 2 (Mod 7 through)

View Set

Ch1: Natural Resources: Types and Development

View Set