Parallel Programming Exam 1

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

When running an MPI program with 10 processes that call MPI_Gather using the default communicator, how many processes will receive the data?

1

When running an MPI program with 10 processes that call MPI_Scatter using the default communicator where the source process scatters an array of 40 elements in total, how many elements does each destination process receive?

4

When running an MPI program with 10 processes that call MPI_Bcast using the default communicator where the source process sends an array of 5 elements, how many elements does each destination process receive?

5

When running an MPI program with 10 processes that call MPI_Reduce using the default communicator where each source process sends an array of 5 elements, how many elements does the destination process receive?

5

When running an MPI program with 10 processes that call MPI_Gather using the default communicator where each source process sends an array of 5 elements, how many elements does the destination process receive?

50

A mutex guarantees fairness whereas a lock does not.

False

A speedup below 1.0 implies a parallelism (i.e., correctness) bug.

False

Acquiring a lock by one thread before accessing a shared memory location prevents other threads from being able to access the same shared memory location, even if the other threads do not acquire a lock.

False

Assigning an equal number of array elements to each thread guarantees that there will be load balance.

False

Barriers and locks perform the same synchronization operation.

False

Data races involve at least two write operations.

False

Embarrassingly parallel code may include barriers.

False

Every parallel program requires explicit synchronization

False

Finding a maximal independent set is NP-hard.

False

GPUs are good at exploiting task parallelism

False

Graphs always have at least as many edges as they have vertices.

False

In all collective communication MPI functions discussed in class, all ranks are both senders and receivers.

False

In the call MPI_Reduce(a, b, c, MPI_DOUBLE, MPI_MAX, 2, MPI_COMM_WORLD), each process contributes 2 elements to the reduction.

False

In the call MPI_Reduce(a, b, c, MPI_DOUBLE, MPI_MAX, 2, MPI_COMM_WORLD), process b is the destination.

False

In the call MPI_Reduce(a, b, c, MPI_DOUBLE, MPI_MAX, 2, MPI_COMM_WORLD), the result is written into the "c" array.

False

In the compressed adjacency-list representation (CSR), one of the arrays has exactly as many elements as there are vertices in the graph.

False

Indeterminacy is a form of synchronization.

False

Indeterminacy is a guaranteed indication of a parallel programming bug.

False

It is possible to have a data race on a private variable (assuming no pointer variables).

False

It is possible to have a data race on a private variable.

False

MPI programs cannot suffer from indeterminacy.

False

MPI programs must be run with at least two processes.

False

MPI_Allgather performs one-to-many communication.

False

MPI_Allreduce performs one-to-many communication.

False

MPI_Gather performs one-to-many communication.

False

MPI_Recv may return before the message has actually been received.

False

MPI_Reduce implies a barrier.

False

MPI_Reduce performs one-to-many communication.

False

MPI_Send performs one-to-many communication.

False

MPI_Ssend may return before the receiver has started receiving the message.

False

Most programs can automatically be efficiently parallelized.

False

Mutexes and locks guarantee fairness.

False

Programs running on pure distributed-memory systems can suffer from data races.

False

Reductions always compute a sum.

False

Reductions are embarrassingly parallel.

False

Returning from an MPI_Reduce call by any process implies that the process receiving the reduced result has already reached its MPI_Reduce call.

False

Shared-memory systems require explicit communication to transfer data between threads.

False

Task parallelism tends to scale more than data parallelism.

False

The adjacency matrix of a graph typically requires O(n) memory, where n is the number of vertices.

False

The call MPI_Reduce(a, b, c, MPI_DOUBLE, MPI_MAX, 2, MPI_COMM_WORLD) performs a sum reduction.

False

The receive buffer size parameter in MPI_Recv calls specifies the exact length of the message to be received (in number of elements).

False

The serial fractal code from the projects contains a loop-carried data dependence in the innermost loop (that iterates over the columns).

False

The spanning tree of a graph is always unique.

False

Two adjacent vertices can be part of the same MIS.

False

When joining a thread, that thread is immediately terminated.

False

When protecting a critical section with a lock, the threads are guaranteed to enter the critical section in the order in which they first tried to acquire the lock.

False

After which of the following MPI calls does the sender know that the receiver has started receiving the message? Check all that apply.

MPI_SSend

Efficiency Formula

Speedup / # of threads

A cyclic distribution of the elements in an array is useful for load balancing when the amount of work per element decreases with increasing array indices.

True

A cyclic distribution of the elements in an array is useful for load balancing when the amount of work per element increases with increasing array indices.

True

A disconnected graph can contain cycles.

True

A lock is a synchronization primitive.

True

A maximal independent set (MIS) can contain all the vertices of a graph.

True

A path through a graph can contain more elements than the total number of vertices in the graph.

True

A single call to MPI_Reduce by each process suffices to reduce local histograms with many buckets into a global histogram.

True

All collective communication MPI functions we have discussed in the lectures imply some synchronization.

True

Data parallelism is a good match for SIMD machines.

True

Data races involve at least two threads.

True

Deadlock is a parallelism bug.

True

Distributed-memory systems generally scale to more CPU cores than shared-memory systems.

True

Embarrassingly parallel programs can suffer from load imbalance.

True

Frequency scaling was replaced by core scaling due to power density concerns.

True

If each process calls MPI_Reduce k > 1 times with the same destination, operator, and communicator, the reductions are matched based on the order in which they are called.

True

In MPI_Gather, every process must pass a parameter for the destination buffer, even processes that will not receive the result of the gather.

True

In MPI_Gather, rank 0 always contributes the first chunk of the result.

True

In the compressed adjacency-list representation (CSR), one of the arrays has exactly as many elements as there are edges in the graph.

True

In this course, we should always call MPI_Init with two NULL parameters.

True

Indeterminacy can result in garbled program output.

True

It is possible to construct a dense graph that contains only three edges.

True

Load imbalance is a situation where some threads/processes execute longer than others.

True

Luby's MIS algorithm is likely to finish in O(log n) parallel steps, where n is the number of vertices.

True

MPI_Aint denotes an integer type that is large enough to hold an address.

True

MPI_Allgather can be emulated with MPI_Gather followed by MPI_Bcast.

True

MPI_Allgather has a similar communication pattern (sends and receives of messages) as MPI_Allreduce.

True

MPI_Allgather implies a barrier.

True

MPI_Allreduce is typically faster than calling MPI_Reduce followed by MPI_Bcast.

True

MPI_Bcast performs one-to-many communication.

True

MPI_Gather may be non-blocking on more than one of the involved processes.

True

MPI_Reduce has a similar communication pattern (sends and receives of messages) as MPI_Gather.

True

MPI_Scatter performs one-to-many communication.

True

Matrix-vector multiplication is embarrassingly parallel.

True

Mutexes and locks are the same kind of synchronization primitive.

True

Parallelism can be used to speed up a computation or to reduce its energy consumption.

True

Programs with data races may produce indeterminate output.

True

Reduction operations can be implemented using a reduction tree.

True

Reduction operations tend to be more frequent in parallel programs than in serial programs.

True

SPMD typically includes task parallelism.

True

Sending one long message in MPI is typically more efficient than sending multiple short messages with the same total amount of data.

True

The MPI_Gather function concatenates the data from all involved processes by rank order.

True

The adjacency list of a vertex can be empty.

True

The busy-waiting code from the slides contains a data race.

True

The communication pattern (one-to-one, one-to-many, many-to-one, or many-to-many) of MPI_Scatter and MPI_Bcast are identical.

True

Undirected graphs always have a symmetric adjacency matrix.

True

When a thread attempts to acquire a lock that is already taken, it is blocked until it obtains the lock.

True

When parallelized, the Collatz code from the projects is likely to suffer from load imbalance.

True


Set pelajaran terkait

Study Guide Multiple Choice Test (A)

View Set

CSC 1123 MGCCC Final Exam chp 14-18

View Set

U.S Government Bells (Mr. Lanphier)

View Set

Particularități texte lirice studiate pe parcursul celor 4 ani de liceu- S III

View Set