CS 4380 Midterm

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

In an MPI program with 12 processes, what is the smallest rank that any process will have?

0

When running an MPI program with 12 processes that call MPI_Gather using the default communicator, how many processes will receive the data?

1

Given a parallel runtime of 20s on 5 threads and a serial runtime of 50s, what is the runtime in seconds on 10 threads assuming the same efficiency (do not include any units in the answer)?

10

In an MPI program with 12 processes, what is the largest rank that any process will have?

11

When running an MPI program with 12 processes that call MPI_Scatter using the default communicator where the source process scatters an array of 24 elements in total, how many elements does each destination process receive?

12

When running an MPI program with 12 processes that call MPI_Scatter using the default communicator, how many processes will receive a chunk of the data?

12

Assuming a parallel runtime of 20s on 5 threads and a serial runtime of 50s, and a fixed overhead, what is the expected runtime in seconds with 10 threads running on 10 cores (do not include any units in the answer)?

15

Given a parallel runtime of 20s on 5 threads and a serial runtime of 50s, what is the speedup?

2.5

When running an MPI program with 12 processes that call MPI_Bcast using the default communicator where the source process sends an array of 4 elements, how many elements does each destination process receive?

4

When running an MPI program with 12 processes that call MPI_Reduce using the default communicator where each source process sends an array of 4 elements, how many elements does the destination process receive?

4

When running an MPI program with 12 processes that call MPI_Gather using the default communicator where each source process sends an array of 4 elements, how many elements does the destination process receive?

48

According to Amdahl's law, what is the upper bound on the achievable speedup when 20% of the code is not parallelized?

5

What is the speedup when 10% of the code is not parallelized and the rest of the code is perfectly parallelized (i.e., achieves linear speedup) and executed on 9 cores?

5

Given a parallel runtime of 20s on 5 threads and a serial runtime of 50s, what is the efficiency in percent (use a whole number and do not include the "%" symbol in the answer)?

50

Given a parallel runtime of 24s on 20 threads and a serial runtime of 100s, what is the percentage of the computation that was parallelized assuming the parallel section is perfectly parallelized (use a whole number and do not include the "%" symbol in the answer)?

80

A mutex guarantees fairness whereas a lock does not.

False

A speedup below 1.0 implies a parallelism bug.

False

Acquiring a lock by one thread before accessing a shared memory location prevents other threads from being able to access the same shared memory location, even if the other threads do not acquire a lock.

False

All reductions compute a sum.

False

Assigning the same number of array elements to each thread guarantees that there will be no load imbalance.

False

Data races involve at least two write operations.

False

Embarrassingly parallel code may contain mutexes.

False

Every parallel program requires explicit synchronization.

False

Graphs always have at least as many edges as they have vertices.

False

In the call MPI_Reduce(a, z, n, MPI_DOUBLE, MPI_MIN, 4, MPI_COMM_WORLD), each process contributes 4 elements to the reduction.

False

In the call MPI_Reduce(a, z, n, MPI_DOUBLE, MPI_MIN, 4, MPI_COMM_WORLD), process n is the destination.

False

In the call MPI_Reduce(a, z, n, MPI_DOUBLE, MPI_MIN, 4, MPI_COMM_WORLD), the result is written into the "a" array.

False

Indeterminacy is a form of synchronization.

False

Indeterminacy is a guaranteed indication of a parallel programming bug.

False

MPI programs have to be run with more than one process.

False

MPI_Allgather can be emulated with MPI_Gather followed by MPI_Scatter.

False

MPI_Allgather performs many-to-one communication.

False

MPI_Allreduce performs many-to-one communication.

False

MPI_Bcast implies a barrier.

False

MPI_Bcast performs many-to-one communication.

False

MPI_Recv may return before the message has actually been received.

False

MPI_Recv performs many-to-one communication.

False

MPI_Scatter performs many-to-one communication.

False

MPI_Send implies some synchronization.

False

MPI_Send performs many-to-one communication.

False

MPI_Ssend performs many-to-one communication.

False

Most programs can automatically be efficiently parallelized.

False

Mutexes guarantee fairness but locks do not.

False

Returning from an MPI_Reduce call by any process implies that the process receiving the reduced result has already reached its MPI_Reduce call.

False

Task parallelism is a good match for SIMD machines.

False

The APSP code from the projects is likely to suffer from load imbalance.

False

The MPI_Scatter function concatenates the data from all involved processes.

False

The call MPI_Reduce(a, z, n, MPI_DOUBLE, MPI_MIN, 4, MPI_COMM_WORLD) performs a sum reduction.

False

The receive buffer size parameter in MPI_Recv calls specifies the exact length of the message to be received (in number of elements).

False

When joining a thread, that thread is killed.

False

When protecting a critical section with a lock, the threads are guaranteed to enter the critical section in the order in which they first tried to acquire the lock.

False

A barrier is a synchronization primitive.

True

A cyclic distribution of the elements in an array is useful for load balancing when the amount of work per element decreases with decreasing array indices.

True

A cyclic distribution of the elements in an array is useful for load balancing when the amount of work per element decreases with increasing array indices.

True

A single call to MPI_Reduce by each process suffices to reduce local histograms with many buckets into a global histogram.

True

Data parallelism tends to scale much more than task parallelism.

True

Data races always involve at least two threads.

True

Data races are a form of indeterminacy.

True

Deadlock is a parallelism bug.

True

Distributed-memory systems generally scale to more CPU cores than shared-memory systems.

True

Distributed-memory systems require explicit communication to transfer data between compute nodes.

True

Embarrassingly parallel programs can suffer from load imbalance.

True

Frequency scaling was replaced by core scaling due to power density concerns.

True

GPUs are good at exploiting data parallelism.

True

If each process calls MPI_Reduce k > 1 times, the reductions are matched based on the order in which they are called.

True

In MPI_Allgather, rank 0 always contributes the first chunk of the result.

True

In MPI_Reduce, every process has to pass a parameter for the destination buffer, even processes that will not receive the result of the reduction.

True

In this course, we should always call MPI_Init with two NULL parameters.

True

Indeterminacy can result in garbled program output.

True

It is impossible to have a data race on a private variable (assuming no pointer variables).

True

It is possible to have a data race on a shared variable.

True

Load imbalance is a situation where some threads/processes execute longer than others.

True

MPI programs can suffer from indeterminacy.

True

MPI_Aint denotes an integer type that is large enough to hold an address.

True

MPI_Allgather implies a barrier.

True

MPI_Allgather is typically faster than calling MPI_Gather followed by MPI_Bcast.

True

MPI_Allreduce has a similar communication pattern (sends and receives of messages) as MPI_Allgather.

True

MPI_Gather performs many-to-one communication.

True

MPI_Reduce has a similar communication pattern (sends and receives of messages) as MPI_Gather.

True

MPI_Reduce may be non-blocking on more than one of the involved processes.

True

MPI_Reduce performs many-to-one communication.

True

MPI_Send may return before the message has actually been sent.

True

Matrix-vector multiplication is embarrassingly parallel.

True

Multidimensional C/C++ arrays are stored row by row in main memory.

True

Mutexes and locks are the same kind of synchronization primitive.

True

Parallelism can be used to speed up a computation or to reduce its energy consumption.

True

Programs running on pure distributed-memory systems cannot suffer from data races.

True

Reduction operations can be implemented using a reduction tree.

True

Reduction operations tend to be more frequent in parallel programs than in serial programs.

True

SPMD typically includes task parallelism.

True

Sending one long message in MPI is typically more efficient than sending multiple short messages with the same total amount of data.

True

The Collatz code from the projects is likely to suffer from load imbalance.

True

The MPI_Barrier call requires a parameter (to be passed to the function).

True

The busy-waiting code from the slides contains a data race.

True

The communication pattern (one-to-one, one-to-many, many-to-one, or many-to-many) of MPI_Bcast and MPI_Scatter are identical.

True

Vector addition is embarrassingly parallel.

True

When a thread attempts to acquire a lock that is already taken, it is blocked until it obtains the lock.

True


Ensembles d'études connexes

LOG 250 2WA Advanced Global Logistics

View Set

COMP 4320 - Auburn University Biaz M1-9

View Set

El terrible... There are five of us: somos 5 Aunque suene algo raro, una frase como "somos cinco" se traduce literalmente del inglés como "hay cinco de nosotros". Si dices "we are five", ten en cuenta que estás diciendo "tenemos cinco años

View Set

Review Chapter 57 Skin Disorders

View Set

Aircraft Landing Gear Systems Terms

View Set