Principles of Parallel Computing - Final Exam 2
n^2
Consider the following: for (i=0; i<n; i++) { y[i] = 0.0; for (j=0; j<n; j++) { y[i] += A[i * n+j] * x[j]; } } In this program, the time in serial (Tserial) would be approximately equal to what?
buffer
Once a message has been assembled, the sending process can ______, meaning that the MPI system will place the message into its own internal storage, and the call to MPI_Send will return.
weakly
Programs that can maintain a constant efficiency if the problem size increases at the same rate as the number of processes are ______ scalable.
strongly
Programs that can maintain a constant efficiency without increasing the problem size are ________ scalable.
5
Suppose that each process calls MPI_Reduce with the operator MPI_SUM and destination process 0. What will be the value of d after these processes are executed?
recv_type, send_type, recv_buf_sz, send_buf_sz
The message sent by q can successfully be received by r if _________ = _________ and _____________ is greater than or equal to ___________.
False (lower values of p usually get closer values to linear speedup)
True or False: The speedup of a program, with n input size and p processes, is closer to reaching linear speedup the greater the value of p.
True
True or False: There is no wildcard for communicator arguments, both senders and receivers must always specify communicators.
aliased
Two arguments are _______ if they refer to the same block of memory.
committed
Before using an MPI_Datatype object in a communication function, it first must be _________.
point-to-point
Functions such as MPI_Send and MPI_Recv are examples of ______________ communications.
gcc -g -Wall -I/home/peter/my_include -o helloworld helloworld.c
(P&P) Consider that a file you wrote called helloworld.c contains a reference to "timer.h" but your local directory does not contain the macro "timer.h." Instead, it's in a directory called /home/peter/my_include. Write the syntax to compile helloworld.c.
for(i = 0; i < m; i++){ /∗Form dot product of ith row with x∗/ y[i] = 0.0; for(j = 0; j < n; j++) y[i] += A[i][j]∗x[j];}
(P&P): Consider the following formula that multiplies a matrix by a vector: yi=ai0x0+ai1x1+ai2x2+···ai,n−1xn−1, where a is a matrix and x is a vector. Write pseudocode for this matrix multiplication.
Greetings from process 0 of 4! Greetings from process 1 of 4! Greetings from process 2 of 4! Greetings from process 3 of 4!
(P&P): Consider the following: /* Start */ #include <everything> const int MAX_STRING = 100; int main(void) { char greeting[MAX_STRING]; int comm_sz; int my_rank; MPI_Init(NULL, NULL); MPI_Comm_size(MPI_COMM_WORLD, &comm_sz); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); if (my_rank != 0) { sprintf(greeting, "Greetings from process %d of %d!", my_rank, comm_sz); MPI_Send(greeting, strlen(greeting)+1, MPI_CHAR, 0, 0, MPI_COMM_WORLD); } else { printf("Greetings from process %d of %d!\n", my_rank, comm_sz); for (int q = 1; q < comm_sz; q++) { MPI_Recv(greeting, MAX_STRING, MPI_CHAR, q, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); printf("%s\n", greeting); } } MPI_Finalize(); return 0; } /* End */ After compilation, if we type the line: mpiexec -n 4 ./mpihello, what will be the output?
void Get_input(int my_rank, int comm_sz, double * a_p, double * b_p, int * n_p) { int dest; if (my_rank == 0) { printf("Enter a, b, and n\n"); scanf("%lf %lf %d", a_p, b_p, n_p); for (dest = 1; dest < comm_sz; dest++) { MPI_Send(a_p, 1, MPI_DOUBLE, dest, 0, MPI_COMM_WORLD); MPI_Send(b_p, 1, MPI_DOUBLE, dest, 0, MPI_COMM_WORLD); MPI_Send(n_p, 1, MPI_DOUBLE, dest, 0, MPI_COMM_WORLD); } } else { MPI_Recv(a_p, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); MPI_Recv(b_p, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); MPI_Recv(n_p, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); } }
(P&P): Consider the following: MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); MPI_Comm_size(MPI_COMM_WORLD, &comm_sz); Get_input(my_rank, comm_sz, &a, &b, &n); h = (b-a)/n The Get_input() function is meant to get values for a, b, and n in order to assign a value to each of these and then apply the trapezoidal rule. Write a parallelized program that defines Get_input()
Tparallel(n,p) = Tserial(n)/p + Tallgather
(P&P): Consider the following: Tparallel(n,p) = Tserial(n)/p If the parallel program calls MPI_Allgather(), how do we change this formula?
double start, finish, local_elapsed, elapsed; MPI_Barrier(comm); start = MPI_Wtime(); /* Code to be timed */ finish = MPI_Wtime(); MPI_Reduce(&local_elapsed, &elapsed, 1, MPI_DOUBLE, MPI_MAX, 0, comm); if (my_rank == 0) { printf("elapsed time = %e seconds\n", finish-start); }
(P&P): Consider the following: double start, finish; start = MPI_Wtime(); /* Code to be timed */ finish = MPI_Wtime(); printf("Proc %d > elapsed time = %e seconds\n", my_rank, finish-start); Parallelize this program in such a way that it returns a single elapsed time and uses the MPI_Barrier() and MPI_Reduction functions.
void Get_input(int my_rank, int comm_sz, double * a_p, double * b_p, int * n_p) { MPI_Datatype input_mpi_t; Build_mpi_type(a_p, b_p, n_p, &input_mpi_t); if (my_rank == 0) { printf("Enter a, b, and n\n"); scanf("%lf %lf %d", a_p, b_p, n_p); } MPI_Bcast(a_p, 1, input_mpi_t, 0, MPI_COMM_WORLD); MPI_Type_free(&input_mpi_t); }
(P&P): Consider the following: void Build_mpi_type(double * a_p, double * b_p, int * n_p, MPI_Datatype * input_mpi_t_p) { int array_of_blocklengths[3] = {1, 1, 1}; MPI_Datatype array_of_types[3] = {MPI_DOUBLE, MPI_DOUBLE, MPI_INT}; MPI_Aint a_addr, b_addr, c_addr; MPI_Aint array_of_displacements[3] = {0}; MPI_Get_address(a_p, &a_addr); MPI_Get_address(b_p, &b_addr); MPI_Get_address(n_p, &n_addr); array_of_displacements[1] = b_addr-a_addr; array_of_displacements[2] = n_addr-a_addr; MPI_Type_create_struct(3, array_of_blocklengths, array_of_displacements, array_of_types, input_mpi_t_p); MPI_Type_commit(input_mpi_t_p); } Write a function that gets the user input for values of a, b, and n called Get_input() that takes arguments int my_rank, int comm_sz, double * a_p, double * b_p, and int * n_p. The Get_input() function should call this function to create a new MPI_Datatype, get input values for a_p, b_p, and n_p, broadcast a_p, and then free the new Datatype.
void Get_input(int my_rank, int comm_sz, double * a_p, double * b_p, int * n_p) { if (my_rank == 0) { printf("Enter a, b, and n\n"); scanf("%lf %lf %d", a_p, b_p, n_p); } MPI_Bcast(a_p, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); MPI_Bcast(b_p, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); MPI_Bcast(n_p, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); }
(P&P): Consider the following: void Get_input(int my_rank, int comm_sz, double * a_p, double * b_p, int * n_p) { int dest; if (my_rank == 0) { printf("Enter a, b, and n\n"); scanf("%lf %lf %d", a_p, b_p, n_p); for (dest = 1; dest < comm_sz; dest++) { MPI_Send(a_p, 1, MPI_DOUBLE, dest, 0, MPI_COMM_WORLD); MPI_Send(b_p, 1, MPI_DOUBLE, dest, 0, MPI_COMM_WORLD); MPI_Send(n_p, 1, MPI_DOUBLE, dest, 0, MPI_COMM_WORLD); } } else { MPI_Recv(a_p, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); MPI_Recv(b_p, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); MPI_Recv(n_p, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); } } Modify this program so that it broadcasts the values of a_p, b_p, and n_p.
void Mat_vect_mult(double local_A[], double local_x[], double local_y[], int local_m, int n, MPI_Comm comm) { double * x; int local_i, j; int local_ok = 1; x = malloc(n * sizeof(double)); MPI_Allgather(local_x, local_n, MPI_DOUBLE, x, local_n, MPI_DOUBLE, comm); for (local_i=0; local_i < local_m; local_i++) { local_y[local_i] = 0.0; for (j=0; j<n; j++) { local_y[local_i] += local_A[local_i * n+j] * x[j]; } } }
(P&P): Consider the following: void Mat_vect_mult(double A[], double x[], double y[], int m, int n) { int i, j; for (i=0; i<m; i++) { y[i] = 0.0; for (j=0; j<n; j++) { y[i] += A[i * n+j] * x[j]; } } } Parallelize this program by implementing the MPI_Allgather() function.
x = a + subval(x) * h, or x = b - subval(x) * h
(P&P): If n subintervals all have the same length such as in this diagram, the vertical lines bounding each subinterval are x= a and x=b, and the length of each subinterval is h, what formula can we use to find the value of any endpoint (x)?
a = h[f(xsub(0))/2 + f(xsub(1)) + f(xsub(2)) + ... + f(xsub(n-1)) + f(xsub(n))]
(P&P): If n subintervals all have the same length such as in this diagram, we call the leftmost endpoint xsub(0) and the rightmost endpoint xsub(n), and the length of each subinterval is h, what formula can we use to approximate the area in graph a (i.e. what formula do we use to find the sum of the areas of the trapezoids in graph b)?
recv_comm = send_comm, recv_tag = send_tag, dest = r, and src = q
(P&P): If process q calls MPI_Send() with MPI_Send(send_buf_p, send_buf_sz, send_type, dest, send_tag, send_comm); and process r calls MPI_Recv() with MPI_Recv(recv_buf_p, recv_buf_sz, recv_type, src, recv_tag, recv_comm, &status); Then the message sent by q with the command MPI_Send() can be received by r with the command MPI_Recv() if (4 conditions)...
status.MPI_SOURCE, status.MPI_TAG
(P&P): If we call: MPI_Status status; and then we call: MPI_Recv(..., &status); What two members can we examine to determine the sender and tag of "status"?
MPI_Get_count(&status, recv_type, &count)
(P&P): If we call: MPI_Status status; MPI_Datatype recv_type; int count; and then we call: MPI_Recv(..., recv_type, ..., &status); What can we call to get the amount of data that's been received?
h = (b-a)/n
(P&P): Since n subintervals all have the same length in this diagram, and the vertical lines bounding each subinterval are x= a and x=b, what is the formula for the length of each subinterval (h)?
void Print_vector(double local_n[], int local_n, int n, char title[], int my_rank, MPI_Comm comm) { double * b = NULL; int i; if (my_rank == 0) { b = malloc(n * sizeof(double)); MPI_Gather(local_b, local_n, MPI_DOUBLE, b, local_n, MPI_DOUBLE, 0, comm); printf("%s\n", title); for (i = 0; i < n; i++) { printf("%f ", b[i]); } printf("\n"); free(b); } else { MPI_Gather(local_b, local_n, MPI_DOUBLE, b, local_n, MPI_DOUBLE, 0, comm); } }
(P&P): Using the MPI_Gather() function write a function that prints a distributed vector called Print_vector() that takes arguments double local_b[], int local_n, int n, char title[], int my_rank, and MPI_Comm comm.
void Read_vector(double local_a[], int local_n, int n, char vec_name[], int my_rank, MPI_Comm comm) { double * a = NULL; int i; if (my_rank == 0) { a = malloc(n * sizeof(double)); printf("Enter the vector %s\n", vec_name); for (i = 0; i < n; i++) { scanf("%lf", &a[i]); } MPI_Scatter(a, local_n, MPI_DOUBLE, local_a, local_n, MPI_DOUBLE, 0, comm); free(a); } else { MPI_Scatter(a, local_n, MPI_DOUBLE, local_a, local_n, MPI_DOUBLE, 0, comm); } }
(P&P): Using the MPI_Scatter() function, write a function that reads and distributes a vector called Read_vector() that takes arguments double local_a[], int local_n, int n, char vec_name[], int my_rank, and MPI_Comm comm.
area = (h/2)(f(xsub(i)) + f(xsub(i+1)))
(P&P): With the base of a trapezoid representing a subinterval, if the endpoints of the subinterval are xsub(i) and xsub(i+1), then the length of each subinterval is h = xsub(i+1) - xsub(i). If the lengths of the two vertical segments are f(xsub(i)) and f(xsub(i+1)), what is the formula for the area of the trapezoid?
double Trap(double left_endpt, double right_endpt, int trap_count, double base_len) { double estimate, x; int i; estimate = f(left_endpt); for (i = 1; i < trap_count; i++) { x = left_endpt + i * base_len; estimate += f(x); } estimate *= base_len; return estimate; }
(P&P): Write a C program for the Trap() function with the f(x) function pre-defined that takes the arguments double left_endpt, double right_endpt, int trap_count, double base_len.
Tparallel(n,p) = Tserial(n)/p + Toverhead
(P&P): Write a formula for Tparallel(n,p) where n is the input size and p is the number of processes, using Tserial and Toverhead.
E(n,p) = S(n,p)/p
(P&P): Write a formula for the efficiency of a program with a size of input n and p processes.
void Build_mpi_type(double * a_p, double * b_p, int * n_p, MPI_Datatype * input_mpi_t_p) { int array_of_blocklengths[3] = {1, 1, 1}; MPI_Datatype array_of_types[3] = {MPI_DOUBLE, MPI_DOUBLE, MPI_INT}; MPI_Aint a_addr, b_addr, c_addr; MPI_Aint array_of_displacements[3] = {0}; MPI_Get_address(a_p, &a_addr); MPI_Get_address(b_p, &b_addr); MPI_Get_address(n_p, &n_addr); array_of_displacements[1] = b_addr-a_addr; array_of_displacements[2] = n_addr-a_addr; MPI_Type_create_struct(3, array_of_blocklengths, array_of_displacements, array_of_types, input_mpi_t_p); MPI_Type_commit(input_mpi_t_p); }
(P&P): Write a function that creates an MPI struct called Build_mpi_type, that takes arguments double * a_p, double * b_p, int * n_p, and MPI_Datatype * input_mpi_t_p. The struct must have three elements, each with blocklengths of one. The first and second elements are doubles and the third is an int.
void Mat_vect_mult(double A[], double x[], double y[], int m, int n) { int i, j; for (i=0; i<m; i++) { y[i] = 0.0; for (j=0; j<n; j++) { y[i] += A[i * n+j] * x[j]; } } }
(P&P): Write a function that multiplies a matrix called Mat_vect_mult() that takes arguments double A[], double x[], double y[], int m, and int n, where x[] has a block distribution and is being multiplied by the matrix to produce y[].
#include "timer.h" double start, finish; GET_TIME(start); /* Code to be timed */ GET_TIME(finish);
(P&P): Write a short program that times a block using the GET_TIME() function and then prints the elapsed time.
double start, finish; start = MPI_Wtime(); /* Code to be timed */ finish = MPI_Wtime(); printf("Proc %d > elapsed time = %e seconds\n", my_rank, finish-start);
(P&P): Write a short program that times a block using the MPI_Wtime() function and then prints the elapsed time.
double local_x[N], sum[N]; MPI_Reduce(local_x, sum, N, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
(P&P): Write a short program that uses the MPI_Reduce() function to create a global_sum function that operates on arrays.
Get a, b, n; h = (b-a)/n; local_n = n/comm_sz; local_a = a + my_rank * local_n * h; local_b = local_a + local_n * h; local_integral = Trap(local_a, local_b, local_n, h); if (my_rank != 0) { Send local _integral to process 0; } else { total_integral = local_integral; for (proc = 1; proc < comm_sz; proc++) { Receive local_integral from proc; total_integral += local_integral; } print total_integral; }
(P&P): Write pseudocode that takes inputs a, b, and n, where a is the starting point and b is the endpoint of an interval, and n is the number of intervals, for a core in a parallel program that returns the area of a trapezoid in its subinterval.
/* Input: a, b, n */ h = (b-a)/n; approx = (f(a) + f(b)) / 2.0; for (i = 1; i <= n-1; i++) { x_i = a + i * h; approx += f(x_i); } approx = h * approx;
(P&P): Write serial pseudocode finding the sum of the areas of the trapezoids
int main(void) { int my_rank, comm_sz, n = 1024, local_n; double a = 0.0, b = 3.0, h, local_a, local_b; double local_int, total_int; int source; MPI_Init(NULL, NULL); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); MPI_Comm_size(MPI_COMM_WORLD, &comm_sz); h = (b - a)/n; local_n = n/comm_sz; local_a = a + my_rank * local_n * h; local_b = local_a + local_n * h; local_int = Trap(local_a, local_b, local_n, h); if (my_rank != 0) { MPI_Send(&local_int, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD); } else { total_int = local_int; for (source = 1; source < comm_sz; source++) { MPI_Recv(&local_int, 1, MPI_DOUBLE, source, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); total_int += local_int; } printf("With n = %d trapezoids, our estimate\n", n); printf("of the integral from %f to %f = %.15e\n", a, b, total_int); } MPI_Finalize(); return 0; }
(P&P): Write the C program for the first version of the MPI trapezoidal rule, with the Trap() function pre-defined.
S(n,p) = Tserial(n) / Tparallel(n,p)
(P&P): Write the formula for the speedup of a program with an input of size n that is parallelized into p processes.
int MPI_Finalize(void);
(P&P): Write the syntax for MPI_Finalize() function
int MPI_Init( int* argc_p /* pointer to argc in main */, char*** argv_p /* pointer to argv in main */);
(P&P): Write the syntax for MPI_Init() function
mpicc −g −Wall −o mpihello mpihello.c
(P&P): Write the syntax for compiling a program called "mpihello.c"
mpiexec -n <number of processes> ./mpi_hello
(P&P): Write the syntax for running a program called "mpi_hello.c"
int MPI_Allgather( void * send_buf_p /* an array of data that will be sent to be gathered */, int send_count /* the amount of data in send_buf_p */, MPI_Datatype send_type /* the type of elements in send_buf_p */, void * recv_buf_p /* an array of data that receives send_buf_p */, int recv_count /* the amount of elements being received */, MPI_Datatype recv_type /* the type of data being received */);
(P&P): Write the syntax for the MPI_Allgather() function.
int MPI_Allreduce( void * input_data_p /* an array of elements of type datatype that each process wants to reduce */, void * output_data_p /* an array that contains the reduced result */, int count /* the size of output_data_p/sizeof(datatype) */, MPI_Datatype datatype /* the type of the elements in input_data_p */, MPI_Op operator /* the operation to which the data is applied */, MPI_Comm comm /* Communicator where the message is sent/received */);
(P&P): Write the syntax for the MPI_Allreduce() function.
int MPI_Barrier( MPI_Comm comm /* The communicator where the barrier is taking place */);
(P&P): Write the syntax for the MPI_Barrier() function.
int MPI_Bcast( void * data_p /* the data being broadcasted */, int count /* the size of data_p/sizeof(datatype) */, MPI_Datatype datatype /* the type of elements in data_p */, int source_proc /* the rank that sends the contents of the memory referenced by data_p */, MPI_Comm comm /* Communicator where the message is sent/received */);
(P&P): Write the syntax for the MPI_Bcast() function.
int MPI_Comm_rank( MPI_Comm comm /* communicator */ int* my_rank_p /* process's rank in communicator */);
(P&P): Write the syntax for the MPI_Comm_rank() function.
int MPI_Comm_size( MPI_Comm comm /* communicator */ int* comm_sz_p /* number of processes in the communicator */);
(P&P): Write the syntax for the MPI_Comm_size() function.
int MPI_Gather ( void * send_buf_p /* an array of data that will be sent to the root process to be gathered */, int send_count /* The amount of elements being sent to the root process */, MPI_Datatype send_type /* The type of elements being sent to the root process */, void * recv_buf_p /* an array of data that will receive send_buf_p */, int recv_count /* The amount of elements being received at the root process */, MPI_Datatype recv_type /* The type of elements being received by the root process */, int root /* The process at which the data is gathered */, MPI_Comm comm /* The communicator in which the processes reside */);
(P&P): Write the syntax for the MPI_Gather() function.
int MPI_Get_address( void * location_p /* the variable whose address we are getting */, MPI_Aint address_p /* The address of the variable */);
(P&P): Write the syntax for the MPI_Get_address() function.
int MPI_Get_count( MPI_Status* status_p /* The status that contains the information about the message */, MPI_Datatype type /* type of the elements in the message */, int* count_p /* the number of elements */);
(P&P): Write the syntax for the MPI_Get_count() function.
int MPI_Recv( void* msg_buf_p /* block of memory */, int buf_size /* number of objects that can be stored in the block */, MPI_datatype buf_type /* type of the objects */, int source /* the process from which the message should be received */, int tag /* determines whether the message should be printed or used in computation */, MPI_Comm communicator /* Communicator of which the message is sent/received */, MPI_Status* status_p /* The status can allow the receiver to know the amount of data in the message, the sender of the message, or the tag of the message, if necessary */);
(P&P): Write the syntax for the MPI_Recv() function.
int MPI_Reduce( void * input_data_p /* an array of elements of type datatype that each process wants to reduce */, void * output_data_p / *an array that contains the reduced result */, int count /* the size of output_data_p/sizeof(datatype) */, MPI_Datatype datatype /* the type of the elements in input_data_p */, MPI_Op operator /* the operation to which the data is applied */, int dest_process /* the rank at which output_data_p is relevant */, MPI_Comm comm /* Communicator where the message is sent/received */);
(P&P): Write the syntax for the MPI_Reduce() function.
int MPI_Scatter( void * send_buf_p /* an array of data that resides in the root process */, int send_count /* The number of elements each process will get */, MPI_Datatype send_type /* The type of elements being sent */, void * recv_buf_p /* a buffer of data that receives send_buf_p */, int recv_count /* the amount of elements recv_buf_p can hold */, MPI_Datatype recv_type /* the type of elements recv_buf_p receives */, int src_proc /* the process that scatters the array */, MPI_Comm comm /* the communicator in which the processes reside. */);
(P&P): Write the syntax for the MPI_Scatter() function.
int MPI_Send( void* msg_buf_p /* a pointer to the block of memory containing the contents of the message */, int msg_size / *number of characters (or other datatypes) in the message +1 */, MPI_Datatype msg_type /* Datatype of the contents in the message */, int dest /* the rank of the process that should receive the message */, int tag /* determines whether the message should be printed or used in computation */, MPI_Comm communicator /* Communicator of which the message is sent/received */);
(P&P): Write the syntax for the MPI_Send() function.
int MPI_Type_commit( MPI_Datatype new_mpi_t_p /* The datatype being committed */);
(P&P): Write the syntax for the MPI_Type_commit() function.
int MPI_Type_free( MPI_Datatype old_mpi_t_p /* the datatype whose storage is being freed */);
(P&P): Write the syntax for the MPI_Type_free() function.
double MPI_Wtime(void);
(P&P): Write the syntax for the MPI_Wtime() function.
int MPI_type_create_struct( int count /* The number of elements in the datatype */, int array_of_blocklengths[] /* an array that allows for the possibility that the individual data items might be arrays or subarrays */, MPI_Aint array_of_displacements[] /* specifies the displacement, in bytes, from the start of the message */, MPI_Datatype array_of_types[] /* array that stores the datatypes of the elements */, MPI_Datatype * new_type_p /* the new datatype being output */);
(P&P): Write the syntax for the MPI_type_create_struct() function.
nondeterministic
A program is ________________ if its output varies from one run to the next.
scalable
A program is this if the problem size can be increased at a rate so that the efficiency doesn't decrease as the number of processes increase.
the amount of data in the message, the sender of the message, or the tag of the message
A receiver can receive a message without knowing these three things.
macro
Because GET_TIME() is a _____, the code that defines it is inserted directly into the source code by the preprocessor. Therefore, the argument is a double and not a pointer to a double.
process 0: 0, 1, 2, 3 process 1: 4, 5, 6, 7 process 2: 8, 9, 10, 11
Consider a 12-component vector among 3 processes. In a block partition, what components would each process have?
process 0: 0, 1, 6, 7 process 1: 2, 3, 8, 9 process 2: 4, 5, 10, 11
Consider a 12-component vector among 3 processes. In a block-cyclic partition where blocksize = 2, what components would each process have?
process 0: 0, 3, 6, 9 process 1: 1, 4, 7, 10 process 2: 2, 5, 8, 11
Consider a 12-component vector among 3 processes. In a cyclic partition, what components would each process have?
False (The processes would have to be in rank order)
Consider the following: #include <everything> int main(void) { int my_rank, comm_sz; MPI_Init(NULL, NULL); MPI_Comm_size(MPI_COMM_WORLD, &comm_sz); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); printf("Proc %d of %d > Does anyone have a toothpick?\n", my_rank, comm_sz); MPI_Finalize(); return 0; } True or False: If comm_sz = 5, this would be a possible output of the program: Proc 1 of 5 > Does anyone have a toothpick? Proc 2 of 5 > Does anyone have a toothpick? Proc 0 of 5 > Does anyone have a toothpick? Proc 4 of 5 > Does anyone have a toothpick? Proc 3 of 5 > Does anyone have a toothpick?
True
Consider the following: #include <everything> int main(void) { int my_rank, comm_sz; MPI_Init(NULL, NULL); MPI_Comm_size(MPI_COMM_WORLD, &comm_sz); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); printf("Proc %d of %d > Does anyone have a toothpick?\n", my_rank, comm_sz); MPI_Finalize(); return 0; } True or False: If comm_sz = 6, this would be a possible output of the program: Proc 1 of 6 > Does anyone have a toothpick? Proc 2 of 6 > Does anyone have a toothpick? Proc 0 of 6 > Does anyone have a toothpick? Proc 4 of 6 > Does anyone have a toothpick? Proc 3 of 6 > Does anyone have a toothpick? Proc 5 of 6 > Does anyone have a toothpick?
derived
Consider the following: void Get_input(int my_rank, int comm_sz, double * a_p, double * b_p, int * n_p) { if (my_rank == 0) { printf("Enter a, b, and n\n"); scanf("%lf %lf %d", a_p, b_p, n_p); } MPI_Bcast(a_p, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); MPI_Bcast(b_p, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); MPI_Bcast(n_p, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); } As an alternative to calling MPI_Bcast three times, we can also call a single _______ datatype that consists of two doubles and one int.
trapezoidal rule
Graph a shows the area to be estimated and graph b shows the approximate are using trapezoids. What rule is being demonstrated in this picture?
hang
If a process tries to receive a message and there's no message end, the process will ____, meaning that it will block forever.
MPI_ANY_SOURCE
If process 0 doles out work for the rest of the processes and the processes are sending their results back to process 0, process 0 uses the MPI_Recv() function to receive the results. If process 0 wants to receive messages in the order of which ranks completed their work first, rather than in rank order, what MPI constant can MPI_Recv() use for the int <source> argument?
linear
If the speedup of a program with size of input n, that is parallelized into p processes is equal to p, then the program has ______ speedup.
MPI_STATUS_IGNORE
If the status_p argument does not need to be used in the Recv() function, then we use this MPI constant as the parameter.
derived datatype
In MPI, this kind of datatype can be used to represent any collection of data items in memory by storing both the types of the items and their relative location in memory.
subinterval
In graph b, each trapezoid represents a ___________ of the graph.
MPI_SUM
In order to use the MPI_Reduce() function to create a global-sum function, what MPI constructor should we use for the MPI_Op argument?
c, e, b, f, d, a
Looking at this diagram of a binary search tree, order these steps a program takes in executing this structure: a) Process 0 adds the received value to its newest value. b) Processes 2 and 6 send their new values to processes 0 and 4, respectively. c) Processes 1, 3, 5, and 7 send their values to processes 0, 2, 4, and 6 respectively. d) Process 4 sends its newest value to process 0. e) Processes 0, 2, 4, and 6 add their received values to their original values. f) Processes 0 and 4 add the received values into their new values.
the count argument to the various communication functions, derived datatypes, and MPI_Pack/Unpack.
MPI provides these three basic approaches to consolidating data that might otherwise require multiple messages.
nonovertaking
MPI requires that messages be _____________, meaning that if process q sends two messages to process r, then the first message sent by q must be available to r before the second message.
collective
MPI_COMM_WORLD is an example of a __________ communication.
wrapper
Mpicc is a script that's a _______ for the C compiler.
block
Once a message has been assembled, the sending process can _____, meaning that the MPI system will wait until it can begin transmitting the message, and the call to MPI_Send may not return immediately.
MPI_ANY_TAG
One process may be receiving multiple messages with different tags from another process, and the receiving process doesn't know the order in which messages will be sent. MPI solves this problem by having this constant that can be used for the int <tag> argument in the MPI_Recv() function.
void Vector_sum(double x[], double y[], double z[], int n) { int i; for (i=0; i<n; i++) { z[i] = x[i] + y[i]; } }
P&P: Write a serial program called Vector_sum that takes arguments double x[], double y[], and double z[], and computes this formula: x+y=(x0,x1,...,xn−1)+(y0,y1,...,yn−1)=(x0+y0,x1+y1,...,xn−1+yn−1)=(z0,z1,...,zn−1)=z
4
Suppose that each process calls MPI_Reduce with the operator MPI_SUM and destination process 0. What will be the value of b after these processes are executed?
MPI_COMM_WORLD
The MPI communicator, defined by the MPI_Init() function, that consists of all of the processes started by the user when he/she started the program.
MPI_SOURCE, MPI_TAG, and MPI_ERROR
The MPI type MPI_Status is a struct with at least these three members.
message-passing programs
These kinds of programs have two processes that can communicate by calling functions: one process calling a send function and the other calling a receive function.
MPI_Barrier()
This MPI collective communication function insures that no process will return from calling it until every process in the communicator has started calling it.
MPI_Type_commit()
This MPI function commits an MPI_Datatype.
MPI_Wtime()
This MPI function returns the number of seconds that have elapsed since some time in the past.
MPI_MAX
This MPI operator finds the largest of the input arguments.
MPI_Aint
This MPI type is an integer type that is big enough to store an address on the system.
broadcast
This diagram is an example of a tree-structured _________.
MPI_Get_address()
This function gets the address to a specific variable.
GET_TIME(double now)
This function is a C macro defined in the header file timer.h, that gets the current time in seconds.
shared-memory system
This kind of system consist of a collection of cores connected to a globally accessible memory, in which each core can have access to any memory location.
distributed-memory system
This kind of system consists of a collection of core-memory pairs connected by a network, and the memory associated with a core is directly accessible only to that core.
tree
This type of communication, represented in this diagram, is finding the global sum and then reversing to distribute the global sum. What kind of communication is represented in this diagram?
True
True or False: Collective communications differ from point-to-point communications in that all the processes in the communicator must call the same collective function.
False (They don't use tags; they're matched solely on the basis of the communicator and the order in which they're called).
True or False: Collective communications differ from point-to-point communications in that collective communications use tags and communicators to match.
False (Arguments in collective communications must be compatible in order for them to execute properly)
True or False: Collective communications differ from point-to-point communications in that the arguments passed by each process to an MPI collective communication can pass unrelated or incompatible arguments without blocking or hanging.
True
True or False: Collective communications differ from point-to-point communications in that the output_data_p argument is only used on dest_process.
False (Doubling the number of processes if n is small, has very little effect on runtime)
True or False: For runtimes of parallel matrix-multiplication programs, if the order of the matrix n is small, doubling the number of processes in the program will halve the runtime.
True
True or False: If the input of a parallel program (n) is large and the number of processes (p) is small, doubling p roughly halves the runtime.
False (just like p, n does have a correlation to the runtime)
True or False: If the input of a parallel program (n) is large and the number of processes (p) is small, increasing n does not substantially impact the program's runtime so long that p remains constant.
True
True or False: If the input of a parallel program (n) is small and the number of processes (p) is large, and the function contains a reference to MPI_Allgather(), the time it takes to run MPI_Allgather() takes up a significant portion of the program's runtime.
True
True or False: In virtually all distributed-memory systems, communication can be much more expensive than local computation.
False (it prohibits aliasing in this case)
True or False: MPI allows aliasing arguments if one is the input and the other is the output.
False (MPI uses "push" rather then "pull")
True or False: MPI uses a "pull" communication mechanism rather than a "push" mechanism.
True
True or False: Only a receiver can use a wildcard argument (such as MPI_ANY_SOURCE or MPI_ANY_TAG).
True
True or False: Senders must specify a process rank and a nonnegative tag.
True
True or False: The MPI_Reduce() function will likely produce an unpredictable result if the same buffer is used for arguments input_data_p and output_data_p.
True
True or False: The speedup of a program, with n input size and p processes, is closer to reaching linear speedup the greater the value of n.
stdout, stderr (either order)
Virtually all MPI implementations allow all the processes in MPI_COMM_WORLD full access to ______ and ______, so most MPI implementations allow all processes to execute printf and fprintf.
butterfly
What communication type is represented in this diagram, finding global sum?
Matrix-Vector Multiplication
What does this diagram display?
distributed-memory system
What kind of system does this diagram represent?
shared-memory system
What kind of system does this diagram represent?
MPI_Type_free()
When we're done using a new MPI_Datatype, we can free any additional storage used with this function.
communicator
a collection of processes that can send messages to each other.
broadcast
a collective communication in which data belonging to a single process is sent to all of the processes in the communicator.
butterfly
a communication pattern in which the processes exchange partial results instead of using one-way communication.
process
a program running on one core-memory pair in a message-passing program.
wrapper script
a script whose main purpose is to run some program.
single program, multiple data (SPMD).
an approach to parallel programming in which a single program is written so that different processes carry out different actions by having the processes branch on the basis of their process rank.
Message-Passing Interface (MPI)
an implementation of message-passing that defines a library of functions that can be called from C, C++, and Fortran functions.
1. Partition the problem solution into tasks. 2. Identify the communication channels between the tasks. 3. Aggregate the tasks into composite tasks. 4. Map the composite tasks to cores.
four basic steps for parallelizing a parallel program
collective communications
global communication functions that can involve more than two processes.
efficiency
speedup per process
speedup
the ratio of the serial runtime to the parallel runtime.
local variables
variables whose contents are significant only on the process that's using them.
global variables
variables whose contents are significant to all the processes.