Computer System third edition Exam practice problem

Ace your homework & exams now with Quizwiz!

For a run of ones starting at bit position n down to bit position m (n ≥ m), we saw that we can generate two forms of code, A and B. How should the compiler decide which form to use?

Assuming that addition and subtraction have the same performance, the rule is to choose form A when n = m, either form when n = m + 1, and form B when n > m + 1. The justification for this rule is as follows. Assume first that m > 0. When n = m, form A requires only a single shift, while form B requires two shifts and a subtraction. When n = m + 1, both forms require two shifts and either an addition or a subtraction. Whenn > m + 1, form B requires only two shifts and one subtraction, while form A requires n — m + 1 > 2 shifts and n — m > 1 additions. For the case of m = 0, we get one fewer shift for both forms A and B, and so the same rules apply for choosing between the two.

Binary 1001101110011110110101 to hexadecimal:

Binary 10 0110 1110 0111 1011 0101 Hexadecimal 2 6 E 7 B 5

Write C expressions, in terms of variable x, for the following values. Your code should work for any word size w ≥ 8. For reference, we show the result of evaluating the expressions for x = 0x87654321, with w = 32. A. The least significant byte of x, with all other bits set to 0. [0x00000021] B. All but the least significant byte of x complemented, with the least significant byte left unchanged. [0x789ABC21] C. The least significant byte set to all ones, and all other bytes of x left unchanged. [0x876543FF]

Here are the expressions: A. x & 0xFF B. x ^ ~0xFF C. x | 0xFF

Expression Type Evaluation -2147483647-1 == 2147483648U __________ __________ -2147483647-1 < 2147483647 __________ __________ -2147483647-1U < 2147483647 __________ __________ -2147483647-1 < -2147483647 __________ __________ -2147483647-1U < -2147483647 _________ __________

Expression Type Evaluation -2147483647-1 == 2147483648U Unsigned 1 -2147483647-1 < 2147483647 Signed 1 -2147483647-1U < 2147483647 Unsigned 0 -2147483647-1 < -2147483647 Signed 1 -2147483647-1U < -2147483647 Unsigned 1

Address Value Register Value 0x100 0xFF %rax 0x100 0x108 0xAB %rcx 0x1 0x110 0x13 %rdx 0x3 0x118 0x11 Fill in the following table showing the effects of the following instructions, in terms of both the register or memory location that will be updated and the resulting value: Instruction Destination Value addq %rcx,(%rax) __________ __________ subq %rdx,8(%rax) __________ __________ imulq $16,( %rax,%rdx,8) __________ __________ incq 16(%rax) __________ __________ decq %rcx __________ __________ subq %rdx,%rax __________ __________

Instruction Destination Value addq %rcx,(%rax) 0x100 0x100 subq %rdx,8(%rax) 0x108 0xA8 imulq $16,(%rax,%rdx,8) 0x118 0x110 incq 16(%rax) 0x110 0x14 decq %rcx %rcx 0x0 subq %rdx,%rax %rax 0xFD

The disassembled code for two functions first and last is shown below, along with the code for a call of first by function main: Disassembly of last(long u, long v) u in %rdi, v in %rsi 1 0000000000400540 <last>: 2 400540: 48 89 f8 mov %rdi,%rax L1: u 3 400543: 48 0f af c6 imul %rsi,%rax L2: u*v 4 400547: c3 retq L3: Return Disassembly of last(long x) x in %rdi 5 0000000000400548 <first>: 6 400548: 48 8d 77 01 lea 0x1(%rdi),%rsi F1: x+1 7 40054c: 48 83 ef 01 sub $0x1,%rdi F2: x-1 8 400550: e8 eb ff ff ff callq 400540 <last> F3: Call last(x-1,x+1) 9 400555: f3 c3 repz retq F4: Return ⋮ 10 400560: e8 e3 ff ff ff callq 400548 <first> M1: Call first(10) 11 400565: 48 89 c2 mov %rax,%rdx M2: Resume Each of these instructions is given a label, similar to those in Figure 3.27(a). Starting with the calling of first(10) by main, fill in the following table to trace instruction execution through to the point where the program returns back to main. Instruction State values (at beginning) Label PC Instruction %rdi %rsi %rax %rsp *%rsp Description M1 0x400560 callq 10 — — 0x7fffffffe820 — Call first(10) F1 __________ __________ __________ __________ __________ __________ __________ __________ F2 __________ __________ __________ __________ __________ __________ __________ __________ F3 __________ __________ __________ __________ __________ __________ __________ __________ L1 __________ __________ __________ __________ __________ __________ __________ __________ L2 __________ __________ __________ __________ __________ __________ __________ __________ L3 __________ __________ __________ __________ __________ __________ __________ __________ F4 __________ __________ __________ __________ __________ __________ __________ __________ M2 __________ __________ __________ __________ __________ __________ __________ __________

Instruction State values (at beginning) Label PC Instruction %rdi %rsi %rax %rsp *%rsp Description M1 0x400560 callq 10 — — 0x7fffffffe820 — Call first(10) F1 0x400548 lea 10 — — 0x7fffffffe818 0x400565 Entry of first F2 0x40054c sub 10 11 — 0x7fffffffe818 0x400565 F3 0x400550 callq 9 11 — 0x7fffffffe818 0x400565 Call last(9, 11) L1 0x400540 mov 9 11 — 0x7fffffffe810 0x400555 Entry of last L2 0x400543 imul 9 11 9 0x7fffffffe810 0x400555 L3 0x400547 retq 9 11 99 0x7fffffffe810 0x400555 Return 99 from last F4 0x400555 repz repq 9 11 99 0x7fffffffe818 0x400565 Return 99 from first M2 0x400565 mov 9 11 99 0x7fffffffe820 — Resume main

For each of the following values of K, find ways to express x * K using only the specified number of operations, where we consider both additions and subtractions to have comparable cost. You may need to use some tricks beyond the simple form A and B rules we have considered so far. K Shifts Add/Subs Expression 6 2 1 __________ 31 1 1 __________ -6 2 1 __________ 55 2 2 __________

K Shifts Add/Subs Expression 6 2 1 (x<<2) + (x<<1) 31 1 1 (x<<5) - x -6 2 1 (x<<1) - (x<<3) 55 2 2 (x<<6) - (x<<3) - x

x -u/4 (x) Hex Decimal Decimal Hex 0 ___________ ___________ ___________ 5 ___________ ___________ ___________ 8 ___________ ___________ ___________ D ___________ ___________ ___________ F ___________ ___________ ___________

((-u/4)x) + x = 16 Hex Decimal Decimal Hex 0 0 0 0 5 5 11 B 8 8 8 8 D 13 3 3 F 15 1 1

Show how the following binary fractional values would be rounded to the nearest half (1 bit to the right of the binary point), according to the round-to-even rule. In each case, show the numeric values, both before and after rounding. (10.010) 2 (10.011)2 (10.110)2 (11.001) 2

(10.010)2 2 1/4 10.0 2 (10.011)2 2 3/8 10.1 2 1/2 (10.110)2 2 3/4 11.0 3 (11.001)2 3 1/8 11.0 3

Using the table you filled in when solving Problem 2.17, fill in the following table describing the function T2U4: x T2U4(x) -8 __________ -3 __________ -2 __________ -1 __________ 0 __________ 5 __________

(hex) x T2U4(x) 0x8 -8 8 0xD -3 13 0xE -2 14 0xF -1 15 0x0 0 0 0x5 5 5

Fractional value Binary representation Decimal representation 1/8 0.001 0.125 3/4 __________ __________ 5/16 __________ __________ __________ 10.1011 __________ __________ 1.001 __________ __________ __________ 5.875 __________ __________ 3.1875

1/8 0.001 0.125 3/4 0.11 0.75 25/16 1.1001 1.5625 43/16 10.1011 2.6875 9/8 1.001 1.125 47/8 101.111 5.875 51/16 11.0011 3.1875 One simple way to think about fractional binary representations is to represent a number as a fraction of the form x / 2^k . We can write this in binary using the binary representation of x, with the binary point inserted k positions from the right. As an example, for 25/16 , we have 2510 = 110012. We then put the binary point four positions from the right to get 1.10012.

Executing a continue statement in C causes the program to jump to the end of the current loop iteration. The stated rule for translating a for loop into a while loop needs some refinement when dealing with continue statements. For example, consider the following code: /* Example of for loop containing a continue statement */ /* Sum even numbers between 0 and 9 */ long sum = 0; long i; for (i = 0; i < 10; i++) { if (i & 1) continue; sum += i; } A. What would we get if we naively applied our rule for translating the for loop into a while loop? What would be wrong with this code? B. How could you replace the continue statement with a goto statement to ensure that the while loop correctly duplicates the behavior of the for loop?

A. Applying our translation rule would yield the following code: /* Naive translation of for loop into while loop */ /* WARNING: This is buggy code */ long sum = 0; long i = 0; while (i < 10) { if (i & 1) /* This will cause an infinite loop */ continue; sum += i; i++; } This code has an infinite loop, since the continue statement would prevent index variable i from being updated. B. The general solution is to replace the continue statement with a goto statement that skips the rest of the loop body and goes directly to the update portion: /* Correct translation of for loop into while loop */ long sum = 0; long i = 0; while (i < 10) { if (i & 1) goto update; sum += i; update: i++; }

you decide to write code that will reverse the elements of an array by swapping elements from opposite ends of the array, working toward the middle. You arrive at the following function: 1 void reverse_array(int a[], int cnt) { 2 int first, last; 3 for (first = 0, last = cnt-1; 4 first <= last; 5 first++,last-) 6 inplace_swap(&a[first], &a[last]); 7 } When you apply your function to an array containing elements 1, 2, 3, and 4, you find the array now has, as expected, elements 4, 3, 2, and 1. When you try it on an array with elements 1, 2, 3, 4, and 5, however, you are surprised to see that the array now has elements 5, 4, 0, 2, and 1. In fact, you discover that the code always works correctly on arrays of even length, but it sets the middle element to 0 whenever the array has odd length. A. For an array of odd length cnt = 2k + 1, what are the values of variables first and last in the final iteration of function reverse_array? B. Why does this call to function inplace_swap set the array element to 0? C. What simple modification to the code for reverse_array would eliminate this problem?

A. Both first and last have value k, so we are attempting to swap the middle element with itself. B. In this case, arguments x and y to inplace_swap both point to the same location. When we compute *x ^ *y, we get 0. We then store 0 as the middle element of the array, and the subsequent steps keep setting this element to 0. We can see that our reasoning in Problem 2.10 implicitly assumed that x and y denote different locations. C. Simply replace the test in line 4 of reverse_array to be first < last, since there is no need to swap the middle element with itself.

An alternate rule for translating if statements into goto code is as follows: t = test-expr; if (t) goto true; else-statement goto done; true: then-statement done: A. Rewrite the goto version of absdiff_se based on this alternate rule. B. Can you think of any reasons for choosing one rule over the other?

A. Converting to this alternate form involves only switching around a few lines of the code: long gotodiff_se_alt(long x, long y) { long result; if (x < y) goto x_lt_y; ge_cnt++; result = x - y; return result; x_lt_y: lt_cnt++; result = y - x; return result; } B. In most respects, the choice is arbitrary. But the original rule works better for the common case where there is no else statement. For this case, we can simply modify the translation rule to be as follows: t = test-expr; if (!t) goto done; then-statement done: A translation based on the alternate rule is more cumbersome.

You are given the assignment of writing a function that determines whether one string is longer than another. You decide to make use of the string library function strlen having the following declaration: /* Prototype for library function strlen */ size_t strlen(const char *s); Here is your first attempt at the function: /* Determine whether string s is longer than string t */ /* WARNING: This function is buggy */ int strlonger(char *s, char *t) { return strlen(s) - strlen(t) > 0; } When you test this on some sample data, things do not seem to work quite right. You investigate further and determine that, when compiled as a 32-bit program, data type size_t is defined (via typedef) in header file stdio.h to be unsigned. A. For what cases will this function produce an incorrect result? B. Explain how this incorrect result comes about. C. Show how to fix the code so that it will work reliably.

A. For what cases will this function produce an incorrect result? The function will incorrectly return 1 when s is shorter than t. B. Explain how this incorrect result comes about. Since strlen is defined to yield an unsigned result, the difference and the comparison are both computed using unsigned arithmetic. When s is shorter than t, the difference strlen(s) - strlen(t) should be negative, but instead becomes a large, unsigned number, which is greater than 0. C. Show how to fix the code so that it will work reliably. Replace the test with the following: return strlen(s) > strlen(t);

When given the C code void cond(long a, long *p) { if (p && a > *p) *p = a; } gcc generates the following assembly code: void cond(long a, long *p) a in %rdi, p in %rsi cond: testq %rsi, %rsi je .L1 cmpq %rdi, (%rsi) jge .L1 movq %rdi, (%rsi) .L1: rep; ret A. Write a goto version in C that performs the same computation and mimics the control flow of the assembly code, in the style shown in Figure 3.16(b). You might find it helpful to first annotate the assembly code as we have done in our examples. B. Explain why the assembly code contains two conditional branches, even though the C code has only one if statement.

A. Here is the C code: void goto_cond(long a, long *p) { if (p == 0) goto done; if (*p >= a) goto done; *p = a; done: return; } B.The first conditional branch is part of the implementation of the && expression. If the test for p being non-null fails, the code will skip the test of a > *p.

A. What is the maximum value of n for which we can represent n! with a 32-bit int? B. What about for a 64-bit long?

A. If we build up a table of factorials computed with data type int, we get the following: n n! OK? 1 1 Y 2 2 Y 3 6 Y 4 24 Y 5 120 Y 6 720 Y 7 5,040 Y 8 40,320 Y 9 362,880 Y 10 3,628,800 Y 11 39,916,800 Y 12 479,001,600 Y 13 1,932,053,504 N We can see that the computation of 13! has overflowed. As we learned in Problem 2.35, when we get value x while attempting to compute n!, we can test for overflow by computing x/n and seeing whether it equals (n - 1)! (assuming that we have already ensured that the computation of (n - 1) !did not overflow). In this case we get 1,932,053,504/13 = 161,004,458.667. As a second test, we can see that any factorial beyond 10! must be a multiple of 100 and therefore have zeros for the last two digits. The correct value of 13! is 6,227,020,800. B. Doing the computation with data type long lets us go up to 20!, yielding 2,432,902,008,176,640,000.

We saw in Problem 2.46 that the Patriot missile software approximated 0.1 as x = 0. 000110011001100110011002. Suppose instead that they had used IEEE round-to-even mode to determine an approximation x′ to 0.1 with 23 bits to the right of the binary point. A. What is the binary representation of x′? B. What is the approximate decimal value of x′ - 0.1? C. How far off would the computed clock have been after 100 hours of operation? D. How far off would the program's prediction of the position of the Scud missile have been?

A. Looking at the nonterminating sequence for 1/10 , we see that the 2 bits to the right of the rounding position are 1, so a better approximation to 1/10 would be obtained by incrementing x to get x′ = 0.000110011001100110011012, which is larger than 0.1. B. We can see that x′ - 0.1 has binary representation 0.0000000000000000000000000[1100] Comparing this to the binary representation of 1/10 , we can see that it is 2^−22 × 2^−22×1/10 , which is around 2.38 × 10−8. C. 2.38 × 10−8 × 100 × 60 × 60 × 10 ≈ 0.086 seconds, a factor of 4 less than the error in the Patriot system. D. 0.086 × 2,000 ≈ 171 meters.

In the C function that follows, we have omitted the body of the switch statement. In the C code, the case labels did not span a contiguous range, and some cases had multiple labels. void switch2 (long x, long *dest) { long val = 0; switch (x) { ⋮ Body of switch statement omitted } *dest = val; } In compiling the function, gcc generates the assembly code that follows for the initial part of the procedure, with variable x in %rdi: void switch2(long x, long *dest) x in %rdi 1 switch2: 2 addq $1, %rdi 3 cmpq $8, %rdi 4 ja .L2 5 jmp *.L4(,%rdi,8) It generates the following code for the jump table: 1 .L4: 2 .quad .L9 3 .quad .L5 4 .quad .L6 5 .quad .L7 6 .quad .L2 7 .quad .L7 8 .quad .L8 9 .quad .L2 10 .quad .L5 Based on this information, answer the following questions: A. What were the values of the case labels in the switch statement? B. What cases had multiple labels in the C code?

A. The case labels in the switch statement body have values -1, 0, 1, 2, 4, 5, and 7. B. The case with destination .L5 has labels 0 and 7. C. The case with destination .L7 has labels 2 and 4.

In the following excerpts from a disassembled binary, some of the information has been replaced by X's. Answer the following questions about these instructions. A. What is the target of the je instruction below? (You do not need to know anything about the callq instruction here.) 4003fa: 74 02 je XXXXXX 4003fc: ff d0 callq *%rax B. What is the target of the je instruction below? 40042f: 74 f4 je XXXXXX 400431: 5d pop %rbp C. What is the address of the ja and pop instructions? XXXXXX: 77 02 ja 400547 XXXXXX: 5d pop %rbp D. In the code that follows, the jump target is encoded in PC-relative form as a 4-byte two's-complement number. The bytes are listed from least significant to most, reflecting the little-endian byte ordering of x86-64. What is the address of the jump target? 4005e8: e9 73 ff ff ff jmpq XXXXXXX 4005ed: 90 nop

A. The je instruction has as its target 0x4003fc + 0x02. As the original disassembled code shows, this is 0x4003fe: 4003fa:7402 je 4003fe 4003fc:ffd0 callq *%rax B. The je instruction has as its target 0x0x400431 - 12 (since 0xf4 is the 1-byte two's-complement representation of - 12). As the original disassembled code shows, this is 0x400425: 40042f:74f4 je 400425 400431: 5d pop %rbp C. According to the annotation produced by the disassembler, the jump target is at absolute address 0x400547. According to the byte encoding, this must be at an address 0x2 bytes beyond that of the pop instruction. Subtracting these gives address 0x400545. Noting that the encoding of the ja instruction requires 2 bytes, it must be located at address 0x400543. These are confirmed by examining the original disassembly: 400543:77 02 ja 400547 400545: 5d pop %rbp D. Reading the bytes in reverse order, we can see that the target offset is 0xffffff73, or decimal -141. Adding this to 0x0x4005ed (the address of the nop instruction) gives address 0x400560: 4005e8: e9 73 ff ff ff jmpq 400560 4005ed:90 nop

In the following C function, we have left the definition of operation OP incomplete: #define OP __________/* Unknown operator */ long arith(long x) { return x OP 8; } When compiled, gcc generates the following assembly code: long arith(long x) x in %rdi arith: leaq 7(%rdi), %rax testq %rdi, %rdi cmovns %rdi, %rax sarq $3, %rax ret A. What operation is OP? B. Annotate the code to explain how it works.

A. The operator is `/'. We see this is an example of dividing by a power of 3 by right shifting (see Section 2.3.7). Before shifting by k=3, we must add a bias of 2^k −1=7 when the dividend is negative. B. Here is an annotated version of the assembly code: long arith(long x) x in %rdi arith: leaq 7(%rdi), %rax temp = x+7 testq %rdi, %rdi Text x cmovns %rdi, %rax If x>= 0, temp = x sarq $3, %rax result = temp >> 3 (= x/8) ret The program creates a temporary value equal to x+7 , in anticipation of x being negative and therefore requiring biasing. The cmovns instruction conditionally changes this number to x when x≥0, and then it is shifted by 3 to generate x/8.

int comp(data_t a, data_t b) { return a COMP b; } shows a general comparison between arguments a and b, where data_t, the data type of the arguments, is defined (via typedef) to be one of the integer data types listed in Figure 3.1 and either signed or unsigned. The comparison COMP is defined via #define. Suppose a is in some portion of %rdx while b is in some portion of %rsi. For each of the following instruction sequences, determine which data types data_t and which comparisons COMP could cause the compiler to generate this code. (There can be multiple correct answers; you should list them all.) A.cmpl %esi, %edi setl %al B. cmpw %si, %di setge %al C. cmpb %sil, %dil setbe %al D. cmpq %rsi, %rdi setne %a

A. The suffix `l' and the register identifiers indicate 32-bit operands, while the comparison is for a two's-complement <. We can infer that data_t must be int. B. The suffix `w' and the register identifiers indicate 16-bit operands, while the comparison is for a two's-complement >=. We can infer that data_t must be short. C. The suffix `b' and the register identifiers indicate 8-bit operands, while the comparison is for an unsigned <=. We can infer that data_t must be unsigned char. D. The suffix `q' and the register identifiers indicate 64-bit operands, while the comparison is for !=, which is the same whether the arguments are signed, unsigned, or pointers. We can infer that data_t could be either long, unsigned long, or some form of pointer.

1 /* Illustration of code vulnerability similar to that found in 2 * Sun's XDR library. 3 */ 4 void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size) { 5 /* 6 * Allocate buffer for ele_cnt objects, each of ele_size bytes 7 * and copy from locations designated by ele_src 8 */ 9 void *result = malloc(ele_cnt * ele_size); 10 if (result == NULL) 11 /* malloc failed */ 12 return NULL; 13 void *next = result; 14 int i; 15 for (i = 0; i < ele_cnt; i++) { 16 /* Copy object i to destination */ 17 memcpy(next, ele_src[i], ele_size); 18 /* Move pointer to next memory region */ 19 next += ele_size; 20 } 21 return result; 22 } A similar vulnerability existed in many implementations of the library function calloc. These have since been patched. Unfortunately, many programmers call allocation functions, such as malloc, using arithmetic expressions as arguments, without checking these expressions for overflow. Writing a reliable version of calloc is left as an exercise (Problem 2.76.) uint64_t asize = ele_cnt * (uint64_t) ele_size; void *result = malloc(asize); Recall that the argument to malloc has type size_t. A. Does your code provide any improvement over the original? B. How would you change the code to eliminate the vulnerability?

A. This change does not help at all. Even though the computation of asize will be accurate, the call to malloc will cause this value to be converted to a 32-bit unsigned number, and so the same overflow conditions will occur. B. With malloc having a 32-bit unsigned number as its argument, it cannot possibly allocate a block of more than 232 bytes, and so there is no point attempting to allocate or copy this much memory. Instead, the function should abort and return NULL, as illustrated by the following replacement to the original call to malloc (line 9): uint64_t required_size = ele_cnt * (uint64_t) ele_size; size_t request_size = (size_t) required_size; if (required_size != request_size) /* Overflow must have occurred. Abort operation */ return NULL; void *result = malloc(request_size); if (result == NULL) /* malloc failed */ return NULL;

Using show_int and show_float, we determine that the integer 3510593 has hexadecimal representation 0x00359141, while the floating-point number 3510593.0 has hexadecimal representation 0x4A564504. A. Write the binary representations of these two hexadecimal values. B. Shift these two strings relative to one another to maximize the number of matching bits. How many bits match? C. What parts of the strings do not match?

A. Using the notation of the example in the text, we write the two strings as follows: (PIC) B. With the second word shifted two positions to the right relative to the first, we find a sequence with 21 matching bits. C. We find all bits of the integer embedded in the floating-point number, except for the most significant bit having value 1. Such is the case for the example in the text as well. In addition, the floating-point number has some nonzero high-order bits that do not match those of the integer.

To determine the time in seconds, the program would multiply the value of this counter by a 24-bit quantity that was a fractional binary approximation to 1/ 10 is the nonterminating sequence 0.000110011[0011]...2, where the portion in brackets is repeated indefinitely. The program approximated 0.1, as a value x, by considering just the first 23 bits of the sequence to the right of the binary point: x = 0.00011001100110011001100. (See Problem 2.51 for a discussion of how they could have approximated 0.1 more precisely.) A. What is the binary representation of 0.1 - x? B. What is the approximate decimal value of 0.1 - x? C. The clock starts at 0 when the system is first powered up and keeps counting up from there. In this case, the system had been running for around 100 hours. What was the difference between the actual time and the time computed by the software? D. The system predicts where an incoming missile will appear based on its velocity and the time of the last radar detection. Given that a Scud travels at around 2,000 meters per second, how far off was its prediction?

A. We can see that 0.1 — x has the binary representation 0.000000000000000000000001100[1100]...2 B. Comparing this to the binary representation of 1/10 , we can see that it is simply 2^−20 ×1/10 , which is around 9.54 × 10−8. C. 9.54 × 10−8 × 100 × 60 × 60 × 10 ≈ 0.343 seconds. D. 0.343 × 2,000 ≈ 687 meters.

Consider a function P, which generates local values, named a0-a8. It then calls function Q using these generated values as arguments. Gcc produces the following code for the first part of P: long P(long x) x in %rdi 1 P: 2 pushq %r15 3 pushq %r14 4 pushq %r13 5 pushq %r12 6 pushq %rbp 7 pushq %rbx 8 subq $24, %rsp 9 movq %rdi, %rbx 10 leaq 1(%rdi), %r15 11 leaq 2(%rdi), %r14 12 leaq 3(%rdi), %r13 13 leaq 4(%rdi), %r12 14 leaq 5(%rdi), %rbp 15 leaq 6(%rdi), %rax 16 movq %rax, (%rsp) 17 leaq 7(%rdi), %rdx 18 movq %rdx, 8(%rsp) 19 movl $0, %eax 20 call Q ... A.Identify which local values get stored in callee-saved registers. B. Identify which local values get stored on the stack. C. Explain why the program could not store all of the local values in callee-saved registers.

A. We can see that lines 9-14 save local values a0-a5 into callee-saved registers %rbx, %r15, %r14, %r13, %r12, and %rbp, respectively. B. Local values a6 and a7 are stored on the stack at offsets 0 and 8 relative to the stack pointer (lines 16 and 18). C. After storing six local variables, the program has used up the supply of callee-saved registers. It stores the remaining two local values on the stack.

A function fun_a has the following overall structure: long fun_a(unsigned long x) { long val = 0; while (...){ ⋮ } return ...; } The gcc C compiler generates the following assembly code: long fun_a(unsigned long x) x in %rdi 1 fun_a: 2 movl $0, %eax 3 jmp .L5 4 .L6: 5 xorq %rdi, %rax 6 shrq %rdi Shift right by 1 7 .L5: 8 testq %rdi, %rdi 9 jne .L6 10 andl $1, %eax 11 ret Reverse engineer the operation of this code and then do the following: A. Determine what loop translation method was used. B. Use the assembly-code version to fill in the missing parts of the C code. C. Describe in English what this function computes.

A. We can see that the code uses the jump-to-middle translation, using the jmp instruction on line 3. B. Here is the original C code: long fun_a(unsigned long x) { long val = 0; while (x) { val ^= x; x >>= 1; } return val & 0x1; } C. This code computes the parity of argument x. That is, it returns 1 if there is an odd number of ones in x and 0 if there is an even number.

You are given the assignment to develop code for a function tmult_ok that will determine whether two arguments can be multiplied without causing overflow. Here is your solution: /* Determine whether arguments can be multiplied without overflow */ int tmult_ok(int x, int y) { int p = x*y; /* Either x is zero, or dividing p by x gives y */ return !x || p/x == y; } You test this code for a number of values of x and y, and it seems to work properly. Your coworker challenges you, saying, "If I can't use subtraction to test whether addition has overflowed (see Problem 2.31), then how can you use division to test whether multiplication has overflowed?" Devise a mathematical justification of your approach, along the following lines. First, argue that the case x = 0 is handled correctly. Otherwise, consider w-bit numbers x (x ≠ 0), y, p, and q, where p is the result of performing two's-complement multiplication on x and y, and q is the result of dividing p by x. A. Show that x · y, the integer product of x and y, can be written in the form x * y = p +t2^w, where t ≠ 0 if and only if the computation of p overflows. B. Show that p can be written in the form x * y = p +t2^w, where |r| < |x|. C. Show that q = y if and only if r = t = 0.

A. We know that x · y can be written as a 2w-bit two's-complement number. Let u denote the unsigned number represented by the lower w bits, and v denote the two's-complement number represented by the upper w bits. Then, based on Equation 2.3, we can see that x · y = v2w + u. We also know that u = T2Uw(p), since they are unsigned and two's-complement numbers arising from the same bit pattern, and so by Equation 2.6, we can write u = p + pw-12w, where pw-1 is the most significant bit of p. Letting t = v + pw-1, we have x · y = p + t2w. When t = 0, we have x . y = p; the multiplication does not overflow. When t = 0, we have x · y = p; the multiplication does overflow. B. By definition of integer division, dividing p by nonzero x gives a quotient q and a remainder r such that p = x · q + r, and |r| < |x|. (We use absolute values here, because the signs of x and r may differ. For example, dividing -7 by 2 gives quotient -3 and remainder -1.) C. Suppose q = y. Then we have x · y = x · y + r + t2w. From this, we can see that r + t2w = 0. But |r| < |x| ≤ 2w, and so this identity can hold only if t = 0, in which case r = 0. Suppose r = t = 0. Then we will have x · y = x · q, implying that y = q.

[1011] [11011] [111011]

A. [1011] -23 + 21 + 20= -8+2+1= -5 B. [11011] -24 + 23 + 21 + 20= -16 + 8 + 2 + 1= -5 C. [111011] -25 + 24 + 23 + 21 + 20= -32 + 16 + 8 + 2 + 1= -5

A. Fill in the following table showing the effect of these functions for several example arguments. You will find it more convenient to work with a hexadecimal representation. Just remember that hex digits 8 through F have their most significant bits equal to 1. w fun1(w) fun2(w) 0x00000076 _________ _________ 0x87654321 _________ _________ 0x000000C9 _________ _________ 0xEDCBA987 _________ _________ B. Describe in words the useful computation each of these functions performs.

A. w fun1(w) fun2(w) 0x00000076 0x00000076 0x00000076 0x87654321 0x00000021 0x00000021 0x000000C9 0x000000C9 0xFFFFFFC9 0xEDCBA987 0x00000087 0xFFFFFF87 B. Function fun1 extracts a value from the low-order 8 bits of the argument, giving an integer ranging between 0 and 255. Function fun2 extracts a value from the low-order 8 bits of the argument, but it also performs sign extension. The result will be a number between -128 and 127.

assume variables x, f, and d are of type int, float, and double, respectively. Their values are arbitrary, except that neither f nor d equals +∞, -∞, or NaN. For each of the following C expressions, either argue that it will always be true (i.e., evaluate to 1) or give a value for the variables such that it is not true (i.e., evaluates to 0). A. x == (int)(double) x B. x == (int)(float) x C. d == (double)(float) d D. f == (float)(double) f E. f == -(-f) F. 1.0/2 == 1/2.0 G. d*d >= 0.0 H. (f+d)-f == d

A. x == (int)(double) x Yes, since double has greater precision and range than int. B. x == (int)(float) x No. For example, when x is TMax. C. d == (double)(float) No. For example, when d is 1e40, we will get + ∞ on the right. D. f ==(float)(double) f Yes, since double has greater precision and range than float. E. f == -(-f) Yes, since a floating-point number is negated by simply inverting its sign bit. F. 1.0/2 == 1/2.0 Yes, the numerators and denominators will both be converted to floating-point representations before the division is performed. G. d*d >= 0.0 Yes, although it may overflow to + ∞. H. (f+d)-f == d No. For example, when f is 1.0e20 and d is 1.0, the expression f+d will be rounded to 1.0e20, and so the expression on the left-hand side will evaluate to 0.0, while the right-hand side will be 1.0.

xorq %rdx,%rdx in code that was generated from C where no exclusive-or operations were present. A. Explain the effect of this particular exclusive-or instruction and what useful operation it implements. B. What would be the more straightforward way to express this operation in assembly code? C. Compare the number of bytes to encode these two different implementations of the same operation.

A.This instruction is used to set register %rdx to zero, exploiting the property that x ^ x = 0 for any x. It corresponds to the C statement x = 0. A more direct way of setting register %rdx to zero is with the instruction movq $0,%rdx. Assembling and disassembling this code, however, we find that the version with xorq requires only 3 bytes, while the version with movq requires 7. Other ways to set %rdx to zero rely on the property that any instruction that updates the lower 4 bytes will cause the high-order bytes to be set to zero. Thus, we could use either xorl %edx,%edx (2 bytes) or movl $0,%edx (5 bytes).

Running on an older processor model, our code required around 16 cycles when the branching pattern was highly predictable, and around 31 cycles when the pattern was random. A. What is the approximate miss penalty? B. How many cycles would the function require when the branch is mispredicted?

A.We can apply our formula directly to get TMP = 2(31 - 16) = 30. B. When misprediction occurs, the function will require around 16+30=46 cycles.

Binary 1100100101111011 to hexadecimal:

Binary 1100 1001 0111 1011 Hexadecimal C 9 7 B

Format A Format B Bits Value Bits Value 011 0000 101 1110 1 0111 000 1 010 1001 __________ __________ __________ 110 1111 __________ __________ __________ 000 0001 __________ __________ __________

Format A Format B Bits Value Bits Value Comments 011 0000 1 0111 000 1 101 1110 15/2 1001 111 15/2 010 1001 25/32 0110 100 3/4 Round down 110 1111 31/2 1011 000 16 Round up 000 0001 1/64 0001 000 1/64 Denorm → norm

Each of the following lines of code generates an error message when we invoke the assembler. Explain what is wrong with each line. movb $0xF, (%ebx) movl %rax, (%rsp) movw (%rax),4(%rsp) movb %al,%sl movq %rax,$0x123 movl %eax,%rdx movb %si, 8(%rbp)

Here is the code with explanations of the errors: movb $0xF, (%ebx) Cannot use %ebx as address register movl %rax, (%rsp) Mismatch between instruction suffix and register ID movw (%rax),4(%rsp) Cannot have both source and destination be memory references movb %al,%sl No register named %sl movl %eax,$0x123 Cannot have immediate as destination movl %eax,%dx Destination operand incorrect size movb %si, 8(%rbp) Mismatch between instruction suffix and register ID

As we will see in Chapter 3, the lea instruction can perform computations of the form (a<<k) + b, where k is either 0, 1, 2, or 3, and b is either 0 or some program value. The compiler often uses this instruction to perform multiplications by constant factors. For example, we can compute 3*a as (a<<1) + a. Considering cases where b is either 0 or equal to a, and all possible values of k, what multiples of a can be computed with a single lea instruction?

In Chapter 3, we will see many examples of the lea instruction in action. The instruction is provided to support pointer arithmetic, but the C compiler often uses it as a way to perform multiplication by small constants. For each value of k, we can compute two multiples: 2k (when b is 0) and 2k + 1 (when b is a). Thus, we can compute multiples 1, 2, 3, 4, 5, 8, and 9.

Fill in the following macro definitions to generate the double-precision values +∞, -∞, and -0: #define POS_INFINITY #define NEG_INFINITY #define NEG_ZERO You cannot use any include files (such as math.h), but you can make use of the fact that the largest finite number that can be represented with double precision is around 1.8 × 10308.

In general, it is better to use a library macro rather than inventing your own code. This code seems to work on a variety of machines, however. We assume that the value 1e400 overflows to infinity. #define POS_INFINITY 1e400 #define NEG_INFINITY (-POS_INFINITY) #define NEG_ZERO (-1.0/POS_INFINITY)

What would be printed as a result of the following call to show_bytes? const char *s = "abcdef"; show_bytes((byte_pointer) s, strlen(s)); Note that letters 'a' through 'z' have ASCII codes 0x61 through 0x7A.

It prints 61 62 63 64 65 66. Recall also that the library routine strlen does not count the terminating null character, and so show_bytes printed only through the character 'f'.

Mode x y x · y Truncated x · y Unsigned ___________[100] ___________[101] ___________ ___________ ___________ ___________ Two's complement ___________[100] ___________[101] ___________ _________ _________ ____________ Unsigned ___________[010] ___________[111] ___________ ___________ ___________ ___________ Two's complement ___________[010] ___________[111] ____________ ___________ _________ __________ Unsigned ___________[110] ___________[110] ___________ ___________ ___________ ___________ Two's complement ___________[110] ___________[110] ___________ ___________ _________ ___________

Mode x y x · y Truncated x · y Unsigned 4[100] 5[101] 20[010100] 4[100] Two's complement -4[100] -3[101] 12[001100] -4[100] Unsigned 2[010] 7[111] 14[001110] 6[110] Two's complement 2[010] -1[111] -2[111110] -2[110] Unsigned 6[110] 6[110] 36[100100] 4[100] Two's complement -2[110] -2[110] 4[000100] -4[100]

Operation Result a [01101001] b [01010101] ~a __________ ~b __________ a & b __________ a | b __________ a ^ b __________

Operation Result a [01101001] b [01010101] ~a [10010110] ~b [10101010] a & b [01000001] a | b [01111101] a ^ b [00111100]

ou are given the following information. A function with prototype void decode1(long *xp, long *yp, long *zp); is compiled into assembly code, yielding the following: void decode1(long *xp, long *yp, long *zp) xp in %rdi, yp in %rsi, zp in %rdx decode1: movq (%rdi), %r8 movq (%rsi), %rcx movq (%rdx), %rax movq %r8, (%rsi) movq %rcx, (%rdx) movq %rax, (%rdi) ret Parameters xp, yp, and zp are stored in registers %rdi, %rsi, and %rdx, respectively. Write C code for decode1 that will have an effect equivalent to the assembly code shown.

Reverse engineering is a good way to understand systems. In this case, we want to reverse the effect of the C compiler to determine what C code gave rise to this assembly code. The best way is to run a "simulation," starting with values x, y, and z at the locations designated by pointers xp, yp, and zp, respectively. We would then get the following behavior: void decode1(long *xp, long *yp, long *zp) xp in %rdi, yp in %rsi, zp in %rdx decode1: movq (%rdi), %r8 Get x = *xp movq (%rsi), %rcx Get y = *yp movq (%rdx), %rax Get z = *zp movq %r8, (%rsi) Store x at yp movq %rcx, (%rdx) Store y at zp movq %rax, (%rdi) Store z at xp ret From this, we can generate the following C code: void decode1(long *xp, long *yp, long *zp) { long x = *xp; long y = *yp; long z = *zp; *yp = x; *zp = y; *xp = z; }

For the C code long dw_loop(long x) { long y = x*x; long *p = &x; long n = 2*x; do { x += y; (*p)++; n--; } while (n > 0); return x; } gcc generates the following assembly code: long dw_loop(long x) x initially in %rdi 1 dw_loop: 2 movq %rdi, %rax 3 movq %rdi, %rcx 4 imulq %rdi, %rcx 5 leaq (%rdi,%rdi), %rdx 6 .L2: 7 leaq 1(%rcx,%rax), %rax 8 subq $1, %rdx 9 testq %rdx, %rdx 10 jg .L2 11 rep; ret A. Which registers are used to hold program values x, y, and n? B. How has the compiler eliminated the need for pointer variable p and the pointer dereferencing implied by the expression (*p)++? C. Add annotations to the assembly code describing the operation of the program, similar to those shown in Figure 3.19(c).

The code generated when compiling loops can be tricky to analyze, because the compiler can perform many different optimizations on loop code, and because it can be difficult to match program variables with registers. This particular example demonstrates several places where the assembly code is not just a direct translation of the C code. A. Although parameter x is passed to the function in register %rdi, we can see that the register is never referenced once the loop is entered. Instead, we can see that registers %rax, %rcx, and %rdx are initialized in lines 2-5 to x, x*x, and x+x. We can conclude, therefore, that these registers contain the program variables. B. The compiler determines that pointer p always points to x, and hence the expression (*p)++ simply increments x. It combines this incrementing by 1 with the increment by y, via the leaq instruction of line 7. C. The annotated code is as follows: long dw_loop(long x) x initially in %rdi 1 dw_loop: 2 movq %rdi, %rax Copy x to %rax 3 movq %rdi, %rcx 4 imulq %rdi, %rcx Compute y = x*x 5 leaq (%rdi,%rdi), %rdx Compute n = 2*x 6 .L2: loop: 7 leaq 1(%rcx,%rax), %rax Compute x += y + 1 8 subq $1, %rdx Decrement n 9 testq %rdx, %rdx Test n 10 jg .L2 If > 0, goto loop 11 rep; ret Return

Using only bit-level and logical operations, write a C expression that is equivalent to x == y. In other words, it will return 1 when x and y are equal and 0 otherwise.

The expression is ! (x ^ y). That is, x^y will be zero if and only if every bit of x matches the corresponding bit of y. We then exploit the ability of ! to determine whether a word contains any nonzero bit. There is no real reason to use this expression rather than simply writing x == y, but it demonstrates some of the nuances of bit-level and logical operations.

How could we modify the expression for form B for the case where bit position n is the most significant bit?

The expression simply becomes -(x<<m). To see this, let the word size be w so that n = w — 1. Form B states that we should compute (x<<w) — (x<<m), but shifting x to the left by w will yield the value 0.

Write a function div16 that returns the value x/16 for integer argument x. Your function should not use division, modulus, multiplication, any conditionals (if or ?:), any comparison operators (e.g., <, >, or ==), or any loops. You may assume that data type int is 32 bits long and uses a two's-complement representation, and that right shifts are performed arithmetically.

The only challenge here is to compute the bias without any testing or conditional operations. We use the trick that the expression x >> 31 generates a word with all ones if x is negative, and all zeros otherwise. By masking off the appropriate bits, we get the desired bias value. int div16(int x) { /* Compute bias to be either 0 (x >= 0) or 15 (x < 0) */ int bias = (x >> 31) & 0xF; return (x + bias) >> 4; }

The Digital Equipment VAX computer was a very popular machine from the late 1970s until the late 1980s. Rather than instructions for Boolean operations and and or, it had instructions bis (bit set) and bic (bit clear). Both instructions take a data word x and a mask word m. They generate a result z consisting of the bits of x modified according to the bits of m. With bis, the modification involves setting z to 1 at each bit position where m is 1. With bic, the modification involves setting z to 0 at each bit position where m is 1. To see how these operations relate to the C bit-level operations, assume we have functions bis and bic implementing the bit set and bit clear operations, and that we want to use these to implement functions computing bitwise operations | and ^, without using any other C operations. Fill in the missing code below. Hint: Write C expressions for the operations bis and bic. /* Declarations of functions implementing operations bis and bic */ int bis(int x, int m); int bic(int x, int m); /* Compute x|y using only calls to functions bis and bic */ int bool_or(int x, int y) { int result = ___________; return result; } /* Compute x^y using only calls to functions bis and bic */ int bool_xor(int x, int y) { int result = ___________; return result; }

These problems help you think about the relation between Boolean operations and typical ways that programmers apply masking operations. Here is the code: /* Declarations of functions implementing operations bis and bic */ int bis(int x, int m); int bic(int x, int m); /* Compute x|y using only calls to functions bis and bic */ int bool_or(int x, int y) { int result = bis(x,y); return result; } /* Compute x^y using only calls to functions bis and bic */ int bool_xor(int x, int y) { int result = bis(bic(x,y), bic(y,x)); return result; } The bis operation is equivalent to Boolean or—a bit is set in z if either this bit is set in x or it is set in m. On the other hand, bic(x, m) is equivalent to x & ~m; we want the result to equal 1 only when the corresponding bit of x is 1 and of m is 0. Given that, we can implement | with a single call to bis. To implement ^, we take advantage of the property

Assume data type int is 32 bits long and uses a two's-complement representation for signed values. Right shifts are performed arithmetically for signed values and logically for unsigned values. The variables are declared and initialized as follows: int x = foo(); /* Arbitrary value */ int y = bar(); /* Arbitrary value */ unsigned ux = x; unsigned uy = y; For each of the following C expressions, either (1) argue that it is true (evaluates to 1) for all values of x and y, or (2) give values of x and y for which it is false (evaluates to 0): A.(x > 0) | | (x-1 < 0) B. (x & 7) != 7 | | (x<<29 < 0) C. (x * x) >= 0 D. x < 0 | | -x <= 0 E. x > 0 | | -x > = 0 F. x+y == uy+ux G. x*~y + uy*ux == -x

These"C puzzle" problems provide a clear demonstration that programmers must understand the properties of computer arithmetic: A.(x > 0) || (x-1 < 0) False. Let x be -2,147,483,648 (TMin32). We will then have x-1 equal to 2,147,483,647 (TMax32). B. (x & 7) != 7 || (x<<29 < 0) True. If (x & 7) ! = 7 evaluates to 0, then we must have bit x2 equal to 1. When shifted left by 29, this will become the sign bit. C.(x * x) >= 0 False. When x is 65,535 (0xFFFF), x*x is -131,071 (0xFFFE0001). D. x < 0 || -x <= 0 True. If x is nonnegative, then -x is nonpositive. E. x > 0 || -x >= 0 False. Let x be -2,147,483,648 (TMin32). Then both x and -x are negative. F. x+y == uy+ux True. Two's-complement and unsigned addition have the same bit-level behavior, and they are commutative. G. x*~y + uy*ux == -x True. ~y equals -y-1. uy*ux equals x*y. Thus, the left-hand side is equivalent to x*-y-x+x*y.

A. For a floating-point format with an n-bit fraction, give a formula for the smallest positive integer that cannot be represented exactly (because it would require an (n + 1)-bit fraction to be exact). Assume the exponent field size k is large enough that the range of representable exponents does not provide a limitation for this problem. B. What is the numeric value of this integer for single-precision format (n = 23)?

This exercise helps you think about what numbers cannot be represented exactly in floating point. A. The number has binary representation 1, followed by n zeros, followed by 1, giving value 2n+1 + 1. B. When n = 23, the value is 224 + 1 = 16,777,217.

You are assigned the task of writing code for a function tsub_ok, with arguments x and y, that will return 1 if computing x-y does not cause overflow. Having just written the code for Problem 2.30, you write the following: /* Determine whether arguments can be subtracted without overflow */ /* WARNING: This code is buggy. */ int tsub_ok(int x, int y) { return tadd_ok(x, -y); } For what values of x and y will this function give incorrect results? Writing a correct version of this function is left as an exercise

This function will give correct values, except when y is TMin. In this case, we will have -y also equal to TMin, and so the call to function tadd_ok will indicate overflow when x is negative and no overflow when x is nonnegative. In fact, the opposite is true: tsub_ok(x, TMin) should yield 0 when x is negative and 1 when it is nonnegative. One lesson to be learned from this exercise is that TMin should be included as one of the cases in any test procedure for a function.

Fill in the blank entries in the following table, giving the decimal and hexadecimal representations of different powers of 2: n 2n (decimal) 2n (hexadecimal) 9 512 0x200 19 16,384 0x10000 17 32 0x80

This problem gives you a chance to think about powers of 2 and their hexadecimal representations. n 2n (decimal) 2n (hexadecimal) 9 512 0x200 19 524,288 0x80000 14 16,384 0x4000 16 65,536 0x10000 17 131,072 0x20000 5 32 0x20 7 128 0x80

A single byte can be represented by 2 hexadecimal digits. Fill in the missing entries in the following table, giving the decimal, binary, and hexadecimal values of different byte patterns: Decimal Binary Hexadecimal 0 0000 0000 0x00 167 __________ __________ 62 __________ __________ 188 __________ __________ __________ 0011 0111 __________ __________ 1000 1000 __________ __________ 1111 0011 __________

This problem gives you a chance to try out conversions between hexadecimal and decimal representations for some smaller numbers. For larger ones, it becomes much more convenient and reliable to use a calculator or conversion program. Decimal Binary Hexadecimal 0 0000 0000 0x00 167 = 10 · 16 + 7 1010 0111 0xA7 62 = 3 · 16 + 14 0011 1110 0x3E 188 = 11 · 16 + 12 1011 1100 0xBC 3 · 16 + 7 = 55 0011 0111 0x37 8 · 16 + 8 = 136 1000 1000 0x88 15 · 16 + 3 = 243 1111 0011 0xF3 5 · 16 + 2 = 82 0101 0010 0x52 10 · 16 + 12 = 172 1010 1100 0xAC 14 · 16 + 7 = 231 1110 0111 0xE7

Suppose that x and y have byte values 0x66 and 0x39, respectively. Fill in the following table indicating the byte values of the different C expressions: Expression Value Expression Value x & y _________ x && y __________ x | y __________ x | | y __________ ~x | ~y __________ !x | | !y __________ x & !y __________ x && ~y __________

This problem highlights the relation between bit-level Boolean operations and logical operations in C. A common programming error is to use a bit-level operation when a logical one is intended, or vice versa. Expression Value Expression Value x&y 0x20 x && y 0x01 x | y 0x7F x || y 0x01 ~x | ~y 0xDF !x || !y 0x00 x & !y 0x00 x && ~y 0x01

Each of these colors can be represented as a bit vector of length 3, and we can apply Boolean operations to them. The complement of a color is formed by turning off the lights that are on and turning on the lights that are off. What would be the complement of each of the eight colors listed above? Describe the effect of applying Boolean operations on the following colors: Blue | Green =__________ Yellow & Cyan =__________ Red ^ Magenta =__________

This problem illustrates how Boolean algebra can be used to describe and reason about real-world systems. We can see that this color algebra is identical to the Boolean algebra over bit vectors of length 3. A. Colors are complemented by complementing the values of R, G, and B. From this, we can see that white is the complement of black, yellow is the complement of blue, magenta is the complement of green, and cyan is the complement of red. B. We perform Boolean operations based on a bit-vector representation of the colors. From this we get the following: Blue (001) | Green (010) = Cyan (011) Yellow (110) & Cyan (011) = Green (010) Red (100) ^ Magenta (101) = Blue (001)

Fill in the table below showing the effects of the different shift operations on single-byte quantities. The best way to think about shift operations is to work with binary representations. Convert the initial values to binary, perform the shifts, and then convert back to hexadecimal. Each of the answers should be 8 binary digits or 2 hexadecimal digits. x x << 3 Logical x >> 2 Arithmetic x >> 2 Hex Binary Binary Hex Binary Hex Binary Hex 0xC3 __________ __________ __________ __________ __________ __________ __________ 0x75 __________ __________ __________ __________ __________ __________ __________ 0x87 __________ __________ __________ __________ __________ __________ __________ 0x66 __________ __________ __________ __________ __________ __________ __________

This problem is a drill to help you understand the different shift operations. x x << 3 (Logical x >> 2) (Arithmet x >> 2) Hex Binary Binary Hex Binary Hex BinaryHex 0xC3[11000011] [00011000]0x18 [00110000]0x30 [11110000]0xF0 0x75[01110101] [10101000]0xA8 [00011101]0x1D [00011101]0x1D 0x87[10000111] [00111000]0x38 [00100001]0x21 [11100001]0xE1 0x66[01100110] [00110000]0x30 [00011001]0x19 [00011001]0x19

Consider the following code that attempts to sum the elements of an array a, where the number of elements is given by parameter length: 1 /* WARNING: This is buggy code */ 2 float sum_elements(float a[], unsigned length) { 3 int i; 4 float result = 0; 5 6 for (i = 0; i <= length-1; i++) 7 result += a[i]; 8 return result; 9 } When run with argument length equal to 0, this code should return 0.0. Instead, it encounters a memory error. Explain why this happens. Show how this code can be corrected.

This problem is designed to demonstrate how easily bugs can arise due to the implicit casting from signed to unsigned. It seems quite natural to pass parameter length as an unsigned, since one would never want to use a negative length. The stopping criterion i <= length-1 also seems quite natural. But combining these two yields an unexpected outcome! Since parameter length is unsigned, the computation 0 - 1 is performed using unsigned arithmetic, which is equivalent to modular addition. The result is then UMax. The ≤ comparison is also performed using an unsigned comparison, and since any number is less than or equal to UMax, the comparison always holds! Thus, the code attempts to access invalid elements of array a. The code can be fixed either by declaring length to be an int or by changing the test of the for loop to be i < length.

1 void inplace_swap(int *x, int *y) { 2 *y = *x ^ *y; /* Step 1 */ 3 *x = *x ^ *y; /* Step 2 */ 4 *y = *x ^ *y; /* Step 3 */ 5 } As the name implies, we claim that the effect of this procedure is to swap the values stored at the locations denoted by pointer variables x and y. Note that unlike the usual technique for swapping two values, we do not need a third location to temporarily store one value while we are moving the other. There is no performance advantage to this way of swapping; it is merely an intellectual amusement. Starting with values a and b in the locations pointed to by x and y, respectively, fill in the table that follows, giving the values stored at the two locations after each step of the procedure. Use the properties of ^ to show that the desired effect is achieved. Recall that every element is its own additive inverse (that is, a ^ a = 0). Step *x *y Initially a b Step 1 __________ __________ Step 2 __________ __________ Step 3 __________ __________

This procedure relies on the fact that exclusive-or is commutative and associative, and that a ^ a = 0 for any a. Step *x *y (Initially) a b (Step 1) a a ^ b (Step 2) a ^ (a ^ b) = (a ^ a) ^ b = b a ^ b (Step 3) b b ^ (a ^ b) = (b ^ b) ^ a = a

In the following code, we have omitted the definitions of constants M and N: #define M /* Mystery number 1 */ #define N /* Mystery number 2 */ int arith(int x, int y) { int result = 0; result = x*M + y/N; /* M and N are mystery numbers. */ return result; } We compiled this code for particular values of M and N. The compiler optimized the multiplication and division using the methods we have discussed. The following is a translation of the generated machine code back into C: /* Translation of assembly code for arith */ int optarith(int x, int y) { int t = x; x <<= 5; x-=t; if (y < 0) y += 7; y >>= 3; /* Arithmetic shift */ return x+y; } What are the values of M and N?

We have found that people have difficulty with this exercise when working directly with assembly code. It becomes more clear when put in the form shown in optarith. We can see that M is 31; x*M is computed as (x<<5)-x. We can see that N is 8; a bias value of 7 is added when y is negative, and the right shift is by 3.

Without converting the numbers to decimal or binary, try to solve the following arithmetic problems, giving the answers in hexadecimal. Hint: Just modify the methods you use for performing decimal addition and subtraction to use base 16. 0x503c + 0x8 = __________ 0x503c - 0x40 = __________ 0x503c + 64 = __________ 0x50ea - 0x503c = __________

When you begin debugging machine-level programs, you will find many cases where some simple hexadecimal arithmetic would be useful. You can always convert numbers to decimal, perform the arithmetic, and convert them back, but being able to work directly in hexadecimal is more efficient and informative. 0x503c + 0x8 = 0x5044. Adding 8 to hex c gives 4 with a carry of 1. 0x503c - 0x40 = 0x4ffc. Subtracting 4 from 3 in the second digit position requires a borrow from the third. Since this digit is 0, we must also borrow from the fourth position. 0x503c + 64 = 0x507c. Decimal 64 (26) equals hexadecimal 0x40. 0x50ea - 0x503c = 0xae. To subtract hex c (decimal 12) from hex a (decimal 10), we borrow 16 from the second digit, giving hex e (decimal 14). In the second digit, we now subtract 3 from hex d (decimal 13), giving hex a (decimal 10).

For the case where data type int has 32 bits, devise a version of tmult_ok (Problem 2.35) that uses the 64-bit precision of data type int64_t, without using division.

With 64 bits, we can perform the multiplication without overflowing. We then test whether casting the product to 32 bits changes the value: 1 /* Determine whether the arguments can be multiplied 2 without overflow */ 3 int tmult_ok(int x, int y) { 4 /* Compute product without overflow */ 5 int64_t pll = (int64_t) x*y; 6 /* See if casting to int preserves value */ 7 return pll == (int) pll; 8 } Note that the casting on the right-hand side of line 5 is critical. If we instead wrote the line as int64_t pll = x*y;

For each of the following lines of assembly language, determine the appropriate instruction suffix based on the operands. (For example, mov can be rewritten as movb, movw, movl, or movq.) mov___ %eax, (%rsp) mov___ (%rax), %dx mov___ $0xFF, %bl mov___ (%rsp,%rdx,4), %dl mov___ (%rdx), %rax mov___ %dx, (%rax)

movl %eax, (%rsp) movw (%rax), %dx movb $0xFF, %bl movb (%rsp,%rdx,4), %dl movq (%rdx), %rax movw %dx, (%rax)

For a C function switcher with the general structure void switcher(long a, long b, long c, long *dest) { long val; switch(a) { case __________: /* CaseA*/ c= __________; /* Fall through */ case __________: /* Case B */ val= __________; break; case __________: /* Case C */ case __________: /* Case D */ val = __________; break; case __________: /* Case E */ val = __________; break; default: val = __________; } *dest = val; } gcc generates the assembly code and jump table shown in Figure 3.24. Fill in the missing parts of the C code. Except for the ordering of case labels C and D, there is only one way to fit the different cases into the template.

void switcher(long a, long b, long c, long *dest) { long val; switch(a) { case 5: c = b ^ 15; /* Fall through */ case 0: val = c + 112; break; case 2: case 7: val = (c + b) << 2; break; case 4: val = a; break; default: val = b; } *dest = val; }

You hear on the news that Montana has just abolished its speed limit, which constitutes 1,500 km of the trip. Your truck can travel at 150 km/hr. What will be your speedup for the trip?

we have α = 0.6 and k = 1.5. More directly, traveling the 1,500 kilometers through Montana will require 10 hours, and the rest of the trip also requires 10 hours. This will give a speedup of 25/(10 + 10) = 1.25×.

You can buy a new turbocharger for your truck at www.fasttrucks.com. They stock a variety of models, but the faster you want to go, the more it will cost. How fast must you travel through Montana to get an overall speedup for your trip of 1.67×?

we have α = 0.6, and we require S = 1.67, from which we can solve for k. More directly, to speed up the trip by 1.67×, we must decrease the overall time to 15 hours. The parts outside of Montana will still require 10 hours, so we must drive through Montana in 5 hours. This requires traveling at 300 km/hr, which is pretty fast for a truck!

x y x + y x+(t 5)y Case _____________ _____________ _____________ _____________ _____________ [10100] [10001] _____________ _____________ _____________ _____________ _____________ _____________ _____________ _____________ [11000] [11000] _____________ _____________ _____________ _____________ _____________ _____________ _____________ _____________ [10111] [01000] _____________ _____________ _____________ _____________ _____________ _____________ _____________ _____________ [00010] [00101] _____________ _____________ _____________ _____________ _____________ _____________ _____________ _____________ [01100] [00100] _____________ _____________ _____________ _____________ _____________ _____________ _____________ _____________

x y x + y x+(t 5)y Case -12 -15 -27 5 1 [10100] [10001] [100101] [00101] -8 -8 -16 -16 2 [11000] [11000] [110000] [10000] -9 8 -1 -1 2 [10111] [01000] [111111] [11111] 2 5 7 7 3 [00010] [00101] [000111] [00111] 12 4 16 -16 4 [01100] [00100] [010000] [10000]

Write a function with the following prototype: /* Determine whether arguments can be added without overflow */ int uadd_ok(unsigned x, unsigned y);

/* Determine whether arguments can be added without overflow */ int uadd_ok(unsigned x, unsigned y) { unsigned sum = x+y; return sum >= x; }

Write a function with the following prototype: /* Determine whether arguments can be added without overflow */ int tadd_ok(int x, int y); This function should return 1 if arguments x and y can be added without causing overflow.

/* Determine whether arguments can be added without overflow */ int tadd_ok(int x, int y) { int sum = x+y; int neg_over = x < 0 && y < 0 && sum >= 0; int pos_over = x >= 0 && y >= 0 && sum < 0; return !neg_over && !pos_over; }

In Chapter 3, we will look at listings generated by a disassembler, a program that converts an executable program file back to a more readable ASCII form. These files contain many hexadecimal numbers, typically representing values in two's-complement form. Being able to recognize these numbers and understand their significance (for example, whether they are negative or positive) is an important skill. For the lines labeled A-I (on the right) in the following listing, convert the hexadecimal values (in 32-bit two's-complement form) shown to the right of the instruction names (sub, mov, and add) into their decimal equivalents: 4004d0:48 81 ec e0 02 00 00 sub $0x2e0,%rsp A. 4004d7:48 8b 44 24 a8 mov -0x58(%rsp),%rax B. 4004dc:48 03 47 28 add 0x28(%rdi),%rax C. 4004e0:48 89 44 24 d0 mov %rax,-0x30(%rsp) D. 4004e5:48 8b 44 24 78 mov 0x78(%rsp),%rax) E. 4004ea:48 89 87 88 00 00 00 mov %rax,0x88(%rdi) F. 4004f1:48 8b 84 24 f8 01 00 mov 0x1f8(%rsp),%rax) G. 4004f8:00 4004f9:48 03 44 24 08 add 0x8(%rsp),%rax 4004fe:48 89 84 24 c0 00 00 mov %rax,0xc0(%rsp) H. 400505:00 400506:48 8b 44 d4 b8 mov -0x48(%rsp,%rdx,8),%rax I.

For a 32-bit word, any value consisting of 8 hexadecimal digits beginning with one of the digits 8 through f represents a negative number. It is quite common to see numbers beginning with a string of f's, since the leading bits of a negative number are all ones. You must look carefully, though. For example, the number 0x8048337 has only 7 digits. Filling this out with a leading zero gives 0x08048337, a positive number. 4004d0:48 81 ec e0 02 00 00 sub $0x2e0,%rsp A. 736 4004d7:48 8b 44 24 a8 mov -0x58(%rsp),%rax B. -88 4004dc:48 03 47 28 add 0x28(%rdi),%rax C. 40 4004e0:48 89 44 24 d0 mov %rax,-0x30(%rsp) D. -48 4004e5:48 8b 44 24 78 mov 0x78(%rsp),%rax) E. 120 4004ea:48 89 87 88 00 00 00 mov %rax,0x88(%rdi) F. 136 4004f1:48 8b 84 24 f8 01 00 mov 0x1f8(%rsp),%rax) G. 504 4004f8:00 4004f9:48 03 44 24 08 add 0x8(%rsp),%rax 4004fe:48 89 84 24 c0 00 00 mov %rax,0xc0(%rsp) H. 192 400505:00 400506:48 8b 44 d4 b8 mov -0x48(%rsp,%rdx,8),%rax I. -72

Picture

For the first four entries, the values of x are negative and T2U4(x) = x + 24. For the remaining two entries, the values of x are nonnegative and T2U4(x) = x

We can represent a bit pattern of length w = 4 with a single hex digit. For a two's-complement interpretation of these digits, fill in the following table to determine the additive inverses of the digits shown: x −(t 4) x Hex Decimal Decimal Hex 0 _________________ _________________ _________________ 5 _________________ _________________ _________________ 8 _________________ _________________ _________________ D _________________ _________________ _________________ F _________________ _________________ _________________

For w = 4, we have TMin4 = -8. So -8 is its own additive inverse, while other values are negated by integer negation. x Hex Decimal Decimal Hex 0 0 0 0 5 5 -5 B 8 -8 -8 8 D -3 3 3 F -1 1 1

Hex Unsigned Two's complement Original Truncated Original Truncated Original Truncated 0 0 0 ___________ 0 ___________ 2 2 2 ___________ 2 ___________ 9 1 9 ___________ -7 ___________ B 3 11 ___________ -5 ___________ F 7 15 ___________ -1 ___________

Hex Unsigned Two's complement Original Truncated Original Truncated Original Truncated 0 0 0 0 0 0 2 2 2 2 2 2 9 1 9 1 -7 1 B 3 11 3 -5 3 F 7 15 7 -1 -1

As mentioned in Problem 2.6, the integer 3,510,593 has hexadecimal representation 0x00359141, while the single-precision floating-point number 3,510,593.0 has hexadecimal representation 0x4A564504. Derive this floating-point representation and explain the correlation between the bits of the integer and floating-point representations.

Hexadecimal 0x359141 is equivalent to binary [1101011001000101000001]. Shifting this right 21 places gives 1.1010110010001010000012 × 221. We form the fraction field by dropping the leading 1 and adding two zeros, giving [10101100100010100000100] The exponent is formed by adding bias 127 to 21, giving 148 (binary [10010100]). We combine this with a sign field of 0 to give a binary representation [01001010010101100100010100000100] We see that the matching bits in the two representations correspond to the low-order bits of the integer, up to the most significant bit equal to 1 matching the high-order 21 bits of the fraction:

0x39A7F8 to binary:

Hexadecimal 3 9 A 7 F 8 Binary 0011 1001 1010 0111 1111 1000

Assuming w = 4, we can assign a numeric value to each possible hexadecimal digit, assuming either an unsigned or a two's-complement interpretation. Fill in the following table according to these interpretations by writing out the nonzero powers of 2 in the summations shown in Equations 2.1 and 2.3: X→ Hexadecimal Binary B2U4(X→) B2T4(X→) 0xE [1110] 23 + 22 + 21 = 14 -23 + 22 + 21 = -2 0x0 __________ __________ __________ 0x5 __________ __________ __________ 0x8 __________ __________ __________ 0xD __________ __________ __________ 0xF __________ __________ __________

Hexadecimal Binary B2U B2T 0xE [1110] 23 +22 +21 = 14 -23 + 22 +21 = -2 0x0 [0000] 0 0 0x5 [0101] 22 + 20 = 5 22 + 20 = 5 0x8 [1000] 23 = 8 -23 = -8 0xD [1101] 23 + 22 + 20 = 13 -23 + 22 + 20 = -3 0xF [1111] 23 + 22 + 21 + 20 = 15 -23 + 22 + 21 + 20 = -1

0xD5E4C to binary:

Hexadecimal D 5 E 4 C Binary 1101 0101 1110 0100 1100

Instruction Result leaq 6(%rax), %rdx __________ leaq (%rax,%rcx), %rdx __________ leaq (%rax,%rcx,4), %rdx __________ leaq 7(%rax,%rax,8), %rdx __________ leaq 0xA(,%rcx,4), %rdx __________ leaq 9(%rax, %rcx,2), %rdx __________

Instruction Result leaq 6(%rax), %rdx 6+x leaq (%rax,%rcx), %rdx x +y leaq (%rax,%rcx,4), %rdx x + 4y leaq 7(%rax,%rax,8), %rdx 7 + 9x leaq 0xA(,%rcx,4), %rdx 10 + 4y leaq 9(%rax,%rcx,2), %rdx 9 +x + 2y

Consider the following three calls to show_bytes: int val = 0x87654321; byte_pointer valp = (byte_pointer) &val; show_bytes(valp, 1); /* A. */ show_bytes(valp, 2); /* B. */ show_bytes(valp, 3); /* C. */

This problem tests your understanding of the byte representation of data and the two different byte orderings. A. Little endian: 21 Big endian: 87 B. Little endian: 21 43 Big endian: 87 65 C. Little endian: 21 43 65 Big endian: 87 65 43

The marketing department at your company has promised your customers that the next software release will show a 2× performance improvement. You have been assigned the task of delivering on that promise. You have determined that only 80% of the system can be improved. How much (i.e., what value of k) would you need to improve this part to meet the overall performance target?

You are given S = 2 and α = 0.8, and you must then solve for k:

Your coworker gets impatient with your analysis of the overflow conditions for two's-complement addition and presents you with the following implementation of tadd_ok: /* Determine whether arguments can be added without overflow */ /* WARNING: This code is buggy. */ int tadd_ok(int x, int y) { int sum = x+y; return (sum-x == y) && (sum-y == x); } You look at the code and laugh. Explain why.

Your coworker could have learned, by studying Section 2.3.2, that two's-complement addition forms an abelian group, and so the expression (x+y)-x will evaluate to y regardless of whether or not the addition overflows, and that (x+y)-y will always evaluate to x.

Write goto code for fact_for based on first transforming it to a while loop and then applying the guarded-do transformation. We see from this presentation that all three forms of loops in C—do-while, while, and for—can be translated by a simple strategy, generating code that contains one or more conditional branches. Conditional transfer of control provides the basic mechanism for translating loops into machine code.

long fact_for_gd_goto(long n) { long i = 2; long result = 1; if (n <= 1) goto done; loop: result *= i; i++; if (i <= n) goto loop; done: return result; }

long scale2(long x, long y, long z) { longt= __________; return t; } Compiling the actual function with gcc yields the following assembly code: long scale2(long x, long y, long z) x in %rdi, y in %rsi, z in %rdx scale2: leaq (%rdi,%rdi,4), %rax leaq (%rax,%rsi,2), %rax leaq (%rax,%rdx,8), %rax ret

long scale2(long x, long y, long z) x in %rdi, y in %rsi, z in %rdx scale2: leaq (%rdi,%rdi,4), %rax 5*x leaq (%rax,%rsi,2), %rax 5*x+2*y leaq (%rax,%rdx,8), %rax 5*x+2*y+8*z ret From this, it is easy to generate the missing expression: long t = 5 * x + 2 * y + 8 * z;

long arith2(long x, long y, long z) { longt1= __________; longt2= __________; longt3= __________; longt4= __________; return t4; } The portion of the generated assembly code implementing these expressions is as follows: long arith2(long x, long y, long z) x in %rdi, y in %rsi, z in %rdx arith2: orq %rsi, %rdi sarq $3, %rdi notq %rdi movq %rdx, %rax subq %rdi, %rax ret

long t1 = x | y; long t2 = t1 << 3; long t3 = ~t2; long t4 = z-t3;

Starting with C code of the form long test(long x, long y) { long val = __________; if (__________) { if (__________) val = __________; else val = __________; } else if (__________) val = __________; return val; } gcc generates the following assembly code: long test(long x, long y) x in %rdi, y in %rsi test: leaq 0(,%rdi,8), %rax testq %rsi, %rsi jle .L2 movq %rsi, %rax subq %rdi, %rax movq %rdi, %rdx andq %rsi, %rdx cmpq %rsi, %rdi cmovge %rdx, %rax ret .L2: addq %rsi, %rdi cmpq $-2, %rsi cmovle %rdi, %rax ret Fill in the missing expressions in the C code.

long test(long x, long y) { long val = 8*x; if (y > 0) { if (x < y) val = y-x; else val = x&y; } else if (y <= -2) val = x+y; return val; }

Starting with C code of the form long test(long x, long y, long z) { long val = __________; if (__________) { if (__________) val = __________; else val = __________; } else if (__________) val = __________; return val; } gcc generates the following assembly code: long test(long x, long y, long z) x in %rdi, y in %rsi, z in %rdx test: leaq (%rdi,%rsi), %rax addq %rdx, %rax cmpq $-3, %rdi jge .L2 cmpq %rdx, %rsi jge .L3 movq %rdi, %rax imulq %rsi, %rax ret .L3: movq %rsi, %rax imulq %rdx, %rax ret .L2: cmpq $2, %rdi jle .L4 movq %rdi, %rax imulq %rdx, %rax .L4: rep; ret Fill in the missing expressions in the C code.

long test(long x, long y, long z) { long val = x+y+z; if (x < -3) { if (y < z) val = x*y; else val = y*z; } else if (x > 2) val = x*z; return val; }

Address Value Register Value 0x100 0xFF %rax 0x100 0x104 0xAB %rcx 0x1 0x108 0x13 %rdx 0x3 0x10C 0x11 Operand Value %rax __________ 0x104 __________ $0x108 __________ (%rax) __________ 4(%rax) __________ 9(%rax,%rdx) __________ 260(%rcx,%rdx) __________ 0xFC(,%rcx,4) __________ (%rax,%rdx,4) __________

perand Value Comment %rax 0x100 Register 0x104 0xAB Absolute address $0x108 0x108 Immediate (%rax) 0xFF Address 0x100 4(%rax) 0xAB Address 0x104 9(%rax,%rdx) 0x11 Address 0x10C 260(%rcx,%rdx) 0x13 Address 0x108 0xFC(,%rcx,4) 0xFF Address 0x100 (%rax,%rdx,4) 0x11 Address 0x10C

src_t dest_t Instruction long long movq(%rdi), %rax movq %rax, (%rsi) char int __________ __________ char unsigned __________ __________ unsigned char long __________ __________ int char __________ __________ unsigned unsigned char __________ __________ char short __________ __________

src_t dest_t Instruction Comments long long movq (%rdi), %rax Read 8 bytes movq %rax, (%rsi) Store 8 bytes char int movsbl (%rdi), %eax Convert char to int movl %eax, (%rsi) Store 4 bytes char unsigned movsbl (%rdi), %eax Convert char to int movl %eax, (%rsi) Store 4 bytes unsigned char long movzbl (%rdi), %eax Read byte and zero extend movq %rax, (%rsi) Store 8 bytes int char movl (%rdi), %eax Read 4 bytes movb %al, (%rsi) Store low-order byte unsigned unsigned movl (%rdi), %eax Read 4 bytes char movb %al, (%rsi) Store low-order byte char short movsbw (%rdi), %ax Read byte and sign-extend movw %ax, (%rsi) Store 2 bytes


Related study sets

Exam 3 review Anatomy and Physiology 1 lab

View Set

western civ midterm - ancient greece and rome

View Set

Industrial Supremacy- Chapter 17

View Set

Chp. 64: Arthritis and Connective Tissue Diseases

View Set

Study Guide For Computer Programming and Coding Quiz

View Set