C952 Computer Architecture Chapter 4
Arithmetic Logic Unit (ALU)
Hardware that performs addition, subtraction, and usually logical operations such as AND and OR.
The leading 1 indicates that the bit pattern below represents a negative number. 1010 ... 0010 1100 True False Not enough information to determine what the bit pattern represents.
Not enough: The operation performed on the bit pattern must be known to determine the meaning of the bit pattern.
A calculation that leads to a number being too large to represent is called _____. overflow underflow a fraction
overflow: Floating-point numbers are represented with a fixed number of bits, thus can only represent a fixed range of numbers. Floating-point arithmetic can lead to numbers that are too large to represent given the number of bits available.
Goal: X6 × X7 (signed multiply high)_____ X5, X6, X7
SMULH: The SMULH instruction computes X6 × X7, and puts the upper 64-bits of 128-bit signed product in register X5.
The add instruction, ADD, includes support for 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit numbers. Operands can be integers, but not floating point numbers.
True: Check marks in COD Figure 3.18 indicate that ADD supports 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit integer operands, but not floating-point numbers.
A floating-point value represented in IEEE 754 is typically an approximation.
True: An infinite variety of numbers cannot be represented by a fixed number of digits. The IEEE floating-point representation strives to get close to the actual number, while balancing the precision and range of a number that can be represented.
ulp is a measure of accuracy in floating point numbers.
True: ulp refers to units in the last place, and is a measure of the number of bits in error in the least significant bits of the significand.
Each step of the multiplication algorithm shifts the Multiplier register 1 bit to the _____.
right: The least significant bit of the multiplier (Multiplier0) determines whether the multiplicand is added to the Product register. Shifting the Multiplier 1 bit to the right in each step ensures the n-th bit is placed in Multiplier0.
IEEE 754 standard floating-point representation
(-1)^5 * (1+Fraction) * 2^(Exponent-bias)
0...0011 + 0...0010 = ?
0...0101: In other words, 3 + 2 is 5.
At the end of iteration 2, the 4-bit value of Multiplier is _____.
0000 Iteration 2, step 3 shifts the Multiplier right 1 bit. The blue text shows the Multiplier's value after the shift. The circle indicates the bit examined in the next iteration.
Iteration 1, step 2b restores the Remainder's value to _____.
00000111: The negative remainder indicates that the divisor is larger than the dividend, so the algorithm restores the original remainder value by adding the divisor back to the remainder.
At the end of iteration 2, the 8-bit value of Multiplicand is _____.
00001000 Iteration 2, step 2 shifts the Multiplicand left 1 bit. The blue text shows the Multiplicand's value after the shift.
The value placed in the quotient is _____. ? 1011/1100111
0: The divisor 1011 is larger than the dividend 110, so a 0 is placed in the quotient.
Iteration 1, step 1 subtracts the Divisor from the Remainder and places the result in the Remainder. The Remainder's updated value is _____.
11100111: The result of the subtraction is negative, as indicated by a 1 in the leftmost bit.
The Multiplicand register is _____-bits wide. 64 128
128: Over 64 steps, the 64-bit multiplicand is shifted 64 bits to the left to mimic the paper-and-pencil method of shifting the intermediate product one digit to the left of the earlier intermediate product. A 128-bit register ensures the original 64-bits of the multiplicand are retained so that the intermediate product can be added with the Product register in each step.
The Product register is _____-bits wide. 64 128 256
128: The multiplication of two 64-bit values yields a 64 + 64-bit, or 128-bit, product.
Number of ARMv8 arithmetic core instructions
15: The arithmetic core instructions are highlighted in bold in the image above.
Fused multiply add
A floating-point instruction that performs both a multiply and an add, but rounds only once after the add.
Single precision
A floating-point value represented in a 32-bit word.
Double precision
A floating-point value represented in a 64-bit doubleword.
scientific notation
A notation that renders numbers with a single digit to the left of the decimal point.
Dividend
A number being divided.
Normalized
A number in floating-point notation that has no leading 0s.
Underflow (floating-point)
A situation in which a negative exponent becomes too large to fit in the exponent field.
Overflow (Floating-point)
A situation in which a positive exponent becomes too large to fit in the exponent field.
Exception
Also called interrupt. An unscheduled event that disrupts program execution; used to detect overflow.
Which LEGv8 arithmetic instructions should be generated for byte and halfword arithmetic operations? ADD, SUB, MUL, DIV ADD, SUB, MUL, DIV, using AND to mask result to 8 or 16 bits after each operation
ADD, SUB, MUL, DIV The load instructions create a 64-bit value, so these arithmetic operations can be used as normal.
Interrupt
An exception that comes from outside of the processer. (Some architectures use the term interrupt for all exceptions.
Floating-point hardware was included in the first computers.
False: John von Neumann refused to include floating-point hardware in the computer he built at Princeton. The loss of memory capacity and the increased complexity were arguments against the inclusion of floating-point hardware.
Floating point
Computer arithmetic that represents numbers in which the binary point is not fixed.
Overflow does not occur in unsigned integers.
False: Overflow can occur in unsigned integers, but since unsigned integers are commonly used for memory addresses, such overflows are often ignored.
Conditional compare that only compares if the initial condition is true
FCCMP: FCCMP (quiet compare) doesn't cause an exception whenever one of the operands is NaN. FCCMPE (signaling compare) does cause an exception when one of the operands is NaN.
Floating-point multiply instruction that produces a negative product
FNMUL: FNMUL is known as the floating-point scalar multiply-negate instruction.
The multiply (MUL) instruction ignores overflow, while the signed multiply high (SMULH) and unsigned multiply high (UMULH) instructions detect overflow. True False
False: All multiply instructions ignore overflow, so the software must detect overflow.
IEEE 754, released in 1985, is the most recent floating point standard.
False: Standards must be revisited periodically for updating. IEEE Std 754-2008, released in 2008, is the most recent standard for floating point. The new standard includes additional types such as half precision and quad precision floating point representations.
Compilers for the C and Fortran programming languages need to identify overflow values and notify the current program.
False: The C language (and many other languages) ignore overflow and just continue executing. Fortran does need to notify the current program of overflow values.
Exponent
In the numerical representation system of floating-point arithmetic, the value that is placed in the exponent field.
Which LEGv8 load instructions should be generated for byte and halfword arithmetic operations? STUR, STURW LDURB, LDURH
LDURB, LDURH These load instructions sign extend the most significant bits, which preserves the variable's value.
Which of the following instruction classifications occur most frequently in the execution of the SPEC CPU2006 floating-point benchmarks? LEGv8 core Remaining ARMv8 Pseudo ARMv8
LEGv8 core: The LEGv8 core and arithmetic core instructions account for 97% of instructions executed by the SPEC CPU2006 floating-point benchmarks. The LEGv8 core and arithmetic core similarly dominate the SPEC CPU2006 integer benchmarks. Thus, the material concentrates on these core instructions.
Which SIMD instruction version means the width of the elements in the destination register is twice the width of the elements in all source registers? Wide Long
Long: Conversely, wide means the width of the elements in the destination register and the first source registers is twice the width of the elements in the second source register.
Move wide with zero
MOVZ: As noted in COD Section 2.10 (LEGv8 addressing for wide immediates ...), the move wide with zero instruction moves a 16-bit constant value into a register and zero extends the rest of the register's bits.
1 1 0 0 ... + 1 1 0 1 ... ------------- 1 0 0 1 ... -Overflow -No overflow
No Overflow: The numbers being added are negative, and the result is negative (as indicated by the 1 in the leftmost bit), as expected. No overflow has occurred.
0 0 0 1 ... + 1 0 0 1 ... ------------- 1 0 1 0 ... -Overflow -No overflow
No overflow: When positive and negative operands are added, overflow is impossible, because the result's magnitude will surely be less than the larger of the two operands. Ex: 200 + -999 = -799.
Indicate if the following number is in normalized scientific notation: 0.0514ten × 10^-6
No: A number in normalized scientific notation does not have leading zeros. The number can be rewritten in normalized scientific notation by moving the decimal point to the right until a non-zero digit appears to the left of the decimal point. The exponent is decremented for each position the decimal point moves: 5.14two × 10-8
0 1 1 0 ... + 0 1 1 1 ... ------------- 1 1 0 1 ... -Overflow -No overflow
Overflow: The numbers being added are positive, but the result is negative (as indicated by the 1 in the leftmost bit). Such a result is nonsense; overflow has occurred.
The Multiplier register is removed and placed inside of the _____ register. Product Multiplicand
Product: The multiplier is placed in the right half of the Product register. As the Product register is shifted to the right to allocate room for the accumulated sum of intermediate products, the least significant bit of the multiplier is no longer needed and can be shifted out of the register.
Each step of the division algorithm shifts the Divisor register 1 bit to the _____. right left
Right: The divisor is placed in the left half of the 128-bit Divisor register to align the divisor with the dividend. The Divisor register is shifted right 1 bit in each step of the algorithm.
Signed divide
SDIV: The signed divide instruction performs signed integer division.
Subword parallelism
The AVX version is 3.85 times as fast, which is very close to the factor of 4.0 increase that you might hope for from performing four times as many operations at a time by using subword parallelism
Using 4-bit numbers to save space, multiply 2ten × 3ten, or 0010two × 0011two.
The figure below shows the value of each register for each of the steps labeled according to COD Figure 3.4 (The first multiplication algorithm ...), with the final value of 0000 0110two or 6ten. Color is used to indicate the register values that change on that step, and the bit circled is the one examined to determine the operation of the next step.
Guard
The first of two extra bits kept on the right during intermediate calculations of floating-point numbers; used to improve rounding accuracy.
Divisor
The number by which another number is divided.
Units in the last place (ulp)
The number of bits in error in the least significant bits of the significand between the actual number and the number that can be represented.
Quotient
The primary result of a division; a number that when multiplied by the divisor and added to the remainder produces the dividend.
Remainder
The secondary result of a division; a number that when added to the product of the quotient and the divisor produces the dividend.
Fraction
The value, generally between 0 and 1, placed in the fraction field. The fraction is also called the mantissa.
Goal: X6 / X7 (unsigned division)_____ X5, X6, X7
UDIV: The UDIV instruction computes X6 / X7, and puts the quotient in register X5.
Indicate if the following numbers are in scientific notation. 1.1101two × 2^3
Yes: The notation contains a single digit to left of the binary point. The standard form for representing binary numbers in scientific notation is 1.xxxxxxxxxtwo × 2yyyy, where the base is 2 and the "." is referred to as a binary point.
Indicate if the following numbers are in scientific notation. 9.8ten × 10^12
Yes: The notation contains a single digit to the left of the decimal point. Scientific notation provides a method to write numbers that are too large to be conveniently written in decimal form.
Indicate if the following numbers are in scientific notation. 3.56ten × 10^-6
Yes: The notation contains a single digit to the left of the decimal point. Scientific notation provides a method to write numbers that are too small to be conveniently written in decimal form.
The prefixes U, S, and F refer to _____. data types operand sizes
data types The prefixes are for instructions that use unsigned (U), signed (S), and floating-point (F) data types.
Rewriting programs that contain _____ arithmetic operations to execute in parallel may affect the result. two's complement integer floating-point
floating-point: Floating-point addition is not associative. The parallel version of the program likely executes arithmetic operations in a different order than the sequential version, thus the result may vary between programs.
Increasing the size of the _____ used to represent a floating-point number impacts the number's precision. fraction exponent
fraction: Floating-point numbers are represented using a fixed number of bits, so compromise is needed between the size of the fraction and the size of the exponent.
Each step of the multiplication algorithm shifts the Multiplicand register 1 bit to the _____. right left
left: The left shift mimics the paper-and-pencil method of shifting the intermediate product one digit to the left of the earlier intermediate product. The intermediate product in the Multiplicand register is then added to the sum being accumulated in the Product register.
Floating-point addition is _____. associative not associative
not associative: Adding a much larger number to a smaller number results in an approximated sum due to limited precision. 1.5ten x 1038 + 1.0 = 1.5ten x 1038, where the 1.0 is lost due to the limited number of bits available to represent the sum.
The dgemm() function operates on single dimensional arrays A[n], B[n], and C[n].
True: DGEMM typically performs the matrix multiply of C = C + A * B where A, B, and C are square matrices. The dgemm() function instead represents the square matrices as single dimensional arrays to improve the performance.
Multiple floating-point operands can be packed into a 128-bit SSE2 register.
True: Each register can hold two single precision or four double precision floating point values. The packed format enables a single data transfer instruction to move multiple operands, and a single arithmetic instruction to perform multiple operations.
Each iteration of the resulting x86 assembly language calculates C = C + A * B for one element of C.
True: Each xmm register contains one operand. Line 6 multiplies one element of A with one element of B. Line 9 adds the product of A * B to one element of C, then line 12 stores the sum back into the C element.
Reduced floating-point accuracy and the high cost of converting code from the IBM 7094 to the S/360 led to the formation of a subcommittee of IBM mainframe users to propose improvements to the S/360 floating point.
True: IBM updated the System/360 floating point based on many of the subcommittee's recommendations, then retrofitted existing System/360s in the field at considerable expense.
Intermediate calculations of floating-point numbers append two extra bits to improve rounding accuracy.
True: IEEE 754 uses two extra bits on the right during intermediate additions. The first extra bit is called the guard digit, and the rightmost extra bit is called the round digit.
Integers can be represented as bits.
True: Information is kept in computer hardware as a series of high and low electronic signals, so a base 2 representation is a natural way to represent integers. Ex: 12ten can be represented as 1100two.
The ARM ADD, ADDI, and SUB instructions may result in overflow.
True: Overflow results when a sum or difference cannot be represented in 64-bits, thus addition and subtraction instructions can result in overflow.
Rounding may require the result to be normalized again.
True: Rounding the significand to a fixed number of digits may yield a non-normalized result. Ex: 9.999 × 104 rounded to three significands adds a 1 to a string of 9s, and results in 10.00 × 104. The result is normalized and rounded again.
Shifting the bits of an unsigned integer right by n bits divides the integer by 2n.
True: Shifting is typically faster than dividing. Replacing a divide operation with a shift operation is a type of optimization known as operator strength reduction.
In an effort to achieve similar floating-point accuracy, programmers converting code from the IBM 7094 to the S/360 replaced single precision declarations with double precision declarations.
True: The IBM S/360's larger memory, narrower single precision words, and cruder arithmetic produced inaccurate results when compared to the IBM 7094. The fastest method to recover accuracy was to move from a single precision to a double precision representation.
Kahan was honored with the Turing Award in 1989 for the benefits conferred upon the computing industry through standardizing floating point.
True: The Turing Award is a prestigious award given to individuals who have made significant technical contributions of lasting importance in the field of computing. Kahan's contributions have provided a considerable degree of fidelity to floating point arithmetic across many computers, enabling the portability of software. (A.M. Turing Award recipients (ACM))
A _____ precision floating-point number is represented with one LEGv8 doubleword. single double
double: A double precision floating-point number is represented with one LEGv8 doubleword, or 64-bits. The exponent is increased to 11-bits to enable representation of a larger range of values; the fraction is increased to 52-bits to enable greater precision.
Complete the outer loop of the optimized C version of DGEMM: for (int i = 0; i < n; _____ ) ++i i+=4
i+=4: Each iteration of the outer loop calculates and stores 4 elements of matrix C, thus the optimized C version increments i by 4 instead of by 1.
IEEE 754 is a _____ for floating-point representation and computation.
standard: The IEEE 754 standard specifies floating-point representations, computation, rounding, exceptions, among others. The standard establishes specifications and procedures to improve the ease of porting floating-point programs and the quality of computer arithmetic across different computing platforms.
Each SIMD version is denoted by a _____. prefix suffix
suffix: The suffixes for wide, long, and narrow are W, L, and N, respectively.
If the Remainder is negative, then the original Remainder value is restored by adding _____ to the Remainder. Control the Divisor
the Divisor The Divisor was subtracted from Remainder in a previous step, which resulted in a negative value. Adding the Divisor back to Remainder will restore Remainder's original value.
The initial 8-bit value of Product is _____.
00000000 The last column of the above table shows the value of Product after each step within an interation. In iteration 0, Product is initialized to 0000 0000.
The divide operation results in a 4-bit Quotient of 0011 and an 8-bit Remainder of _____.
00000001: The last row of table shows the final value of the Quotient and Remainder registers. The result can be checked as follows:Dividend = Quotient × Divisor + Remainder
At the end of iteration 1, the 8-bit value of Product is _____.
00000010: Multiper0 = 1 so iteration 1 first executes step 1a. The multiplicand and product are added, and then placed into the Product register.0000 0010 + 0000 0000 = 0000 0010
The initial 8-bit value of Remainder is _____.
00000111: The Remainder register is initialized with the dividend, or 0000 0111.
At the end of iteration 4, the 4-bit value of Quotient is _____.
0001: Remainder ≥ 0, so Step 2a shifts the Quotient left and sets the rightmost bit to 1. The table shows the updated value of Quotient in blue.
The initial 8-bit value of Divisor is _____
00100000: The divisor is placed in the left half of the Divisor register. The first row shows that Divisor is initialized to 0010 0000.
If the Remainder is negative, a _____ is shifted into the least significant bit of the Quotient register. 0 1
0: A negative remainder indicates that the divisor did not go into the dividend, so a 0 is placed in the quotient.
1.000 × 2^3 + 0.011 × 2^5 = ? 1.010 × 2^4 0.101 × 2^5 10.001 × 2^5
1.010*2^4 1.000 × 2^3 becomes 0.010 × 2^5 0.010 × 2^5 + 0.011 × 2^5 = 0.101 × 2^5Normalizing the result yields 1.010 × 2^4
1.001 × 2^-4 + 1.000 × 2^-6 = ? 10.001 × 2^-4 1.011 × 2^-4
1.011 * 2^-4 1.000 × 2^-6 becomes 0.010 × 2^-4 to match the larger exponent. Then, the significands are added, and the base and exponents are copied to match the operands.
1.010 × 2^-3 + 0.011 × 2^-3 = ? 1.101 1.101 x 2^-6 1.101 x 2^-3
1.101 * 2^-3 The significands are added, then the base and exponent are copied to match the operands.
ARMv8 added _____-bit registers to support subword parallelism. 64 128
128: ARMv8 added 32 new 128-bit registers, or 512 bytes of new registers.
The IEEE formed a committee in _____ to begin drafting a floating-point standard. 1976 1977 1985
1977: The diversity in floating-point arithmetic implementations for mainframes and microprocessors was disquieting. Robert G. Stewart, with the support of industry, formed a subcommittee to begin standardizing floating-point arithmetic to improve accuracy and portability.
The value placed in the quotient is _____. ? 1011/1100111
1: The divisor 1011 is smaller than the dividend 1100, so a 1 is placed in the quotient.
ARMv8 has _____ assembly-language SIMD instructions. 245 63
245: ARMv8 has 245 assembly-language arithmetic instructions.
Each iteration of the multiplication algorithm consists of _____ basic steps. 3 7 64
3: The first step checks Multiplier0 to determine whether the multiplicand is added to the Product register. The second step shifts the Multiplicand register left 1 bit. The third step shifts the Multiplier register right 1 bit.
Each iteration of the division algorithm consists of _____ basic steps. 3 8 65
3: The first step determines if the divisor is smaller than the dividend. The second steps sets the rightmost bit of Quotient to 1 or 0 and restores the Remainder register if Remainder was negative. The third step shifts the Divisor register right 1 bit.
The resulting x86 assembly language output contains _____ floating-point instructions. 5 12
5: The 5 floating-point instructions start with a v, and uses the sd variation of the instruction. sd stands for Scalar Double precision floating point, and indicates that one 64-bit operand is stored in a 128-bit xmm register.
Number of assembly-language integer arithmetic and floating-point instructions in ARMv8
63: The 63 instructions are divided into 10 categories in the image above.
The Multiplier register is _____-bits wide. 64 128
64: Step 1 of the multiplication algorithm checks the value of Multiper0 to determine the next step. Once the Multiplier0 is checked, that bit is no longer needed and can be shifted out of the register without impacting the multiplication.
Sticky bit
A bit used in rounding in addition to guard and round that is set whenever there are nonzero bits to the right of the round bit.
Floating-point hardware became standardized by the late 1950s.
False: Floating-point hardware was included in many computers by the late 1950s. However, each implementation was different, so floating-point operations behaved differently on each computer.
Numerical software in the late 1950s was portable because the software consists entirely of computer-independent mathematical formulas.
False: Floating-point operations produced different results depending on the computer that executed the code. The burden fell upon software developers to implement clever tricks to deliver results correct in all but the last several bits.
The ALU adds the 128-bit Product and 64-bit Multiplicand, and then stores the result into the Product register.
False: The ALU adds the upper 64-bits of the Product with the 64-bit Multiplicand. The result is then stored in the upper 64-bits of the Product register. The Product register is then shifted right 1 bit before the next step.
Faster divison, like faster multiplication, can be achieved by increasing the number of ALUs.
False: The Divisor is subtracted from the Remainder in each step of the algorithm. The sign of the difference is not known beforehand, and is needed to determine the next step of the algorithm. Instead, faster division is achieved through prediction techniques that attempt to produce more than one bit of the quotient per step of the algorithm.
The small ALU is used to add the significands.
False: The Small ALU determines which operand has the larger exponent and by how much. The exponent difference specifies how to pass the operands to the Big ALU, as well as how many positions the smaller operand is shifted to the right.
The YMM extension was introduced to support floating-point numbers represented in a 256-bit representation.
False: The YMM registers pack additional floating point values into a single register, either eight 32-bit or four 64-bit floating-point values. The extension supports SIMD, or single instruction, multiple data, which strives to perform a larger number of operations in parallel.
Shifting the bits of a signed integer right by n bits divides the integer by 2n if the shifter extends the sign bit instead of shifting in 0s.
False: The above example demonstrates shifting -5ten two positions to the right produces -2ten rather than the expected value of -1ten. In two's complement representation, the sign bit also contributes to a value's magnitude, so the same operator strength reduction techniques do not apply.
The following statements produce the same results on the CDC 6600 computer: if (x == 0.0) y = 17.0 else y = z/x if (1.0 * x == 0.0) y = 17.0 else y = z/x True False
False: The primary goal of CDC was to produce a fast computer, so accuracy was sacrificed for speed. The above statements produce different results due to optimizations to the underlying implementation. The later statement is a trick used by programmers to deal with the variations in how the adder, multiplier, and divider hardware handled tiny numbers.
The improved version of the division hardware does not require a quotient to perform a divide operation.
False: The quotient is still needed. The refined multiplication hardware reduces the total hardware needed to perform the divide operation by combining the Quotient register with the right half of the Remainder.
ARMv8 has little support for subword parallelism beyond data transfer instructions.
False: The rising popularity of multimedia applications has led to support for data transfer, arithmetic, and logical/compare instructions. More than 500 instructions have been added to support subword parallelism.
The unsigned multiply high (UMULH) instruction can return the upper 64 bits of a 128-bit product.
False: The signed multiply high (SMULH) and unsigned multiply high (UMULH) instructions can both return the upper 64 bits of a 128-bit product depending on the types of multiplier and multiplicand used by the programmer.
The divide operations, SDIV and UDIV, detect overflow and division by 0.
False: The software must check the divisor to detect division by 0, as well as check the division operation for overflow.
The speedup comes from reducing the size of the registers and ALU.
False: The speedup comes from shifting the operands and the quotient simultaneously with the subtraction.
The release of the IEEE 754 standard made the implementation of floating point hardware simple for manufacturers.
False: To implement IEEE 754 correctly demands extraordinarily diligent attention to detail; to make it run fast demands extraordinarily competent ingenuity of design.
Round
Method to make the intermediate floating-point result fit the floating-point format; the goal is typically to find the nearest number that can be represented in the format. It is also the name of the second of two extra bits kept on the right during intermediate floating-point calculations, which improves rounding accuracy.
Indicate if the following numbers are in scientific notation. 100.10two × 2^-6
No: The notation contains three digits to the left of the binary point. The binary number can be rewritten in scientific notation by moving the binary point two positions to the left and adding two to the exponent: 1.0010two × 2-4
Indicate if the following numbers are in scientific notation. 25.11ten × 10^-8
No: The notation contains two digits to the left of the decimal point. The number can be rewritten in scientific notation by moving the decimal point one position to the left and incrementing the exponent: 2.511 × 10-7
Which LEGv8 store instructions should be generated for byte and halfword arithmetic operations? SBU, SHU STURB, STURH
STURB, STURH The load and arithmetic operations preserve the byte and halfword values, so the values are simply stored from registers to memory.
Subtract immediate
SUBI: The subtract immediate instruction subtracts a constant value from another value in a register.
Instructions can be represented as bits.
True: The binary language that a machine understands is called the machine language. An assembler converts a symbolic version of an instruction into the binary version. Ex: ADD XZR, X19, X20 becomes 00000010001100100100000000100000
The improved version of the division hardware halves the width of the adder and divisor.
True: The improved hardware reduces unused portions of registers and adders by reducing the the Divisor register and ALU to 64-bits. The division algorithm is updated to subtract the Divisor from the left half of the Remainder and place the result in the left half of the Remainder.
The multiplication hardware supports signed multiplication.
True: The multiplier and multiplicand are first converted to positive numbers, and then multiplied using the same multiplication hardware. The product is negated if the multiplier and multiplicand signs disagree.
The "Increment or decrement" and "Shift left or right" hardware normalize the sum.
True: The normalization step shifts the sum left or right until a single non-zero value appears to the left of the binary point. Shifting the sum right increments the exponent, whereas shifting left decrements the exponent.
The division hardware supports signed division.
True: The operands are first converted to positive numbers, and then divided using the same division hardware. The quotient is negated if the operands' signs disagree. The sign of the non-zero remainder is set to match the dividend.
An inaccuracy in floating-point division cost Intel an estimated $500 million.
True: The processor flaw led to a large amount of negative publicity, and prompting Intel to offer consumers a replacement of the processor for an updated version, in which this floating-point divide flaw was corrected.
A parallel program containing floating-point arithmetic operations executing on 10 processors may produce a different result than the same program executing on 1,000 processors.
True: The program's floating-point operations will be scheduled differently to execute on 1,000 processors, compared to 10 processors, to take advantage of the availability of additional resources. Because floating-point addition is not associative, changing the order of floating-point operations may yield different results.
The refined multiplication hardware halves the width of the Multiplicand register from 128-bits to 64-bits.
True: The refined multiplication hardware shifts the Product register right 1 bit in each step instead of shifting the Multiplicand register left 1 bit in each step. Because the Multiplicand is no longer shifted left, the register width can be reduced.
MULSS, MULPS, MULSD, and MULPD are valid x86 instructions.
True: The table denotes variations of the MUL instruction with curly brackets {}. The variations indicate the size and number of floating-point values within a 128-bit SSE2 register.
_____ is a C intrinsic. _mm256_load_pd() dgemm() vbroadcastsd
_mm256_load_pd(): The optimized C version utilizes the __m256d data type, which tells the compiler the variable will hold 4 double-precision floating-point values. The _mm256_load_pd() intrinsic takes advantage of the packed floating-point values and tells the compiler when loading, to load 4 double-precision floating-point numbers in parallel.
The field of _____ examines how to solve mathematical problems using imprecision and limited representation of data.
numerical analysis: Parallelized code containing floating-point operations may yield varied results compared to sequential code. Moreover, the same floating-point code may yield varied results given a different number of processors available to execute the code. Numerical analysis is an important field that has developed methods to determine if results produced by the code are credible.
Subword parallelism takes advantage of byte- and halfword-sized data by _____. -turning off the unused bits of a 128-bit adder -splitting the add operation to execute across multiple cycles -partitioning the adder to perform multiple operations in parallel
partitioning the adder to perform multiple operations in parallel: The adder can be configured to add sixteen 8-bit operands, eight 16-bit operands, four 32-bit operands, or two 64-bit operands simultaneously in one instruction.
Compiling the optimized C code replaces most of the x86 floating-point instructions from a sd variation to a _____ variation of the instruction. pd ymm __m256d
pd: The pd, or parallel double, variation utilizes a wider 256-bit YMM register that enables four 64-bit floating-point operations in parallel.
A situation where an operation yields a number that is too small to represent with the fixed number of bits available is called _____.
underflow: Similarly, overflow occurs when an operation yields a number that is too big to represent.
A graduate student in aeronautical engineering used the IBM 7090 to simulate a new _____. After many simulations, the student was disheartened because the simulation predicted an abrupt onset of stall. wing design logarithm program
wide design: Even after repeating the simulation in double precision, the results indicated the new wing design would fail. Kahan's replacement for IBM's logarithm program (ALOG) helped to expose the inaccuracies in the simulation, and the student was able to successfully build and deploy the new wing design.