C952 Computer Architecture Chapter 4

Ace your homework & exams now with Quizwiz!

Arithmetic Logic Unit (ALU)

Hardware that performs addition, subtraction, and usually logical operations such as AND and OR.

The leading 1 indicates that the bit pattern below represents a negative number. 1010 ... 0010 1100 True False Not enough information to determine what the bit pattern represents.

Not enough: The operation performed on the bit pattern must be known to determine the meaning of the bit pattern.

A calculation that leads to a number being too large to represent is called _____. overflow underflow a fraction

overflow: Floating-point numbers are represented with a fixed number of bits, thus can only represent a fixed range of numbers. Floating-point arithmetic can lead to numbers that are too large to represent given the number of bits available.

Goal: X6 × X7 (signed multiply high)_____ X5, X6, X7

SMULH: The SMULH instruction computes X6 × X7, and puts the upper 64-bits of 128-bit signed product in register X5.

The add instruction, ADD, includes support for 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit numbers. Operands can be integers, but not floating point numbers.

True: Check marks in COD Figure 3.18 indicate that ADD supports 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit integer operands, but not floating-point numbers.

A floating-point value represented in IEEE 754 is typically an approximation.

True: An infinite variety of numbers cannot be represented by a fixed number of digits. The IEEE floating-point representation strives to get close to the actual number, while balancing the precision and range of a number that can be represented.

ulp is a measure of accuracy in floating point numbers.

True: ulp refers to units in the last place, and is a measure of the number of bits in error in the least significant bits of the significand.

Each step of the multiplication algorithm shifts the Multiplier register 1 bit to the _____.

right: The least significant bit of the multiplier (Multiplier0) determines whether the multiplicand is added to the Product register. Shifting the Multiplier 1 bit to the right in each step ensures the n-th bit is placed in Multiplier0.

IEEE 754 standard floating-point representation

(-1)^5 * (1+Fraction) * 2^(Exponent-bias)

0...0011 + 0...0010 = ?

0...0101: In other words, 3 + 2 is 5.

Iteration 1, step 2b restores the Remainder's value to _____.

00000111: The negative remainder indicates that the divisor is larger than the dividend, so the algorithm restores the original remainder value by adding the divisor back to the remainder.

At the end of iteration 2, the 8-bit value of Multiplicand is _____.

00001000 Iteration 2, step 2 shifts the Multiplicand left 1 bit. The blue text shows the Multiplicand's value after the shift.

The value placed in the quotient is _____. ? 1011/1100111

0: The divisor 1011 is larger than the dividend 110, so a 0 is placed in the quotient.

The Multiplicand register is _____-bits wide. 64 128

128: Over 64 steps, the 64-bit multiplicand is shifted 64 bits to the left to mimic the paper-and-pencil method of shifting the intermediate product one digit to the left of the earlier intermediate product. A 128-bit register ensures the original 64-bits of the multiplicand are retained so that the intermediate product can be added with the Product register in each step.

The Product register is _____-bits wide. 64 128 256

128: The multiplication of two 64-bit values yields a 64 + 64-bit, or 128-bit, product.

Number of ARMv8 arithmetic core instructions

15: The arithmetic core instructions are highlighted in bold in the image above.

Single precision

A floating-point value represented in a 32-bit word.

scientific notation

A notation that renders numbers with a single digit to the left of the decimal point.

Dividend

A number being divided.

Normalized

A number in floating-point notation that has no leading 0s.

Underflow (floating-point)

A situation in which a negative exponent becomes too large to fit in the exponent field.

Overflow (Floating-point)

A situation in which a positive exponent becomes too large to fit in the exponent field.

Exception

Also called interrupt. An unscheduled event that disrupts program execution; used to detect overflow.

Which LEGv8 arithmetic instructions should be generated for byte and halfword arithmetic operations? ADD, SUB, MUL, DIV ADD, SUB, MUL, DIV, using AND to mask result to 8 or 16 bits after each operation

ADD, SUB, MUL, DIV The load instructions create a 64-bit value, so these arithmetic operations can be used as normal.

Interrupt

An exception that comes from outside of the processer. (Some architectures use the term interrupt for all exceptions.

Floating-point hardware was included in the first computers.

False: John von Neumann refused to include floating-point hardware in the computer he built at Princeton. The loss of memory capacity and the increased complexity were arguments against the inclusion of floating-point hardware.

Floating point

Computer arithmetic that represents numbers in which the binary point is not fixed.

Overflow does not occur in unsigned integers.

False: Overflow can occur in unsigned integers, but since unsigned integers are commonly used for memory addresses, such overflows are often ignored.

Conditional compare that only compares if the initial condition is true

FCCMP: FCCMP (quiet compare) doesn't cause an exception whenever one of the operands is NaN. FCCMPE (signaling compare) does cause an exception when one of the operands is NaN.

Floating-point multiply instruction that produces a negative product

FNMUL: FNMUL is known as the floating-point scalar multiply-negate instruction.

The multiply (MUL) instruction ignores overflow, while the signed multiply high (SMULH) and unsigned multiply high (UMULH) instructions detect overflow. True False

False: All multiply instructions ignore overflow, so the software must detect overflow.

IEEE 754, released in 1985, is the most recent floating point standard.

False: Standards must be revisited periodically for updating. IEEE Std 754-2008, released in 2008, is the most recent standard for floating point. The new standard includes additional types such as half precision and quad precision floating point representations.

Compilers for the C and Fortran programming languages need to identify overflow values and notify the current program.

False: The C language (and many other languages) ignore overflow and just continue executing. Fortran does need to notify the current program of overflow values.

Exponent

In the numerical representation system of floating-point arithmetic, the value that is placed in the exponent field.

Which LEGv8 load instructions should be generated for byte and halfword arithmetic operations? STUR, STURW LDURB, LDURH

LDURB, LDURH These load instructions sign extend the most significant bits, which preserves the variable's value.

Which of the following instruction classifications occur most frequently in the execution of the SPEC CPU2006 floating-point benchmarks? LEGv8 core Remaining ARMv8 Pseudo ARMv8

LEGv8 core: The LEGv8 core and arithmetic core instructions account for 97% of instructions executed by the SPEC CPU2006 floating-point benchmarks. The LEGv8 core and arithmetic core similarly dominate the SPEC CPU2006 integer benchmarks. Thus, the material concentrates on these core instructions.

Which SIMD instruction version means the width of the elements in the destination register is twice the width of the elements in all source registers? Wide Long

Long: Conversely, wide means the width of the elements in the destination register and the first source registers is twice the width of the elements in the second source register.

Move wide with zero

MOVZ: As noted in COD Section 2.10 (LEGv8 addressing for wide immediates ...), the move wide with zero instruction moves a 16-bit constant value into a register and zero extends the rest of the register's bits.

1 1 0 0 ... + 1 1 0 1 ... ------------- 1 0 0 1 ... -Overflow -No overflow

No Overflow: The numbers being added are negative, and the result is negative (as indicated by the 1 in the leftmost bit), as expected. No overflow has occurred.

0 0 0 1 ... + 1 0 0 1 ... ------------- 1 0 1 0 ... -Overflow -No overflow

No overflow: When positive and negative operands are added, overflow is impossible, because the result's magnitude will surely be less than the larger of the two operands. Ex: 200 + -999 = -799.

Indicate if the following number is in normalized scientific notation: 0.0514ten × 10^-6

No: A number in normalized scientific notation does not have leading zeros. The number can be rewritten in normalized scientific notation by moving the decimal point to the right until a non-zero digit appears to the left of the decimal point. The exponent is decremented for each position the decimal point moves: 5.14two × 10-8

0 1 1 0 ... + 0 1 1 1 ... ------------- 1 1 0 1 ... -Overflow -No overflow

Overflow: The numbers being added are positive, but the result is negative (as indicated by the 1 in the leftmost bit). Such a result is nonsense; overflow has occurred.

The Multiplier register is removed and placed inside of the _____ register. Product Multiplicand

Product: The multiplier is placed in the right half of the Product register. As the Product register is shifted to the right to allocate room for the accumulated sum of intermediate products, the least significant bit of the multiplier is no longer needed and can be shifted out of the register.

Each step of the division algorithm shifts the Divisor register 1 bit to the _____. right left

Right: The divisor is placed in the left half of the 128-bit Divisor register to align the divisor with the dividend. The Divisor register is shifted right 1 bit in each step of the algorithm.

Signed divide

SDIV: The signed divide instruction performs signed integer division.

Subword parallelism

The AVX version is 3.85 times as fast, which is very close to the factor of 4.0 increase that you might hope for from performing four times as many operations at a time by using subword parallelism

Using 4-bit numbers to save space, multiply 2ten × 3ten, or 0010two × 0011two.

The figure below shows the value of each register for each of the steps labeled according to COD Figure 3.4 (The first multiplication algorithm ...), with the final value of 0000 0110two or 6ten. Color is used to indicate the register values that change on that step, and the bit circled is the one examined to determine the operation of the next step.

Guard

The first of two extra bits kept on the right during intermediate calculations of floating-point numbers; used to improve rounding accuracy.

Divisor

The number by which another number is divided.

Units in the last place (ulp)

The number of bits in error in the least significant bits of the significand between the actual number and the number that can be represented.

Quotient

The primary result of a division; a number that when multiplied by the divisor and added to the remainder produces the dividend.

Remainder

The secondary result of a division; a number that when added to the product of the quotient and the divisor produces the dividend.

Fraction

The value, generally between 0 and 1, placed in the fraction field. The fraction is also called the mantissa.

Goal: X6 / X7 (unsigned division)_____ X5, X6, X7

UDIV: The UDIV instruction computes X6 / X7, and puts the quotient in register X5.

The prefixes U, S, and F refer to _____. data types operand sizes

data types The prefixes are for instructions that use unsigned (U), signed (S), and floating-point (F) data types.

Rewriting programs that contain _____ arithmetic operations to execute in parallel may affect the result. two's complement integer floating-point

floating-point: Floating-point addition is not associative. The parallel version of the program likely executes arithmetic operations in a different order than the sequential version, thus the result may vary between programs.

Floating-point addition is _____. associative not associative

not associative: Adding a much larger number to a smaller number results in an approximated sum due to limited precision. 1.5ten x 1038 + 1.0 = 1.5ten x 1038, where the 1.0 is lost due to the limited number of bits available to represent the sum.

At the end of iteration 2, the 4-bit value of Multiplier is _____.

0000 Iteration 2, step 3 shifts the Multiplier right 1 bit. The blue text shows the Multiplier's value after the shift. The circle indicates the bit examined in the next iteration.

The divide operation results in a 4-bit Quotient of 0011 and an 8-bit Remainder of _____.

00000001: The last row of table shows the final value of the Quotient and Remainder registers. The result can be checked as follows:Dividend = Quotient × Divisor + Remainder

At the end of iteration 1, the 8-bit value of Product is _____.

00000010: Multiper0 = 1 so iteration 1 first executes step 1a. The multiplicand and product are added, and then placed into the Product register.0000 0010 + 0000 0000 = 0000 0010

At the end of iteration 4, the 4-bit value of Quotient is _____.

0001: Remainder ≥ 0, so Step 2a shifts the Quotient left and sets the rightmost bit to 1. The table shows the updated value of Quotient in blue.

The initial 8-bit value of Divisor is _____

00100000: The divisor is placed in the left half of the Divisor register. The first row shows that Divisor is initialized to 0010 0000.

If the Remainder is negative, a _____ is shifted into the least significant bit of the Quotient register. 0 1

0: A negative remainder indicates that the divisor did not go into the dividend, so a 0 is placed in the quotient.

1.000 × 2^3 + 0.011 × 2^5 = ? 1.010 × 2^4 0.101 × 2^5 10.001 × 2^5

1.010*2^4 1.000 × 2^3 becomes 0.010 × 2^5 0.010 × 2^5 + 0.011 × 2^5 = 0.101 × 2^5Normalizing the result yields 1.010 × 2^4

1.001 × 2^-4 + 1.000 × 2^-6 = ? 10.001 × 2^-4 1.011 × 2^-4

1.011 * 2^-4 1.000 × 2^-6 becomes 0.010 × 2^-4 to match the larger exponent. Then, the significands are added, and the base and exponents are copied to match the operands.

1.010 × 2^-3 + 0.011 × 2^-3 = ? 1.101 1.101 x 2^-6 1.101 x 2^-3

1.101 * 2^-3 The significands are added, then the base and exponent are copied to match the operands.

Iteration 1, step 1 subtracts the Divisor from the Remainder and places the result in the Remainder. The Remainder's updated value is _____.

11100111: The result of the subtraction is negative, as indicated by a 1 in the leftmost bit.

ARMv8 added _____-bit registers to support subword parallelism. 64 128

128: ARMv8 added 32 new 128-bit registers, or 512 bytes of new registers.

Each iteration of the multiplication algorithm consists of _____ basic steps. 3 7 64

3: The first step checks Multiplier0 to determine whether the multiplicand is added to the Product register. The second step shifts the Multiplicand register left 1 bit. The third step shifts the Multiplier register right 1 bit.

Each iteration of the division algorithm consists of _____ basic steps. 3 8 65

3: The first step determines if the divisor is smaller than the dividend. The second steps sets the rightmost bit of Quotient to 1 or 0 and restores the Remainder register if Remainder was negative. The third step shifts the Divisor register right 1 bit.

The Multiplier register is _____-bits wide. 64 128

64: Step 1 of the multiplication algorithm checks the value of Multiper0 to determine the next step. Once the Multiplier0 is checked, that bit is no longer needed and can be shifted out of the register without impacting the multiplication.

Sticky bit

A bit used in rounding in addition to guard and round that is set whenever there are nonzero bits to the right of the round bit.

Fused multiply add

A floating-point instruction that performs both a multiply and an add, but rounds only once after the add.

Double precision

A floating-point value represented in a 64-bit doubleword.

The unsigned multiply high (UMULH) instruction can return the upper 64 bits of a 128-bit product.

False: The signed multiply high (SMULH) and unsigned multiply high (UMULH) instructions can both return the upper 64 bits of a 128-bit product depending on the types of multiplier and multiplicand used by the programmer.

The divide operations, SDIV and UDIV, detect overflow and division by 0.

False: The software must check the divisor to detect division by 0, as well as check the division operation for overflow.

The speedup comes from reducing the size of the registers and ALU.

False: The speedup comes from shifting the operands and the quotient simultaneously with the subtraction.

Round

Method to make the intermediate floating-point result fit the floating-point format; the goal is typically to find the nearest number that can be represented in the format. It is also the name of the second of two extra bits kept on the right during intermediate floating-point calculations, which improves rounding accuracy.

Indicate if the following numbers are in scientific notation. 100.10two × 2^-6

No: The notation contains three digits to the left of the binary point. The binary number can be rewritten in scientific notation by moving the binary point two positions to the left and adding two to the exponent: 1.0010two × 2-4

Indicate if the following numbers are in scientific notation. 25.11ten × 10^-8

No: The notation contains two digits to the left of the decimal point. The number can be rewritten in scientific notation by moving the decimal point one position to the left and incrementing the exponent: 2.511 × 10-7

Which LEGv8 store instructions should be generated for byte and halfword arithmetic operations? SBU, SHU STURB, STURH

STURB, STURH The load and arithmetic operations preserve the byte and halfword values, so the values are simply stored from registers to memory.

Multiple floating-point operands can be packed into a 128-bit SSE2 register.

True: Each register can hold two single precision or four double precision floating point values. The packed format enables a single data transfer instruction to move multiple operands, and a single arithmetic instruction to perform multiple operations.

Intermediate calculations of floating-point numbers append two extra bits to improve rounding accuracy.

True: IEEE 754 uses two extra bits on the right during intermediate additions. The first extra bit is called the guard digit, and the rightmost extra bit is called the round digit.

The ARM ADD, ADDI, and SUB instructions may result in overflow.

True: Overflow results when a sum or difference cannot be represented in 64-bits, thus addition and subtraction instructions can result in overflow.

Rounding may require the result to be normalized again.

True: Rounding the significand to a fixed number of digits may yield a non-normalized result. Ex: 9.999 × 104 rounded to three significands adds a 1 to a string of 9s, and results in 10.00 × 104. The result is normalized and rounded again.

Instructions can be represented as bits.

True: The binary language that a machine understands is called the machine language. An assembler converts a symbolic version of an instruction into the binary version. Ex: ADD XZR, X19, X20 becomes 00000010001100100100000000100000

The improved version of the division hardware halves the width of the adder and divisor.

True: The improved hardware reduces unused portions of registers and adders by reducing the the Divisor register and ALU to 64-bits. The division algorithm is updated to subtract the Divisor from the left half of the Remainder and place the result in the left half of the Remainder.

The multiplication hardware supports signed multiplication.

True: The multiplier and multiplicand are first converted to positive numbers, and then multiplied using the same multiplication hardware. The product is negated if the multiplier and multiplicand signs disagree.

The "Increment or decrement" and "Shift left or right" hardware normalize the sum.

True: The normalization step shifts the sum left or right until a single non-zero value appears to the left of the binary point. Shifting the sum right increments the exponent, whereas shifting left decrements the exponent.

The division hardware supports signed division.

True: The operands are first converted to positive numbers, and then divided using the same division hardware. The quotient is negated if the operands' signs disagree. The sign of the non-zero remainder is set to match the dividend.

The refined multiplication hardware halves the width of the Multiplicand register from 128-bits to 64-bits.

True: The refined multiplication hardware shifts the Product register right 1 bit in each step instead of shifting the Multiplicand register left 1 bit in each step. Because the Multiplicand is no longer shifted left, the register width can be reduced.

MULSS, MULPS, MULSD, and MULPD are valid x86 instructions.

True: The table denotes variations of the MUL instruction with curly brackets {}. The variations indicate the size and number of floating-point values within a 128-bit SSE2 register.

Indicate if the following numbers are in scientific notation. 1.1101two × 2^3

Yes: The notation contains a single digit to left of the binary point. The standard form for representing binary numbers in scientific notation is 1.xxxxxxxxxtwo × 2yyyy, where the base is 2 and the "." is referred to as a binary point.

Indicate if the following numbers are in scientific notation. 9.8ten × 10^12

Yes: The notation contains a single digit to the left of the decimal point. Scientific notation provides a method to write numbers that are too large to be conveniently written in decimal form.

Indicate if the following numbers are in scientific notation. 3.56ten × 10^-6

Yes: The notation contains a single digit to the left of the decimal point. Scientific notation provides a method to write numbers that are too small to be conveniently written in decimal form.

A _____ precision floating-point number is represented with one LEGv8 doubleword. single double

double: A double precision floating-point number is represented with one LEGv8 doubleword, or 64-bits. The exponent is increased to 11-bits to enable representation of a larger range of values; the fraction is increased to 52-bits to enable greater precision.

Increasing the size of the _____ used to represent a floating-point number impacts the number's precision. fraction exponent

fraction: Floating-point numbers are represented using a fixed number of bits, so compromise is needed between the size of the fraction and the size of the exponent.

Each step of the multiplication algorithm shifts the Multiplicand register 1 bit to the _____. right left

left: The left shift mimics the paper-and-pencil method of shifting the intermediate product one digit to the left of the earlier intermediate product. The intermediate product in the Multiplicand register is then added to the sum being accumulated in the Product register.

Subword parallelism takes advantage of byte- and halfword-sized data by _____. -turning off the unused bits of a 128-bit adder -splitting the add operation to execute across multiple cycles -partitioning the adder to perform multiple operations in parallel

partitioning the adder to perform multiple operations in parallel: The adder can be configured to add sixteen 8-bit operands, eight 16-bit operands, four 32-bit operands, or two 64-bit operands simultaneously in one instruction.

The dgemm() function operates on single dimensional arrays A[n], B[n], and C[n].

True: DGEMM typically performs the matrix multiply of C = C + A * B where A, B, and C are square matrices. The dgemm() function instead represents the square matrices as single dimensional arrays to improve the performance.

Each iteration of the resulting x86 assembly language calculates C = C + A * B for one element of C.

True: Each xmm register contains one operand. Line 6 multiplies one element of A with one element of B. Line 9 adds the product of A * B to one element of C, then line 12 stores the sum back into the C element.

Reduced floating-point accuracy and the high cost of converting code from the IBM 7094 to the S/360 led to the formation of a subcommittee of IBM mainframe users to propose improvements to the S/360 floating point.

True: IBM updated the System/360 floating point based on many of the subcommittee's recommendations, then retrofitted existing System/360s in the field at considerable expense.

Integers can be represented as bits.

True: Information is kept in computer hardware as a series of high and low electronic signals, so a base 2 representation is a natural way to represent integers. Ex: 12ten can be represented as 1100two.

Shifting the bits of an unsigned integer right by n bits divides the integer by 2n.

True: Shifting is typically faster than dividing. Replacing a divide operation with a shift operation is a type of optimization known as operator strength reduction.

In an effort to achieve similar floating-point accuracy, programmers converting code from the IBM 7094 to the S/360 replaced single precision declarations with double precision declarations.

True: The IBM S/360's larger memory, narrower single precision words, and cruder arithmetic produced inaccurate results when compared to the IBM 7094. The fastest method to recover accuracy was to move from a single precision to a double precision representation.

Kahan was honored with the Turing Award in 1989 for the benefits conferred upon the computing industry through standardizing floating point.

True: The Turing Award is a prestigious award given to individuals who have made significant technical contributions of lasting importance in the field of computing. Kahan's contributions have provided a considerable degree of fidelity to floating point arithmetic across many computers, enabling the portability of software. (A.M. Turing Award recipients (ACM))

Complete the outer loop of the optimized C version of DGEMM: for (int i = 0; i < n; _____ ) ++i i+=4

i+=4: Each iteration of the outer loop calculates and stores 4 elements of matrix C, thus the optimized C version increments i by 4 instead of by 1.

IEEE 754 is a _____ for floating-point representation and computation.

standard: The IEEE 754 standard specifies floating-point representations, computation, rounding, exceptions, among others. The standard establishes specifications and procedures to improve the ease of porting floating-point programs and the quality of computer arithmetic across different computing platforms.

Each SIMD version is denoted by a _____. prefix suffix

suffix: The suffixes for wide, long, and narrow are W, L, and N, respectively.

If the Remainder is negative, then the original Remainder value is restored by adding _____ to the Remainder. Control the Divisor

the Divisor The Divisor was subtracted from Remainder in a previous step, which resulted in a negative value. Adding the Divisor back to Remainder will restore Remainder's original value.

The initial 8-bit value of Product is _____.

00000000 The last column of the above table shows the value of Product after each step within an interation. In iteration 0, Product is initialized to 0000 0000.

The initial 8-bit value of Remainder is _____.

00000111: The Remainder register is initialized with the dividend, or 0000 0111.

The IEEE formed a committee in _____ to begin drafting a floating-point standard. 1976 1977 1985

1977: The diversity in floating-point arithmetic implementations for mainframes and microprocessors was disquieting. Robert G. Stewart, with the support of industry, formed a subcommittee to begin standardizing floating-point arithmetic to improve accuracy and portability.

The value placed in the quotient is _____. ? 1011/1100111

1: The divisor 1011 is smaller than the dividend 1100, so a 1 is placed in the quotient.

ARMv8 has _____ assembly-language SIMD instructions. 245 63

245: ARMv8 has 245 assembly-language arithmetic instructions.

The resulting x86 assembly language output contains _____ floating-point instructions. 5 12

5: The 5 floating-point instructions start with a v, and uses the sd variation of the instruction. sd stands for Scalar Double precision floating point, and indicates that one 64-bit operand is stored in a 128-bit xmm register.

Number of assembly-language integer arithmetic and floating-point instructions in ARMv8

63: The 63 instructions are divided into 10 categories in the image above.

Floating-point hardware became standardized by the late 1950s.

False: Floating-point hardware was included in many computers by the late 1950s. However, each implementation was different, so floating-point operations behaved differently on each computer.

Numerical software in the late 1950s was portable because the software consists entirely of computer-independent mathematical formulas.

False: Floating-point operations produced different results depending on the computer that executed the code. The burden fell upon software developers to implement clever tricks to deliver results correct in all but the last several bits.

The ALU adds the 128-bit Product and 64-bit Multiplicand, and then stores the result into the Product register.

False: The ALU adds the upper 64-bits of the Product with the 64-bit Multiplicand. The result is then stored in the upper 64-bits of the Product register. The Product register is then shifted right 1 bit before the next step.

Faster divison, like faster multiplication, can be achieved by increasing the number of ALUs.

False: The Divisor is subtracted from the Remainder in each step of the algorithm. The sign of the difference is not known beforehand, and is needed to determine the next step of the algorithm. Instead, faster division is achieved through prediction techniques that attempt to produce more than one bit of the quotient per step of the algorithm.

The small ALU is used to add the significands.

False: The Small ALU determines which operand has the larger exponent and by how much. The exponent difference specifies how to pass the operands to the Big ALU, as well as how many positions the smaller operand is shifted to the right.

The YMM extension was introduced to support floating-point numbers represented in a 256-bit representation.

False: The YMM registers pack additional floating point values into a single register, either eight 32-bit or four 64-bit floating-point values. The extension supports SIMD, or single instruction, multiple data, which strives to perform a larger number of operations in parallel.

Shifting the bits of a signed integer right by n bits divides the integer by 2n if the shifter extends the sign bit instead of shifting in 0s.

False: The above example demonstrates shifting -5ten two positions to the right produces -2ten rather than the expected value of -1ten. In two's complement representation, the sign bit also contributes to a value's magnitude, so the same operator strength reduction techniques do not apply.

The following statements produce the same results on the CDC 6600 computer: if (x == 0.0) y = 17.0 else y = z/x if (1.0 * x == 0.0) y = 17.0 else y = z/x True False

False: The primary goal of CDC was to produce a fast computer, so accuracy was sacrificed for speed. The above statements produce different results due to optimizations to the underlying implementation. The later statement is a trick used by programmers to deal with the variations in how the adder, multiplier, and divider hardware handled tiny numbers.

The improved version of the division hardware does not require a quotient to perform a divide operation.

False: The quotient is still needed. The refined multiplication hardware reduces the total hardware needed to perform the divide operation by combining the Quotient register with the right half of the Remainder.

ARMv8 has little support for subword parallelism beyond data transfer instructions.

False: The rising popularity of multimedia applications has led to support for data transfer, arithmetic, and logical/compare instructions. More than 500 instructions have been added to support subword parallelism.

The release of the IEEE 754 standard made the implementation of floating point hardware simple for manufacturers.

False: To implement IEEE 754 correctly demands extraordinarily diligent attention to detail; to make it run fast demands extraordinarily competent ingenuity of design.

Subtract immediate

SUBI: The subtract immediate instruction subtracts a constant value from another value in a register.

An inaccuracy in floating-point division cost Intel an estimated $500 million.

True: The processor flaw led to a large amount of negative publicity, and prompting Intel to offer consumers a replacement of the processor for an updated version, in which this floating-point divide flaw was corrected.

A parallel program containing floating-point arithmetic operations executing on 10 processors may produce a different result than the same program executing on 1,000 processors.

True: The program's floating-point operations will be scheduled differently to execute on 1,000 processors, compared to 10 processors, to take advantage of the availability of additional resources. Because floating-point addition is not associative, changing the order of floating-point operations may yield different results.

_____ is a C intrinsic. _mm256_load_pd() dgemm() vbroadcastsd

_mm256_load_pd(): The optimized C version utilizes the __m256d data type, which tells the compiler the variable will hold 4 double-precision floating-point values. The _mm256_load_pd() intrinsic takes advantage of the packed floating-point values and tells the compiler when loading, to load 4 double-precision floating-point numbers in parallel.

The field of _____ examines how to solve mathematical problems using imprecision and limited representation of data.

numerical analysis: Parallelized code containing floating-point operations may yield varied results compared to sequential code. Moreover, the same floating-point code may yield varied results given a different number of processors available to execute the code. Numerical analysis is an important field that has developed methods to determine if results produced by the code are credible.

Compiling the optimized C code replaces most of the x86 floating-point instructions from a sd variation to a _____ variation of the instruction. pd ymm __m256d

pd: The pd, or parallel double, variation utilizes a wider 256-bit YMM register that enables four 64-bit floating-point operations in parallel.

A situation where an operation yields a number that is too small to represent with the fixed number of bits available is called _____.

underflow: Similarly, overflow occurs when an operation yields a number that is too big to represent.

A graduate student in aeronautical engineering used the IBM 7090 to simulate a new _____. After many simulations, the student was disheartened because the simulation predicted an abrupt onset of stall. wing design logarithm program

wide design: Even after repeating the simulation in double precision, the results indicated the new wing design would fail. Kahan's replacement for IBM's logarithm program (ALOG) helped to expose the inaccuracies in the simulation, and the student was able to successfully build and deploy the new wing design.

C952 Computer Architecture Chapter 4

Related study sets

Paraphrasing, Quoting, and Summarizing

Business Comm. Midterm review

Google Certification - Level 2 (Units 8-10)

Intro to Culinary Arts 22.1

bogar management test 2

Computer Science Final

Chapter 4: Activity-Based Costing SmartBook

COURSEPOINT Chapter 1

Graph of Intracellular DNA mass vs Time

ECON Yang 2023 Final

06-09 list creation from iterator

IBN Chapter 6

econ chpt 9 Saving, investment, and the financial system

BAMK 260 CHP 1

Orpheus and Eurydice commonlit answers

BUSI - Ch. 12 (Distributing and Promoting Products and Services)

Renal Exam 1 Image Questions

mobility prep u

nutrition midterm

Geomery 1-2 1-7 Test VOCAB