CSCI 3301 Computer Organization - Hoque S16 - Test 1

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

2. What did the transistor replace?

The transistor replaced the vacuum tube in 1951.

Levels of Program Code: Tell me about -High Level Language -Assembly Language -Hardware Representation

*-High Level Language* .Level of abstraction closer to problem domain .Provides for productivity and portability *-Assembly Language* .Textual representation of instruction *-Hardware Representation* .Binary digits (bits) .Encoded inst. an data

1.1: Aside from smart phones, list 4 other computer types

*Personal Computer*: Good performance to single users at low cost. Use 3rd Party Software. *Personal mobile device (PMD), (tablets included)*: PMD's are battery operated & wireless. No keyboard/mouse, usually touchscreen or speech input. *Server*: Large problem runner, accessed via network. *Warehouse scale computer*: Thousands of processors in a cluster. *Supercomputer:* Hundreds and thousands of CPU & GPU & terabytes of memory. *Embedded computer*: One application, or set of related applications on single system.

1.6: Consider this, there's two implementations of the same instruction set architecture. The instructions can be divided into four classes according to CPI (A,B,C,D). P1 clock rate = 2.5GHz CPIs = 1,2,3,3 P2 clock rate = 3GHz CPIs = 2,2,2,2 Given a program with dynamic instruction count of 1.0x10⁶ instructions divided into classes as follows: 10% class A 20% class B 50% class C 20% class D Which implementation is faster? a) What is the global CPI for each implementation? b) Find the clock cycles required in both cases.

*a.:* Class A: 10⁵ instr. (since 10% of 10⁶ = 10⁵) Class B: 2x10⁵ instr. (since 20% of 10⁶ = 2x10⁵) Class C: 5x10⁵ instr. Class D: 2x10⁵ instr. Time = # instr x CPI/clock rate Total time P1 = (10⁵ + 2x10⁵ x 2 + 5x10⁵ x 3 + 2x10⁵ x 3) / (2.5x10⁹) = 10.4x10⁻⁴s Total time P2 = (10⁵ x 2 + 2x10⁵ x 2 + 2 + 5x10⁵ x 2 + 2x10⁵ x 2) / (3x10⁹) = 6.66x10⁻⁴ s CPI(P1)= 10.4x10⁻⁴ x 2.5x10⁹ / 10⁶ = 2.6 [Since Time=CyclesxCycle-time] CPI(P2)=6.66x10⁻⁴ x 3x10⁹ / 10⁶ = 2.0 [therefore Cycles=TimexClock-rate] *b.:* CPI x # instr = Cycles, therefore, clock cycles(p1) = 10⁵ x 1 + 2x10⁵ x 2 + 5x10⁵ x 3 + 2x10⁵ x 3 = 26x10⁵ clock cycles(p2) = 10⁵ x 2 + 2x10⁵ x 2 + 5x10⁵ x 2 + 2x10⁵ x 2 = 20x10⁵

Relative Performance -Define Performance = 1/_____ Time -"X is n times faster than __" -Ex: time taken to run a program: --10s on A, 15s on B --_____ Time_B / _____ Time_A = 15s / 10s = 1.5 --So A is 1.5 times faster than B

-Define Performance = 1/*Execution *Time -"X is n times faster than *Y*" -Ex: time taken to run a program: --10s on A, 15s on B --*Execution* Time_B / *Execution* Time_A = 15s / 10s = 1.5 --So A is 1.5 times faster than B

Touchscreens: Tell me about different types of touchscreens

-PostPC device -Supersedes keyboard and moust *Resistive* -Older crappy smartphones *Capacitive* -Most tablets, smartphones -Capacitive allows multi touch

1.8.1: The Pentium 4 Prescott processor, released in 2004, had a clock rate of 3.6 GHz & voltage of 1.25V. Assume that, on avg, it consumes 10W of static power and 90W of dynamic power. The Core i5 Ivy Bridge, released in 2012, has a clock rate of 3.4GHz and voltage 0.9V. Assume that, on average, it consumed 30W of static power and 40W of dynamic power. 1.8.1: For each processor, find the avg. capacitive loads

1.8.1: (Dynamic) Power [DP] = 1/2 Capacitive-load [C] x (Voltage [V])² x Frequency [F] ... ... (1) [Note: Eq (1) is for a single transition. For a pulse or double transitions, the "1/2" is removed] Pentium 4: C = 2 x DP / (V² x F) = (2x90 / (1.25² x 3.6x10⁹) C = 3.2x10⁻⁸ Farad [Note: Farad is the unit used for the capacitive load] Similarly, Core i5 Ivy Bridge: C = 2.9x10⁻8 Faradd

1.8.2: The Pentium 4 Prescott processor, released in 2004, had a clock rate of 3.6 GHz & voltage of 1.25V. Assume that, on avg, it consumes 10W of static power and 90W of dynamic power. The Core i5 Ivy Bridge, released in 2012, has a clock rate of 3.4GHz and voltage 0.9V. Assume that, on average, it consumed 30W of static power and 40W of dynamic power. 1.8.2: Find the percentage of the total dissipated power comprised by static powre and the ratio of static power to dynamic power for each technology.

1.8.2: Pentium 4: 10/100 = 10% Core i5 Ivy Bridge: 30/70 = 42.9%

Questions from the Slides:

14,15,16,18,24,42,44,47,49,51,54,66

CPI Example: Alternative compiled code sequences using instructions in classes A, B, C Class A B C CPI for class 1 2 3 IC in seq. 1 2 1 2 IC in seq. 2 4 1 1 Sequence 1: IC = 5 Clock Cycles: =2 x 1 + 1 x 2 + 2 x 3 = 10 Avg. CPI = 10/5 = 2.0 *Sequence 2:* IC = _ Clock Cycles: =_ x _ + _ x _ + _ x _ = _ Avg. CPI = _/_ = _._

Alternative compiled code sequences using instructions in classes A, B, C Class A B C CPI for class 1 2 3 IC in seq. 1 2 1 2 IC in seq. 2 4 1 1 Sequence 1: IC = 5 Clock Cycles: =2 x 1 + 1 x 2 + 2 x 3 = 10 Avg. CPI = 10/5 = 2.0 *Sequence 2: IC = 6* *Clock Cycles:* *=4 x 1 + 1 x 2 + 1 x 3 = 9* *Avg. CPI = 9/6 = 1.5*

*3. State Amdahl's Law as it applies to Number 3.

Amdahl's Law: (Execution time *after* improvement) = ((Execution time *affected* by improvement)/(*Amount of* improvement))+(Execution time *unaffected*) So, *After= (Affected/Amount Improved)+Unaffected*

*3a. Amdahl's Law: Suppose you have a machine that executes a program with 50% floating point multiply, 20% floating point divide, and 30% from other instructions. Management wants to run 4 times faster. You can make the divide run 3 times faster and multiply 8 times faster. Can you meet management's goal by making only one improvement? Which one?

Assuming initially that the floating point multiply,divide,and other instructions had the same CPI: i) Divide Improvement: (20)/3 + (50+30) = *86.67* ii) Multiply Improvement: (50)/8 + (20+30) = *56.25* Told = 50 + 20FP + 30

Below your program: Application software: Written in _____-_____ _____ System software: _____: translates HLL code to machine code. _____ _____: service code -Handling in/output -Manage mem./storage -Schedule task & share resources. Hardware: _____ , _____ , _/_ _____

Below your program: Application software: Written in *high-level language* System software: *Compiler*: translates HLL code to machine code. *Operating system*: service code -Handling in/output -Manage mem./storage -Schedule task & share resources. Hardware: *Processor , memory, I/O controllers*

1.5.c: Consider three different processors P1, P2, P3 executing the same instruction set. P1 = 3 GHz clock rate, CPI = 1.5. P2 = 2.5GHz clock rate, CPI = 1.0. P3 = 4.0GHz clock rate, CPI = 2.2. c. Trying to reduc time by 30%, but it leads to increase of 20% in the CPI. What clock rate should we have to get this time reduction?

CPI_new = CPI_old x 1.2, then CPI(P1) = 1.8, CPI(P2)=1.2, CPI(P3)=2.64. Clock-rate, f=# instr x CPI/time, then f(P1) = (2x10⁹x1.8)/(0.7)=5.14GHz f(P2) = (2.5x10⁹x1.2)/(0/7)=4.28GHz f(P3)=(1.8x10⁹2.64)/(0.7)=6.78GHz

CPU Clocking: -Operation of digital hardware governed by a constant-rate clock Clock period: ________ __ _ _____ _____ e.g. 250ps = 0.25ns = 250x10⁻¹²s Clock frequency (rate): _____ ___ ______ e.g. 4.0GHz = 4000MHz = 4.0x10⁹Hz

CPU Clocking: -Operation of digital hardware governed by a constant-rate clock Clock period: *duration of a clock cycle* e.g. 250ps = 0.25ns = 250x10⁻¹²s Clock frequency (rate): *cycles per second* e.g. 4.0GHz = 4000MHz = 4.0x10⁹Hz

CPI Example: -Computer A: Cycle Time = 250ps, CPI = 2.0 -Computer B: Cycle Time = 500ps, CPI = 1.2 -Same ISA Which is faster and by how much?

CPU Time = Instruction Count x CPI x Cycle Time CPU_A = I x 2.0 x 250ps = I x 500ps CPU_B = I x 2.0 x 500ps = I x 600ps

Eight great ideas. Name em' Design for _______ _______ Use _______ to simplify design. Make the _______ _______ _______ Performance via _______ Performance via _______ Performance via _______ _______ of memories. _______ via redundancy.

Design for *Moore's Law* Use *abstraction* to simplify design. Make the *common case fast* Performance via *Parallelism* Performance via *Pipelining* Performance via *Prediction* *Heiarchy* of memories. *Dependability* via redundancy.

*3b. Hoque removed the companies managers. If you can make both multiply and divide improvements, what's the new machine's speed relative to the old one?

Execution time after improvement = (50)/8 + (20)/3 + (30) = 42.91 Relative to original machine = (100)/(42.91) = 2.33

2. How did the development of the transistor affect computers?

It had a tremendous impact on bringing personal computers into the home and into our hands. The ability to package a transistor on a chip increases exponentially according to Moore's Law.

Reducing Power: Suppose a new CPU has- 85% of capacitive load of old CPU 15% voltage and 15% frequency reduction P_new/P_old = (C_old x 0.85 x (V_old x 0.85)² x F_old x 0.85) / (C_old x V_old² x F_old) = 0.85⁴ = 0.52 The power wall- We can't reduce voltage further We can't remove more heat OK... Uh, for this one, do that equation above except 70% capacitive load and 15% voltage and 20% frequency reduction.

P_new/P_old = (C_old x 0.70 x (V_old x 0.70)² x F_old x 0.70) / (C_old x V_old² x F_old) = 0.70⁴ = ______

1.5.a: Consider three different processors P1, P2, P3 executing the same instruction set. P1 = 3 GHz clock rate, CPI = 1.5. P2 = 2.5GHz clock rate, CPI = 1.0. P3 = 4.0GHz clock rate, CPI = 2.2. a. Which processor has the highest performance expressed in instructions per second?

Performance of P1 (inst/sec) = 3x10⁹ / 1.5 = 2x10⁹ Performance of P2 (inst/sec) = 2.5x10⁹ / 1.0 = 2.5x10⁹ Performance of P1 (inst/sec) = 4x10⁹ / 2.2 = 1.8x10⁹

1.3: Describe steps that transform a program written in high-level language like C into a representation that's directly executed by a CPU

Program is compiled into assembly language program, then assembled into machine language program.

Memory Hierarchy Tell me about different levels of memory hierarchy, roughly guess their access time and typical capacity

Registers - 1ns - <1KB L1/L2 Cache - 2ns - 32KB/4MB Main Memory - 10ns - 1-16GB Secondary Storage - 10ms - 128-2000GB Magnetic Tape - 100sec - 500-1000GB

Pitfall: Amdahl's Law: Improving an aspect of a computer and expecting a proportional improvement in overall performance. T_improved = T_affected / improvement factor + T_unaffected Example: multiply operation accounts for 80s of 100s in a program -How much improvement in multiply performance to get 5x overall? __ = ( __/__ ) + __

T_improved = T_affected / improvement factor + T_unaffected 20 = 80/n+20 = Can't be done! Corollary: make the common case fast.

1.4.a: Assume color display using 8 bits for each primary color (RGB) per pixel and frame size of 1280x1024 a) What's minimum size in bytes of frame buffer to store a frame?

a) What's minimum size in bytes of frame buffer to store a frame? 1280x1024 px = 1,310,720 => 1,310,720 x 3 = 3,932,160 bytes/frame.

1.2: Match the eight ideas from computer science that are similar to ideas in other disciplines "Design for Moore's Law" "Use Abstraction to simplify design" "Make the Common Case Fast" "Performance via Parallelism" "Performance via Pipelining" "Performance via Prediction" "Hierarchy of Memories" "Dependency via Redundancy" a. Assembly lines in auto manf. b. Suspension bridge cables c. Aircraft/marine nav w/ wind info d. Express elavators e. Library reference desk f. Increase gate area on CMOS transistor to decrease switching time g. Adding electromagnetic aircraft catapults, allowed by increased power gen offered by new reactor tech h. Build/retrofit self driving cars relying partially on sensor systems installed on the base vehicle such as lane departure and smart cruise control

a. Assembly lines in auto manf. *"Performance via Pipelining"* b. Suspension bridge cables *"Dependency via Redundancy"* c. Aircraft/marine nav w/ wind info *"Performance via Prediction"* d. Express elavators *"Make the Common Case Fast"* e. Library reference desk *"Hierarchy of Memories"* f. Increase gate area on CMOS transistor to decrease switching time *"Performance via Parallelism"* g. Adding electromagnetic aircraft catapults, allowed by increased power gen offered by new reactor tech *"Design for Moore's Law"* h. Build/retrofit self driving cars relying partially on sensor systems installed on the base vehicle such as lane departure and smart cruise control *"Use Abstraction to simplify design"*

1.7.a: Compilers can have a profound impact on the peformance of an application. Assume that for a program, compiler A results in a dynamic instruction count 1.0x10⁹ and has an execution time of 1.1s, while compiler B results in a dynamic instruction count of 1.2x10⁹ and execution time of 1.5s. a. Find average CPI for each program given the processor has a clock cycle time of 1ns

a. T_exec = Cycles x Cycle-time = (CPI x I) x Cycle-time , thus , CPI = T_exec / (I x Cycle Time) Compiler A CPI = (1.1s/10⁹x1ns) = (1.1x10⁹ns / 10⁹x1ns) = 1.1 Similarly, compiler B CPI = (1.5sec / 1.2x10⁹x1ns) = (1.5x10⁹ns / 1.2x10⁹x1ns) = 1.25

1.4.b: Assume color display using 8 bits for each primary color (RGB) per pixel and frame size of 1280x1024 b) How long would it take, at a minimum, for the frame to be sent over a 100Mbit/s network?

b) How long would it take, at a minimum, for the frame to be sent over a 100Mbit/s network? 3,932,160 bytes x (8bits/byte) / 100E6 bits/sec = 0.31sec

1.7.b: Compilers can have a profound impact on the peformance of an application. Assume that for a program, compiler A results in a dynamic instruction count 1.0x10⁹ and has an execution time of 1.1s, while compiler B results in a dynamic instruction count of 1.2x10⁹ and execution time of 1.5s. b. Assume the compiled programs run on two different processors. If the execution times on the two processors are the same, how much faster is the clock of the processor running compiler A's code versus the clock of the processor running compiler B's code?

b. f_B / f_A = (# instr(B) x CPI(B)) / (# instr(A) x CPI(A)) = (1.25x1.2x10⁹)/(1.1x1.0x10⁹)=1.37

1.7.c: Compilers can have a profound impact on the peformance of an application. Assume that for a program, compiler A results in a dynamic instruction count 1.0x10⁹ and has an execution time of 1.1s, while compiler B results in a dynamic instruction count of 1.2x10⁹ and execution time of 1.5s. c. A new compiler is developed that uses only 6.0x10⁸ instructions and has an average CPI of 1.1. What is the speedup of using this new compiler versus using compiler A or B on the original processor?

c. T_A / T_new = (10⁹x1.1 / 6.0x10⁸x1.1) = 1.67 T_B / T_new = (10⁹x1.5 / 6.0x10⁸x1.1) = 2.27

1.5.b: Consider three different processors P1, P2, P3 executing the same instruction set. P1 = 3 GHz clock rate, CPI = 1.5. P2 = 2.5GHz clock rate, CPI = 1.0. P3 = 4.0GHz clock rate, CPI = 2.2. b. If the processors each execute a program in 10 seconds, find the number of instructions

cycles(P1) = 10 x 3.0x10⁹ = 30x10⁹ cycles cycles(P2) = 10 x 2.5x10⁹ = 25x10⁹ cycles cycles(P3) = 10 x 4.0x10⁹ = 40x10⁹ cycles # instructions(P1) = 30*10⁹/1.5=20x10⁹ # instructions(P1) = 25*10⁹/1.0=25x10⁹ # instructions(P1) = 40*10⁹/2.2=18.18x10⁹


Kaugnay na mga set ng pag-aaral

Javascript Technical Interview (Theory)

View Set

Chapter 19: Infectious Diseases Affecting the Respiratory System

View Set

Fine Arts Survey A - Basic Techniques (Topic Test #2)

View Set

Micro ch 4-5 test (test questions)

View Set

Materials Chapter 13 Review Questions

View Set

HIST. 201 - Chapter 15: "What Is Freedom?": Reconstruction (1865 - 1877) Multiple Choice/Review Questions

View Set

Public Speaking Final Chapter 15

View Set