Computer Architecture and Assembly Language

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

High-Level Language

(such as C++, COBOL, Fortran). Source code gets translated via a compiler ( or interpreter: BASH, REXX) into assembly language of the architecture; which then is assembled into machine language.

Assembly Language

(such as IBM Basic Assembly and Intel x86 Assembly). Low-level programming language in which primitive mnemonic codes are used to represent machine language. The mnemonics make the code easier to read and remember versus binary ones and zeroes. Assembly code is difficult for humans to read or write, but after translation via an "assembler" into machine language it runs extremely fast.

Word Length and Data Alignment

8 bits constitute a byte. Each byte must begin on every 8th bit, that is 0,8,16,24, etc. The byte is a conventional architectural boundary. A Word of data is 4 bytes aligned along a byte boundary whose address is divisible by 4. A Half-Word is 2 bytes aligned along a byte boundary whose address is divisible by 2. A Double-Word is 8 bytes aligned along a byte boundary whose address is divisible by 8. An architectural "even" byte boundary, is any location divisible by 2. Such as 0, 2, 4, 6, 8. Early architectures required that all instructions originate only on an even boundary. Except for bit location ZERO, the bit locations of 1 thru 7 are not checked for instructions or data for simplicity reasons. Doing so would require a much more complex and expensive CPU with its associated logic. Since all data other than single bits must begin on a byte boundary the CPU can therefore ignore most of the bit locations of every byte insofar as looking for an address or the start of a data field. Should instructions begin on an "odd numbered" boundary location then extra work would have to be performed by the CPU. The CPU loads the number of bytes that could be accommodated by it largest instruction format with each fetch. On IBM systems, bytes containing instructions must be aligned on a "even numbered" boundary. This relieves the CPU from looking for instructions on half of all the possible addressable memory locations.

Cache Structures

A cache structure in which each memory block is mapped to exactly one block in the cache is called direct-mapped cache. As you will see, this is the scheme used for demand paging. A fully associative cache structure is one in which a memory block can be placed in any location in the cache. All blocks may have to be searched to find the correct one. A set-associative cache is one in which has a fixed number of locations where each block can be placed. So each block has a neighborhood in which it only can be found. This is a hybrid of direct-mapped and fully-associative methods. You need only search a single neighborhood to determine if the block you need is present in the cache.

Logic Elements

A combinational element is an operational element, such as an AND or XOR gate, in the Arithmetic Logic Unit (ALU). This element will determine how the information in the two data elements (the operands) are processed or combined by the operation. A state element is a contiguous set of bits at a memory location or in a register. The data that represents the contents or results of a logic operation. These would be the elements that would need to be saved and restored upon context-switching two tasks (interrupting, saving, restoring and re-dispatching). The state element is somewhat like a data variable in a high-level language. It holds the state or the current value of an operand used by the instruction. A datapath element is a unit used to operate on or hold transient data within a processor. This may include intermediate results from the instructions, the registers, the ALU, the adders and the data memory accesses.

Logic Gates

A gate is a device that performs a basic operation on electrical signals. The gates in the circuitry hardware are sometimes referred to as logic gates because each performs just a single logical function. That is, each gate accepts one or more input values and produces a single output value. Because we are dealing with binary information, each input and output value is either a zero (0), corresponding to a low-voltage signal, or a one (1), corresponding to a higher-voltage signal. The type of gate and the input values determine the output value. Gates are combined into circuits to perform more complicated tasks. The following logic operations were designed in circuitry hardware to efficiently manipulate bits to perform operations used in arithmetic.

Addition and Subtraction

A maximum of 65 bits may be required to add two 64 bit numbers. Overflow occurs when the result from an operation cannot be represented in the hardware registers. Arithmetic operations can generate results that cannot fit in the architecture's fixed size (32 or 64-bit) word. Overflow can occur only when two operands with the same sign are added together. Overflow cannot occur when the signs differ. The hardware will "know" the sign of the result prior to performing the computation. For signed numbers, the high-order bit is used to indicate whether the contents are positive or negative. During overflow from adding, the high-order sign bit is set with the value of the result through carries, instead of the proper sign of the result. This will allow the hardware to detect an exception. When adding two positive numbers the sign will indicate negative if overflow occurs. When adding two negative numbers the sign will indicate positive when overflow occurs. If a number is to be subtracted from another number, the subtrahend is converted via a twos complement manipulation, then added to the minuend. Any carries past the high-order position are ignored. Note that "adding" is performed by XOR operations with possible carry.

Memory Hierarchy

A memory hierarchy is the application of the locality principles that employs a structure that uses multiple levels of memories. As the distance of the data (and instructions) from the processor increases, the size of the memories and the access time to fetch data at distance will both increase. The closer memories are generally smaller and faster; and therefore more expensive. The memories further out toward the base of the pyramid decrease in expense with spatial or temporal distance.

Cache Hits and Misses

A memory hierarchy will consist of multiple levels. Data is copied between only two adjacent levels at a time. If the data that is being requested by the processor is present in some block in the upper level of the hierarchy then we have a "hit". If the data being requested is not found in the upper level then we are said to have a "miss". The hit ratio is the percentage of memory accesses that successfully found target data in the upper level. This is used as a measure of performance of the memory hierarchy. The miss ratio is the percentage of the memory accesses that failed to find its data in the upper level. The hit time is the time required to access a level of the memory hierarchy, which includes the time needed to determine whether the access is in fact a hit. The miss penalty is the time required to fetch a block into a level of the memory hierarchy from the lower level, which includes the time to access the block and transmit it from one level to the next, then insert it into the upper level that experienced the miss.

Binary and Hexadecimal Number Representation

All data, computer programs, instructions and numbers are represented in the computer as BINARY digits (called bits). The entire operating system (OS) and its application programs (APPS) with their data can be represented as a single string of ones and zeros. The bit is a digit in the base 2 numbering system that represents the fundamental unit of information. Bits are physically grouped into strings of 8 units called bytes. Each byte is addressable by the computer architecture. (Bits are NOT addressable) The byte (8 bits of data) has a corresponding hex code in the EBCDIC or ASCII code tables which provides its binary code equivalent. Binary is base 2. Functional for machines. BINARY: 0 | 1 times 2**i Example: 0111 = 7, 1000 = 8, 1001 = 9 Decimal is base 10. Comfortable for humans. DECIMAL: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 times 10**i Example: 456 = 4x100 + 5x10 + 6x1 Hexadecimal is base 16. Helps humans interpret binary representation. HEXADECIMAL: 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F times 16**i Example: x15 = 21, x2F = 47, xFF = 256 x2AF = 2x256 + 10x16 + 15 = 512 + 160 + 15 = 687 Binary code is easily converted to Hexadecimal code because both are powers of 2. Hex code is used by engineers because it poses much less of a challenge to read for humans, where binary is extremely difficult to translate.

Fixed Storage Locations

Areas in the computer memory that the hardware will place status information regarding external and internal exception events; and cause the storing and loading of Program Status when an Interruption occurs. The first 4096 bytes of real memory is known as the "Prefixed Storage Area" or "Page Zero". The manufacturers of operating systems must always adhere to this interface in order that the hardware can correctly communicate with the OS. The owner of Page Zero will be recognized by the hardware as the owner of the computer.

Memory Allocation

Bytes Kilobytes 2**10 or 10**3 Megabytes 2**20 or 10**6 Gigabytes 2**30 or 10**9 Terabytes 2**40 or 10**12 Petabytes 2** 50 or 10**15 Exobytes 2** 60 or 10**18

The Data Pyramid

CPU Buffers Registers L1/Lx Cache Real Memory Auxiliary Storage DASD Internal Cache DASD (hard drive) Storage Emulated-Virtual Tape Storage Physical Magnetic TAPE Storage T h e I N T E R N E T C L O U D S e r v e r s The goal is to present the user with as much memory as is available using the least expensive technology, while providing access speed as close as possible to that offered by the fastest memory. By implementing the memory system as a hierarchy the user is presented with the illusion of a memory that is as large as the largest level of the hierarchy, but can be accessed as if it were all built from the fastest and closest memory to the CPU. Benefits: Memory hierarchies take advantage of temporal locality by keeping more recently accessed data physically closer to the CPU. Memory hierarchies take advantage of spatial locality by moving multiple contiguous blocks in memory to upper levels of the hierarchy. The price of main memory on the IBM 360 Model 65 was about $1.60 per byte! The price on the IBM 370 Model 168 was about $0.20 per byte. Today the cost per byte is much less than $0.0001. This is "less than dirt cheap".

Program Performance Measures and Factors

CPU Time Supervisory or System use of CPU time. Time consumed by architecture on behalf of the operating system. Also known as computing "overhead". Problem or User application use of CPU time. Time consumed by the architecture executing applications or by the OS on behalf of the application. Wall Clock Time This is a measure of elapsed real time. CPU Performance -- Clock cycles is the number of "clock ticks" needed to execute some instruction. A faster clock will tick faster and therefore a greater number of instruction executions per unit time. Instruction performance -- The number of clock cycles that are required for each type of instruction. Execution Time = Secs/Prog Secs/Prog = (N-Instructions / Prog) X (Clock cycles / Instruction) X (Secs / Clock cycle) I/O execution time vs. CPU execution time - Time durations for moving data from place to place (I/O) vs. Time spent consuming CPU cycles in computation (Add, Multiply, Shift, etc.)

Central Processing Unit (CPU)

Central Processing Unit. The CPU is where instructions are obtained, validated and executed. In terms of computing power, the CPU is the most important element of a computer. Two typical components of a CPU are the following: (1) The Arithmetic Logic Unit (ALU), which performs arithmetic and logical operations. Such as addition, division, comparisons, also the AND'ing and OR'ing. (DATAPATH) (2) The control unit (CU), which extracts instructions from real memory, then decodes and executes them, calling on the ALU when necessary. (CONTROL). Decoding consumes the greatest portion of the clock cycles used by the CPU.

Clocking

Clocking is rhythmic ticking of the computer's internal timepiece. Clocking is measured in Hertz, (cycles per second), and is currently on the order of billions of units per second. Clocking defines when signals can be read and when they can be written; as well as the interval in which a unit of work must be executed. Synchronization is important so that all work units execute in lock step. If operations did not synchronize into a tight tolerance then a signal could be read at the same time it is being written thereby causing unpredictability and loss of data integrity. A combinational element has the interval of time from one clock edge to the next to complete a unit of work. Finishing late would be catastrophe. Any values updated in a sequential logic element (register or variable in memory) occur at a clock edge and are maintained for the entire clock interval.

Machine Code

Code that is understandable by the hardware. Binary strings of ones and zeroes. Machine code is printed in sets of 4 bits called hexadecimal so that humans to have a fighting chance at reading.

Translators

Compliers Assemblers Linkers Loader Interpreters

Signals

Control Signals are used for the proper selection or the directing of the operation of a functional unit. These signals are setup on the basis of the format for the executing instruction. Such as: the instruction read or write to a register or memory, the string of logic gates used to process data, load the address of a calculation or the contents of the calculation. The setting of the control lines is completely determined by the instruction operation code (opcode) fields of the instruction. The CPU is configured for this instruction format. Somewhat like a car assembly line being reconfigured for a particular model with each car being produced. A Data Signal is used to provide content information (the operands) to the instruction that is operated on by a functional unit, such as decode. That is, identification to the further pipe stages as to the source of the inputs (register, condition code, memory location, immediate data) and what is the target of the outputs. This information is encoded in the instruction format.

Overflow Exception Events during Arithmetic Operations

Detection of overflow generates an exception, which in turn will cause an interrupt on many computer architectures. An interrupt is an asynchronous hardware event that alters the sequence in which the processor executes instructions. Typically generated by an I/O device, Timer or Operator intervention. An exception is a synchronous unscheduled or undoable software event that disrupts the normal program execution and leads to an interruption. The application or operating system's flow of control is interrupted and the program path gets immediately changed so that its prior thread of execution is suspended. A transfer of control to an interrupt handler takes place for corrective or recovery action. Status, context and registers are saved in a unique recovery area of memory by the operating system.

Input Devices

Devices that accept information and transfer that information to memory,

Output Devices

Devices that pass information from memory to the external environment.

Division

Dividend = (Quotient x Divisor) + Remainder The Remainder will always be smaller than the Divisor. The problem when dividing decimal numbers is that you must know how many times the divisor goes into the portion of the dividend you are currently working with. With binary numbers the divisor goes into the intermediate portion of the dividend either 0 times or 1 time. If the dividend has more significant high-order digits than the divisor then you know that the divisor can be subtracted from the dividend again. Division Steps: First, the remainder register is initialized with the dividend. With each step the divisor is subtracted (using twos complement) from the high-order portion of the dividend. If the result is positive, the divisor was smaller or equal to the dividend, so a 1 is generated in the quotient, and the remainder is reduced. If the result is negative, the next step is to restore the original value by adding the divisor back to the remainder and generate a zero in the quotient. The divisor is shifted right and the process iterates until complete. The remainder and quotient will be result in their respective registers.

Endianess

Each byte of data in memory has its own address. There are two different conventions for ordering bytes within a larger data object: Big-endian systems (such as IBM), store the most significant byte of a word in the lowest address and the least significant byte is stored in the highest address. Take the hexadecimal number 06971A2F. This value can be represented in memory as x"06971A2F" (a set of 4 bytes called a "word", written out using left-to-right positional, hexadecimal notation) The four memory locations with addresses: a, a+1, a+2 and a+3; Then, in big-endian systems, byte x'06' is stored in a, '97' in a+1, x'1A' in a+2 and x'2F' in a+3. The most significant part of the number is stored in the lowest address. 06971A2F Little-endian systems, in contrast, store the least significant byte in the lowest address In little-endian systems (such as Intel), the order is reversed with x'2F'; stored in memory address a, x'1A' in a+1, x'97' in a+2, and x'06' in a+3. 2F1A9706

Addressability

Each individually accessible unit (byte) of information in storage is selected using its numerical memory address (location). In modern computers, location-addressable storage is limited to primary real storage. The ability to reach a particular location in memory is limited by the size of a number (displacement) that can be accommodated by a register. How high can the CPU count and fit the result in a register? Examples: 16 bit length registers allow access to approximately 65 thousand bytes. 32 bit registers expand that addressability to the neighborhood of 4 billion bytes.

Representing Negative Numbers with Two Complements

Engineers needed a way to represent the number scale in binary. It must be balanced between negative and positive numbers, with a single zero. One's complement number representation takes each bit of the binary number and inverts it. That is the bits are sent through a NOT gate. Example: 7 = 0111 One's complement of 0111 is 1000 Two's complement number representation takes the One's complement and then adds one to it (with carries). Example: 7 = 0111 One's complement of 0111 is 1000 Two's complement is 1001 Two's complement numbers are a way to encode negative numbers into ordinary binary, such that you can add the Twos complemented number to another number and in effect cause a subtraction. Adding -1 + 1 should always equal 0. If during addition in making a Two's complemented number, you have a carry out to a position to the left of the high-order digit of the register the hardware will ignore the carry. (It drops away) The Twos Complement successfully splits the number range between positive and negative numbers. Twos Complement ensures that there is only one zero configuration, that being 0000....0 propagated throughout the entire register. The smallest magnitude positive number is 0000...1 The greatest magnitude positive number is 0111...1 Note that zero is in the high-order position. The smallest magnitude negative number is 1111...1111 The greatest magnitude negative number is 1000...0000 Note that a one is in the high-order position.

Exceptions

Exceptions were initially designed to handle unexpected events from within the processor, like arithmetic overflow, divide by zero, etc. This can be any unexpected change in the control flow from internal causes. When an exception occurs, the processor must perform some action to save the address of the offending instruction and then transfer control to the proper operating system Interrupt Handler. The operating system can then take the appropriate action in response to the exception, likely stopping the execution of the program and posting an error. Note that any unfinished instructions in the pipeline ahead of the instruction causing the exception must be allowed to run to completion. Furthermore, and instructions in the pipeline following the instruction causing the exception must be flushed from the system as though they never happened. From the programmers point of view, he/she must be assured by the computer architecture that the instructions were processed as though executed serially, one after another, in time. The pipelined implementation treats exceptions as another form of a control hazard. The hardware will stop the offending instruction in midstream, let all prior instructions run to completion, and flush all the instructions following it.

Steps in Instruction Execution

Fetch the instruction from memory. Examine the address of the next instruction via the Program Counter (PC) or the Program Status Word (PSW). Go to the memory location containing the next instruction to be executed. Bring the instruction and operands from main memory into the CPU hardware buffer. This is generally consists of the maximum amount of data needed to populate the longest instruction format. This data starts at that memory address that the PSW points. Decode the instruction. Using the Instruction Set list and the Instruction Formats, determine the Operation Code (opcode) and read the contents of the register(s). The opcode field denotes the operation to be performed and the format of the instruction. Sets the control and data signals based upon the format of the particular instruction. Each format has a unique requirement for processing. These signals effectively reconfigure the CPU on the fly. Compute the address of the next instruction to be fetched and update the Program Counter (PSW). Execute the instruction. This takes place in the Arithmetic Logic Unit (ALU), Uses the control and data signals set by the Decode stage as a guide to determine the work to perform and the data to act on. May calculate a next address in memory to be branched (jumped) to. If the next inline, adjacent address does not contain the instruction to be next fetched, then override the next instruction address in the PSW that was calculated in the Decode stage. Data memory access. Ensure that the operands' memory addresses can be reached, that is, they are addressable by the architecture. Ensure that the address computed by the content of the registers and displacments does exist. Ensure access to any register(s) needed. Set the condition code in the PSW. Write the result back to memory. Place the data results of computation into the targeted memory location or into the result register and/or the condition code to the PSW.

Signed and Unsigned Number Representations

In all bases, the high-order digits represent the greater magnitude number, and the low-order digits represent the least significant numbers. All numbers are based upon the following format: d times base**i Example: (1 * 2**24) and (987 * 10**8) Positive and Negative numbers are represented by sign and magnitude. The negative sign is represented by a 1 bit in the high-order position (that is the leftmost digit), whereas a positive number will have a zero as the high-order digit. Note that the high-order digit cannot be use in determining the magnitude of a number, so this reduces the greatest possible number by half. Engineers have decided that it is more efficient to ADD binary numbers then to SUBTRACT them, so that when performing subtraction the number to be subtracted is converted to TWOs COMPLEMENT form. Each bit is inverted and 1 is added to the result. This new binary number is then added to the first number and a correct subtraction will result.

JAVA

JAVA is compiled into an architecture neutral code (known as byte-codes). An interpreter (called a Java Virtual Machine) on each of the different architectures then at execution time translate "on-the-fly" the byte-codes into the machine language of the hardware environment.

The Loader

Locates the executable program that resides as a file in auxiliary storage, allocates memory sufficient for its size, then loads the program into memory, and finally makes the program addressable by providing the caller with the entry point to its executable code.

Memory Management

Logical (Virtual) vs. Physical (Real) Addressing Virtual address is the address computed by the CPU, with the aid of the OS, based upon the "view" of the address space in which the program executes. Physical or Absolute address is a real location on the memory chip. It must be between 0 and the highest addressable byte of real memory. Address Binding Symbolic address - The addresses as viewed from the prespective of the programmer. Relocatable address - Addresses in a format that can be altered dynamically at execution time. Address Mapping Run-time mapping between virtual and physical addresses is performed by a hardware mechanism called dynamic address translation or the memory-management unit (MMU). The value of the relocation is adjusted for every address in each memory access. This means that the value of the relocation register is added to each accessed address. The application program never sees the real physical addresses. So we have logical addresses in the range 0 to "any logical address". And physical addresses in the range of Real 0 to "end physical address" Single Contiguous Memory Allocation Simple address space that could accommodate a single application. The entire program must be able to fit into the address space. Each instruction and data address is mapped directly to a real address in the address space. The unused portion of storage went wasted. If the entire program cannot fit in the address space, then it cannot be run; or it must be segmented by the programmer. Partitioned Allocation Static Partitioned. Address space that could accommodate a fixed number of partitions and therefore a fixed number of applications. An operating system was needed; it was loaded beginning at memory location zero. Each application partition, aside from the first, was offset by a fixed displacement from memory location zero. Since programs are written with the assumption of running relative to address zero, each memory access was manipulated by an offset representing the beginning partition address so that it could address the proper physical address relative to the partition start boundary. Remaining unused portions of each partition called fragments could not be used for other programming executions Dynamic Partitioned. Had all the features of Static Partitioned with the exception that the address space could accommodate a variable number of partitions and applications. The variable number was set by the operator at machine startup time. Both static and dynamic partitioned required that each program must be able to fit in the partition allocated to it.

Classes of Computing Equipment

Low-end to High-end performance and capacity: Handheld (Smart Phones, Tablets, Pads, Watches, etc) Laptop and Desktop. Servers (Intel or AMD based) Mainframe or Enterprise Server (370, 390, ESA, IBM z/Architecture) Supercomputers (Cray)

Programming Languages

Microcode or Firmware Machine Assembly High-Level JAVA

Memory Types

Paper Magnetic Optical Electronic and Solid State Steel and Vinyl

Single Cycle Implementation

Performing the above in single cycle fashion will work correctly, but is not used in modern designs because it is inefficient. The clock cycle must be the same length for each instruction. The clock cycle is determined by the longest possible path instruction, the worst case delay, to be completely processed by the CPU. Each instruction is executed serially (one at a time) to completion. The next instruction does not begin processing until the prior has vacated the CPU.

Application (Application Tasks)

Permitted by the hardware to execute only a subset of the instruction set. This subset consists of non-privileged instructions. Since privileged instructions could adversely effect the OS and other application tasks only the operating system is permitted to execute privileged instructions. (Synonyms: user, problem, un-privileged, application, app)

Pipelined Datapath and Control

Pipelining is a technique that exploits parallelism among multiple instructions in a sequential instruction stream. The instructions are overlapped in execution. The division of an instruction into five stages means a five-stage pipeline, which in turn means that up to five instructions (theoretically) will be in execution simultaneously during any single clock cycle. Pipelining increases the number of simultaneously processing instructions and the rate at which instructions are started and completed. Pipelining does not reduce the time it takes to complete an individual instruction. As each step in the instruction procedures finishes, the first step of the next instruction is fetched. the second step of the (next -1) instruction is decoded, the third step of the (next-2) instruction is executed, the fourth step of the (next-3) instruction accesses data memory, the last step of the (next-4) instruction writes the result. Pipelining does NOT change the order in which instructions are executed. The programmer can consider the instruction order as "set in stone". Pipeline (ideal) performance: If stages are perfectly balanced with ideal processing conditions then: Time between instructions = Time between non-piped instructions / Number of pipe stages Pipeline vs. Single Cycle instruction performance Just as the single-cycle design must take the worst-case number of clock cycles to process the complete instruction, the pipeline need only consider the worst-case amount of time needed to complete the slowest step in the pipeline. Outside of the memory system, the effective operation of the pipeline is usually the most important factor in determining the Clock Cycles Per Instruction of the processor and hence its performance. Designing for PIPELINES 1) Make each instruction format, of the entire instruction set, the same length. 2) Have very few differing instruction formats. 3) Capability to operate on operands that are in main memory. 4) Ensure that instruction operands reside in the same page of storage as the instruction itself. PIPELINE considerations Given a large number of instructions the increase in speed is about equal to the number of pipe stages. Pipelining improves performance by increasing instruction throughput. It does not decrease the execution time for any single instruction. Pipeline Hazards A pipeline hazard is a situation where the next instruction cannot execute due to conditions set up by the processing of instructions currently preceding in the pipeline. A structural hazard is one in which the planned instruction cannot execute in the same clock cycle because the hardware does not support the combination of instructions that are set to execute. There exists a conflict, such as a common resource that is needed by two operations and that resource cannot be shared. Example: An instruction that execute a compare operation and has not yet set the condition code in the PSW. The following instruction needs to take action on the basis of the condition code. Two instructions need access to the same memory location, the first wants to write (store), the second to read (load). A data hazard occurs when a planned instruction cannot execute in the same clock cycle because the input data that is needed to execute the instruction has not yet been made available; it is still in the process of being or has not yet begun to be computed. This hazard can lead to a "stall". Example: A SUBTRACT instruction needs the result of a prior ADD instruction which is still in the pipeline.

Microcode or Firmware

Programming code that instructs the hardware on how to carry out a very low level task, such as opening a circuit gate, ORing and ANDing and NEGating current flow.

Uni-proccessing vs. Multi-Processing

Single CPU vs. Multiple CPUs that process in a shared storage environment. Multiprocessing allows simultaneous processing of multiple tasks across more than one CPU. You need to design special sophisticated programming to efficiently operate on a single problem. A problem that arises when employing multiple processors (CPUs) is that they may both access a shared resource at the same time. You must provide some interlock mechanism to provide serialization and synchronization when sharing a common (the same) resource. There is a danger if more than one processor attempts to access the same resource at the same time can cause a likely corruption of the data value; this is called a data race. However, if the computations can be made disjoint from other computations for a given problem the parallel processing can reduce the time to result enormously.

Registers

Special, high-speed storage areas packaged with the CPU hardware. These are the physically closest storage areas to the CPU. Accessibility is virtually instantaneous. All data must be represented in a register before it can be processed. For example, if two numbers are to be added then both numbers must be in registers, and the result is also placed in a register or is pointed to by an address contained in a register. ("Pointed to " means that the register contains the address of a memory location where data resides, rather than the actual value of the data itself.) The number of registers that a CPU has and the size of each register, (their length in bits) help determine the architecture of the computer. For example a 32-bit machine is one in which each register is 32 bits wide. Therefore, each instruction can accomodate 32 bits of data.

Computer Types

Specific Designed -- Machines that are built for a single specialized task and are not reprogrammable. General Purpose -- These machines can be re-programmed for multiple applications.

Operating System

Supervisory Tasks- Permitted by mechanical hardware checks to execute the full instruction set. This is also called privileged mode. (Synonyms: system, supervisor, privileged)

Floating Point Operations

Support for fractions and real numbers (numbers expressed as p/q) Numbers are expressed in scientific notation, numbers with a single digit to the left of the decimal point. A floating point number is normalized if it has no leading zeroes. Floating point numbers are represented by a sign, a fraction (significand) and a signed exponent (characteristic). The sign is the first bit, followed by an exponent, then the fraction. Overflow means that the exponent is too large. Underflow means that the exponent is too small. A double infers a floating point that is twice the bit length. Both the exponent and the fraction contain an increased number of significant bits, and thereby greater precision and magnitude. [Sign] Mantissa (also known as the Significand) 1+52 bits Characteristic (also known as the Biased Exponent) 11 bits

Program Status Word (PSW) or Program Control Register

The PSW is a hardware register which contains information about program state used by the operating system and the underlying hardware. It will contain a pointer (address) to the next instruction to be executed by the CPU. The PSW also contains the information required for proper program execution. an error status field, condition code set by prior instruction execution, interrupt enable/disable masking bit, and a supervisor/problem state mode bit. In general, the PSW is used to control instruction sequencing and to hold and indicate the status of the system in relation to the program currently being executed. The active or controlling PSW is called the CURRENT PSW. By storing the current PSW during an interruption, the status of the CPU can be preserved for subsequent inspection. By loading a NEW PSW or modifying the contents of some part of a PSW, the state of the CPU can be initialized or changed.

Locality

The Principle of Locality states that programs access a relatively small portion of their address space at any interval of time. Temporal Locality is the principle stating that once a location containing data or instructions is referenced then it is likely that it will be referenced repeatedly in the very near future. Such as a programming loop. Spatial Locality is the principle stating that if a data location is referenced, then subsequent references will be to data with addresses in the same physical neighborhood. Spatial locality will define the working-set size of a program, that is the set of instructions that make up the normal operating core of a program and tend to be executed or reused over again. A program's working-set consists of those pages that the operating system considers to exhibit the greatest locality. In general, it is desirable to develop programs with a high degree of locality. Locality helps to keep the CPU busy and not waiting for instructions and data to be fetched from non-local areas (such as auxiliary storage volumes). We rely on the operating system and other mechanisms to track and determine which instructions and data exhibit high degrees of locality.

Instruction Set

The abstract programming interface between the hardware and the lowest-level software that encompasses all the information necessary to write a machine language program that will run correctly, and can be re-purposed for different tasks. The instruction set comprises the operations that are the smallest programmed units that the CPU can interpret as the functional guidelines to accomplish some operation or manipulative task. You can think of an instruction as a reliable function or procedure that is executed by the hardware. An instruction is composed of electrical engineered circuits. The instruction set layer hides the details of the function of the underlying hardware. This is where software engineering ends and electrical engineering begins.

Instruction Set

The alphabet of the computer is ASCII or EBCDIC. This is the code of symbols understood by the operating environment. ASCII = American Standard Code for Information Interchange EBCDIC = Extended Binary Coded Decimal Interchange Code The language of the computer architecture is the instruction set. This is the set of binary strings representing instructions that is understood by the particular architecture of the machine on which it operates. Each instruction represents an item in the vocabulary of the computer language. Both instructions and data are stored in the computer memory as binary strings. This is known as the stored-program concept. First formalized by John Von Neumann, this allowed computers to be easily reprogrammed for different applications.

Architecture

The computer architecture is the conceptual design of the machine organization that specifies the fundamental operational structure of a computing system. It describes the requirements and design implementation for both the hardware and the native operating software to perform instruction execution and memory access.

Multiplication

The multiplicand is multiplied by the multiplier resulting in a product. The intermediate products are shifted 1 bit to the left. The length of the product of an n-bit multiplicand and a m-bit multiplier is n+m-1 bits. That is n+m-1 bits are required to accommodate all the possible products. Generally multiply instructions are provided with a product area that is 2 registers in length. Hardware multiplication is simply a series of shifts and ANDs applied to the bits. A maximum of 128 bits may be required to multiply (or divide) two 64-bit numbers. Overflow cannot occur because two registers are used to contain the result. The hardware will "know" the sign of the result prior to performing the computation. Note that "multiplying" a digit in the multiplier by a digit in the multiplicand is performed by AND operations. Multiplication Steps: The multiplier bits are ANDed with each bit in the multiplicand, with the result stored as a row. The next multiplier bit to the left performs likewise, with the result stored in the next row but shifted 1 bit to the left. The row is added (XOR) to the prior to produce an intermediate result. The process continues until the multiplier bits are exhausted for length of the architectural number of bits in a register. The last result becomes the product. Note that shifting bits 1 bit to the left has the same effect as multiplying by 2, with more efficient hardware performance. Faster multiplication is accomplished by a pipelined series of adders. One for each bit of the multiplier. One input is the multiplicand ANDed with the multiplier bit, and the other is the output of the prior adder. Pipelining means connecting the outputs of one adder with the inputs of the next. For signed multiplication, simply recall the signs of the multiplier and multiplicand. Following simple mechanical arithmetic rules: If they are different then the result must be negative otherwise the result is positive.

Instruction Operands

The operands of an instruction may be: One or more register containing a numerical data value, or the address of data in real memory (ie. a pointer), or the length of some data, or a literal data value acting as an immediate operand. All operations must take place in the CPU. The location or effective address in memory identifies the place in real memory where data is loaded from or stored into. This location is calculated from the contents of the operands which consists of the register(s) contents and optionally a displacement value. Location (or effective address) = contents of register(s) + [some positive displacement in memory] While a register may contain a numerical value it generally points to a location in memory that contains a value. The effective address loaded in the register is the sum of the following: ( Base register + [Index register] + [Displacement] ) The amount of data acted upon is an architected fixed amount such as a bit, byte or word; or can be represented by a length code. Immediate operands do not have a register or memory location associated with them, therefore, no need to fetch from memory. Immediate operands are supplied in the program code itself. They are used to indicate a literal data value or a bit-pattern (mask) to be applied directly to the computation from the instruction itself. The OR IMMEDIATE instruction (OI) will alter the bit contents of real storage at some displacement from the location pointed to by a given register. For each instruction defined in the computer architecture the CPU must be able to resolve: 1) What service the instruction is to provide, 2) where the operands are located, (in memory or in registers or immediately in the program code). 3) the length of the data to be manipulated, 4) and finally where to place the result. A load data transfer operation copies data from real memory to a register. L R7, 1024( R2, RA ) load reg7 with the data from memory location at effective address of register2 + register10 + 1024 5872A400 (in hex) A store data transfer operation copies data from a register to real memory. ST R7, 2048( R2, RA ) store contents of reg7 into memory location at effective address of register2 + register10 + 2048 5072A800 (in hex) Instructions must be aligned in memory, generally on architectural boundaries divisible by 2. This way the CPU need only concern itself with the "even" numbered memory locations when looking for the start of an instruction. Furthermore, the CPU can discount all bit locations other than the bit location that begins the byte at each even addressed location. Alignment permits a much easier interpretation and facilitates speed. Most instructions set a Condition Code that represents the success or failure (and other statuses) of the last executed instruction. Since the CPU cannot "remember" what operation it last performed, this status information is stored in the Program Status Word (or Program Control Register) for interpretation by the next instruction.

Registers

The operands of many instructions are restricted to using a limited number of special locations built into the CPU hardware called registers. These primitive features are different from memory locations in that: their number is limited; they are physically close to the CPU; access is extremely fast; their addressing is simple, requiring only 4 bits (0-F); and they can never be paged out of memory. The size of the register (its width in bits) is limited to that of the underlying architecture (16, 32, 64-bit). Smaller width registers and Less of them will result in faster access and manipulation of data contained in them. Registers are of several types: General Purpose -- Used for numeric data and address arithmetic Control -- Used for controlling reconfiguration of the CPU functions I/O -- Used for managing input and output operations Program Status -- Used for maintaining state of processor, next instruction and the results of the last instruction, interrupt masking.

The Von Neumann Bottleneck

The shared bus between the program memory and data memory leads to the von Neumann bottleneck, the limited throughput (data transfer rate) between the CPU and memory compared to the amount of memory. Because instructions and datat must travel between the memory and CPU across a buss, throughput is lower than the rate at which the CPU can work. This seriously limits the effective processing speed when the CPU is required to perform minimal processing on large amounts of data. The CPU is continually forced to wait for needed data to be transferred to or from memory. Since CPU speed and memory size have increased much faster than the throughput between them, the bottleneck has become more of a problem, a problem whose severity increases with every newer generation of CPU.

Handling Writes

To ensure consistency in data both memory and cache may have to be updated on writes. write-through -- The information is written to both the memory block in the cache and the memory block in the lower level of memory via a write buffer. If the write buffers cannot be emptied fast enough then the CPU will incur a stall condition until a write buffer becomes available . write-back -- The information is written only to the memory block in the cache. The changed memory block is written to the lower level only when the block is replaced. If the machine were to lose power or the OS were to abnormally end then the cache would be lost and the data would now be in an inconsistent state.

Interpreters

Translate human readable source programming into machine code at execution time. This is performed "on-the-fly" as each statement is read from the source code. The CPU interprets the binary machine language code as instructions, then executes.

Handling Cache Replacements

When a miss occurs with direct-mapped there is only one position to be considered. With an associative cache we have a number of choices. That choice is usually based upon a Least Recently Used (LRU) algorithm. The block replaced is the one that has not been used for the greatest amount of time. That is the block that exhibits the least locality.

Cache

is a smaller, faster memory which stores copies of instructions most recently executed and the data accessed from the most frequently used memory locations.

Linkers

or Linkage-Editors will take all of the independently assembled machine language object programs and bind them together into a single executable program. The linkers resolve all the other programming referenced in your object code to stitch together a program ready to run. The final product is sometimes called: an executable, a binary, or a load module.

Compliers

translate human readable source programming to assembly language. First invented by Grace Hopper for the COBOL language.

Assemblers

translate mnemonic code programming to binary Machine Language (object code) consisting of strings of 1 and zeroes.


Kaugnay na mga set ng pag-aaral

Medical Surgical Nursing Chapter 31 Hematologic Problems

View Set

Chapter 14 Identifications/Matching

View Set

SMALL BUSINESS MANAGEMENT - CHAPTER 3 - FAMILY ENTERPRISE

View Set

business short answer questions (2.28.18)

View Set

Chapt 4 (MINDTAP: Research Methods for the Behavioral Sciences + Mind Tap PSYC/SOCI 205 )

View Set