Assembly Language Parts 1-3
Registers are prefixed with ?
%
RSP VS RBP
%rsp and %rbp are used as stack management registers, and should not be used for any other purpose RSP is the address of the top of the stack: it points to the last value pushed onto the stack RBP is a frame pointer: it points to the bottom of the stack frame of the function which is currently executing. rbp must be set at the beginning of the function, after preserving the value of rbp for the caller function.
register operand
- second type of operand - denotes contents of a register - one of the 8-, 4-, 2-, or 1- byte low-order portions of the registers for operands having 64, 32, 16, or 8 bits respectively
SF
- sign flag - bit = 7 - set if most sig bit is 1
%RSP
- stack pointer - points to last item pushed onto the runtime stack
memory operand
- third type of operand - access some memory location according to a computed address - this address is called the effective addresss, computed as: Imm + R[r sub(b)] + R[r sub(i)] *s
ZF
- zero flag - bit = 6 - set if result was zero
CF
-carry flag - bit 0 - set if operation generated by carry or borrow results in overflow for unsigned operation
3 types of data movement in computer languages
1. Memory to function variables (registers in assembler) 2. Function variables (register) to function variables (register) 3. Function variables (register) to memory
3 categories of x86 instructions
1. data movement 2. arithmetic/logical operators 3. control-flow
4 General Categories of Statements in Computer Languages
1. declarations 2. data movement: - Memory to function variables (registers in assembler) - Function variables (register) to function variables (register) - Function variables (register) to memory 3. Arithmetic/Logical Operation - compare - calculate 4. Control-Flow - procedure/function calls - looping - conditionals
3 types of operands
1. immediate - constant values - written with '$' followed by int in C notation 2. register - denotes contents of register - one of the 8-, 4-, 2-, or 1- byte low-order portions of the registers for operands having 64, 32, 16, or 8 bits respectively 3. memory - access some memory location according to a computed address, often called the effective address
The modern meaning of the term computer architecture covers three aspects of computer design:
1. instruction set architecture (ISA) 2. computer organization 3. computer hardware
1 word = __ bits
16
The X86‐64 CPU has _____ 64-bit registers
16
how many bits is a word?
16 (double word = 32, quad word = 64)
movb moves __ bytes
2
how many bits is a double word? quad word?
32, 64 (word = 16)
hexadecimal representation of -1
FF...F where the number of F's is twice the number of bytes in the representation
True/False: Since we will usually be working with quad words, omitting the opcode suffix is OK because the compiler assumes instruction is -q.
False: If you omit the suffix, the assembler will attempt to determine it from the operands involved, but this is dangerous (the assembler will not always generate a machine instruction that operates on data of the size you want). This can cause bugs which are extremely difficult to track down, so be sure to *always* include the operand size suffix!
True/False: For all three operands, zero-extension and sign-extension data movement instructions are supported for all size sources and destinations
False: for 64-bit destinations, moving with a sign extension is supported for all three source types, but moving with zero-extension is supported for the two smaller source types
True/False: movX destination, source is the correct format for MOV instructions, where X is accurate suffix
False: movX source, destination *SOURCE COMES FIRST*
True/False: movzlq is typically the most used instruction of the MOVZXX instruction class family
False: movzlq is *not* an instruction in the movzXX class. movl already zero extends to higher destination, so movzlq can be just 'movl'
True/False: MOV class is used to move and copy values
False: the MOV class is used to move values. the MOV class cannot be used to copy values from one register to another in one instruction, must use two instructions as mov cannot have location in memory as both source and destination operand.
First 8 general registers names vs last 8 names
First 8 - historical names - RAX, RBX, RCX, RDX, RBP, RSI, RDI, and RSP - RSP is stack pointer (end pos run-time stack) Last 8 - added later - second eight are named R8-R15
_____ refers to the actual programmer visible machine interface such as instruction set, registers, memory organization, and exception (i.e. interrupt) handling
ISA
simplest of data movement instruction classes
MOV class - consists of 4 instructions: movb, movw, movl, movq - operates on data of different sizes: 1, 2, 4, and 8 bytes respectively - source operand = immediate value stored in register or memory, must be 32 bits - destination operand = location that is either a register or a memory address, sign extended to 64 bits - cannot be used to copy values from one register to another in one instruction, must use two instructions as mov cannot have location in memory as both source and destination operand.
Memory is considered a _____-_____________ storage array
Memory: Consider it to be a byte‐addressable storage array. Words stored in little‐endian byte order
.type
Needed by the linker to identify the label as one associated with a function, as opposed to data
.size
Needed by the linker to identify the size of the text for the program
Memory Addressing Modes: Normal
Normal (R) Mem[Reg[R]] Register R specifies memory address Aha! Pointer dereferencing in C movq (%rcx), %rax
Assembly operation characteristics
Perform arithmetic function on register or memory data Transfer data between memory and register - Load data from memory into register - Store register data into memory Transfer control - Unconditional jumps to/from procedures - Conditional branches
How to access registers
RAX, RBX, RCX, RDX - rRx: Refers to 64 bit register - eRx: Refers to 32 bit register - Rx: Refers to the lower (least significant) 16 bits of eRx - Rh: Refers to the top (most significant) 8 bits of the Rx bit register (buggy) - Rl: Refers to the lower 8 bits of the Rx register The least significant (or lower) 16 bits of RDI and RSI can be accessed also in x86‐64 RSP and RBP are used as stack management registers, and should not be used for any other purpose
All processors in smart phones and tablets are of ____ architecture.
RISC
2 types of ISA
RISC & CISC
RISC
RISC Architecture (reduced instruction set computer) - fixed encoding length - simple address model - arithmetic and logical operations (ALU ops) work on data in registers - load-store architecture: only instructions that can affect memory are load (from memory to register) & store (from register to memory) instructions - no condition registers - RISC processors = less power and heat - Register intensive procedural linkage - Registers used for procedure arguments, return values and addresses •All processors in smart phones and tablets are of RISC architecture.
RISC VS CISC encoding length
RISC: fixed encoding length (all instructions same length) CISC: variable encoding length (instructions have different lengths)
RISC VS CISC condition registers
RISC: no condition registers CISC: condition codes hold side effects of instructions
RISC VS CISC procedural linkage
RISC: register-intensive procedural linkage (Registers used for procedure arguments, return values and addresses) CISC: stack-intensive procedural linkage (Stack used for procedure arguments, return values and addresses)
RISC VS CISC address modes
RISC: simple, base & displacement CISC: more modes, base, displacement, scale factors, index, registers, etc
MOVSXX
Sign-Extending data movement instructions (fill the remaining destination bytes with copy of most significant bit in source) movsXX where X's are suffixes, first is source second is destination. - register or memory location as source, register as destination 5 types: movzbw, movzbl, movzwl, movzbq, movzwq - movzlq = cltq; moves 4 byte to 8 byte; no operand, always %eax to %rax movzbw: moves zero-extended byte to word movzbl: moves zero-extended byte to double word movzwl: moves zero-extended word to double word movzbq: moves zero-extended byte to quad word movzwq: moves zero-extended word to quad word
.string
Specifies that the characters enclosed in quotation marks are to be stored in memory, terminated by a null byte
.section
This makes the specified section the current section. comes with: .rodata Specifies that the following data is to be placed in the read only memory portion of the executable
True/False: 64-bit CPU has 16 general registers to store int and pointer data
True
True/False: It is the caller's responsibility to clean the stack after the call.
True
True/False: Cannot do memory‐memory transfer with a single instruction
True use two MOV instructions
True/False: X86‐64 ISA uses 1-bit flags: Z, S, C and O in the RFLAGS register (EFLAGs). They signify that the result of the most recent ALU operation was 0, negative, carry out or resulted in overflow, respectively.
True. condition codes
True/False: Operands can be of different sizes: 1, 2, 4 or 8 bytes
True: Data Movement and Arithmetic/Logic operations in ATT format use an instruction opcode suffix (q, b, w, or l) to specify operand size.
True/False: When movl has a register as the destination, it will set the high-order 4 bytes of the register to 0
True: This is because of convention adopted in x86-64, that any instruction that generates a 32 bit value for a register also sets the high-order portion of the register to 0
true/false: operands have 3 types
True: immediate, register, and memory
True/False: only ALU operations set condition codes
True: ALU doesn't know if signed or unsigned
True/False: There are many different kinds of assembly languages
True: Each different type of processor can have a different one we're going to use X86‐64 because it's the processor that stdlinux is based on
True/False: Compiling assembly will get very different results on different machines
True: due to different versions of gcc and different compiler settings
RISC vs CISC Architecture
Two types of ISA 1. RISC Architecture - reduced instruction set computer - fixed encoding length (all instructions same length) - simple address modes, base and displacement - arithmetic and logical operations (ALU ops) work on data in registers - load-store architecture: only instructions that can affect memory are load (from memory to register) & store (from register to memory) instructions - no condition registers - RISC processors = less power and heat - large number of registers (typical is 32, 64, or 128) - Register-intensive procedural linkage (Registers used for procedure arguments, return values and addresses) •All processors in smart phones and tablets are of RISC architecture. 2. CISC Architecture - complex instruction set computer - variable encoding length (instruction length varies) - more addressing modes, base displacement index registers etc - arithmetic and logical operations can be performed on registers or directly on memory - condition codes hold side effects of instructions - CISC processors = more power and heat - Stack‐intensive procedure linkages (Stack is used for procedural arguments and return address/values) - This is the ISA for Intel IA‐32 processors, and also the basis for the related 64 bit versions of the ISA (these processors are now found in over 90% of laptop and desktop computers)
4 condition codes (R/E FLAGS) that we care about for this class
ZF (zero flag), CF (carry flag), OF (overflow flag), and SF (sign flag)
MOVZXX
Zero-Extending data movement instructions (fill the remaining bytes with 0's) movzXX where X's are suffixes, first is source second is destination. - register or memory location as source, register as destination 5 types: movzbw, movzbl, movzwl, movzbq, movzwq movzbw: moves zero-extended byte to word movzbl: moves zero-extended byte to double word movzwl: moves zero-extended word to double word movzbq: moves zero-extended byte to quad word movzwq: moves zero-extended word to quad word *notice movzlq is missing. movl already zero extends to higher destination, so movzlq can be just 'movl'
Each different type of processor can have a different _______ ________
assembly language
.data
changes or sets the current section to the data section
.text
changes or sets the current section to the text (or code) section
char data type, suffix, size
data type: byte suffix: -b size (bytes): 1
int data type, suffix, size
data type: double word suffix: -l size(bytes): 4
long, char * data type, suffix, size
data type: quad word suffix: -q size(bytes): 8
short data type, suffix, size
data type: word suffix: -w size (bytes): 2
Assembly is often useful for _____________ code.
debugging/fixing
destination operand of MOV instruction
destination of MOV: location in memory 64 bit
float, double data types, assembly suffix, size (bytes)
float data type: single precision suffix: -s size: 4 double data type: double precision suffix: -l size: 8
access to the lower 16 bits is possible by ?
for RAX, RBX, RCX, and RDX, access to the lower 16 bits is possible by removing the initial R (AX for RAX), and the lower byte of the these by switching the X for L (AL for AX), and the higher byte of the low 16 bits using an H (AH for AX).
RCX, RDX, R8, R9 are used for _____ and ______
ints and pointers
Intel stores bytes ______ endian
little
memory: words stored in ______-endian byte order
little
movsbw
movsbw: moves sign-extended byte to word fills remaining with copy of most significant bit in source byte b/c movs not movz
movzbl
movzbl: moves zero-extended byte to double word fills remaining with 0's b/c movz not movs
movzbw
movzbw: moves zero-extended byte to word fills remaining with 0's b/c movz not movs
_______ is the "name" of the instruction in assembly language, which does a certain kind of operation on the processor: for example, mov (which moves data), add, jmp, mov, etc.
opcode
Most instructions have one or more ________ specifying the source values to use in performing an operation and the destination location into which to place the result.
operands
Data Size Assembler Directives .quad value .long value .word value .byte value
places the given value, (0x prefix for hex, no prefix for decimal) in memory, encoded in 8 bytes for .quad, 4 for .long, 2 for .word, and 1 for .byte
Assembler directives format
preceded with a '.' i.e.) .globl .quad value .size .rodata
pointers are stored as _____ ______?
quad words - 8 bytes (64 bits)
.rodata
read only data comes after .section
source operand of MOV instruction
source of a MOV instruction: immediate value (stored in register or memory) 32-bit two's complement numbers, sign extended to destination of 64-bit
store VS load in load-store architecture (RISC)
store: from register to memory load: from memory to register
In AT&T format, the size of the operands is specified with
suffixes, q, l, w, and b appended to the opcode q - quad word, or 64 bit operand l - long word, or 32 bit operand w - word, or 16 bit operand b - byte, or 8 bit operand
true/false: labels (for functions or data) in assembly language source code are followed by a colon
true
Program Counter (PC)
- %RIP - memory address of next instruction to be executed - in CPU
%RIP
- 64-bit instruction pointer - points to next instruction to be executed
cltq
- = movslq - moves 4 byte to 8 byte - no operand - source always %eax - destination always %rax
Load-Store Architecture
- a feature of some computer architectures where "operate" instructions do not have memory operands; their operands are found in CPU registers - in a RISC ISA: only instructions that can affect memory are load (from memory to register) & store (from register to memory) instructions
Immediate operand
- first type of operand - constant values - written as: $ followed by int in C notation
Instruction Set Architecture (ISA)
- format and behavior of a machine-level program is defined by the ISA. - programmer visible machine interface such as instruction set, registers, memory organization, and exception (i.e. interrupt) handling. - 2 types: RISC (smartphones) and CISC (processors) - define processor state - format of instructions - effect each of these will have on state
register file
- heavily used program data - in CPU
condition code registers
- holds status info about most recently executed arithmetic or logical instruction - used to implement conditional branching or changes in the control or data flow, such as if and while statements
Assembly data types
- integer data (1,2,4, or 8 bytes): data values, pointers - floating point data (4,8, or 10 bytes)
OF
- overflow flag - bit = 11 - set if overflow on signed operation
1 byte = ____ bits
8
.globl
A directive needed by the linker for symbol resolution: followed by name of function
Are we using AT&T or Intel syntax for x86 in this course
AT&T
.file
Allows a name to be assigned to the assembly language source code file.
By replacing the initial R with a(n) __ on the first eight registers, it is possible to access the lower 32 bits
By replacing the initial R with an *E* on the first eight registers, it is possible to access the lower 32 bits (EAX for RAX) Similarly, for RAX, RBX, RCX, and RDX, access to the lower 16 bits is possible by removing the initial R (AX for RAX), and the lower byte of the these by switching the X for L (AL for AX), and the higher byte of the low 16 bits using an H (AH for AX).
opcode suffix
Data Movement and Arithmetic/Logic operations in AT&T format use an *instruction opcode suffix (q, b, w, or l)* to specify operand size. b = 1, w = 2, l = 4, q = 8
Memory Addressing Modes: Displacement
Displacement D(R) Mem[Reg[R]+D] R, register specifies start of memory region D, constant displacement specifies offset movq 8(%rbp), %rdx
