IS4483-001 Module 2
Data Structures
"A data structures describe how data are laid out" (Carrier 2005:24) • Space allocated for data (static or dynamic) • Dynamic data structures: Offset for start of next data structure is calculated using length & start of current data structure • Data structure describe data types & lengths • Importance of data structures to digital forensics • Convert data into information and knowledge • Advanced tools "know" & interpret data structures • Experts must be able to test & explain tool interpretation • First step in reverse engineering application artifacts
ASCII
0x57 0x68 0x79 0x20 0x4b 0x61 0x77 0x68 0x69 • W h y K a w h i • 0x20 0x57 0x68 0x79 0x79 0x79 0x3f • W h y y y ?
Here Are the Answers!
1. Byte size: 2 bytes (word) 2. Extra byte: 0x00 (null character in ASCII table) 3. Endian order: Hard to tell! Which is the byte pair ". \" or "\." ? ◦ But if I told you this was from Windows machine with an Intel processor? ◦ Then assume little-endian unless told otherwise. ◦ So, then we know the byte pair must be "\." not ".\" 4. So, the literal text string would be "\.M.y. .D.o.c.u.m.e.n.t.s." right? ◦ Wrong! The dots would be interpreted as 0x2E since the "period" in ASCII table is 0x2E. ◦ You would have to search via literal hex string: 0x5C 00 4D 00 79 00 20 00 44 00 6F 00 63 00 75 00.... 5. 0x00 looks like a period or a dot in hex editors.
Boot Sequence
1. POST (power-on self test) ◦ Tests primary components (power supply, CPU, ROM, memory, keyboard, etc.) 2. System initialization, CMOS/BIOS check ◦ Date/time settings, hard-disk settings, what hardware to boot from, etc. 3. Load operating system ◦ Reads master boot record (MBR) (1st sector of 1st, 2nd, etc. device in boot order list in BIOS) ◦ Loads OS or find location of bootable partition (VBR) ◦ Go to VBR and find locations of boot files/binaries
Which of the following represents ONE NIBBLE? Select all that apply.
1111, 0001
How many 256's (162 place) would contribute to the decimal equivalent of \x3F2D?
15
How many 16's (161 place) would contribute to the decimal equivalent of \x3F2D?
2
if you needed to store the number \x42 in a little endian, WORD data structure, which of the following would be done right (stored this way on disk, as if viewing in a hex editor)?
3234585C
Decimal (Base 10)
84 • 2018 • Any of the other numbers we're used to on a normal basis
What is the qword (little endian) at offset 0x58?
Answer: 0x000080945453494c
What is the word (little endian) at offset 0xb4?
Answer: 0x01e0
What is the dword (little endian) at offset 0x6c?
Answer: 0x73646976
BIOS
BIOS = basic input/output system • Facilitates system bootstrap ("boot," start-up) • Firmware (software embedded in hardware device) ◦ CMOS (Complementary Metal Oxide Semiconductor) • Read-only memory (ROM) ◦ Can "flash" the hardware to change s/w on it ◦ "Flash" refers to electrically erasing and rewriting flash memory • Very low power draw from on-board battery • Provides an interface between operating system and system hardware (via library of interrupts)
Numbering Formats/Systems
Binary, hexadecimal, and decimal ◦ Base 2, base 16, and base 10 respectively ◦ "Base" refers to number of different "digits" (alphanumeric) in system ◦ Each "place" corresponds to the power of base number ◦ Examples: 100s, 10s, 1s, etc. ... 102, 101, 100, etc. • Bits and bytes ◦ Each binary digit = "bit" ◦ Eight bits = "byte" ◦ One byte = two "nibbles" = one hex pair
Hexadecimal (Base 16)
Convert the following Binary Nibble to Hex: 1. 1010 2. 1110 • Convert the following Binary Byte to Hex: 1. 11110111 2. 10101010
Hexadecimal (Base 16)
Convert the following Binary Nibble to Hex: 1. 1010 = 0xa 2. 1110 = 0xe • Convert the following Binary Byte to Hex: 1. 11110111 = 0xf7 2. 10101010 = 0xaa
Binary (Base 2)
Convert the following Binary to Decimal: 1. 0101 2. 1111 3. 00110011 4. 11111111
Binary (Base 2)
Convert the following Binary to Decimal: 1. 0101 = 5 2. 1111 = 15 3. 00110011 = 51 4. 11111111 = 255
Hexadecimal (Base 16)
Convert the following Hexadecimal to Decimal: 1. 0x0c 2. 0xd5 3. 0xff
Hexadecimal (Base 16)
Convert the following Hexadecimal to Decimal: 1. 0x0c = 12 2. 0xd5 = 213 3. 0xff = 255
Endianness (Endian Ordering)
Definition ◦ Byte order of multi-byte data structures • Big endian ◦ Read bytes left to right ◦ Most significant byte stored in low address (comes first) • Little endian ◦ Read bytes right to left ◦ Least significant byte stored in low address (comes first) ◦ So - why use little endian? ◦ Same value can be read at same address at different data structure lengths - low level programming optimization
Data Sizes
Digital data are stored in allocated "space" • Bytes usually smallest space allocated • Bytes can be grouped (usually 2, 4, or 8 bytes) • Terminology for multi-byte data structures • 1 byte = byte • 2 bytes = word • 4 bytes = dword (double word) • 8 bytes = qword (quadruple word)
We do not use unicode in US based computers because our alphabet is short and fits into single byte data structures via ASCII.
False
Which of the following could be used to hide data?
Host protected area (HPA)• LBA1-LBA62 on a DOS-based partitioned system• Device configuration overlay (DCO) space
Big vs. Little Endian: An Example
How can we represent the decimal 1025 as a hex number? ◦ 16^3=4096 ◦ 16^2=256 ◦ 16^1=16 ◦ 16^0=1 ◦ 1025/256 = 4 with remainder 1 ◦ So, 1025 = 0x0401
Binary (Base 2)
Nibble • 0010 • 0111 • Byte • 00101011 • Notice it's double the length/amount of numbers than the nibble
ASCII
Notice that we did not say an ASCII printable character. • When we say ASCII, we are usually referring to the subset of the ASCII table that equals printable characters. • However, if you think about the media at the byte by byte level (i.e., in a hex editor), printed "text" in a hex dump still shows ASCII symbols...but it's not necessarily ASCII text represented.
Importance of Endianness
Particularly important when data structure stores a number (e.g. base10: 789 ≠ 987) • Example: date/time stamps • Stored as number of time increments since epoch • Must know endian so read/convert number correctly • Epochs vary between OS/FS • UNIX: # of seconds since midnight, 1 Jan 1970 (UTC) • FAT: 1 Jan 1980 (two counters: days, seconds) (local) • NTFS: # of 100 nanosec units from 1 Jan 1601 (UTC) • Mac (≤ v9): # seconds since midnight, 1 Jan 1904 (local) • Mac OS X: # of seconds since midnight, 1 Jan 1970 (UTC) • Don't forget to convert to local time zone! • Must know/remember time zone rules for area and year!
Note
See Carrier's comment mid-way down on pg 25: ◦ "A '.' exists where there is no printable ASCII char for the value." • This is misleading. • Hex editors will print some "non-printable" ASCII characters with regard to printable vs. non-printable delineation on the ASCII table (as shown before with the My Documents example)
A Look at Big and Little Endian
This video provides an overview of big and little endian (Cote, 2011).
Unicode - Why Do We Care?
To search for non-English text strings • To search for filenames and directory names • Look at the hex table on the next slide and answer the following questions. 1. What byte size is used? 2. What is the extra byte filled with? 3. What is the endian ordering? 4. What would the literal string be that is searched for? 5. What does the null character (0x00) show up as in the hex editor?
Big and Little Endian: Another Look
Watch this video for some more examples of big and little endian (Delgado, 2016).
Back to ASCII
What does this say?? • 0x57 0x68 0x79 0x20 0x4b 0x61 0x77 0x68 0x69 0x20 0x57 0x68 0x79 0x79 0x79 0x3f
The Importance of Endianness (cont'd.)
Why 1601? ◦ This epoch is the beginning of the last 400-year cycle by which leap-years are calculated in the Gregorian calendar, which was followed by a new 400-year cycle beginning with 2001. • Converting to local time zone ◦ Note: dates/times stored in <= Win98 stored as local; post-98 = zulu) ◦ Windows registry (HKEY-local machine-current control set) has local time zone setting. ◦ Raw data is stored as Zulu (GMT, UTC), but Windows interface interprets the times based on registry setting.
The Importance of Endianness (cont'd.)
• BUT... this is only set/saved for the current year. We don't really know the setting in previous years. • Remember, some states don't participate in DST, and rules change now and then (i.e., 2006). • And, ultimately, Windows gets its time from BIOS (unless you change it while running, of course), so getting the local time from BIOS upon seizure and noting whether it's accurate or not is very important.
Hard Disks
• Bit value is 1 or 0 • Rigid platter disk technology ◦ Electromagnetic resonance polarity (+ or - ) ◦ Platters are aluminum or glass, coated in silicon or carbon • Solid state technology ◦ Semiconductor chips, not magnetic media ◦ Like memory, but non-volatile (data persists w/o power) ◦ Binary value determined by current flow across transistors and gates • Common types ◦ IDE (Integrated Drive Electronics) ◦ SCSI (Small Computer Systems Interface) ◦ SATA (Serial ATA (Advanced Technology Attachment)) • Controller & cable needed for I/O
Physical Geometry
• CHS = cylinder, head, sector ◦ Head: r/w device; #s refer to platters ◦ "Platter" = side of physical platter ◦ Track: concentric circles on platters ◦ "Cylinder" = "column" of tracks (same track on multiple platters) ◦ Sector: smallest block of data on a track ◦ Typically 512 bytes (low level format)
Hexadecimal (Base 16)
• Characters are: 0-9, a-f • A = 10 • B = 11 • C = 12 • D = 13 • E = 14 • F = 15 • Examples: • 0x10 • 0x23 • 0x3a
Master Boot Record (MBR)
• Executable • Stored at beginning of 1st sector of device • Reads partition table to determine which partition is the boot partition • Transfers "control" to boot sector of boot partition
Physical Geometry (cont'd.)
• Given the counting scheme here, what would be the address of the MBR? (000 or 001, depending...) ◦ CHS usually starts counting with 1 ◦ LBA usually starts counting with 0 • Track & cylinder are clearly rather synonymous. • CHS data is like GPS or latitude/longitude coordinates.
Big vs. Little Endian: An Example (cont'd.)
• Here's another issue: ◦ What if the data structure is multi-byte in length, and longer than needed? Where do you pad the number with null (zeroes)? ◦ What if we stored it as a four-byte data structure (dword)? ◦ If it's big endian, then we add leading zeroes: 0x00 00 04 01 ◦ But if it's little endian, then we add trailing zeroes: 0x01 04 00 00
For ASCII Wars...
• I'll be providing screenshots of hex dumps with a theme • Try to work on translating the hex to ASCII and figure out the answer before going to the next slide, which will have the answer • Remember, google is your friend to find a Hex to ASCII chart! • Note: I'll mention the offset and how long it is... use this to your benefit.
Interfaces
• IDE ◦ Used in older systems (but not too old) • SATA ◦ Serial ◦ Most common interface today • SCSI ◦ Servers, non-PC systems ◦ Older systems
UEFI ... The Latest BIOS
• Modularity - platform with hardware independence ◦ Processor/OS independent device driver environment • No need for dedicated boot loader ◦ OS loader is an EFI application • EFI shell environment ◦ Can do a lot more without booting an OS • Flexibility ◦ Can accommodate technological advancements • Supports GPTs vs. DOS-based partitioning ◦ See Volume & Partition Analysis Slides
Big vs. Little Endian: An Example (cont'd.)
• Now, how is 0x04 01 represented ON DISK? ◦ This is a two byte hex number. ◦ So, it would be stored as 04 01, right? ◦ Computers SURELY must read left-to-right, not right-to-left, like English, right?? ◦ NOT NECESSARILY!!! • If the machine reads the higher byte first (right-toleft), then we will have 0104. ◦ \x104 = 260 decimal ◦ \x401 = 1025 decimal
Data Addressing
• Offset ◦ Absolute offset (physical; relative to start of disk) ◦ File offset (logical; relative to start of file) ◦ Offset (physical or logical; question reference point) • Physical sector (PS, aka sector offset) ◦ Uses logical block addressing (LBA) ◦ "Stream" sectors in order, across entire physical CHS geometry ◦ CHS 0,0,0 = LBA 0; CHS 0,0,1 = LBA 1; CHS 0,0,2 = LBA 2; and so on...
UEFI
• Puts a dummy MBR at sector 0 for backwards compatibility ◦ Because of boot sector viruses and continued use of MBR technology, there are still virus detection tools that check sector 0 for compliance with MBR data structure rules and when they aren't followed, then the program will take a backup copy of MBR and overwrite sector 1 or other data/ actions. ◦ Format of first sector of UEFI disk is different from the MBR format, so there is a risk of a virus detection system seeing a UEFI sector at sector 0 and overwriting it! • Then sector 1 is functionally the MBR w/ sectors 2-? hold partition table info. • More to come on UEFI in volume/partitions block....
Booting Up
• This video explains the BIOS and the boot process and discusses security issues as well (Dion, 2016).
String and Character Encoding
• Unicode ◦ Purpose is to support extended character sets ◦ i.e. languages other than American English • Unicode encoding schemes ◦ UTF-32: dword per character ◦ UTF-16: most stored via word; lesser used characters are stored via dword ◦ UTF-8: uses 1, 2, or 4 bytes per character as needed ◦ Variable length schemes (UTF-8 & 16) save space, but are more difficult to process ◦ ... and others
How Does a Hard Drive Work?
• Watch Nick Parlante's video of how a hard drive stores binary. (Parlante, Stanford, 2012) • And below is how a typical laptop hard drive looks (Amos, 2013)
Numbering Formats/Systems
• Why do digital forensics investigators care? ◦ Computers use binary ... voltage level is high or low ◦ Hex is how you "see" & read physical level media