FIT1047 - Week 4, 6
Issues with ACLs
- Access control matrices / ACLs don't scale well - 1000 staff and 200 applications means 2 million entries to be managed - Use groups or roles to manage privileges of large sets of users - Role-based access control RBAC - Example Healthcare: Doctor, Nurse, Administation etc - ACLs are easy if users own their files and can manage access rights for these files - Difficult in distributed systems. Large and dynamic sets of user difficult to manage - Central authorities need to keep track of all resources that users have access to - Another option: use capabilities (rows in the matrix)
what are access controls
- Access controls is basically logging into a system and having access to certain things. For example, Monash staff have access to different things compared to students. One of the main questions in cyber security is about who (persons, processes, devices, etc.) has access to which resources in the system. Resources: read files, execute programs, change data-base content, share data with others, etc. This is all about ACCESS CONTROL
ACCESS CONTROL IN ENTERPRISE APPLICATIONS
- Can enforce protection properties - Controls access to resources, data-bases, transactions, etc. - Can be role-based (not just user-based)
ACCESS CONTROL MATRIX
- Columns are access control lists ACLs - Rows are capabilities
ACCESS CONTROL ON OPERATING SYSTEM LEVEL
- Distinguish the users, groups of users - Controls access to files, ports, devices and other resources. - User authentication (e.g. password, fingerprint) - Allocate processes to users and enforce separation - OSs can support complex policies for individual programs (e.g. SELinux)
authentication of transactions
- E.g. for money transfer in banking - Transaction numbers (TANs) are not limited to actual transaction - SMS TAN can show info on transaction. Two devices need to be manipulated - TAN generator reads barcode from screen and generates TAN linked to transaction
computers - what is a multi-user system
- Every computer has an administrator account and a user account. Logging is as the admin gives you the power to download software and all those kinds if things, and logging in as a user gives you access to other things such as your photos and gmail etc. The admin also has access to these things too. This is basically a multi-user system, where there are many users separate to others and as an admin you can see the data of all users.
additional security measures with access control
- Hard disk encryption - Virus protection - Backups - Security updates - Trusted computing (special security hardware)
How to authenticate a person
- Identify them at login - Authenticate particular transactions The most common form of authenticating someone's identity is using passwords
single sign on
- Just log in once and access many services (e.g. Monash University authcate) - Very convenient. High usability - Single point of failure. Needs secure implementation and high level of control - Main goal of access control: limit the damage that can be done by users, groups of users - Privilege escalation is a goal for attacks - Many ways how access control can go wrong
BASIC FILE PERMISSIONS (Linux)
- Main actions are read, write, execute - Can be defined for owner, group, all users
what is multi-factor authentication
- Multi-factor authentication combines different ways of authentication. This is something like typing your password, and then getting a text with a code you need to type in as well etc.
what are the problems with passwords
- Passwords re-use - Weak passwords - Can be stolen through phishing/malware - Stored passwords - Difficult to remember / reset processes - Pre-configured passwords sometimes devices such as a router already has a password that is set by itself
ACCESS CONTROL ON APPLICATION LEVEL
- This is what user usually can see (and also configure) - Often complex security policies - Enterprise applications: Staff with various roles, and fine-grained access to transactions - Social networks: Rather complex rules on who can see, copy, forward, search what data.
digital signatures/authenticity
- This time the secret key is used to do the signature, and the public key is used to check it. The public key is distributed, then the message is signed using the private key, and then the message has to be sent as well as the digital signature, and then the recipient can take the public key to verify the sender has signed it. As long as the sender keeps the private key private, this method is safe - However, to know that the private key is actually private and from the right person, there are certificates
what can go wrong with access control
- Weakness in software, interfaces, protocols - Physical attacks - Race conditions, feature interaction problems - Connect devices (USB) - Social engineering
access rights
- r permission to read - w permission to write - x permission to execute - - no permission at all Xrqxr_x__ someuser somegroup some-file - Owner (someuser): xrw - Group (somegroup): xr_ - All users: x__
RSA key generation
1. Generate two large prime numbers p and q. 2. Let n=p∗q and m= Φ(n)=(p-1)(q-1) 3. Chose a small number e co-prime to m, i.e. GCD(e,m)=1 and 1<e<m 4. Find d such that e∗d mod m=1 5. Publish e and n as public key and keep d and m as secret key.
what is a good way to store a password
A good way of storing a password is by using hash functions: Taking a password and hashing it, and then storing the hashed version on the computer is a good way of storing passwords. However, this is actually not a good idea of storing passwords. This is because passwords are kind of easy to guess what it is while using hash values. Searching up the hash value on google will usually result in you being able to see what it is. This is why it's a bad idea to store the hash on the computer A better way of doing this is using a salted hash. This is by taking the hash and putting a random number at the end and making it longer, just concatenate it, and then hash that. If someone now wants to work out the password, they need to use a dictionary (not English one) to find out all the values and that would take a really long time. - The password if the most commonly used way of authentication
modern I/O
A modern computer (and this includes things like smartphones and washing machines) features many different I/O devices. Things that come to mind as typical input devices include keyboard, mouse, touch pad, touch screen, voice control, gestures, cameras, fingerprint sensors, iris scanners, accelerometers, barometers, or GPS. Output devices are things like screens, printers, audio, and robotic actuators. But other components such as external storage (hard disks) and network devices (WiFi, 4G, Bluetooth) are also classified as I/O. Today, most I/O devices communicate with the CPU via standardised interfaces. The most common ones are USB (for which you probably know quite a few applications), SATA (for connecting hard disks), DisplayPort and HDMI (for displays), or PCI Express (for expansion cards). What characterises a standard interface is • a standardised set of connectors (i.e., the physical dimensions of plugs and sockets) • a standardised electrical behaviour (defining the "meaning" of the wires in the plugs) • standardised software protocols (so that you can use the same device with any computer) I/O devices can be connected to the CPU internally (i.e., in the same case and possibly on the same printed circuit board), or externally (e.g. using a plug and cable). In either case, the device would use a standard interface (for example, the popular Raspberry Pi "single-board computer" ships with an Ethernet network interface that is soldered onto the same circuit board and connected via an internal USB interface, without using any cables and plugs).
what is pseudocode and how is it useful
A pseudocode is a program written in a "programming language" that doesn't exist. It is useful when you plan a program, since you can just write down the general structure without having to use the exact syntax of a particular programming language. In this case, the program consists of four steps that are supposed to be executed in this
Mandatory access control MAC:
A system (OS, Database management system) enforces pre-defined access policies
AddI X meaning
Add value pointed to by X into AC
Add X meaning
Add value stored at location X to current value in AC
addressing memory locations
An address is simply an unsigned integer that references one unique memory location. Addresses are usually consecutive numbers starting at 0 and ranging up to the number of distinct memory locations (minus one). In most architectures, one memory location stores one byte. Consequentially, each location, each byte, needs its own address. This is called byte-addressable memory. In some architectures, such as MARIE, one memory location stores one word. Each address therefore references a whole word, and we call this word-addressable memory. Recall that words in MARIE are 16 bits (or 2 bytes) wide, but other systems may use word sizes such as 32 bits or 64 bits.
register transfer language
Assembly language (and machine code) already look quite low-level. You may have the impression that they are the "atomic" operations that CPUs work with, i.e., that each instruction is executed by the CPU in a single step. That, however is not the case, and we already got a glimpse of this idea in the module on CPU basics and History: The fetch-decode-execute cycle defines a sequence of even lower-level steps that each instruction can be broken down into. We will now introduce register transfer language(RTL), which helps us define these lower-level steps. check notes for exact info
other uses of public key cryptography
Based on the basic mechanisms, many cryptographic protocols and security applications have been developed. Some examples: • electronic cash • non-repudiation protocols • fair exchange protocols • electronic voting • multi-party key agreement
control signals
Before we move on to implementing this architecture using actual circuits, we will add one extra level of information to the RTL, the control signals that the control unit needs to generate in order to implement each RTL step. Each control signal is a particular set of "wires" of the control bus that can switch a component on or off or select its mode of operation (for example addition vs. subtraction in the ALU).
disadvantages of biometrics
Biometrics have high usability, but it's not really secret information, anyone knows it technically. It can't be revoked or replaces and there is no pseudonyms/ anonymous access
INPUT/OUTPUT DEVICES
Computers are completely useless without some form of input and output. We need input devices to get data and programs into the machine, and output devices to communicate the results of the computation back to us.
Jump X meaning
Continue execution at location X
JumpI X meaning
Continue execution atnlocation pointed to by X
Discretionary access control DAC:
Depending on their rights, users can change ACLs and revoke or give rights to other users
what is public key cryptography
General idea: Based on a "hard" mathematical problem and a large random number, a key-pair is generated, such that the private key cannot be derived from the public key without solving the underlying mathematical problem. Every principal owns a unique pair of keys.
early I/O
In the earliest computers, input often consisted of hard-wiring the programs and data, or of simple switches, and soon punched paper tape or cards. In fact, punched tape and cards predate digital computers by more than two centuries! They were used to control automatic looms (the first records seem to be from around 1725, and the first fully automatic machine was the Jaquard Loom from 1801), and later to tabulate data such as census records. teleprinters were then adapted as output devices for early computers as well. Their typewriter keyboards could also be used to create punched paper tape for input.
the marie architecture
Let us now make things much more concrete, and introduce a particular machine architecture called MARIE. Compared to real architectures, it is very, very simple: • Words are 16 bits wide • There are only 16 different instructions • Each instruction is one word (16 bits) wide, composed of a 4-bit opcode and a 12-bit address • There is a single general-purpose register
E.G. FOR ACCESS CONTROL POLICIES
Lets look at a very simplified bookkeeping system. It consists of: - Operating system - Accounts program - Accounting data - Audit trail
LoadI X meaning
Load from address pointed to by X into AC
Load X meaning
Load value from location X into AC
assembly language in MARIE
Machine code is obviously hard to write and read. Instead of dealing directly with the 4-bit opcodes, we introduce a mnemonic for each opcode that can be easily remembered and recognised. We also call these mnemonic opcodes assembly code, and an assembler is a tool that translates an assembly code program into real machine code (it is basically a very simple compiler!). Here's an overview of part of the MARIE instruction set. The X in the instructions stands for the address part.
IDEAL CRYPTOGRAPHIC HASH FUNCTIONS
Need to have the following properties: • Computing a hash value for a message needs to be fast and use low resources. • Given just a hash, it is infeasible to find the original message (except by trying all possible messages) • Hashes for similar messages should not be correlated (small change in message -> large change in hash) • Infeasible to find collisions (i.e. two messages with the same hash).
hardware tokens
Not popular, but secure. It's another piece of hardware that is used for authentication. The computer can still be attacked though.
Output meaning
Output current value of AC
memory organisation in RAM etc
RAM modules are made up of multiple chips. Each chip has a fixed size L×W, where L is the number of locations, and W is the number of bits per location. For example, 2K × 8 means 2 × 210 locations of 8 bits each. RAM chips are combined in rows and columns to construct larger modules. For example, we can use thirty-two 2K × 8 chips to build a 64K × 8 memory module made up of 16 rows and two chips in each row: Let's now assume that we want to use this RAM in a computer that uses byte-addressable memory. Since we have 64 × 210 = 216 locations, we will need addresses that are 16 bits long. So how does the hardware know which RAM chip it should use when a given address is requested? We split up the addresses into two parts: • Use the "highest" (leftmost) 4 bits to select the row • Use the "low" (remaining) 12 bits to select the byte in the row This is called memory interleaving (and in particular, high-order interleaving if the highest bits are used to select the row, and low-order interleaving if the lowest bits are used). In modern architectures, interleaving can significantly improve memory performance, because it can allow the CPU to address several different memory chips at the same time (e.g. one for reading and another one for writing). In a word-addressable architecture, we can of course use the same technique. Assuming that our example 64K × 8 RAM module is accessed by an architecture that uses 16 bit words per location, we only need 15 bits for the addresses (since each address now represents twice the amount of memory!), and the leftmost 4 bits of an address still indicate the row while the remaining 11 bits indicate the word inside the row.
RAM
RAM stands for Random Access Memory, which emphasises that the CPU can access any memory location in RAM (i.e., read from them or write into them) in basically the same amount of time. This is different from, e.g., a hard disk, where it may be very fast to read the data that is currently under the read/write head, in a sequential fashion, but it can be very slow to read arbitrary pieces of data that are stored in different physical locations on the disk.
Input meaning
Read user input into AC
Clear meaning
Set AC to 0
instructions in MARIE
Since each location in MARIE memory can hold a 16-bit value, one instruction fits exactly into a memory location. Now we could simply make a list of all the instructions we need, and assign a 16-bit pattern to each individual instruction. But Instruction Set Architectures are typically constructed in a much more structured way, to make it easy to implement the Control Unit hardware (which is responsible for actually decoding the instructions). In the case of MARIE, the leftmost 4 bits in each instruction represent the opcode, which tells us what kind of instruction it is. The remaining 12 bits contain an address of a memory location that the instruction should work with. For example, the opcode 0001 means "Load the value stored at the address mentioned in the remaining 12 bits into the AC register". With that information, we can now try to understand the first value in the memory dump: the 0001000000000100 begins with the opcode 0001, so it is an instruction to load data from memory, and the address to load from is 000000000100. This is of course the binary number for decimal 4. When the CPU executes this instruction, it will therefore load the value currently stored at memory address 4, and put it into the AC register inside the CPU, so that further instructions can use it. So what will be the value of AC after executing this instruction?
Skipcond X meaning
Skip next instruction under certain condition (depends on X)
Halt meaning
Stop execution
Store X meaning
Store value from AC into location X
StoreI X meaning
Strore AC into address pointed to by X
Subt X meaning
Subtract value stored at location X from current value in AC
disadvantages of symmetric cryptography
Symmetric cryptography is very efficient, but has a number of disadvantages: • Key distribution: somehow, one needs to establish a shared secret. An alternative secure channel for key distribution is necessary. • Scalability: Each pair of sender and receiver needs a unique secret key. The number of keys grows exponentially with the number of participants (12 participants need 66 keys, 1000 need 499,500 keys and a million participants need an unrealistic 499,999,500,000 keys) • Non-repudiation is not possible is each side owns the key, it's hard to say which side created the message
what registers does the MARIE architecture contain
The MARIE architecture contains the following registers: • AC (accumulator): This is the only general-purpose register. • MAR (Memory Address Register): Holds a memory address of a word that needs to be read from or written to memory. • MBR (Memory Buffer Register): Holds the data read from or written to memory. • IR (Instruction Register): Contains the instruction that is currently being executed. • PC (Program Counter): Contains the address of the next instruction.
PC and IR registers
The PC and IR registers are used by the CU to keep track of which part of our program is currently executing. The IR contains the currently executing instruction, while the PC contains the address of the next instruction to be executed after the current one has finished. The data paths are just the general architecture, i.e., they show us how the different components are connected. The next section defines, for each individual instruction, in what order data and addresses need to be transferred between registers and memory.
MBR
The data bus can transport individual words of data between the memory, the registers and the ALU. What is not shown in the picture is that the only way to get a data word from memory into a register, or from the registers into memory, is via the memory buffer register MBR. So even for an instruction like Load 005 the CPU can't directly transfer the value at address 005 into the AC register, it first has to load it from memory into MBR, and then from there into AC.
data paths and its relation to MARIE
The data paths in a CPU describe how the different functional units, in particular the registers and the ALU, are connected. The hardware implementation of the data paths is the system bus (or simply bus), the set of "wires" in the CPU that connects all components. We will use the MARIE architecture as an example here, but most modern architectures are quite similar. The module on MARIE mentioned that the architecture has five registers. We have seen how the AC register is used as a temporary storage location for almost all instructions, so let's now explain the function of the remaining registers. The following figure illustrates the data paths in the MARIE architecture.
MAR
The green address bus connects the memory with the MAR, the memory address register. It is responsible for selecting the memory address that the CPU reads from or writes to. Let's take the example of Load 005 again: the CPU has to put the value 005 into MAR, which tells the memory to "activate" address 005.
what is memory
The module on CPU basics and History already briefly explained the concept of memory: it's a sequence of locations, each of which has an address (consecutive integers, usually starting from 0), and each location can store one data value of a fixed width (i.e., a fixed number of bits). The CPU can read the value currently stored at a location, and overwrite it with a different value.
control bus
The red control bus doesn't transport addresses or data words. It is used by the Control Unit (CU) to select different modes of operation on the other components. For example, it is not enough for the memory to know that the address it should use is 005, it also needs to know whether it should transfer the current contents of 005 into the MBR (as for a Load instruction), or rather go the opposite direction and transfer the current data word from MBR into address 005. The control unit can therefore switch the memory from "read mode" into "write mode", and that switch is one of the signals on the control bus. The CU also needs to be able to tell the ALU which operation to perform, e.g., whether to add or subtract. This is also done via the control bus. Finally, the CU needs to select which register to read from and write to. E.g., in a Jump 102 instruction it would have to write the value 102 into the PC register, but in many other instructions, it has to read from or write to the AC register.
subroutines
There's only one MARIE instruction we haven't covered yet. It is called JnS X, which stands for "Jump and Store", and it's main purpose is to enable writing subroutines. A subroutine, also known as a procedure, function or method in other programming languages, is a piece of code that • has a well-defined purpose or function • needs to be executed often • we can call from our code, passing arguments to it • returns to where it was called from after it has finished, possibly with a return value (or result) Examples for common subroutines in high-level programming languages are System.out.println in Java, which takes a string as its argument and prints it to the console, or math.log in Python, which computes the logarithm of its argument and returns it. Subroutines are probably the most important concept in programming: they allow us to structure a program, breaking it up into small parts. Of course, since all high-level languages are, in the end, executed by machine code instructions, most ISAs have instructions that make it easy to implement subroutines directly in machine code. In MARIE, JnS X stores the address of the next instruction (i.e., the one immediately after the JnS) into the memory location X. It then continues execution at address X+1. The actual subroutine code is then stored at X+1, and it concludes with the instruction JumpI X (an indirect jump) that jumps back to the address stored at X. check notes for more info
user mode vs. kernel mode
User mode is where your stuff is running, and Kernel mode is access to fire systems and other controls are running. This is separate due to access controls. If the user mode wants to access a file, it needs to call on the kernel mode to find out if its allowed. The system call is when the user is asking the system if it can access a file. - Keep in mind that the computer needs to store the password in file so that it can check to see if what you typed in was the correct password
indirect addressing
We've already covered nine out of the sixteen possible MARIE instructions (remember we only use four bits for the opcode). Most of the remaining instructions are just variants of the ones we've already seen, but they are different in an important way. Let's look again at the Load X instruction. It directly loads the value stored at address X into the AC register. What this means is that we can only use fixed, precomputed addresses that are hard-coded into the instructions. But a typical coding pattern is to store data in an array, i.e., a sequence of consecutive locations in memory, often without a fixed length. For example, for a Twitter application it may be enough to say that the text for a tweet is stored at memory locations 100-239 (since tweets are limited to 140 characters), but what about an email application? With the fixed-address instructions we've seen so far, there is no way to loop through all the characters in the email text (e.g. in order to print them onto the screen). The solution to this problem is to use indirect addressing. Instead of accessing the value stored at location X, we can use the value stored at X as the address at which the actual value we want to use is stored. That sounds complicated so let's look at an example. Here's what the current contents of our memory could look like starting from location 100:
how does public key cryptography work
check notes
Draw an AS to client diagram
see notes
examples of biometrics and what are they
using characteristics of your body to authenticate you - Fingerprints - Voice recognition - Iris scans
CRYPTOGRAPHIC HASH FUNCTIONS
• A hash function maps input of arbitrary length to a fixed length output. • Cryptographic hash functions are infeasible to invert. • Used in digital signatures, for storing and comparing passwords, in message authentication codes, etc.
recommended key lengths
• AES (symmetric): Currently, 128 bit is considered secure. Long term recommendations (after 2030) go towards 256 bit. • RSA (public key): Currently, 2048 bits is considered secure. Some agencies/government bodies recommend 3072 bits after 2020, others after 2030. • Recommendations from NIST, NSA and the German BSI differ in details.
random numbers in public key cryptography
• All types of cryptography need random numbers for o Key generation o Use in protocols to mark messages as new o Initialisation vectors • Many attacks on cryptography have been based on bad random numbers.
what are current hash functions
• MD5 was widely used, but is not secure. Sometimes it is still used for integrity protection. • SHA1 is better, but attacking it is much easier than brute-force. Attacks get more efficient. Is no longer recommended for digital signatures. • Current recommendations are SHA-256, SHA-384 and SHA-512
E.G. OF ASYMMETRIC CRYPTOGRAPHY: RSA
• Private key d public key e, n. --> check pg. 3, wk. 6 notes for this bit • x mod n means the remainder of x divided by n
Meaning of skipcond 000, 400, 800
• SkipCond 000: If the value in AC is smaller than 0, then skip the next instruction. • SkipCond 400: If the value in AC is equal to 0, then skip the next instruction. • SkipCond 800: If the value in AC is greater than 0, then skip the next instruction.
