FIN 691 Technology Exam
Challenges to Applications in Finance
-Access to data -Bad data, Bias, Not representative -Bad analysis, e.g., "overfitting" a regression analysis -Lack of testing in real world
Factors promoting machine learning
-Big Data - sources and low cost processing and memory -Internet collaboration, API's -Automation of more and more business processes -Advantages gained
Payments rails versus Currency
-Bitcoin could be much faster by simply making the block time shorter or making the blocks larger. Why don't they do this? -Because decentralization is more important than speed. Suggested changes that threaten decentralization are usually voted down. -Think of Bitcoin (Big B)(software, payments network) like Visa and bitcoin (little b)(currency) like the dollar. Little b bitcoin can be traded on any other platform eventually. The critical think is to preserve its core value.
Digital Signatures
-Digital signatures are a type of e-signature that uses cryptography -Not all e-signatures are digital signatures
Aadhar in India
-Government issued, biometric based Unique ID card -Over 1 billion people have signed up (~90% of the population) -Aadhar means foundation or baseAadhar in India -Uses photographs, iris scan and fingerprints. -Individuals of any age. -Voluntary, but required for certain government benefits. -Free and valid for life. -Used by many governmental and private services
Other types of consensus algorithms
-Instead of proof of work (or computing power), why not use proof of something else that doesn't use so much power? -Proof of stake - nodes are randomly chosen to validate blocks based on how much of the coin they already own. -Delegated Proof of Stake - A small number of nodes (for example 21) are chosen to provide the validation service. -Proof of work is believed the be the strongest.
Primary Problems
-Intrusive -Privacy -Discriminatory or inaccessible to those with disabilities -Setup and maintenance costs -Chance of counterfeiting -Susceptible to errors -Speed/throughput - high traffic areas like airports
Nudges
-Liberty-preserving approaches that steer people in particular directions, but that also allow them to go their own way. -To be distinguished from mandates or bans - Transparent -Not coercive -Requires testing and monitoring Examples Default rules (ex., auto enrollment) -Simplification -Use of social norms (emphasize what most people do) -Increase in ease and convenience -Disclosure -Warnings (ex., graphic cigarette warning) -Pre-commitment -Reminders -Eliciting implementation intentions ("do you plan to vote") -Informing people of past choices
The incentive to mine a block: bitcoin
-Miners must solve a hashing puzzle in order to create a new block. -The first miner to solve the puzzle can add a block to the chain and earn the "block reward". -The block reward is 6.25 BTC or about $315,000.
Randomness
-Modern encryption requires the use of random numbers. -Computers actually can't generate a truly random number. "Random" numbers created by computers are referred to as "pseudorandom." -Some applications use such things as hand movements to generate adequate randomness.
Bitcoin Step 1: Decentralized network
-Network of thousands of nodes all over the world transmitting and storing transaction data. -Public and transparent. -No single node is required for the system to work.
Digital Certifications - Public Key Infrastructure
-Public key cryptography has been used since the mid-90s to create trust online. -Standardized systems have developed - Public Key Infrastructure (PKI) -At the center of PKI are Digital Certificates issued by Certificate Authorities (CAs). -The CA is trusted to authenticate a user and grant the user a public/private key pair. -The public key is distributed to people that will interact with the user.
Near Field Communication (NFC)
-Purpose: convenience -A subset of RFID (Radio Frequency ID) -For payment requires a reader to be installed at Point of Sale (POS) - This is problem. -There is technology that does not require a special reader. See loop.
Secure Socket Layer (SSL)
-SSL is an example of Public Key Infrastructure. -If a website wants the transmit data securely with encryption and to use the HTTPS symbol, it must get a digital certificate from a certificate authority, such as DigiCert, Comodo or Symantec. -The web browsers honor these CAs and deliver the information with symmetric encryption with a transfer of the key using the public/private key pair.
Biometric Identification
-Sample checked against database of references. No credential required upfront, so more convenient. Common to use iris scanning for this. -Green - match if single reference above threshold. -Amber - multiple references above threshold - human intervention needed -Red - no references above threshold. -Used for surveillance and by police to check people against watch lists
Biometric Verification
-Sample compared with only one person's reference - does it match or not. -Smart phones use this. Sample and reference may never leave device. -Airport security - matching face with passport image. -Voice recognition for customer service
Technology for creation and access
-Sensors - connected (Internet of things), remote (example power line sensor), video, microphone, etc. -Scanners - RFID, etc. -Web Scraping - Extracting data from web pages without the assistance of the owner of the web page. This is often down by automated web "crawlers." This is how search engines like Google work. -API (application programming interface) - These are tools created by the web page owner that allow others to access the data easily. APIs use a standardized format for delivery of data called JSON (JavaScript Object Notation). --Case of bank account aggregation
How long to wait for confirmation?
-Starbucks? Maybe no blocks - immediately -Something important - many more -General rule of thumb is six blocks, which takes one hour on Bitcoin.
Categories of Machine Learning
-Supervised learning -Unsupervised learning -Reinforcement learning
Assisted-GPS (A-GPS)
-The network of mobile phones and cell towers is used to help GPS with speed (TTFF) and/or accuracy. -This coordination was accelerated by the FCCs requirement that cell phone carriers deliver location data for 911 callers.
Byzantine Generals Problem
-The nodes don't know each other - no trust. -How do they agree on what transactions are valid? There are certainly dishonest nodes. How can the honest ones know that the chain is honest? -We need an algorithm that will guarantee an honest outcome so long as a certain percentage of the participants are honest.
Details
-The puzzle is to come up with a "nonce" (extra characters) that, when added to the other info and hashed, creates a hash with certain characteristics - for example, it starts with 000. -The difficulty of the puzzle is adjusted to create blocks roughly every 10 minutes. It is adjusted every 2016 blocks (two weeks). One block has 1 megabyte of data. -The reward started with 50 bitcoin and halves every 210,000 blocks. There have been 618,774 blocks mined to date (so the number has halved twice). It takes about 4 years to mine 210,000 blocks. The reward will go away in year 2140, after that time miners will only get paid with transaction fees - no more bitcoin will be created, with a max number of 21 million.
How GPS Works
29 Satellites in space orbiting the earth at an altitude of 20,000 km. -Every place on earth is visible to at least four satellites. -With vision from three satellites "Trilateration" can be used to find the location. The other satellite is used to finetune the location. -Any device with GPS can receive information from the satellites about their position and distance form the GPS device. Trilateration is then used by the GPS device to determine the location. -The time it takes for the GPS system to determine the location is called the "time to first fix" or TTFF
Pattern Matching
A good modality for authentication must be -Distinguishable -Repeatable A template is created with data points, called "minutiae", that help distinguish between different people. Machine learning is used to develop algorithms. The algorithms are adjusted to achieve a target error rate.
Four Steps of Big Data Management
Acquire: Create and/or access data from a variety of sources, potentially with different methods for each. Organize: Work with various data formats to parse the data how you want. Analyze: Queries, modeling, and building algorithms to find new insights. Decide: Make valuable decisions. Understand and verify outputs.
Reinforcement Learning
Algorithms that aim to maximize a reward/goal by evaluating the result of its actions. -Google Deepmind's Atari game playing -Google's AlphaZero chess
Supervised Learning
Algorithms that ingest labeled training data which is used to learn a general rule for predicting the labels of unseen data. The right answer is known and "tagged" by the people inputting the data. Examples: -Regression: Continuous labels. For example, income and education data. Used to predict income based on level of education. -Classification: Predicting a label. Is this a cat or a dog. 1 or 0. Fraud or not.
Unsupervised Learning
Algorithms that ingest unlabeled data, and discover organizing principles/structure in it. The right answer is not known. Helps "summarize" data by choosing representative groups or distinguishing features. Examples: -Clustering: Group data objects in to clusters according to their similarity to one another. Google's page rank. -Dimension reduction: Learn which are the important components of the data.
Artificial Intelligence (AI)
Artificial intelligence - The theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. OR The design of an intelligent agent that perceives its environment and makes decisions to maximize the chances of achieving its goal. -AI can be narrow, with only one task, such as playing chess or general (strong AI) which accomplishes many task and could potentially approach or exceed human capacities. -AI does not necessarily involved learning. AI can be rules based - for example, IBM's chess computer Deep Blue.
Size of bits
Bit 1 bit Nibble 4 bits Byte 8 bits Kilobyte Megabyte 1,024 kilobytes Gigabyte 1,024 megabytes Terrabyte 1,024 gigabytes Petabyte 1,024 terrabytes Exabyte 1,024 petabytes Zettabyte 1,024 exabytes Yottabyte 1,024 zettabytes
Ciphers
Cipher: The algorithm used to encrypt and decrypt a message. Simple Example - Substitution Cipher: For example, shift each letter of the alphabet two places to the right to encrypt and to the left to decrypt. So: hello class -> jgnnq encuuHere the "key" is 2.Mathematical Ciphers: Math is used today to create ciphers that haven't yet been broken. Remember Kerckhoff's principle, the ciphers are made public to be tested. Keys are kept private.
Cryptography - Uses
Confidentiality: Messages can be sent over unsecure networks because meaning will not be known without a "key." Authentication: The ability to encrypt a message in a certain way can be evidence of identity. Non-repudiation: The sender can't deny sending the message if it is encrypted using a key only the sender knew. Integrity: Can make messaging "tamper evident"
Price of data
Data is now cheap to keep forever today Organizations now store information they used to delete. Bits per square inch of memory hardware has increased from 2 thousand in 1957 to 1 trillion today. Supposedly it would cost less than $30 million to store all US phone calls for one year.Note: All based on technology that creates storable 1s and 0s.
Security of cryptographic algorithms
Cryptographic algorithms are usually made public. In cryptography it is better to make the algorithm public so that it can be tested by everyone. If there is a "key", this should be kept secret, but not the algorithm. This is known as Kerckhoff's principle. The most secure hashing algorithms are believed to be only breakable with brute force computing power. This is not possible with current computers. New algorithms may need to be put in place if computers become more powerful, for example with quantum computing.
Cryptography - General
Cryptography: "secret writing" With cryptography secrets are publicly transferred, but hidden through encryption. Compare "security through obscurity" - where secrets are hidden. This is generally viewed as impossible on a computer network.
Goals of Hashing Algorithm
Deterministic: The output will always be the same with the same input. -Computationally efficient: A computer can execute the function quickly. -Pre-image resistant: It is infeasible to determine the input from the output. -Avalanche effect: A slight change in input results in a large change in output. -Collision resistant: It is infeasible that two different inputs will result in the same output.
Electronic Signature Law
ESIGN Act in the United States - 2000 An electronic signature is valid if four criteria are met: 1)Intent to sign 2)Consent to do business electronically 3)Association of signature with record 4)Record Retention
Process
Enrollment: Information is scanned by a biometric reader and converted to binary information. Many common readers can be installed as a USB peripheral device. Others are built into devices or entry systems. - Sensor module: acquires the biometric sample. - Feature extraction module: records the significant information from the sample (unique features). Template: The biometric template is recorded in a database or user device. When the user wants to access a resource, a new scan is conducted to compare to the template. If they match to within a defined degree of tolerance, access is granted.
Transactional Platforms market segments
Facilitate payments or DAPP transactions - ripple
Types of errors
False negatives: Where a legitimate user is not recognized. Referred to as False Rejection Rate (FRR) or Type 1 error. False positives: Where a interloper is accepted. Referred to as the False Acceptance Rate (FAR) or Type 2 error. Equal Error Rate (EER): Where FAR = FRR. Biometric modalities are compared based EER. False negatives cause inconvenience. False positives result in security breaches. For this reason, false positives are usually considered the more important metric, according to CompTIA.
Physical Bio metrics
Fingerprint recognition: -Unique-Most widely used -Easy and cheap to implement-Possible to obtain a copy and create a mold to fool scanner Retinal Scan - An infrared light is shone into the eye to identify the patter of blood vessels. -Expensive and relatively intrusive-False negatives possible with disease, such as cataracts Iris Scan - Matches patterns on the surface of the eye.-Less intrusive. More suitable for mass adoption than retinal scan-Possible to fool with high resolution picture. Facial recognition - records multiple indicators about the size and shape of the face. -Can be lengthy process to get good reading.-Possible to get a reading from a distance without permission, for example with CCTV (closed-circuit TV). -Currently used more for surveillance than authentication-Slaughterbots video: https://www.youtube.com/watch?v=O-2tpwW0kmU-What if machine learning is used to connect faces to features like good credit, intelligence, work ethic, etc.
Asset Tokens
Form of ownership of traditional asset. - NFTs
Hexidecimal
Hexadecimal is used to shorten numbers shown as bits. base of 16
utility tokens market segments
Marketing/fundraising tools. Native to single organization - Many ICOs
Machine Learning
Machine Learning: "Field of study that gives computers the ability to learn without being explicitly programmed" Arthur Samuel, 1959 Explicit program - Given input and program computer creates output. Machine learning - Given input and output, computer create the program. Why do we need it? Because often no exact answer and/or because computation too hard to know exact algorithm to program.
Budgeting Envelopes
Many people benefit from tactics such as budgeting envelopes or "mental accounts", while others do not.
Unstructured Data
Much big data is unstructured. Unstructured data is data like emails that has no easily recognizable structure, such as patters, relationships, normalization of names, etc.
Recent proliferation of data
Much more information is now digitized and broadly available. -Communication - cell phone, email - Social media -Wikipedia, blogs - Sensors - IoT -Video, photos -Biometrics -Private records - health, banking
Biometric Authentication
Purpose: To find or confirm the identity of individuals from intrinsic traits. Biometrics generally can't be forgotten, lost, shared or changed. Used with and without permission:-Access control-Surveillance Methods, for example fingerprints, are called modalities
EMV
Purpose: fraud prevention -A microprocessor or "chip" is read by the POS device instead of the magstripe. -The chip encrypts the data transmission and can serve as authentication that the card is real. -The additional of a pin adds a second authentication factor to provide the identity of the holder of the card. -Can accommodate small offline transactions. -EMV stands for Europay, MasterCard and Visa (the largest card networks). Every major card network is now part of EMV. -Merchant adoption of EMV was accelerated in 2015 by new network terms that shifted liability of fraud to merchant that don't adopt EMV.
Bitcoin Solution: Proof of work
Rule: The longest chain is valid. Why? The blocks are made by people (miners) expending computing power to earn bitcoin. The more computing power they use, the faster they can add the blocks. As long as most of the computing power is in the hands of honest people, the longest chain will be honest.
Security of the Process
Security of the template-Not possible to use template to reconstruct the sample-Tamper-proof (or tamper-evident) - encrypted-Standard encryption can't be used because of need for "fuzzy" pattern matching. -Distributed storage - on devices "Spoofing"- Faking the biometric
Global currencies market segments.
Seeking widescale merchant adoption as a currency - bitcoin
Debt Snowball
Some people pay off debt more quickly if they pay the smallest debts first. Others would pay the most expensive ones first. Is either right?
Authentication - Types of credentials
Something you know - ex: password Something you have - ex: card, one-time password device Something you are - ex: fingerprint - physical biometrics Something you do - ex: signature - behavioral biometrics Somewhere you are - ex: mobile phone location
Structured Data
Structured data is data that is easily manipulated by a computer - for example, lists, CSV files (comma-separated values), SQL databases (structured query language)/"relational" databases.
A/B Testing
Technology allows us to learn quickly how people behave in various situations.
Problem with Symmetric Encryption
The key must be distributed to all the parties. Creates risk of being intercepted. It is good at encrypting the message, but it doesn't help in authenticating who is on the other end. How do you encrypt online activity when you don't know who the other party is?
SHA 256
There are many variations. One of the most common is SHA-256 or "Secure Hashing Algorithm" with an output 256 bits. SHA-256 was developed by the NSA
Behavioral biometrics
These can be cheap to implement, but tend to produce more errors than physical biometrics. Voice: Obtaining the data is relatively easy because voice recognition functionality is already built into a lot of computers. However, creating an accurate template can be difficult and time-consuming. Subject to impersonation. Signature process: stroke, speed and pressure Typing: speed and pattern.
Bitcoin Step 2: Blockchain
Transactions Information about valid transactions is immutably committed to a "block" in the form of a hash. Information cannot be changed without it being immediately obvious to everyone in the network. This happens repeatedly into a single chain of records of all transactions. Over time the transaction gets more and more immutable.
Problem with digital money
Trusted intermediaries (banks) are needed to validate payments -Double spending problem = pay twice with same money -bitcoins proof of work solves this problem
Endowment Effect
We ascribe value to things merely because we own them. Losses are weighted more heavily than gains.
Decoy Effect
We tend to have a specific change in preference between two options when also presented with a third option that is asymmetrically dominated.
Passwords use of Hash
Websites use hash functions to hide your password. Situation: Websites want to make sure users are who they say they are by using a password. Problem: If a site stores actual passwords in its database, a hacker will have all of the passwords if it breaks into the database. Solution: -The site hashes the passwords, storing only the hash in the database. No one knows the password unless they see you type it. A site wouldn't be able to tell you your password. -If a hacker hacks the database, they will have the hashes and could check against simple passwords, but they would not be able to find complex passwords. -Sites will add a "salt" to your password to force complex passwords. This makes it a little harder, but if the hacker also sees the "salt", then this is of no help. Diagram Types password: "me123" Code on the site's server will first add a salt of (for example) "fp93ahg9hq4y98asyfx", so now the password is:"me123fp93ahg9hq4y98asyfx". This is hashed to:"5e1d6a459ba1b7b3d9fca779b85ccda4faa25609c6ac111938b02dbf59b4193d"And stored in the database. When you later use the password, the password you enter plus the same salt are hashed and compared to the hash stored in the database. Bad guys only see the hashed passwords in the database. They can try many passwords to see if they match the hashes, but this will be very hard if they don't know the salt and/or the password is complex.
binary, base 2, bits
What are bits: Binary digits 1 and 0. (a byte is 8 bits) Example of binary/base 2 counting:0 = 0 1 = 1 10 = 2 11 = 3 100 = 4 101 = 5 1010 = 1 x 23 + 0 x 22 + 1 x 21 + 0 x 20 = 8 + 0 + 2 + 0 = 10 Converting number of bits to size (example): 256 bits means 256 1s or 0s. How many different ways can 256 1s and 0s be arranged? 2256 = 1.158 x 1077 Compare: 7.5 x 1018 grains of sand on earth
The effect of free
What are effects of something being free? -"irrational" demand? -Socially responsible behavior to not abuse? -Negative stigma?
Hash Function
What it does: Converts information of any size into a string of alphanumeric characters of fixed size. Example: The MD5 hash function converts: 1 - "apple" into "1f3870be274f6c49b3e31a0c6728957f"" 2 - Apple"into "9f6290f4436e5a2351f12e03b6433c3c" 3 - "I like apples" into "10fde6012f5f899be32761a875e9cacd" -The output is always the same size regardless of input -A small change in the output results in major change in output
Commitments use of Hash
You can use a hash to commit to something to be revealed later. Situation: We want to bet on the total points to be scored in the next UK game. Problem: -We don't want to tell each other our guesses because that might influence the other. -We aren't in the same place, so we don't have a simple way of simply simultaneously revealing our guess. -We don't trust any third party to hold our guesses. Solution: We can hash our guesses and give the hash to each other. After the game, we can recreate the hashes with our answers to prove the choice was made before the game.
Record keeping use of hash
You can use a hash to create an immutable record. Situation: You want to record a lot of important information that many people rely on and have access to. Problem: You don't trust all of the users of the system. Someone could change the records without anyone noticing for a long time Solution: Hash the information. Let everyone that uses the information hash it whenever they want to see that the hash has not changed. Even the slightest changes will be immediately observable by everyone.
Smart contracts
computer program that directly controls the transfer of digital currencies or assets between parties under certain conditions
Typical problem with cryptography
eavesdropers
Asymmetric (Public Key) Encryption
encryption technology in which a message is encrypted with one key and decrypted with another -Invented in the 1970's (Diffie-Hellman and Rivest-Shamir-Adleman (RSA)) -Very complicated math - factoring large prime numbers, elliptic-curves, etc. (we're not going to try to learn this!). -Requires more computing power than symmetric encryption, so it is not used to encrypt large amounts of data. Encrypts hashes or symmetric keys. -Trivia: Cryptography was classified as munitions by the U.S. government until 1996. -Note on Quantum computing: This just jeopardize current alogirthms. See https://www.wired.co.uk/article/quantum-computing-explained
NFTs - Non Fungible Tokens
it's unique and can't be replaced with something else. - like a playing card or nba moment
Symmetric Encryption
the same key is used to encode and decode