IDS 200 Exam 1
Table
The set of records of a particular type, like photos or users
brute force attack
An attack on passwords or encryption that tries every possible password or encryption key.
ROM
(Read-Only Memory) One of two basic types of memory. ROM contains only permanent information put there by the manufacturer. Information in ROM cannot be altered, nor can the memory be dynamically allocated by the computer or its operator.
RDBMS
(Relational Database Management System) A software application that contains tools to manage data, answer queries, create user-friendly forms for data entry, and generate printed reports. Pros: Efficient when data is stable Fields are homogeneous for individual objects New fields not added Data can be stored to optimize query processing speed cons: Not flexible: difficult to change database schema (i.e., add fields) while maintaining relationships between tables Data doesn't always neatly fit relational schema Designed to run on single servers, which caps scale
ACID & Performance
-As you might guess, strict adherence to ACID database properties can inhibit performance -In a very large system with lumps of popular data, users will sometimes have to wait on operations Can't comment on a post until another user has finished his comment Deadlock: Two users simultaneously attempt to post on each other's profiles, creating their own profile entry How much does it matter if a user post on one server isn't immediately echoed to all others?
Security vs Usability
-Because perfect security is impossible, managers must balance usability against security The cost of added inconvenience for system users (i.e., reduced usage up to abandonment) The added system expenses of security The expected damage caused by all attack attempts over time At the ideal point, managers minimize the sum of security costs and attack damage
Accuracy leads to Revenues
-Better user experience à more time using More stuff you want to see Fewer posts from peripheral friends -Pay-per-impression advertising: More time using à more ads viewed More ads viewed à more $$$ -Pay-per-click advertising: More time using à more ads viewed Better ad-user match à more clicks per view More clicks on ads à more $$$
key field
A field in a record that uniquely identifies instances of that record so that it can be retrieved, updated, or sorted
Freshness and aggregation
-More recent data should be emphasized Better reflection of current interests and interactions with friends But, smaller sample size à estimation errors -Facebook isn't going to update the feed sequence formula for every new interaction User histories can be aggregated into single variables for simpler & faster calculations Aggregate history usable for ad content
Encryption
-Storing passwords as plain text in databases is a clear and massive security risk Hackers could potentially download data batches containing associated {id, password} pairs With encryption, would only get the masked data
Caching and Replication
-Worst case for Facebook's network: Content is very popular and bandwidth-intensive Celebrity interviews Video clips from major sporting event -However, most viewers won't watch content as it happens but staggered over time Two-part solution: Caching: Make popular content fast to reach Replication: Copy video segments to servers around the world to prevent bottleneck at the origin server
Access
-ensuring you can.. View your data Update your data
Security
-preventing outsiders from... Viewing your data Tampering with your data Blocking your access to your data
tera
10^12 (trillion) typical hard drive
peta
10^15(quadrillion) one day's facebook traffic
exa
10^18 (quintillion) all data of a large business
zetta
10^21 (sextillion) total annual internet traffic
Kilo
10^3 (thousand) Short text file
Mega
10^6 (million) big image
giga
10^9 (billion) long video
key-value database
A NoSQL database model that stores data as a collection of key-value pairs in which the value component is unintelligible to the DBMS -Every value (record, object) has a key, just like the unique key field in a RDBMS table The database doesn't enforce any structure on records in terms of fields or formats -Features: Flexible: changing an object definition has no effect on other objects Scalable: can easily span storage devices enabling very large databases Speed: system can be optimized for non-ACID cases Query limits: no table JOINs
Hard Disk Drive
A non-volatile computer storage device containing magnetic disks or platters rotating at high speeds. readable and writeable
strong password
A password that is difficult to break. Strong passwords should contain uppercase and lowercase letters, numbers, and punctuation symbols.
weak password
A password that is short in length (less than 15 characters),uses a common word (princess), a predictable sequence of characters (abc123), or personal information (Braden).
Responsive Design
A way to provide content so that it adapts appropriately to the size of the display on any device. Uses percentages to choose size of content.
Consistency
Actions are processed in sequence and according to any existing database rules
Phishing
An attack that sends an email or displays a Web announcement that falsely claims to be from a legitimate enterprise in an attempt to trick the user into surrendering private information
SQL Injection
An attack that targets SQL servers by injecting commands to be manipulated by the database.
Atomicity
Any set of related database operations either {all happen} or {all don't happen} changing one data thing + all it's backups.
ACID
Atomicity, Consistency, Isolation, Durability
Auto Size
Automatic growth in depth or width of a text frame dependent on the amount of text it contains.
Dynamic RAM (DRAM)
Cheap, obsolescent
Thick Client
Client machines relatively powerful, like complete desktops
thin client
Client machines relatively weak, like terminals without permanent storage
Facebook feed
Content stream: Posts (friends & friends of friends) Ads (Facebook & others)
Big Data
Data quantities requiring special tools or systems to manage
Replication
Decentralization: Multiple copies of data are stored at separate server groups worldwide When necessary, requests can be forwarded to other server groups Advantages of decentralization: Less latency Locally important data can be made more accessible (e.g., local news or language) System failure only affects local system Smaller database size facilitates updates
DASD
Direct Access Storage Device Mix of sequential, indexed, and direct access Hard drives, optical drives
Facebook Revenue streams
Direct sources: Ads (per click or per impression) Premium site content (e.g., original programming) Fees from site-enabled transactions Indirect sources: More users Higher engagement
Edge Rank
EdgeRank is Facebook's algorithm that is used to determine the ranking of various fanpages and to what extent messages from those fanpages are shown (including with fans) in the timeline. The higher the EdgeRank of your page, the more people/fans will see your posts.
In-Memory Grids/Database Systems
Essentially, an in-memory grid is the same RAM as in your computer but there's a lot more of it The system can operate at electronic speeds instead of physical speed Up to literally a million times faster Backup: Volatile memory is erased when power is lost, presenting a major reliability concern Although NVRAM systems are in development, backup is usually done on hard drive systems
Cloud System
Essentially, the operational requirements are broken into segments and allocated to computers Cloud details hidden from users for security and ease of use
Synchronous RAM (SDRAM)
Faster than ordinary dynamic RAM Much more expensive DDR levels indicate speeds relative to DRAM
Haystack
Initial problem: FB has lots of pictures, and two options; either hard disk storage (cheap but slow) or going through another CDN (expensive) Solution: restructure retrieval process. instead of multiple db calls per image , embed the metadata in the image URL and arrange photos in albums and runs much faster.
non-volatile memory
Memory stored on a chip which does not lose data when the power is turned off. For eg ROM
Matching feed to user preferences
Like NetFlix, Facebook relies primarily on preferences revealed via user actions Many users will falsely claim preferences in line with an idealized version of themselves Don't ask, just observe With matching preferences, the basic rule is to provide content similar to past interactions: Users whose posts you've commented on If you watch videos, you'll get more videos Also captured in profile dimensions
Hybrid Database System
Most organizational data is historic and rarely used, so instant response time is less important Rarely used data is stored on hard drives Frequently used data stored in RAM for fast access Backups may be synchronous or asynchronous Example: Oracle In-Memory Database Cache Hybrid system with RAM & hard drives Balances cost and speed
Isolation
No database operations affect others directly; each is executed the same regardless of any other operations occurring in parallel
flash memory
Non-volatile random access memory (NVRAM) Very slow for performing computations Used for storage & retrieval USB drives
Hadoop
Open-source software framework that enables distributed parallel processing of huge amounts of data across many inexpensive computers. large scale number-crunching system
Security Threats
Physical - the physical facilities and devices on which data are stored Personnel - the employees entrusted with maintaining adherence to security policies Software - the computerized rules for providing system security (Perfect Security is impossible)
Thick applications
Processing mostly happens on the client
Thin Application
Processing mostly happens on the server, results returned over the network
Static RAM (SRAM)
RAM chips that retain information without the need for refreshing, as long as the computer's power is on. They are more expensive than traditional DRAM. Doesn't require constant power Faster & less power consumption than DRAM Commonly used for high-performance CPU caches L1/L2 for a particular CPU/core L3 shared among CPUs/cores
RAM
Random Access Memory; temporary memory. RAM is expandable, and resides on the motherboard. Data accessed in approximately the same time regardless of its location Faster access but much more expensive Main memory, USB drives, solid-state drives
Row (record)
Represents a particular thing of the table's type, like a specific photo or user
Query
Retrieve data (typically retrieve data more than write)
SAM
Sequential Access Memory; storage accessed sequentially; a cassette tape, a CD Requires an initial "seek" delay to find the starting point Used to be cheap but slow: Used for storage, not computations Surviving example: Tape drives for extreme conditions
Striping
Splits data, instructions, and information across multiple drives in the array. Increases speed for reads and writes
volatile memory
Storage (such as RAM chips) that is wiped clean when power is cut off from a device.
Map-Reduce
a technique for harnessing the power of thousands of computers working in parallel
Durability
The database is robust against failures
Asymmetric Encryption (Two-Key Encryption)
The essential idea is that Yvonne and Zooey each have two "keys" to lock & unlock messages Public & private keys - refer to usage Encrypted with public key, decrypt with private key Encrypted with private key, decrypt with public key RSA encryption algorithm: keys are large primes Each user: Shares the public key Keeps the private key private This model extensible to a pool of n users, requiring 2n total keys, rather than n2 - n for unique pairs
Caching
The local storage of frequently needed files that would otherwise be obtained from an external source. Keep the stuff you use a lot where it's easy to reach "Hot" data - used often/recently "Cold" data - not used often/recently
Columns (fields)
The various attributes stored regarding each record in a table, such as: Photos: size, location, poster User: name, password, profile picture
Hot data
Used often/Recently
Server
a computer or computer program that manages access to a centralized resource or service in a network.
denial of service attack DDos
a cyber attack in which an attacker sends a flood of data packets to the target computer, with the aim of overloading its resources
spear phishing
a phishing expedition in which the emails are carefully designed to target a particular person or organization
Adaptive Design
a process that adjusts content to the screen size of a device used to access a webpage
NoSQL
aren't as strict as RDB's (table might allow flexible fields for different records or individual fields can be composite not atomic). typically more efficient for storage but you also typically lose the ability to do cross - table queries Flexibility: easy to add or remove data fields Accommodating different data sizes or structures Auto-sharding: distributed operation across servers
Modify
change data within a record
Clickbait
content whose main purpose is to attract attention and encourage visitors to click on a link to a particular web page
encryption algorithm
convert plain text to some scrambled gibberish using a fixed pattern Key point: Reversible! Related: Hashing (scrambling into groups, not necessarily reversible or with a 1-to-1 correspondence
Metadata
data that describes other data
Cold Data
data that is rarely accessed and therefore stored on an organization's slowest storage option
Facebook ACID Solution
has in many cases chosen a policy of "eventual consistency" The system will ultimately be consistent, but if a person is commenting while you are watching a video you may not be able to view it intially
Content Management System (CMS)
information systems that support the management and delivery of documents including reports, web pages, and other expressions of employee knowledge
Redundant Arrays of Independent Disks (RAID)
involves using parallel disks that contain redundant elements of data and applications. If one disk fails, the lost data are automatically reconstructed from the redundant components stored on the other disks. A group of physically independent hard drives Single I/O interface Allows operation as a single logical unit
Mirroring
making copies of files on different disks; doesn't inherently add time but it will add to the system load, so the RAID system can be much slower if busy
parity
means including error-checking for any file. writes can be substantially slower when combined with mirroring
read & write memory
memory that can be read and written.
Insert (table or record)
put or introduce into something
multifactor authentication
the use of two or more types of authentication credentials in conjunction to achieve a greater level of security
delete (table or record)
to erase, wipe out, cut out
Hashing
transforming plaintext of any length into a short code called a hash