SQL/databases/servers/OS/VM/MongoDB
operating system
the software that supports a computer's basic functions, such as scheduling tasks, executing applications, and controlling peripherals
delay-line memory
think of tube filled with mercury, send a pressure wave down (= 1), at any point there can be many pressure waves(data) memorized in the tube!
creating pattern to use with LIKE keyword
think regular expression! i.e. a% = start with a %a = end with a %or% = or in any position _r% r in second position a_% start with a and at least 2 characters in length
having clause
to filter the results grouped by a GROUP BY clause. Can only reference aggregate functions or column that are also specified in the GROUP BY clause.
union vs union all
union all allows duplicates
update
update certain entries, make sure to use where clause to stop ALL records in table from being updated UPDATE table_name SET column1 = value1, column2 = value2, etc. WHERE condition;
Count(*)
use with group by to just count number of entries grouped together
system call
way for programs to interact with kernel
page split
when you insert a record into a full page (especially an index page) and you have to split the records between a new page and the old page to balance the index again expensive!
Transact-SQL (T-SQL)
the primary means of programming and managing SQL Server. It exposes keywords so that you can create and manage databases and their components and monitor and manage the server itself. programming extension from Microsoft that adds features to SQL has local variables, mathematical operation functions, etc. passes turing completeness test ! SQL does not
common SQL commands
SELECT - extracts data from a database UPDATE - updates data in a database DELETE - deletes data from a database INSERT INTO - inserts new data into a database CREATE DATABASE - creates a new database ALTER DATABASE - modifies a database CREATE TABLE - creates a new table ALTER TABLE - modifies a table DROP TABLE - deletes a table CREATE INDEX - creates an index (search key) DROP INDEX - deletes an index **not case sensitive
union syntax
SELECT City FROM Customers UNION SELECT City FROM Suppliers combines results from two select statements
join example syntax
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDateFROM Orders INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;
EXISTS syntax
SELECT column_name(s) FROM table_name WHERE EXISTS(SELECT column_name FROM table_name WHERE condition);
a database can have multiple
tables, each with its own name
alias
temporary name for table/column use AS keyword
EXISTS keyword
tests for existence of any record in a subquery
where keyword
the keyword that specifies the criteria that records must match to be included in the results
operators in where clause
=, >, <, >=, <=, not equal, between(within a range), like, in(specify multiple values for a column, think tuple)
relational database
A group of database tables that is connected or linked by a defined relationship that ties the information together (i.e. rows are records, and columns are fields)...data is stored as TABLES You can imagine for a medical patient, their data is spread over multiple tables (one for phone #, allergies, etc.)
bind variable
A placeholder for SQL literal values, such as numbers or character strings (like if u want to accept inputs/parameters at run time)
database vs table
A table is an object inside a database. ... database is a collection of several components like tables, indexes, stored procedures and so on. A table is a two dimensional structure that contains several columns and rows. It is contains all the data in form of several records.
SQL wildcards
A wildcard character can be used to substitute for any other character(s) in a string. In SQL server... % (zero or more characters) _ (single character) [] (represents any single character within brackets) --> h[oa]t ^ (represents any character not in brackets) -> h[^oa]t - (range of characters, like c[a-b]t)
how to alter table in database
ALTER TABLE table_name OPERATION (i.e. ADD columnname datatype) or DROP COLUMN col_name
where conditional keywords
AND, OR, NOT (for further specifying what types of entries you want to select, etc.)
SQL join
An SQL command that temporarily joins data from more than one table in order to allow simultaneous access to both tables (INNER) JOIN: Returns records that have matching values in both tables LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left table FULL (OUTER) JOIN: Returns all records when there is a match in either left or right table
SQL injection
An attack that targets SQL servers by injecting commands to be manipulated by the database (i.e. user input, you input 1=1 which is always true, which affects what values are selected if there is an OR statement) Also OR = is always true (may user input this) Also 105; DROP TABLE suppliers (creates a new batch, a group of SQL statements separated by semicolons...) use parameters to prevent injection (via @ symbol)
SQL contraints
Constraints can be specified when the table is created with the CREATE TABLE statement, or after the table is created with the ALTER TABLE statement. specifies rules for data that goes into table common ones include: NOT NULL - Ensures that a column cannot have a NULL value UNIQUE - Ensures that all values in a column are different PRIMARY KEY - A combination of a NOT NULL and UNIQUE. Uniquely identifies each row in a table FOREIGN KEY - Uniquely identifies a row/record in another table CHECK - Ensures that all values in a column satisfies a specific condition DEFAULT - Sets a default value for a column when no value is specified INDEX - Used to create and retrieve data from the database very quickly
CASE syntax
CASE WHEN condition 1 THEN result 1 WHEN condition2 THEN result2 WHEN conditionN THEN resultN ELSE result END;
stored procedure syntax
CREATE PROCEDURE procedure_name @param1_name data type, #param2_name datatype) AS operation (i.e. SELECT * FROM CUSTOMERS WHERE City = @City) GO; Then to run it use EXEC procedure_name @param1 = value1, @param2 = value2
CREATE UNIQUE INDEX
CREATE UNIQUE INDEX index_name ON table_name (column1, column2, ...); no duplicate values allowed
delete syntax
DELETE FROM table_name WHERE some_column=some_value; or just DELETE FROM table_name(deletes all rows without deleting table itself)
how to delete table in database
DROP TABLE tablename TRUNCATE TABLE tablename deletes data inside table
insert into select syntax
INSERT INTO table2SELECT * FROM table1WHERE condition;
SQL server ISNULL()
ISNULL(column_name, alternative value) allows you to return alternate value when expression is null
connection between server and database
If you want your web site to be able to store and retrieve data from a database, your web server should have access to a database-system that uses the SQL language.
returning only certain # of records
MySQL uses SELECT column_name(s) FROM table_name WHERE condition LIMIT number; versus SQL server syntax: SELECT TOP number|percent column_name(s) FROM table_name WHERE condition;
AS keyword
One way to name a column is by using the AS keyword, although it is optional, it enhances readability
steps to opening a db connection
Open a connection to the database using the database driver. Open a TCP socket for CRUD operations Perform CRUD operations over the socket. Close the connection. Close the socket very expensive if all these steps have to happen all the time!
INSERT INTO SELECT
SELECTs data from one table and INSERTs it INTO an existing table, rows in target table are unaffected
optical disk
Secondary storage device using lasers to read (and write) data to a reflective surface. For storing files to be distributed or transferred or for backup of important files (i.e. CD and DVD)
types of pooling
Session pooling: One of the connections in the pool is assigned to a client until the timeout is reached. Transaction pooling: Similar to session polling, it gets a connection from the pool. It keeps it until the transaction is done. If the same client wants to run another transaction, it has to wait until it gets another transaction assigned to it. Statement pooling: Connection is returned to the pool as soon as the first query is completed.
MySQL schema vs database
Synonymous But usually, schema=collection of tables while database =collection of schemas, ao database schema represents logical configuration of database
ALL operator
The ALL operator returns true if all of the subquery values meet the condition. SELECT column_name(s) FROM table_name WHERE column_name operator ALL(SELECT column_name FROM table_name WHERE condition); i.e.SELECT ProductName FROM Products WHERE ProductID = ALL (SELECT ProductID FROM OrderDetails WHERE Quantity > 99); so like a conditional return (may not always return anything, such as when ANY or ALL returns false)
ANY operator
The ANY operator returns true if any of the subquery values meet the condition. SELECT column_name(s)FROM table_name WHERE column_name operator ANY(SELECT column_name FROM table_name WHERE condition); so like a conditional return (may not always return anything, such as when ANY or ALL returns false)
DEFAULT
The DEFAULT constraint is used to provide a default value for a column. The default value will be added to all new records IF no other value is specified.
Stored procedure
basically a function, so you don't have to rewrite queries
virtual machine
a computer existing within another computer, physical computer is host, virtual is guest, VM uses host system resources
server
a computer or computer program that manages access to a centralized resource or service in a network It is a software or hardware device that receives requests over a network and responds, such as by sending requested info from a resource
floppy disk
a flexible removable magnetic disk, typically encased in hard plastic, used for storing data.
MS SQL Server
a microsoft database server that hosts relational databases...has a database engine (controls data storage) and server operating system (memory, I/O management, job scheduling, data locking...), also has network interface layer to facilitate request/response interactions with database servers
virtual memory
because finite amount of RAM, need way to prevent running out...uses hard disk space for additional memory allocation **INDIRECTION, MAPPING VIRTUAL TO PHYSICAL program has its own address space, broken into pages where each page is contiguous range of addresses pages mapped onto physical memory but some go to hard drive rather than RAM if not necessary...what's mapped to the hard drive changes (oldest thing on RAM is moved to hard drive when sth new needs RAM) if shared memory is needed, can map both program addresses to same RAM memory location! otherwise, program addresses are isolated (mapped to diff RAM spaces, not necessarily contiguous for any given program)
null field
blank field (use IS NULL or IS NOT NULL to test for nullness)
ACID
key properties of a relational db atomicity - all queries must succeed, if one fails all should rollback consistency isolation - can my inflight transaction see changes made by other transactions? durability
how to solve index fragmentation
1) rebuild it (will need more disk space as you are copying the original, and then deleting the original) 2) reorganize it (will require lots of log space due to logging, but won't require more disk space as the pages are reordered to be contiguous)
database connection pool
As scale of app increases, opening and closing a DB connection for every query really adds up and takes too much time, so a connection pool of opened connections is maintained and these connections get passed between operations, saving a lot of resources (though keeping connections open also takes up a lot of resources...) if each connection = its own thread, then Too few connections in the pool will result in high latency as each operation waits for a connection to open up. But too many connections to the pool can also result in high latency as each connection thread is being run concurrently by the system. The time it takes for many threads to complete concurrently is typically higher than the time it takes a smaller number of threads to run sequentially due to context switching while running concurrently When a new request to access data from the backend service comes in, the pool manager checks if the pool contains any unused connection and returns one if available. If all the connections in the pool are active, then a new connection is created and added to the pool by the pool manager. When the pool reaches its maximum size, all new connections are queued until a connection in the pool becomes available.
AUTO INCREMENT
Auto-increment allows a unique number to be generated automatically when a new record is inserted into a table. Often this is the primary key field that we would like to be created automatically every time a new record is inserted.
Non-Relational Databases
Developed to handle large data sets of data that is not easily organized into tables, columns, and rows i.e. image for medical patients, all their info is stored as a document about that patient vs many tables in relational databases Types: document database (stores data similar to JSON objects, each document contains pairs of fields/values) key-value databases (every item is a key-value) wide-column stores: stores data in tables/rows/dynamic columns, but better than relational cuz not every row has same # columns or even same column name for every row...like 2D key-value database graph database: store data in nodes/edges, edges are for relationships **mongoDB is a popular NoSQL database
guid vs pk
GUIDs can be considered as global primary keys. Local primary keys are used to uniquely identify records within a table. On the other hand, GUIDs can be used to uniquely identify records across tables, databases, and servers.
CREATE VIEW
In SQL, a view is a virtual table based on the result-set of an SQL statement. A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database. You can add SQL functions, WHERE, and JOIN statements to a view and present the data as if the data were coming from one single table. CREATE VIEW view_name AS SELECT column_name(s) FROM table_name WHERE condition i.e. easy visualization
ORM
Object Relational Mapping...allows u to interact with ur database using language of choice instead of SQL....so SQLAlchemy is an ORM... benefits: abstracts away the database system (MySQL vs PostgreSQL), leverages language fluency, more features cons: learning pain sometimes, configuration can be difficult
peripheral
Peripheral management controls peripheral devices by sending them commands in their own computer language (i.e. device connected to main computer)
hypervisor (virtual machine manager [VMM])
Software that runs on a physical computer and manages one or more virtual machine operating systems.
CASE statement
The CASE statement goes through conditions and returns a value when the first condition is met (like an IF-THEN-ELSE statement). So, once a condition is true, it will stop reading and return the result. If no conditions are true, it returns the value in the ELSE clause. If there is no ELSE part and no conditions are true, it returns NULL
helpful mathematical methods
The COUNT(col_name) function returns the number of rows that matches a specified criterion. The AVG(col_name) function returns the average value of a numeric column. The SUM(col_name) function returns the total sum of a numeric column.
group by
The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country". The GROUP BY statement is often used with aggregate functions (COUNT, MAX, MIN, SUM, AVG) to group the result-set by one or more columns. creates frequency distributions! i.e. SELECT COUNT(CustomerID), Country FROM Customers GROUP BY Country; will group records with the same country and count each of their customer ID's
insert into statement
The INSERT INTO statement is used to insert new records in a table. INSERT INTO table_name VALUES (value1,value2,value3,...); *inserts for all columns, make sure order of values match order of columns INSERT INTO table_name (column1,column2,column3,...) VALUES (value1,value2,value3,...); *inserts only some values for a given entry
order by keyword
The ORDER BY keyword is used to sort the result-set by one or more columns. The ORDER BY keyword sorts the records in ascending order by default. To sort the records in a descending order, you can use the DESC keyword. SELECT column_name, column_name FROM table_name ORDER BY column_name ASC|DESC.... will do alphabetical order by default for strings
SELECT INTO
The SELECT INTO statement selects data from one table and inserts it into a new table. SELECT * INTO newtable [IN externaldb] FROM table1; SELECT column_name(s) INTO newtable [IN externaldb] FROM table1;
dialect
The dialect of the database is simply a term that defines the specific features of the SQL language that are available when accessing that database...it is how you communicate with a specific type of database
primary key
The field selected as a unique identifier for the database (uniquely defines each record)...primary keys must contain unique values are cannot have null values
SQL (Structured Query Language)
a popular language to run operations on databases (inherently for relational databases: NoSQL is for non-relational)
random access memory
access any aspect of memory at any time (in contrast to cyclic or delay line memory, where you have to wait for ur data to show up in the cycle)
unique vs primary key constraint
both primary key and unique guarantee uniqueness primary key automatically has unique contraint can have many unique contraints per table, only 1 primary key per table
SQL keywords are not
case sensitive
how does splitting hashtables/databases across machines to save memory work?
certain keys go to certain machines which can make it unbalanced however, as long as load is balanced, you can easily parallelize while knowing exactly which data is on which machine
transaction
collection of queries one unit of work it is atomic either all operations will succeed or be rolled back start transaction and commit (remember INFO 2300)
mongoDB organization
collections (equivalent of relational tables), and each record is called a document documents use JSON like formatting ex. db.collection.insertOne({name: "sue", age: 25})
shrink database
compact database pages aka move records into empty spaces on pages
metadata
container holding many features of a database (including on tables and child objects)
CREATE DATABASE databasename
create a new database
CREATE INDEX
create index on table (like back of book) for easier searching (speeds up select and where statements, slows insert/updates) essentially creates copy of table but indexed/sorted as specified CREATE INDEX index_name ON table_name (column1, column2, ...);
CREATE TABLE table_name ( column_1 data_type, column_2 data_type, column_3 data_type);
creates a new table in database
DROP DATABASE databasename
delete database
isolation read phenomena
dirty read - you read a value that another transaction hasn't committed...issue with updates non-repeatable reads - you read the same value twice, the second time after an update was committed (or similar situation), so ur second read is different...then your reads were not repeatable phantom reads - reading new inserted values before you meant to lost updates!
database collation
how data is sorted and compared in a database. Collation provides the sorting rules, case, and accent sensitivity properties for the data in the database. For example, when you run a query using the ORDER BY clause, collation determines whether or not uppercase letters and lowercase letters are treated the same.
sql server fill factor
how full each index page will be upon creation too full and you'll definitely need to add pages when records get inserted too low and you risk wasting space, especially if you have very few additions to the index, also bad read performance because more pages to read mostly reads = higher fill factor is probably good mostly writes = lower fill factor is better
index fragmentation
kind of like external fragmentation: pages are out of order (logical order of index pages does not match physical order) sometimes used to refer to internal fragmentation: each page has lots of empty space (microsoft calls this page density issue instead) as you add records to indices, you would need to add more pages eventually, but they may not be contiguous with the original index pages or if you delete a record, you would leave empty space on a page (microsoft classifies this as page density not index fragmentation) this leads to poor performance because internal fragmentation leads to more pages than necessary (ideally have all pages be 100% full, not <100%), so this is more disk reads/writes! slower! also poor performance because external fragmentation = random disk access rather than sequential, which is often slower for many hard drives
how low page density (internal fragmentation, sometimes classified under index fragmentation, but not by microsoft) degrades performance
leads to more levels in tree index = longer traversal times less relevant records get cached, as only a certain number of pages can fit in a cache more page reads required may result in a different query plan being chosen as the disk I/O increases and the cost of this plan is worse and worse
construct
like a word/syntactical thingy
CHECK keyword
limit value range that can be in column
magnetic core memory
magnetic donuts(cores) with wire loop, magnetized in certain directions to store 1's and 0's...donuts arranged into grids
hard disk
magnetic surface, allowing read/write heads to operate on it (sequential memory access though cuz disks are spinning)
solid state
no moving parts, uses integrated circuits (tiny chip!)
kernel vs operating system
operating system = package of datasoftware managing resources of a system kernel is the important program in the OS, connection between hardware and software
sharded database
physical sharding = splitting a database across multiple physical instances logical sharding = splitting up a table into multiple tables horizontal sharding = moving rows from one table into multiple tables vertical sharding = storing different columns of the same logical table into different tables while sharding is complex and queries can be harder to manage, it can speed up queries bc u can search each shard in parallel, rather than searching all rows in one monolithic table...then merge the results!
mysqli object oriented vs procedural
prefer object oriented over procedural (series of steps) php is moving towards object oriented
isolation levels
read uncomitted = no isolation, any change from outside is visible to transaction read committed = each query only sees committed stuff repeatable read = each query in transaction only sees committed updates at start of transaction [reading SAME VERSION within a transaction] serializable
foreign key
relates two tables together foreign key is a field/collection of fields in one table that refers to primary key of another table table with foreign key = child table, table with candidate key = referenced or parent table The FOREIGN KEY constraint is used to prevent actions that would destroy links between tables. The FOREIGN KEY constraint also prevents invalid data from being inserted into the foreign key column, because it has to be one of the values contained in the table it points to.
MIN(col_name) and MAX(col_name)
returns smallest or biggest field in a column works for strings (A = smallest)
order by several columns
selects all customers from the table, sorted by the A and the B column. This means that it orders by A, but if some rows have the same A, it orders them by B
storing objects in SQL databases
serialize the object as JSON, XML, or even Pickle (depending on what database accepts), then send that to database (will be stored as a string, but then you can retrieve and rebuild object w/ the string)
server vs mainframe
server manages resources and responds to requests on the network meanwhile mainframe is a workhorse that controls and handles a large userbase (meant for large-scale data processing or data storage) >> high volume of transactions
SQL comments
single line -- multiline is /* and */
database schema
skeleton structure representing the logical view of entire database, showing how data is organized and relations among them
database driver
software that connects a given application program to a particular DBMS mySQLi is the new improved driver (i = improved) implements protocol for connecting to database
odbc
standard api for accessing database management services