2 - Database Architectures
What is a datamart?
This is basically a small scale datawarehouse which focuses on a particular subject matter(engine room) and commonly department wide and contains less complex queries.
Advantages of a datawarehouse?
- Consolidation - single query engine for multiple datasources - Isolation - It does not interfere with OLTP performance - History - it can store much larger datasets for longer period of time than operational oltp providing access to more historical data - Consistency - it has a single data model regardless of the data source model - Performance - Excellent Query responsiveness with no impact on production data sources - Value - Access to better information for better business intelligence.
what are the features of an OLTP?
- Constant transactions(queries and updates) accounting and online store databases - Used for operational data - stores data in the short term(2-6 months) - Mostly GB of storage - It has many users - It has robust, constant data protection through snapshots and fault tolerance.
Datawarehouse pros?
- Faster data retrieval - Resolve inconsistencies - Better Business intelligence
What are the features of wide columnar stores?
- For this type of nosql database the key-value pair is contained within a column - Columns are grouped into families which are similar to the standard relational tables however names and format can vary from row to row - Column families are further grouped into super column families - These stores provide high performance and can be easily scaled.
NoSQL Pros?
- Goes beyond SQL - Rapid development - Horizontal scaling - Better for large datasets
What are the features of Graph databases?
- It uses graph structure to store, map and query relationships - consists of nodes which represents items in a db and edges which represents relationships. - They provide index free adjacency so adjacent elements are linked without using an index
What are the types of NoSQL databases?
- Key - value stores - Document stores - Wide column stores
What are the features of OLAP?
- Periodic large updates and complex queries like daily or weekly updates - Used for business intelligence and decision support - Consolidates data - Usually involves long term retention of data 2-5 years - Terabyte to petabyte scale - Usually fewer users - Periodic protection and backup of data
RDBMS pros?
- Proven technology - Widely adopted - Less risk - Very good transactional latency - Outstanding for smaller datasets
how does RDBMS and NOSQL storage?
- RDBMS is Best on traditional storage - Nosql is best designed for distributed hierarchical storage platforms and preferably for large datasets.
how does RDBMS and NOSQL scale?
- Rdbms offers vertical scaling -Nosql offers horizontal scaling
how does RDBMS and NOSQL handle Queries?
- Rdbms uses powerful sql queries - nosql uses not so powerful unsql
RDBMS CONS?
- Scalability(increasing hardware capacity) - fixed schema - Hardware bound vertical scaling
What are the features of document stores?
- Similar to key value stores - Value is a single document and popular documents can be indexed for fast retrieval without knowing the keys - The structure is flexible
What are features of a key-value store
- They use key to access stored values - They are the most flexible nosql model - They are efficient and scalable - They are not effective as transactional databases
Datawarehouse Cons?
- Time consuming to construct - Additional security concerns - Complex to maintain
NoSQL cons?
- Variable consistency - Transactional latency - Immature tech
what are the different types of data sitting in a datawarehouse?
- metadata - Summary data (reporting) - Raw data(data mining)
where does a data warehouse gets all its data from?
A datawarehouse aggregates all its data from various sources using ETL.
What are flat files?
A flat-file database is a database stored in a file called a flat file. Records follow a uniform format, and there are no structures for indexing or recognizing relationships between records. The file is simple. A flat file can be a plain text file, or a binary file
What is a NoSQL database?
A term used to describe high-performance, non-relational databases. NoSQL databases use a variety of data models, including document, graph, key/value, and columnar.
What is Acid compliance?
Atomicity - Transactions are all or nothing no partial transactions Consistency - Only valid transactions are saved isolation - Transactions do not affect each other Durability - Written data will not be lost
Are all acid compliance needed in a rdbms?
Not all 4 are compulsory at the same time but can be designed as eventual compliance
How does RDBMS AND NOSQL handle transactional applications?
Relational - Best fit for high load, complex transactional applications Nosql - Best fit for lighter transactional loads but with greater requirements for flexibility and size.
How does RDMS and NOsql handle acid compliance?
Relational - Sql emphasizes acid compliance Nosql - Acid compliance is generally compromised in favor of flexibility and performance at scale. - acid is eventually achieved through design - it excels at brewer cap theorem(consistency, availability and partition theorem.
What is RDBMS?
Relational Data Base Management Systems (RDBMS) are database management systems that maintain data records and indices in tables. Relationships may be created and maintained across and among the data and tables. In a relational database, relationships between data items are expressed by means of tables. Interdependencies among these tables are expressed by data values rather than by pointers. This allows a high degree of data independence. An RDBMS has the capability to recombine the data items from different files, providing powerful tools for data usage. (Read more here)
What is DataWarehousing?
Subject-oriented, which means that the data in the database is organized so that all the data elements relating to the same real-world event or object are linked together; Time-variant, which means that the changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time; Non-volatile, which means that data in the database is never over-written or deleted, once committed, the data is static, read-only, but retained for future reporting. Integrated, which means that the database contains data from most or all of an organization's operational applications, and that this data is made consistent.
what is brewer cap theorem?
The CAP theorem applies a similar type of logic to distributed systems—namely, that a distributed system can deliver only two of three desired characteristics: consistency, availability, and partition tolerance (the 'C,' 'A' and 'P' in CAP).