Database and data storage
What is Data Warehousing
A system used for reporting as well as data analysis. Data comes from multiple heterogeneous sources. Historical data.
Smart Business Applications
Embedded in the users workflow the application can alert when criteria set by business are crossed. Automate analysis results and distribution
What is a fact table
Fact table is central table found in star schema or snowflakes schema which is surrounded by dimension tables.Fact table contains numeric values that are known as measurements.Fact table has two types of columns: 1.Facts 2.Foreign key of dimension tables.
What is granularity
Granularity in a table represents the level of information stored. High granularity is at a transaction level, low granularity describes data not at a detailed level.
What is a snowflake schema
Snowflake schema is a form of dimensional modeling where dimensions are stored with multiple dimension tables. In a Snowflake schema dimensions are normalized. Normalization splits up data in to additional tables.
Why is NoSQL faster than RDBMS
The DB doesn't have to do all of these complex operations and locking across much of the dataset, so it's really easy to partition the thing across many servers/disks/whatever and have it work really fast.
What are the measurement types of a fact table
The measures in a fact table are of three types : 1.Additive : Measures that can be added across any dimension 2.Non-additive: Measures that can not be added across any dimension 3.Semi-additive: Measures that can be added across some dimensions.
What does RDBMS offer that NoSQL does not
Things like automatic enforcement of referential integrity, transactions, etc. ( ACID )
Federated database system
This is a type of meta-database management system (DBMS), which transparently maps multiple autonomous database systems into a single federated database. The constituent databases are interconnected via a computer network and may be geographically decentralized.
What is universe in Business Analytics
Universe is the interface layer between the client and the data warehouse. Universe defines the relationship between various tables in the data warehouse. Hides the tables structure from the user
Describe OLAP ( Online Analytical Processing)
.OLAP is used for complex calculations,Trends Analysis,sophisticated data modeling. Key purpose is to decrease query time and increase effectiveness
Four steps of Dimensional Modeling
1) Choose business process -- describe the business process 2) Declare grain -- a sentence describing the focus of the model. ex an individual line item from customer slip 3) Identify the dimensions -- nouns ( date, product, store, time ) 4) Identify the facts -- the numeric facts / measures that the users will need
What are 4 rules in dimensional modeling
1) Load detailed atomic data - better to have than not. aggregated/summerized data can be present for performance 2)structure dimensional models around business processes. Processes are measurable events. Find these and record at a fine grain. 3) Ensure each fact table has a date dimension 4) Ensure all fact tables have same grain / level of detail
Example of aggregates
1.Average 2.Count 3.MAX 4.MIN 5.Median 6.SUM
Stages of BI
1.Data Source 2.Data Analysis 3.Decision making support 4.Situation Awareness 5.Risk Management
What are benefits BI offers
1.Improving Decision Making 2.Accelerate Decision Making 3.Optimizing internal business process 4.Increase Operational Efficiency 5.Driving new revenues 6.Gaining competitive advantages over competitors
What is a schema?
Schema is logical description of whole database.Database schema is a skeleton or structure of the database which represents database logically.
The acronym ACID stands for ?
Atomicity, Consistency, Isolation, and Durability − commonly known as ACID properties − in order to ensure accuracy, completeness, and data integrity.
The acronym ACID stands for ?
Atomicity, Consistency, Isolation, and Durability − commonly known as ACID properties − in order to ensure accuracy, completeness, and data integrity. Atomicity : each transaction is single unit Consistency: a transaction will bring a DB to a valid state Isolation: State of DB is whether a trans in a group or not. Durability: states are saved even if system failure. Write to non-volatile disk
The acronym BASE stands for what
BASE (Basically Available, Soft State, Eventual consistency) principles.
Describe what scale up is
Buy a bigger box. problem: Pay four to five times what you paid for your previous solution, and only get twice the performance.
What are different Data Warehouse systems
Data Mart Online Analytical Processing (OLAP) Predictive Analysis
What is a Data Mart
Data Mart is a simplest set of Data warehouse which is used to focus on single functional area of the business
Describe Dimensional Modeling
Design method to create DW
What is a dimension table?
Dimension table is table which describes the business entities of an enterprise -- they describe the the objects in a fact table. Dimension table has primary key which uniquely identifies each dimension row.
What is a star schema
In a Star schema there is a Fact table at the center with dimension tables surrounding it .
Describe scale out
In a scale-out architecture a distributed set of nodes is used as the basic architecture. It provides highly elastic scaling capability, enabling you to add nodes to handle load on the fly.
Slowly changing dimensions
Is the technique used to manage attribute changes in a dimension over time. SCD type 1: Overwriting. New overwrites current. SCD type 2:Create another dimension row. Retains history of values. SCD type 3:holds both current and last value. SCD type 4:history table created of dimension SCD type 6: combine 1,2 & 3 ( 1+2+3=6). "overwrite" by marking the current row, add another row to history, thus curr and most recent are represented
Drawbacks of sharding
Need logical key to shard on. If cannot distribute your traffic evenly. Application logic that needs to operate across shards or maintain referential integrity has poor performance.
What is Business Intelligence
Provides the user with data and tools to answer questions which are important to run business or part of a business.
What is sharding
Scale database past the single-instance by sharding, or partitioning data across multiple servers.
What is immutable
doesn't change
describe ranking functions in SQL queries (rank, dense rank, ntitle, row_number)
functions that rank results using following syntax : SELECT Names, Grade, ROW_NUMBER () OVER (ORDER BY Grade DESC) as ROW_NUMBER, RANK () OVER (ORDER BY Grade DESC) as RANK, DENSE_RANK () OVER (ORDER BY Grade DESC) as DENSE_RANK, NTILE(3) OVER(ORDER BY Grade desc) AS NTILE FROM dbo.Grades row_number will list (1..n); rank = (1,2,2,4...); dense rank =(1,2,2,3...); ntitle puts into three groups ( 1,1,2,2,3...)
What is embedded analytics
the integration of analytic content and capabilities within business process applications. It provides relevant information and analytical tools designed for the task at hand so users can work smarter and more efficiently in the applications they use every day.
