BigData Mock Interview Training

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

What is the difference b/w batch data and stream data?

A batch is a collection of data points that have been grouped together within a specific time interval. Another term often used for this is a window of data. Streaming processing deals with continuous data and is key to turning big data into fast data.

What is a fact table?

A fact table is the central table in a star schema of a data warehouse. A fact table stores quantitative information for analysis and is often denormalized. A fact table works with dimension tables.

What is "factless" table?

A factless fact table is a fact table that does not have any measures. It is essentially an intersection of dimensions (it contains nothing but dimensional keys). ... For example, you can have a factless fact table to capture student attendance, creating a row each time a student attends a class.

What is "snowflaking"?

A method of normalizing fact table in a star schema. In data warehousing, snowflaking is a form of dimensional modeling in which dimensions are stored in multiple related dimension tables. A snowflake schema is a variation of the star schema. Snowflaking is used to improve the performance of certain queries.

What is the difference b/w primary key vs. foreign key?

A primary key is used to ensure data in the specific column is unique. A foreign key is a column or group of columns in a relational database table that provides a link between data in two tables. It uniquely identifies a record in the relational database table. ... Its value cannot be deleted from the parent table.

What is a schema design in data modeling?

A schema is an overall description of a database, and it is usually represented by the entity relationship diagram (ERD). There are many subschemas that represent external models and thus display external views of the data.

What is an aggregated table?

Aggregate tables are tables that aggregate or "roll up" the data to one level higher than a base or derived table (and other functions can also be in the aggregate tables such as average, count, min, max, and others).

What is your understanding of denormalization in your own words?

Denormalization is a strategy used on a previously-normalized database to increase performance. In computing, denormalization is the process of trying to improve the read performance of a database, at the expense of losing some write performance, by adding redundant copies of data or by grouping data.

What is a snow flake schema?

In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions.[citation needed]. "Snowflaking" is a method of normalizing the dimension tables in a star schema. When it is completely normalized along all the dimension tables, the resultant structure resembles a snowflake with the fact table in the middle. The principle behind snowflaking is normalization of the dimension tables by removing low cardinality attributes and forming separate tables

What is "check constraint"?

Input/field range check, e.g., values b/w 0-9

What is the difference b/w normalization & denormalization?

Normalization is used to remove redundant data from the database and to store non-redundant and consistent data into it. ... Denormalization is used to combine multiple table data into one so that it can be queried quickly.

What is the difference b/w OLTP vs. OLAP system?

OLAP=batch aggregate of OLTP

What is a datamart?

Portion of data warehouse that is department specific.

Give an example of in memory analytics platform that you have used?

Spark

What are the types of schema in dimensional modeling?

Star Schema:A star schema is the one in which a central fact table is sourrounded by denormalized dimensional tables. A star schema can be simple or complex. A simple star schema consists of one fact table where as a complex star schema have more than one fact table. Snow Flake Schema:A snow flake schema is an enhancement of star schema by adding additional dimensions. Snow flake schema are useful when there are low cardinality attributes in the dimensions. Galaxy Schema:Galaxy schema contains many fact tables with some common dimensions (conformed dimensions). This schema is a combination of many data marts. Fact Constellation Schema:The dimensions in this schema are segregated into independent dimensions based on the levels of hierarchy. For example, if geography has five levels of hierarchy like teritary, region, country, state and city; constellation schema would have five dimensions instead of one.

Difference b/w star vs. snowflake schemas?

Star and snowflake schemas are similar at heart: a central fact table surrounded by dimension tables. The difference is in the dimensions themselves. In a star schema each logical dimension is denormalized into one table, while in a snowflake, at least some of the dimensions are normalized.

What is a star schema?

Star schema In computing, the star schema is the simplest style of data mart schema and is the approach most widely used to develop data warehouses and dimensional data marts. The star schema consists of one or more fact tables referencing any number of dimension tables.

What is lambda function in Python?

inline function with flexibility to sub multi-line named function

What is "PYTHONPATH"?

location of modules,packages

Which one is normalized vs. which one is denormalized?

low redundancy of info vs high redundancy

What is "first normal form"?

only atomic values, no repeating groups


Set pelajaran terkait

InQuizative - Chap. 15: Foreign Policy

View Set

Ch 4 States of Consciousness Quiz

View Set

Lecture 4 - Units & Engineering Calculations

View Set

AP World Multiple Choice on Unit 2

View Set

musculoskeletal treatment modalities

View Set

Chapter 24: Gynecologic Emergencies:

View Set