CHOI Final Exam

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

For every dimension under consideration, two questions must be answered:

Question 1: Can the dimension table be useful for the analysis of the chosen subject? Question 2: Can the dimension table be created based on the existing data sources?

Data Manipulation Language: SELECT

Retrieve data based on conditions

Two tailed test:

Non-directional test which considers two possibilities: the purchase probability of treatment group could be more than or less than that of control group

Data manipulation FUNCTION

Retrieve data, Edit, Update, Add, delete

Data Manipulation Language terms

SELECT INSERT UPDATE DELETE

SELECT Statement:

SELECT * FROM Order_Table WHERE order_quantity >3;

IN(): example

SELECT * FROM coupon_usage WHERE discount_rate IN (10, 25);

Explicit JOIN notation

SELECT * FROM employee JOIN department ON employee.DepartmentID=department.DepartmentID SELECT LastName, CourseTitleFROM student JOIN course ON student.studentID= course.studentID

Implicit JOIN

SELECT * FROM employee, department WHERE employee.DepartmentID=department.DepartmentID

HAVING Statement example

SELECT discount_rate , COUNT( member_ID ) FROM coupon_usage GROUP BY discount_rate HAVING discount_rate >20;

SELECT Statement:

SELECT field_name_1, [field_name_2, ...,] FROM table_name_1, [table_name_2, ...,] WHERE Condition GROUP BY field_name HAVING Condition ORDER BY field_name [ASC or DESC];

Data manipulation TYPES

SELECT, INSERT, UPDATE, DELETE

HAVING Statements:

Similar to WHERE Statements, it confines the query into specific conditions Used after GROUP BY function USE HAVING (NOT WHERE) after GROUP BY function!!! • You CANNOT use WHERE after GROUP BY function!!!

DISTINCT Statement:

Syntax to use to select one record from duplicate records

Dimensional modeling

-A data design methodology used for designing subject-oriented analytical databases, such as data warehouses or data marts

Subject-oriented

-An operational database system is developed in order to support a specific business operation -A data warehouse is developed to analyze specific business subject areas

LIKE "%~%":

-Applies to string values -You can select the records which contain specific keywords

Outer Join

List the Customer ID number, registered date, and order number for all customers. Include all customers even if the customer does not have an order.

ORDER BY Statement:

-Returns the results sort by order -Syntax to control the order of output -Default: Ascending order (use DESC to sort them in decreasing order)

Outer join

-Rows that do not have matching values in common columns are still included in the result -Can use left outer join or right outer join

Data warehouse components

-Source systems -Extraction-transformation-load (ETL) infrastructure -Data warehouse -Front-end applications

Integrated

-The data warehouse integrates the analytically useful data from the various operational databases (and possibly other sources) -Integration refers to this process of bringing the data from multiple data sources into a singular data warehouse.

Structured repository

-The data warehouse is a database containing analytically useful information -Any database is a structured repository with its structure represented in its metadata

Inner/Natural join

-The query compares each row of table A with each row of table B to find all pairs of rows which satisfy the join condition. -Rows must have matching values in order to appear in the result.

Data Manipulation Language: INSERT

Insert values into the table

Data Manipulation Language: UPDATE

Modify stored data

Operational information (transactional information)

-the information collected and used in support of day to day operational needs in businesses and other organizations

data warehouse.

A ______________ is created within an organization as a separate data store whose primary purpose is data analysis.

retrieval of analytical information

A data warehouse is developed for the ________________, and it is not meant for direct data entry by the users

Null hypothesis:

A statement that no difference exists between the parameter and the statistic being compared to it.

ALTER TABLE statement:

ALTER TABLE table_name ADD field_name datatype <field conditions>; ALTER TABLE table_name ADD CONSTRAINT condition_name; ALTER TABLE table_name DROP field_name; ALTER TABLE table_name DROP CONSTRAINT condition_name;

Join

An operation that causes two or more tables with a common field to be combined into a single table or view

IN():

Applies to the discrete values

CREATE DATABASE example

CREATE DATABASE dbname

Data definition TYPES

CREATE, ALTER, DROP

Alter table

Change structure (i.e., adding new fields) or change constraints

Two approaches to hypothesis testing:

Classical statistics Bayesian statistics

Data Control Language (DCL)

Commands that control a database

Data Definition Language (DDL)

Commands that define a database

Data Manipulation Language (DML)

Commands that maintain and query a database

dimensional modeling

Commonly, ____________ is employed as a relational data modeling technique

Declaring a Primary Key

Create Table Broker (BrokerID Int not null primary key, LastName varchar(255) not null, FirstName varchar(25), GPA varchar(5));

Data Manipulation Language: DELETE

Delete the records in the data

star schema

Designing the _____________ involves considering which dimensions to use with the fact table representing the chosen subject

One-tailed test:

Directional test which considers only one possibility (either greater than or less than)

Data control TYPES

GRANT, REVOKE, COMMIT, ROLLBACK

Data definition FUNCTION

Generate data structure or delete

Data control FUNCTION

Grant or revoke permissions to database users and recover transactions

INSERT Statement

INSERT INTO Table_Name VALUES (value1, value2....);

INSERT Statement example

INSERT INTO Table_Name_1(Fields_Name_1) SELECT Fields_Name_2FROM Table_Name_2WHERE Conditions;

INSERT INTO

The __________ statement is used to insert new records in a table.

GROUP BY

The ____________ statement is often used with aggregate functions (COUNT, MAX, MIN, SUM, AVG) to group the result

INSERT INTO

The ____________ statement is used to insert new records in a table.

changes

The data in the data warehouse is not subject to _______.

non-volatile, static, read-only

The data in the data warehouse is referred to as____________ , ________, or __________

retrieval

The only functionality available to the users of the data warehouse is _________

Hypothesis Testing

The purpose of ___________ is determine the accuracy of your hypotheses by using statistics.

time variant

The term _____________ refers to the fact that a data warehouse contains slices or snapshots of data from different periods of time across its time horizo

Update Statement

UPDATE Book SET Stock = 10 WHERE Book_num = 5;

GROUP BY

When you use ___________ function, please make sure you select the fields which have the records unique to the ___________ level

the data warehouse =

a structured repository of integrated, subject-oriented, enterprise-wide, historical, and time-variant data. The purpose of the data warehouse is the retrieval of analytical information. A data warehouse can store detailed and/or summarized data.

Foreign keys

are identifiers that enable one table to refer to another table - Primary key from one table becomes the foreign key of the other - Foreign keys cannot have values that the primary key in other table does not have → Referential Integrity Rule

Primary keys

are unique identifiers of the table -This is how we can guarantee that all instances are unique - Use EntityNameID if not a "natural" primary key -Keys can be simple (a single attribute) or composite (more than one attribute). -No primary key attribute may be null. All primary key fields must have data → Entity Integrity Rule

Data Definition Language (DDL) examples

creating, altering, and dropping tables and establishing constraints

CREATE DATABASE

defines a database

CREATE TABLE

defines a new table and its columns

Dimensions Facts

dimensional modeling distinguishes two types of tables:

star schema

in the __________, the chosen subject of analysis is represented by a fact table

Data Control Language (DCL) examples

including administering privileges and committing data

GROUP BY Statement:

is often used with aggregate functions (COUNT, MAX, MIN, SUM, AVG) to group the result

Alternative hypothesis:

logical opposite of the null hypothesis.

ERD to SQL

look at slide 17 and 20 will be exactly like the exam... look through slide 18 - 20

Classical statistics:

represents an objective view of probability in which the decision making rests totally on an analysis of available sampling data. (the hypothesis is rejected or accepted based on the sample data collected)

Bayesian statistics:

represents an subjective probability estimates stated in terms of degrees of belief.

DISTINCT Statement example

select distinct(discount_rate) from coupon_usage;

CREATE TABLE example

tablename(column_name1 data_type ,column_name2 data_type, column_name3 data_type , ....)

Analytical information -

the information collected and used in support of analytical tasks -Analytical information is based on operational (transactional) information

Assumption for t-test:

•The observations must be independent—that is, the selection of any one case should not affect the chances for any other case to be included in the sample. •The observations should be drawn from normally distributed populations. •These populations should have equal variances.


Ensembles d'études connexes

Missed Test questions + Study Cards

View Set

Fitness Assessment/Prescription: EXSC U455

View Set

Ch 19 quiz: Appraisal of Property

View Set

Chapter 7 - Investment Vehicle Characteristics

View Set

Chapter 37 Respiratory adaptive quizzing

View Set

Chapter 7 - Access Control Lists

View Set