ISM 4212

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Dimensional modeling

- A data design methodology used for designing subject-oriented analytical databases, such as data warehouses or data marts - Commonly, dimensional modeling is employed as a relational data modeling technique

Data Warehouse

- A typical organization maintains and utilizes a number of operational data sources. - The operational data sources include the databases and other data repositories which are used to support organization day-to-day operations - A data warehouse is created within an organization as a separate data store whose primary purpose is data analysis.

Subject-oriented

- An operational database system is developed in order to support a specific business operation - A data warehouse is developed to analyze specific business subject areas

ANOVA Test

- If we have k-independent-samples tests (i.e., three or more samples) - Uses the variance of observations

Star Schema

- In the star schema, the chosen subject of analysis is represented by a fact table - Designing the star schema involves considering which dimensions to use with the fact table representing the chosen subject

What does an Outer Join do?

- List the Customer ID number, registered date, and order number for all customers. Include all customers even if the customer does not have an order. - SELECT customer.customerid, registerdate, orderid FROM customer LEFT OUTER JOIN b2c_order ON customer.customerid = b2c_order.customerid

Structured repository

- The data warehouse is a database containing analytically useful information - Any database is a structured repository with its structure represented i in its metadata

Assumption for t-test

- The observations must be independent—that is, the selection of any one case should not affect the chances for any other case to be included in the sample. - The observations should be drawn from normally distributed populations. - These populations should have equal variances.

SELECT statement

- WHERE - GROUP BY - HAVING - ORDER BY

Foreign keys

- are identifiers that enable one table to refer to another table - cannot have values that the primary key in other table does not have

Primary keys

- are unique identifiers of the table - Simple (single attribute) or Composite (more than one attribute) - No PK can be null - Guarantee's that all instances are unique

Analytical Information

- the information collected and used in support of analytical tasks - based on operation (transnational) information

Data definition Types

-CREATE - ALTER - DROP

Data control Types

-GRANT - REVOKE - COMMIT - ROLLBACK

Data manipulation Types

-SELECT - INSERT - UPDATE - DELETE

Data warehouse components

-Source systems -Extraction-transformation-load (ETL) infrastructure -Data warehouse -Front-end applications

Alternative Hypothesis (H1)

logical opposite of the null hypothesis

Outer join

- Rows that do not have matching values in common columns are still included in the result - Can use left outer join or right outer join

Integrated

- The data warehouse integrates the analytically useful data from the various operational databases (and possibly other sources) - Integration refers to this process of bringing the data from multiple data sources into a singular data warehouse.

Retrieval of analytical information

A data warehouse is developed for the retrieval of analytical information, and it is not meant for direct data entry by the users

Null Hypothesis (H0)

A statement that no difference exists between the parameter and the statistic being compared to it.

Data Control Language (DCL)

Commands that control a database, including administering privileges and committing data

Data Definition Language (DDL)

Commands that define a database, including creating, altering, and dropping tables and establishing constraints

Data Manipulation Language (DML)

Commands that maintain and query a database

ALTER TABLE

Data Definition types

One-tailed test

Directional test which considers only one possibility (either greater than or less than)

Heteroscedasticity

The variances of population are equal

Left outer Join

left table and then things that correspond in the right table

Join

An operation that causes two or more tables with a common field to be combined into a single table or view

Independence of errors

Each sample group is independent from each other

Referential Integrity Rule

Foreign keys cannot have values that the primary key in other table does not have

Data definition

Generate data structure or delete

Data control

Grant or revoke permissions to database users and recover transactions

Entity Integrity Rule

No primary key attribute may be null. All primary key fields must have data

Two-tailed test

Non-directional test which considers two possibilities: the purchase probability of treatment group could be more than or less than that of control group

T-score < T- critical

Null hypothesis is not rejected

T-score > T- critical

Null hypothesis is rejected

Data manipulation

Retrieve data, Edit, Update, Add, delete

Explicit JOIN notation

SELECT * FROM employee JOIN department ON employee.DepartmentID = department.DepartmentID SELECT LastName, CourseTitle FROM student JOIN course ON student.studentID = course.studentID

Implicit JOIN statment

SELECT * FROM employee, department WHERE employee.DepartmentID = department.DepartmentID SELECT LastName, CourseTitle FROM student, course WHERE student.studentID = course.studentID

Selection Bias

Samples must be randomly selected from normal population

Data Warehouse Definition

The data warehouse is a structured repository of integrated, subject-oriented, enterprise-wide, historical, and time-variant data. The purpose of the data warehouse is the retrieval of analytical information. A data warehouse can store detailed and/or summarized data.

Hypothesis Testing

The purpose of hypothesis testing is to determine the accuracy of your hypotheses by using statistics.

Inner/Natural join

The query compares each row of table A with each row of table B to find all pairs of rows which satisfy the join condition. Rows must have matching values in order to appear in the result.

Time variant

The term time variant refers to the fact that a data warehouse contains slices or snapshots of data from different periods of time across its time horizon

Classical Statistics

represents an objective view of probability in which the decision making rests totally on an analysis of available sampling data. (the hypothesis is rejected or accepted based on the sample data collected)

Bayesian Statistics

represents an subjective probability estimates stated in terms of degrees of belief

Operational information (transactional information)

the information collected and used in support of day to day operational needs in businesses and other organizations


Kaugnay na mga set ng pag-aaral

postoperative medsurg chapter 20 n141

View Set

CHAPTER 9: DNA & the Molecular Structure of Chromosomes

View Set

PSYCHOLOGY - CHAPTER 7: LEARNING

View Set

Managerial communication test- chapter 7, 8

View Set

EMT Chapter 16: Cardiovascular Emergencies Quiz

View Set

Compensation and Benefits Chapter 5

View Set