ITP 487 Midterm 1

¡Supera tus tareas y exámenes ahora con Quizwiz!

Normalization (and the 3 anomalies it removes)

•"Normalization is the process of structuring a relational database ... in order to reduce data redundancy and improve data integrity." •Normalization eliminates modification anomalies 1. Insertion anomaly 2. Update anomaly 3. Deletion anomaly •Anomalies can be removed by splitting the relation into two or more relations; each with a different, single theme •By splitting the relation into several relations, you normalize the database by reducing redundancy. This leads to fewer anomalies. •However, breaking up a relation may create referential integrity constraints

candidate key

•A candidate key is a key that uniquely identifies a row in a relation

Data warehouse

•A data warehouse is a process and technology that brings data from multiple data sources to a target database that support's enterprise's decision-making processes. •"Single source of truth for the enterprise" physically holds the consolidated data from multiple sources in the format defined by the Warehouse Catalog. •Large companies performs business intelligence functions using data warehouses

If CIO reports to COO

-Can be either like CEO or CFO

3 data roles and jobs

1. Chief Data Officer 2. Data Scientist 3. Data Analyst

Multi-Dimensional Analysis - has all the various primary keys in one row. what is the Maximum number of rows if there are 50 products and 100 customers?

50*100 = 5000 rows

entity

An entity is some identifiable thing that users want to track: -Customers -Products -Sales Orders Vendors

Big Data vs Enterprise Data

Big Data •Large and complex data sets •Grows at a high rate (volume & velocity) •Various data types (variety) •Incomplete data (veracity) • •May require more than a standard database -Standard DB/RDBMS - relational database management system Enterprise Data •Data used by large companies •Typically, structured data •Stored in standard databases

Why is CIO important?

CIO is responsible for enterprise systems, and enterprise systems are the ones that enable cross-company changes. CIO sponsor huge consulting projects.

Give an example of a data movement process.

Copy and paste from excel to powerpoint Data Source: Excel Data Staging: Clipboard Data Target: Powerpoint ETL between each data area Download data from a web site and import it into Tableau.

Is an ERP system a transactional system?

ERP system is a transactional system

What is SAP HANA?

SAP's in-memory database that is integrated with SAP 4 HANA products. So HANA compatible only products. Uses SQL. supports database views (virtual tables)

Source System:

Source System: data sourcing or data collections / extracting data

Chief Data Officer

The CDO is the senior executive who bears responsibility on behalf of an organization for enhancing the quality, reliability and access of data and creating value from its data assets and from the external data ecosystem. Value creation comes through data exploitation, enabling all forms of business outcomes through analytics (including data and analytics governance) and enterprise information policies.

Why is SAP HANA highly performant?

because it is: - in-memory - uses columnar data storage - compression which avoids data redundnacy

Why do you need to have a data warehouse?

more efficient to analyze using one unified, central system instead of many disparate systems.

Master Data Tables

normalized with 3 key additional attributes: text, attributes, hierarchies

What if a customer ordered the same product from the same vendor twice? in sap BW

revenue would increase, but you wouldn't make a second row of the same customer, product, and vendor ID.

What is the data structure of the result of SQL select statement?

tabular

OLTP

takes in data from external sources. OLTP isthe data sources. tabular

Data Analytics Process (6 step process)

•Data analytics is a process that involves a)identifying the problem b)gathering relevant data that frequently are not in a usable form c)cleaning up the data to make them usable d)loading them into data storage models e)manipulating them to discover trends and patterns that leads to actionable insights making decisions based on those insights.

Deletion anomaly

•Deletion anomaly: deletion of a row loses information about two or more entities •Example: john hunt retires, deleting we lose information about NY office

Data Warehouse Objectives

•Integrated, company-wide view of high-quality information (from disparate databases) •Standardized structures and representation for all enterprise information, defined by warehouse catalog •Easy to use access, single point of access to all enterprise information •Self service, high quality business reporting and analysis on all levels •High performance environment fed from heterogeneous sources •Unburden systems and IT resources in the transactional environment (e.g. SAP ERP)

Data Scientist

•Intersection of computer science, statistics, and domain knowledge •Involves use of computers to acquire knowledge by analyzing large amounts of data using models and domain expertise

Data Storage Structures

•Tabular Structure: Relational databases à transactional systems •Multi-Dimensional Structure: Data warehouses à informational systems. Are data warehouses relational? •Can the two be combined into one data structure? •Read vs. write optimized •Performance

Data Structure: Tabular vs Multi-Dimensional (OLTP OR OLAP)

•Tabular •(OLTP) transactional -First Row Defines Class -Subsequent rows are instances of that data type -E.g. Spreadsheets, relational databases •Multi-Dimensional (OLAP) informational -Facts surrounded by dimensions for analysis -E.g. online analytical processing systems (e.g. data warehouses) Allows you to slice and dice data - e.g. Revenue by the dimensions

Dimensions (Dimension Tables)

•The fact table by itself does not have sufficient information about the customer, products, and other transaction details; such as name, description, and location... •Details regarding these master data are stored in separate tables called dimension tables. •A dimension table provides a more detailed view of the fact; for example, "What is the total revenue by customer including their name and address?" •Revenue is the fact; Customer number, customer name and address are stored in the customer dimension. •If we were to pull data from the fact table without the data in the dimension table, we would have only the Customer_Number and not the customer's name, which would not be particularly useful.

Third Normal Form (3NF)

•Third Normal Form (3NF) - In 2NF and all of its columns are not transitively dependent on the primary key.

Dependencies (and what does the tables look like*)

Time - a from and a to column Language - has a language key Currency - has a currency key UoM -

SAP HANA Technology Performance & Cost Limitations

used to lack the CPU's and RAM necessary as it was too costly. Also storage. These costs went down so now SAP HANA is possible

Composite Entity

(also known as linking tables)* •Avoid problems inherent to M:N relationship by creating a composite entity, gets rid of many to many relationship. -Includes as foreign keys the primary keys of tables to be linked

Compression techniques

- Dictionary Compression - Prefix Encoding Compression - Run Length Encoding - Indirect Encoding Compression - Sparse Encoding

What is the difference between Classical DB vs Column Data Storage?

- classical DB has column headers then rows of data. Often uses something like 85 characters for 1 row of data. A column data storage has a dictionary and tables filled with specific numbers. Instead of holding long character strings, it holds numbers like 1 or 0 which take up one character. So classical is 85 characters * 6 rows of data = 610 characters. Column data storage of same thing uses 3 characters * 6 rows = 18 characters plus dictionary, but dictionary is negligible amount once we have tons of data. So column data storage is way more efficient. In-memory databases are so expensive, so using column data storage is way more efficient.

Strategy Tool: Gartner - Technology Research

-Magic Quadrants - provides vendor landscape -Hype Cycle - technology maturity -Maturity Models - your organizations maturity on a specific domain -Research and recommendations

Chief Information Officer (CIO)

-Most senior executive responsible for the business systems -May report into the CEO, COO, or CFO

If CIO reports to CFO

-View as a cost savings functions -Continuous drive for efficiency and cost savings -Lower risk acceptance

If CIO reports to CEO...

-Viewed as a strategic asset to the company -Also responsible for product/innovation side of technology -Investments and experimentation to gain strategic advantages over competition -Higher risk acceptance

Explain the need for data analytics (2 causes)

1. Drivers of Business Intelligence Globalization Speeding of global connections Decentralization People price check all the time and find cheapest point Smaller margins Same reason as above Faster reaction to market Right when you put a product on market someone copies it Increasing importance of service 2. Explosion of data •More users •Socialization of everything •Entertainment shift •Mobility •Internet of Things -Sensors -Devices •Cloud and longevity of data •Low-cost storage

2 Advantages of a Star Schema

1. The star-like layout is easy to understand and implement. •A fact table surrounded by dimension tables is one of the simplest database schemas. •Similar to other relational databases, the one-to-many relationship between dimension tables and the fact table is straightforward. •The use of primary keys and foreign keys to link the tables is also familiar. •The star schema is easy for business users to visualize. 2. The star schema has only a single level of JOIN for any query. •The fact table is joined to one or more dimension tables, so the JOINs are easy to build and can be rapidly executed (depending on the number of rows, of course).

transitive dependency

A condition in which an attribute is dependent on another attribute that is not part of the primary key. A functional dependency is said to be transitive if it is indirectly formed by two functional dependencies. For e.g. X -> Z is a transitive dependency if the following three functional dependencies hold true: X->Y Y does not ->X Y->Z

Analytics Tools

Analytics Tools: where analysts perform business analysis

SAP Dates (present, beginning of time)*

Beginning of time is 1/1/1000 End of time (present) is 12/31/9999

Classic Star Schema vs SAP BW Schema Name Mapping

Classic Star Schema vs Sap BW Fact = Key Figure Fact Table = Fact Table Dimension Table = Dimension SAP BW has Master Data Tables with Text, Attributes, and Hierarchies Dimension Attribute = Characteristic we analyze by characteristics instead of dimension attribute

Data Mart (Cube)

Data Mart (Cube): small structures where data is stored

Data Staging

Data Staging: the process of data cleaning and preparation •Before data can be analyzed, they need to be organized in a form that can be accessed by the authorized users. •The process whereby data are organized and prepared for analysis is called data staging. The staging process ensures that data are consistent, relevant, and free of ambiguities and they can be analyzed in a reasonable amount of time

Data Science - what is it? what are the three disciplines? What is Data Analytics in the context of Data Science?

Data science •creates algorithms and models using 1. statistics, 2. computer science, and 3. domain knowledge (ie specific industry like healthcare) •Data Analytics - applies algorithms and models generated from data science

Data Warehouse (DW)

Data warehouse: the process of transforming and preparing data

Decision making

Decisions on major technology projects are typically made by a steering committee •Steering Committees consists of -Executive Sponsor -Key Stakeholders • •Committee Meetings -Kick-off -Updates Strategy (Initial Findings & Recommendations)

What component of strategy gives you legitimacy?

Discovery, which is: -Framework/Taxonomy -Current Landscape ("as is") so looking at competitors

Do OLTP systems use master data or transactional data? Do informational systems use master data or transactional data

OLTP and informational systems use both master and transactional data. In OLTP, customer and vendor tables would be master data, but new order in Order table would be transactional data. In informational systems (OLAP), the table contains master data like customer_id and vendor_id, and also transactional data like revenue and quantity.

5 components of a Data Warehouse

Source System: data sourcing or data collections / extracting data Data Staging: the process of data cleaning and preparation Data warehouse: the process of transforming and preparing data Data Mart (Cube): small structures where data is stored Analytics Tools: where analysts perform business analysis

OLAP components

Staging - where data is cleansed Warehouse catalog- defines the data that goes into the data warehouse ie USC student is defined as USC_ID Warehouse management, which is the way you take care of your laptop when you run out of space, ensures data moves along as you do maintenance. Warehouse management is where you figure out where to store data Data warehouse- •A data warehouse is a process and technology that brings data from multiple data sources to a target database that support's enterprise's decision-making processes. single source of truth for the enterprise. Data Marts - slice of data from the data warehouse for departmental analysis.

Star Schema vs. Snow Flake Schema

Star schemas are denormalized relational databases with one fact table and multiple dimension tables. Snowflake schemas resolve the problems associated with star schemas, and feature Master Data Tables which are Dimension Tables that have been further normalized. Unlike star schemas, snowflake schemas can support historization, which is changing dimensional data over time, as well as support loading of dimensional attributes from various data sources, by splitting dimension tables into more tables (Master Data Tables). This also enables the snowflake schema to support multiple languages, currencies, and units of measure, and also reduces data redundancy.

Unstructured vs Structured Data

Structured data could be interpreted directly by computer while unstructured data needs to be translated in some ways for computer to understand Structured data: Excel / Spreadsheet; Unstructured data: images, videos Structured data is typically organized/modeled, usually in columns and rows, while unstructured data is unorganized. Structured data features fields or cells with fixed-widths, while unstructured data can be differing lengths. Structured data features cells with values that are connected to an input such as name or sales dollars, while unstructured data possesses values that can be free format and unrelated to each other, such as suggestions and comments. Structured data is simple to examine while unstructured data can be difficult to examine. Computers can understand structured data but unstructured data must be translated for a computer to understand its content. Structured data includes dates, structured text, and numeric values while unstructured data can include video, graphics, pictures, audio, and text. CHECK SLIDE ON THIS, SLIDE 295

Transactional Systems vs Informational Systems (5 characteristics each)

Transactional systems: Process transactions quickly Good for quick writing Store current, detailed information Can be updated Support business processes Informational systems: Store summarized data Supports managerial decision making Periodically loaded For quick reading data is historical Transactional systems: Tabular structure Provide a current representation of the state of the business Allow for narrow, planned and simple updates and queries Have many, constant updates and queries on one or a few table rows Used by clerks, salespersons, and administrators Informational systems Characteristics of informational systems Multi-dimensional structure Used by managers, business analysts, and customers Enables complicated ad hoc queries and analysis, The type of data in these systems is historical point-in-time (snapshots) and predictions Periodic batch updates and queries requiring many or all rows

can you use Analytics Tools to go directly into the data warehouse without using data marts?

Yes, you can use Analytics Tools to go directly into the data warehouse without using data marts

What is a Dimension Inside SAP BW?

•A dimension is a group of characteristics which belong to the same business object (semantically). •Each characteristic may contain additional attributes which are relevant for data analysis - or not!!! •Display attributes are InfoObjects for which data analysis does not make sense (such as a house number or a family name) •Navigational attributes are InfoObjects which are used to perform data analysis. •Time-dependent attributes (example: price, payrate, etc.) •Time-independent attributes (example: product weight and volume)

foreign key

•A foreign key is the primary key of one relation that is placed in another relation to form a link between the relations: -A foreign key can be a single column or a composite key -The term refers to the fact that key values are foreign to the relation in which they appear as foreign key values

composite key

•A key is a combination of one or more columns that is used to identify rows in a relation •A composite key is a key that consists of two or more columns

primary key

•A primary key is a candidate key selected as the primary means of identifying rows in a relation: -There is one and only one primary key per relation -The primary key may be a composite key -The ideal primary key is short, numeric and never changes

referential integrity constraint (RIC)

•A referential integrity constraint (RIC) is a statement that limits the values of the foreign key to those already existing as primary key values in the corresponding relation e.g. invoices cannot exist without a customer* CUSTOMER(CustomerNumber, Language Key, Sales Organization, Location, Country, Customer Description) SALES TRANSACTIONS(Year, Month, Day, CustomerNumber, OrderNumber, OrderItem, ProductNumber, SalesQuantity, UnitOfMeasure, Currency, Revenue, Discount) RIC Where SALES TRANSACTIONS.CustomerNumber must exist in CUSTOMER.CustomerNumber

surrogate key

•A surrogate key as an artificial column added to a relation to serve as a primary key: -DBMS supplied -Short, numeric and never changes - an ideal primary key! -Has artificial values that are meaningless to users -Normally hidden in forms and reports

Data Analyst

•Applies the algorithms and models generated from data science •Concerned more with applying models than with creating them

6 Components of a Strategy

•Background & Objectives •Roles & Responsibilities -Executive Sponsor -Key Stakeholders -Subject Matter Experts •Discovery -Framework/Taxonomy -Current Landscape ("as is") •Key Questions/Requirements •Proposal -Decision Criteria -Proposed Landscape ("to be") •Roadmap & Budget

Cardinality

•Cardinality means "count," and is expressed as a number. •Maximum cardinality is the maximum number of entity instances that can participate in a relationship. •Minimum cardinality is the minimum number of entity instances that must participate in a relationship.

What is data analytics?

•Data analytics is a process that enables access to and analysis of information to improve and optimize decisions and performance

4 Shortcomings of Star Schemas

•Duplication of dimensional data (see next slide) •No support for multiple languages •No historization (changing dimensional data) •Hierarchies can cause anomalies •The customer dimension at the right contains two hierarchies -Sales_territory -Country

characteristics

•Each attribute in a dimension table is called a characteristic. • For example, in the Customer table, the customer name and customer address, are characteristics. •These characteristics may have additional attributes describing them in more detail. •The customer name could have salutation, middle initial, previous names etc. The customer address could have details such as county, time zone etc. •The attributes are stored in separate tables called attribute tables. •Some characteristics may have additional textual data describing them. These text oriented attributes are stored in text tables. •Some characteristics may have hierarchies associated with them. They are stored in hierarchy tables. •Some characteristics may have geospatial data associated with them. They are stored in geospatial tables.

Enterprise Data Analytics

•Enterprise Data Analytics is analytics for large companies •Large companies performs business intelligence functions using data warehouses

First Normal Form (1NF)

•First Normal Form (1NF) - Information is stored in a relational table with each column containing an atomic value. There are no repeating groups of columns.

The SQL keyword GROUP BY

•In general, place WHERE before GROUP BY. Some DBMS products do not require that placement, but to be safe, always put WHERE before GROUP BY. •The HAVING operator restricts the groups that are presented in the result. There is an ambiguity in statements that include both WHERE and HAVING clauses.The results can vary, so to eliminate this ambiguity SQL always applies WHERE before HAVING

Insertion anomaly

•Insertion anomaly: insertion of a fact in one entity cannot be done until a fact about another entity is added •We cannot record until we know information for the entire row. We can't record the SalesOffice until we know the salesperson

Master Data vs Transactional Data

•Master Data -Data used across the business processes that rarely changes -Same data is used across the enterprise •Examples -Customer -Product -Vendor -Property •Transactional Data -Data resulting from executing a business process, aka a transaction -Includes data such as dates, prices, delivery terms, etc -Uses Master Data ID (as foreign keys) •Examples -Sales Orders -Purchase Orders -Rental Booking Master data is used throughout the different business processes and rarely changes. This same data is used throughout the enterprise. An example would be product data. In contrast, transactional data is the data produced from executing a business process/transaction. Transactional data consists of data such as delivery items, dates, prices, and other data, and uses foreign keys that are Master Data IDs. An example of transactional data would be sales orders, and transactional data changes very frequently. Transactional data is data recorded as daily transaction activities while the master data records all the transactional data and other important information. Example: Amazon's order data is transactional data but the whole database that includes customers' information, employees' information, transactions, and others is the master data.

SAP BW Schema

•Master Data •Represents Dimensional data •Independent of any Fact table •Reusable in multiple Fact tables •Designed to support multi lingual systems •Designed to support time dependent data. •Designed to improve query performance •Fact table •Consists of Dimensional Keys and Key Figures (facts) •Values that can be added, subtracted, divided, multiplied, etc. •Dimension table •Acts as a link between Master data and Fact table records •Advanced DataStore Object (replacement for InfoCube) •Consists of Fact and Dimensional tables.

Maximum Cardinality

•Maximum cardinality is the maximum number of entity instances that can participate in a relationship. •There are three types of maximum cardinality: -One-to-One [1:1] (rare) -One-to-Many [1:N] (ideal relationship) Many-to-Many [N:M] (has to be resolved)

Minimum Cardinality

•Minimum cardinality is the minimum number of entity instances that must participate in a relationship. •Minimums are generally stated as either zero or one: -IF zero [0] THEN participation in the relationship by the entity is optional, and no entity instance must participate in the relationship. -IF one [1] THEN participation in the relationship by the entity is mandatory, and at least one entity instance must participate in the relationship. -Minimum cardinality of zero [0] indicating optional participation is indicated by placing an oval next to the optional entity. -Minimum cardinality of one [1] indicating mandatory (required) participation is indicated by placing a vertical hash mark next to the required entity. •Look toward the entity in question: -IF you see an oval THEN that entity is optional (minimum cardinality of zero [0]). -IF you see a vertical hash mark THEN that entity is mandatory (required) (minimum cardinality of one [ 1]).

Subqueries

•Often necessary to process data based on other processed data •Subquery is a query inside a query, normally inside parentheses •First query is the outer query •Inside query is the inner query •Inner query is executed first •Output of inner query is used as input for outer query •Sometimes referred to as a nested query

Physical vs. Virtual Data

•Physical (Persistent) -Data is physically stored locally -Accessed directly -E.g. Database table •Virtual -Data is stored elsewhere -Virtual has the instructions on how to access in real time -E.g. Database view

Relation

•Relational databases store data about entities in relations, which are a special type of table •A relation is a two-dimensional table that has the following characteristics: •Rows contain data about an entity •Columns contain data about attributes of the entity •All entries in a column are of the same kind •Each column has a unique name •Cells of the table hold a single value •The order of the columns is unimportant •The order of the rows is unimportant •No two rows may be identical A RELATION IS A TABLE NOT A RELATIONSHIP

SAP BW - Snowflake Model

•SAP BW uses a snowflake model instead of the star schema •Further normalization and expansion of the dimension tables in a star schema results in a snowflake design. •This reduces data redundancy which reduces storage space but increases the number of joins. •A dimension is said to be snowflaked when the low-cardinality columns in the dimension have been removed to separate normalized tables that then link back into the original dimension table.

Second Normal Form (2NF)

•Second Normal Form (2NF) - In 1NF and all columns depend on the table's primary key.

Star Schema

•Simplest style of data warehouse •are denormalized relational databases with one fact table and multiple dimension tables •Facts - numeric values that are being analyzed •Dimensions - master data that you will use to analyze (slice & dice) the fact(s) •Fact Table - stores facts (transactional data) with foreign keys to dimensions (master data) •Dimension Tables - stores dimensions (master data) also includes other dimensions

Information Analysis Structure

•Sources of Data -Corporate Data Sources - these are the data sources from within your organization -External Data Sources - currency exchange, weather, taxes, etc •Information Analysis Structure -Warehouse Catalog - this is the enterprise-wide definitions and characteristics of data •Student is uniquely defined by USC ID and is 10 numeric characters -Staging -automate and run extraction, transformation and loading of data -Data Warehouse - physically holds the consolidated data from multiple sources in the format defined by the Warehouse Catalog -Data Mart (Department Data Warehouse) - department specific data warehouse with a subset of data from the enterprise data warehouse -Warehouse Management Subsystem - it's the software and tools to manage the data

Strategy (definition)

•Strategy -Vehicle of change in people, process, and/or technology to meet objectives -Documents the "as is" state, "to be" state, and proposes a roadmap to get there -Strategies are implemented by one or more projects •Roadmaps -Sequencing and timing of projects within a strategy - •Portfolio Management -Decisions are taken in context with regards to existing people, process, and technologies

Structured Data

•Structured data is data that can be read and used by people and computers •String, numeric, or dates •Stored in cells or fields of fixed length -Examples: •Spreadsheets •Flat files Databases

Snow Flake Schema

•Style of data warehouse that addresses the shortcomings of Star Schema -Each vendor may have their own "variation" of this Schema -Dimension Tables are further normalized into Master Data Tables. DimIDs (Dimension Table) •Key Additional Compoents: -Master Data Tables •Attributes, Text, and Hierarchies The dimension tables are split into more tables. •Reduces data redundancy •Supports multiple languages, currencies & Unit of Measure •Supports historization •Support loading of dimensional attributes from different data sources Snowflake has fact tables that are the same as star schema, with transactional (facts/measures) and relationships to dimension (foreign key) Dimensions are relationships between fact table and master data tables. They have no values, but ID's to master data tables. Master data tables consist of attributes, text, and hierarchies.

OLTP vs OLAP: Data Structure

•Tabular (OLTP) -First Row Defines Class -Subsequent rows are instances of that data type -E.g. Spreadsheets, relational databases •Multi-Dimensional (OLAP) -Facts surrounded by dimensions for analysis E.g. online analytical processing systems (e.g. data warehouses Allows you to slice and dice data - e.g. Revenue by the dimensions

Data Warehouse Structure

•Takes large quantities of data and stores them so that we can easily access the data •Relational database that has been denormalized •That is, some redundant data is added back into the database to reduce the number of tables. This speeds up queries. Why? •Because data warehouses are "read only", modification anomalies are not relevant. Why? •Provides a "single source of truth" for analytics

How to snowflake

•The basic process of snowflaking involves: -Extracting attributes from the dimension tables -Placing them into new tables. •There are multiple snowflaking models, all of which split the dimension table into more tables. •The precise process for performing this division depends on the specific model (e.g. each vendor may have its own - SAP BW/4HANA)

Facts (Fact Table)

•The fact table contains facts about the business; in other words, it contains a list of historical transactions with data about key figures or measures. •These transactions originate in a transactional system and cascade (via ETL) to the data warehouse and ultimately into the star schema. •Two types of data are stored in the sales fact table. •Product_ID, Time_ID, SalesOrg_ID, Region_ID, and Customer_ID, are master data about the sales order; for example, who bought what, when, and where. •Quantity Sold, Price, and Revenue are transactional data. •The transactional data of the fact table are known as measures or key figures. •The fact table by itself does not have sufficient information about the customer, products, and other transaction details; such as name, description, and location... Details regarding these master data are stored in separate tables called dimension tables

Other Dimensions

•The time dimension contains details about the time of a transaction in the fact table: year, quarter, month, day, hour, and so on. •The unit dimension defines the unit of measure for quantity values. •There are generally two types of unit dimensions, currency and unit of measure (UoM). If any key figure is of type currency, then it must have a currency key. •The currency key links to the unit dimension that contains the description of the currency. The third dimension is the data package dimension. This dimension contains the details of the time when a record was loaded to the fact table. The loading number and time stamp are examples of columns in the data package dimension. This dimension is not used for analytics and is not shown on the figure

Type vs Instance

•Type (aka Class) -Defines the structure of instances of data -Each attribute type is defined (e.g. character, length 10) •E.g. Defines a USC student as -USC ID -Name -Address -Email Gender •Instance (aka Object) -"Real world thing" -Victor Qiu is an instance of a Student Type • •E.g. Tommie Trojan -Student ID: 626626626 -Name: Tommy Trojan -Address: San Marino -Email: [email protected] -Gender: F

Unstructured Data

•Unstructured data may not be understandable to a computer in its native form •Must be transformed to be understandable by a computer •Understanding means understanding the semantics of the data

Update anomaly:

•Update anomaly: update of a fact must done in all locations of the changed fact •When we have same information in several rows. Example: Must change office number in multiple locations

View

•View - (virtual table) -Data is stored elsewhere -Virtual has the instructions on how to access in real time -Can be stored in other databases E.g. Database view


Conjuntos de estudio relacionados

LearningCurve 6a. Basic Concepts of Sensation and Perception

View Set

BUS 19: Ch 4-Intro to Probability

View Set

6. Tétel: Ismertesse az állam gazdasági szerepét

View Set

Ch 15 Preventing Cardiovascular Disease

View Set