INFS 360 Test 1
dimension tables
Descriptions of the business, organization, or enterprise to which the subject of analysis belongs Columns contain descriptive info that is often textual but can be numeric Provides basis for analysis of the subject A typical dimension contains relatively static data Typically have orders of magnitude fewer records than fact tables
What is data visualization?
-Visually display measured quantities by means of the combined use of points, lines, a coordinate system, numbers, symbols, words, shading, and color -Finding artificial memory that best supports our natural means of perception -Transformation of the symbolic into the geometric -Depiction on information using spatial or graphical representations, to facilitate comparison, pattern recognition, change detection, and other cognitive skills by making use of the visual system
Why visualize data?
1. Record information (photos, maps, videos, blueprints) 2. Analyze data (solve problems graphically, discover patterns, explore data) 3. Communicate ideas (presentations, collaborations)
categorical data
Categorical = identify what the quantitative values measure Categorical items organized according to their corresponding quantities Ranking (order them) Ratio (often part to whole) Correlation (comparing two sets)
dimensional modeling
Data design methodology used for designing subject-oriented analytical databases, such as data warehouses and data marts Commonly employed as a relational modeling technique Distinguished two types of tables: dimensions and facts
preattentive attributes
Form: length, width, orientation, shape, size, enclosure Color: hue, intensity Position: 2D position Motion
law of pragnanz
Our eyes tens to find simplicity in complex shapes, preventing us from being overwhelmed by information overload
surrogate keys
Non-composite, system-generated key Values are typically simple auto-increment integer values Have no meaning or purpose except to give each dimension a new column that serves as the primary key within the dimensional model instead of the operational key
gestalt laws of grouping
Proximity - perceive objects that are close to each other as belonging to a group Similarity - tend to group together objects that are similar (size, color, shape, orientation) Enclosure - perceive objects as belonging to a group when they are enclosed in a way that appears to create a boundary around them Closure - perceive open structures as complete, closed, and regular if there is a way to interpret them that way Continuity - we perceive objects as belonging together as a part of a single whole, if they are aligned or appear to form a continuation of one another Connection - perceive objects that are connected as part of the same group
working memory
RAM (main memory) Processed info from iconic memory Information remains for a few second to hours if rehearsed Limited capacity (3-4 visual chunks) If you visualize 10 data sets, the viewer will need to refer to the legend constantly
fact tables
Related to the subject of analysis and the foreign keys (associating fact tables with dimension tables) Measures are typically numeric and are intended for mathematical computation and quantitative analysis In a typical fact table, records are added continually, and the table rapidly grows in size
star schema
Result of dimensional modeling is a dimensional schema containing facts and dimensions - often referred to as the start schema Chosen subject of analysis is a fact table Must answer two questions: Can the dimension table be useful for the analysis of the chosen subject? Can the dimension table be created based on the existing data sources?
Tufte's Principles
Show the data Substance over methodology Do not distort Reveal structure Enable comparison
lie factor
Size of the effect shown in the chart / size of the actual effect in the data
Visual distortion and how one creates it
Sizes /representations do not match the numbers Arbitrary baselines Parts do not add up Inconsistent spacing Size vs area/volume
law of figure and ground
Tend to segment our visual world into figure and ground
data visualization
The use of computer-supported, interactive, visual representation of data to amplify cognition
operational data
Time horizon: Days/months Data: Detailed, Current Control of update: Major issue Technical: Can be updated Small amounts used in a process Non-redundant High frequency of access Purpose: Support daily ops, application oriented
analytical data
Time horizon: Years Data: Summarized (and/or detailed), Values over time (snapshots) Control of update: No issue Technical: Read (and append) only Large amounts used in a process Redundancy not an issue Low/modest frequency of access Purpose: Support managerial needs, Subject oriented
iconic memory
analogous to buffer memory Pre-attentive processing Automatic/unconscious Notice attributes without focusing them Extremely fast (less than 1 sec)
ordinal data
descriptive values with prescribed order; typically shown in either ascending or descending order (years, scores in percentile form, likert scale)
nominal data
discrete, descriptive values; no specific order
long term memory
hard disk (persistent storage) Some of the info in the working memory makes it to permanent storage Can be recalled quickly but it's difficult to store Very complex, not fully understood process
memory types
iconic working long-term
best preattentive attributes to display quantity
length and 2D position best; width, size, intensity use limited
quantitative data
measure something, usually with numbers Not all numbers are quantitative values (years, IDs, phone numbers, account numbers)
hierarchical data
multiple categories organized in a tree; parent to child connections (colleges/schools - majors ; departments - business units)
interval data
ranges of quantitative values that form categorical items (GPA ranges, temperatures)