HSCI 391 Final
data misrepresentation
1. unconscious bias 2. taking advantage of glitzy software results inaccuracy of visualization 3. a conscious desire to deceive viewer of the visualization 4. poorly constructed visualization with insufficient attention to detail
relational
2 or more interrelated tables always have key field common to both tables in order to relate the records in those tables
databases versus spreadsheets
RDBMS require structure of database created before any data is entered into the system spreadsheets are blank slates
line chart
a type of chart which displays information as a series of data points called 'markers' connected by straight line segments most versatile graph types best types to illustrate data in a time series time depicted left to right depicted w/o clutter require little explanation
cluster column charts
a variation of a column chart that includes more than one category typically needs a legend
user level security
access control to a file, printer, or other network resource based on username. it provides greater security than share-level security because users are identified individually or within a group
temporal
allows us to ID patterns that we would not see otherwise time
input masks
allows user to force future end-users to input data using a specified format specify how data should be inputed
security methods for DBMS
anonymity data partitioning/segmentation encryption
text field
any alpha numeric character sometime called character or string fields can enter #s but no mathematical formulas allowed
consideration when creating a database
attention to initial design procedure to ensure consistency & integrity (hygiene, validation, rules) eliminate data redunacy protect sensitive data from inappropriate access
data partitioning/segmentation
based on the process of splitting up a database into multiple partitions in order to improve performance or as a security precaution to individually secure the partitions and control database user access for each partition
data partioning
based on the process of splitting up a database into multiple sections in order to improve performance or as a security precaution to individually secure and control database user access
relational database management systems
basic technology that can be used to support everything from accounting systems to electronic health records to personal christmas card list access allows users to create customized databases that contain custom-designed data entry screens, queries, reports, and user interface screens
logical fields
boolean Y/N T/F 1/0 used when can only have 2 possible field types
column charts
can be used with a time-series data too many data points-too many bars ineffective
memo field
can contain anything documents, graphics, links
cell address
combining column letter & row number every cell assigned a name which is a cell address identifies it's location in the spreadsheet 2 different types of cell address relative & constant
stacked bar or column charts
communicating the relative proportions of 2 variables over time
databases
composed of a collection of one or more tables
tables
composed of records (rows)
range
contiguous group of cells is specified ex. B2:B247
data validation in access
create rule used to test data input into that field error message pop-ups if user tries to do something else
what is the basic concept of spreadsheets?
data values are separated from the formula logic
what are the building blocks of RDMS?
database fields
under HIPPA-Safe Harbor Method
de-identifying patients data in fields
name
each field must be named no 2 fields can be the same
single-user desktop
easiest DB to deploy DB only exist on 1 persons computer
logic errors
errors that result in formulas that execute properly but don't produce the desired results difficult to identify follow formula's logic step-by-step to see where disconnect happens between the expectation & the result
cells are _____
essentially variables
spreadsheet standard review board
establish recommended guidelines for organizational use of spreadsheets & best practices for spreadsheet modeling
size
field assigned particular size will reserve that amount of capacity. stated in terms of how many characteristics a field will be able to hold
circular references
formula incorrectly references it's own results 2 or more formulas referenced each other results
records
info concerning places, things, events, or person
shapefiles
information related to where someone or something is physically present made popular by the environmental systems research institute ESRI
anecdotal data
information that doesn't necessarily have scientific validity
another type of data validation?
input masks
arguments
inputs & instructions needed for the function to do it's job
indexinf
keeps track of each record on table & change order of the way they appear index key has made a massive changes only takes 30 seconds
what is the strongest data encryption?
key-exchange cryptography not available on access
best practices for spreadsheets
know the purpose purpose should communicate to end users font colors (blue for constants/input cells) (black for formulas) assumptions should never be located together w/ outputs data validations techniques should be used whenever possible complex formulas should be documented use effective version control
clusters
larger than expected number of incidences of disease (crimes, car accidents, etc.) related by time & place
spreadsheets
limited # of rows (records) structure can be created while data is being inputed can only support flat file no structured report-writing capabilities limited limited ability to create data queries
DBMS
limited only by amount of disk space & operating system structure must be created first (logic) then data can be input can support complex relational DB structures ability to create complex data queries using boolean logic requires planning
types of geography
lines, points, & polygons
pattern recognition
looking at numbers analytically to determine what interesting patterns or trends may exist
projection
mathematical algorithms to adjust coordinates such as latitude & longitude for the earth's curvture
excel formuals
may be mathematical in nature or used to manipulate text evaluate logical conditions or perform other non-mathematical tasks
data fields
may be referred to as datetime formats only holds dates set up in chronological order
functions
mini computer games that can perform very specific specialized tasks within another program input --> output ex. =sum()
selection process from preselected values
minimize amount of typing that must be done by data operator ex. zipcode lookup table
syntax errors
mistakes in writing or typing the formulas and commands when creating a spreadsheet often result in error message ex. ####, #DIV/0, #VALUE!, #REF, #NAME?, #N/A
required data entry
most DBMS allow fields to be specified as required for data entry data operator cannot skip over fields
anatomy of a function
name () the name describes the function () contain arguments comma separate argument
do text fields set up numbers in chronological order?
no
does access have user-level security?
no
cells contents includes
numbers (dollars dates a quantitative value) labels (words & symbols) formulas (mathematical, evaluate logical conditions)
circular reference
occurs when 1. a formula incorrectly references its own results 2. two or more formulas reference each others results
field or column
piece of info within a record
version control
programmer keeps track of different versions of his or her code that accumulates as repairs & modifications are made to a program
3D pie chart
provides a more skewed view or visual inaccuracy when viewing data best to illustrate relative proportions between various components poor for time series data stick to 2D designs slices must add to 100%
arguments that include constants must be enclosed w/ ______
quotation marks ex. =function("my name")
data visualization
refers to graphical or pictorial representation of data
attributes
refers to information about geographic features
spatial
refers to information related to where someone or something is physically present space
spatial query
relating or integrating different tables based on spatial characteristics & use spatial statistics together with a database query to identify records meeting a criterion
anoynmity
removing all info from a dataset that refers to any info making them identifiable names, addresses, phone numbers effective anonymity ensures all fields containing personal identifies are removed from database prior to database being distributed
geographic centroids
represent the geometric center of the region
population centroids
represent the point inside of a region containing the greatest population density
multi-user access through LAN
requires all computers requiring access to DB connected using cable or wireless connection to same router closed system no internet possible to provide access to lan via internet through VPN
multi-user access through WAN
similar to LAN sever holding software & database directly accessible through internet accessed through web browser robust system security measures must be applied to ensure database accessibility to users & protected from unauthorized
database accessibility
single user desktop multi-user access through LAN multi-user access through WAN
numeric fields
sometimes called numbers or values can only hold #s
features
specific places on map and can be roads, regions, cities, zip codes etc.
since access doesn't have key-exchange. what is another form to protect info?
symantec, pgp, and gpg tools
2 types of errors
syntax & logic
####
syntax error column to small to display results
#DIV/0
syntax error dividing by a 0
#REF
syntax error invalid cell address referenced to a cell that no long exists
#NAME?
syntax error invalid cell address or name cell address inserted wrong into formula
#VALUE!
syntax error invalid value error invalid value is reference within formula
#N/A
syntax error result cannot be returned for the formula because there is no result
centroid
the coordinates representing the center of a region
geocoding
the process of determining the geographic coordinates of specific location based on street address or existence within a known region always an estimate included in most GIS software as a batch process
layer
transparent overlays of the map each containing a different type of geography
type
type of data the field will hold ex. numeric field
what should you never do in excel formulas?
use numbers always reference the cell address
planning a database
what tables will be need and how they potentially relate to one another what fields will be requires & assigning the characteristics of each field
do databases require planning?
yes
encryption
a process by which digital information is converted into an unreadable state based off science of cryptography
data hygiene
The degree to which a computer databases contain errors such as typos, data entry mistakes, transposed with numbers, outdated data elements
data validation
The process of ensuring that a program operates on clean, correct and useful data.
relative cells
a cell reference that can change by default in excel all cells are relative cells unless specified
constant cell
a cell reference that never changes Ex. $A1 A$1 $A$1
bar charts
a charts in which the bars are oriented vertically and can either be clustered by a grouping variable or they may be stacked rotated 90 degrees long space for description bars inappropriate time-series data
flat file
a database that consists of entirely 1 table work best when database is simple & narrowly defined
fileserver
a device that controls access to separately stored files as part of a multiuser system
thematic map
a map designed as data visualization tool map is designed to describe one or more attributes of the features data visualization technique that allows spatial patterns to easily be communicated
open source
a method of licensing is based on the premise that software licensed under this method is developed by a global community of programmers and is free for anyone to use
virtual private network (VPN)
a network that is constructed using public wires to connect to a private network such as company's internal network