MIS 0855- Exam 1
testable, falsifiable, grounded in rationale
"iPhone users download more apps each month than Android users "There are NO vampires living in Louisiana" "Students who attend class more often get better grades"
Data Visualization
A visual representation of data
Variety-Big data
Many different sources of data are combined together.
Theory
Something that hasn't been proven yet but has been accepted to be true-a scientific idea that is supported by evidence
Data Types
Strings: contains text Integer: contains whole number Floating Point: contains number with decimals Boolean: only two possible options(usually true/false or 1/0) Date/Time: relates to dates and times
Knowledge
application of the data and information - students that come to class more often getting better exam scores
confirmation bias
confirmation bias is a type of cognitive bias that involves favoring information that confirms previously existing beliefs or biases. (left-handed people are more creative than right handed)
Basic elements of data visualizations
content, context(captions, legends, keys), construction(aspect ration, color, size)
Information
data that is processed to be useful (mean, median, etc.)
Infographic
data visualization with text and images that together tell a story (contains a bunch of data visualization)
Intentional Bias
fake reviews out to get someone!
Why is it good to have data dictionaries and metadata?
it improves clarity, communication, reduces error, everybody on the same page
Visualizing comparative areas (two squares)
one of the left- box for b is 1/4 the size of an even though the data says b is half the size so the size of b should be bigger to accurately represent the data. the other picture accurately represents the proportions.
How can you make graphics that stand on their own?
see the tips and tools that we discussed. Use context, patterns, picture superiority
Data Science
study of the generalizable extraction of knowledge from data - basically means that you are taking data and getting insight from it.
A hypothesis must be
testable, falsifiable, grounded in a rationale
Survivorship Bias
the logical error of concentration on the people or things that made it past some selection process and overlooking those that did not, typically because of their lack of visibility. We tend to only focus on successful individuals for reference.
direction of casuality
two variables have a relationship and have a cause for each other but you often don't know which one caused which. (Correlation does not imply causation)
Spurious Correlations
when two things that don't really make sense are associated with each other. (dads buy beer and diapers)
Picture Superiority Effect
where pictures and images are more distinguishable than words
How do you use color, size, scale, etc. to create visualization
-using color, size, scale, ratio -axes should include 0-why? to accurately display the difference in data
to tell a powerful story with data
1. My understanding of the business problem 2.How will I measure the business impact? 3.What's the available data? 4The initial solution hypothesis 5The solution 6The business impact of the solution
Avoid when telling story with data
1. Technical Terminology 2.Step-by-step description 3Complex statistics
What makes data so important today?
1. The massive availability of digital data suggests (and it has been increasingly shown) that there are huge opportunities to create valuable insights from data at every corner of the economy and society. 2. Newly developed techniques and tools for data analytics allow us to harness the potential of digital data in new ways. 3. Your competitor is going to harness data
Metadata
1. each piece of data can be described by: Variable name (a dataset column label), Variable description (in a data dictionary), Data type (in a data dictionary), Value (the datum itself). 2. Metadata is data that describe other data. They are often stored as a data dictionary attached to a dataset. 3. Without explicit or inferred metadata, a dataset becomes useless.
Biases in (Big) Data
Always remember that data and datasets" are creations of human design" Different types of biases... 1. Survivorship Bias 2. Intentional Bias 3.Confirmation Bias
Think about a song on Spotify: you have the audio file, song name, duration, plays, artist name---Which of the above is NOT metadata?
Audio file
Before data; Today data
B-data had to be collected and often produced for each analytical purpose separately. This made data very expensive. T-digital data are everywhere. Digital systems create data as the side effect of their operation. There is often a very low cost to data collection - massive datasets are just a few clicks away!
Different types of visualizations
Charts, graphs, scatterplots, heat maps, etc
Data
Data are usually understood as 'raw unorganized facts' - recorded observations about the world.
Velocity-Big data
Data can change very quickly. (However much 'Big Data' you have, the data does not speak by itself - new kinds of data sources have also new problems.)
Standard View
Data-Information-Knowledge Hierarchy
Calculated Fields in Tableau
Look at tableau and assignment (basic visualization: what happens if you drag this somewhere or do this, which visualization) Categorical Values(pie usually has percentages bar usually compares numbers) Continuous Value(line graphs shows relationship between variables or something over time, histograms)
Open data
Open data are datasets that are made available by organizations (see below) to be used freely for analysis and modification by anyone. There can be various motivations for offering open data: Government: speed up economic development, legitimacy (the data were paid by taxpayers) Companies: open innovation, PR, corporate citizenship Academics: the ideal of science, replicability of results, publicity Non-profits: publicity, cultural ideals Data are a non-rival economic good - its consumption by one party does not preclude others from using them. This makes it difficult to capture value from data by offering it to third parties. However, organizations often do not want to share data due to competition and possible liabilities; preparing open datasets can be laborious.
Volume-Big data(the so-called 3Vs of gives us an overall idea of the nature of new data resources)
The amount of data has literally exploded.
Telling stories with data
There are lots of articles on "telling stories with data" (just Google it) that pretty much tell the same story: "Find the compelling narrative. Along with giving an account of the facts and establishing the connections between them, don't be boring." "Think about your audience. What does the audience know about the topic?" "Be objective and offer balance. A visualization should be devoid of bias. Even if it is arguing to influence, it should be based upon what the data says-not what you want it to say." "Don't Censor. Don't be selective about the data you include or exclude, unless you're confident you're giving your audience the best representation of what the data "says". "Finally, Edit, Edit, Edit. Also, take care to really try to explain the data, not just decorate it."