Data Science - Quiz 1
What is metadata?
"data about data"
According to Hoven, what are Few's 8 core principles of data visualizations?
- simplify - compare - attend - explore - view diversity - ask why - be skeptical - respond
According to Davenport, the "essence" of analytical communication includes
- the data - the model - the relationships among the variables
What explanations did Whong give to the surprising finding of large gaps in the travels of a taxi?
- Drivers going back to the base and switching shifts - Drivers refueling tanks - Drivers getting hungry and stopping for lunch break
What is the difference between Tableau and Excel?
- Excel is a spreadsheet tool, while Tableau is a data visualization one - Excel worksheet involves manually programming processes, while Tableau is a little more intuitive with creating processes and calculations
What are some examples of data types?
- Integer - Floating point - Boolean - String - Date/time
Some characteristics of open data are...
- It must be available for free - It can come from any source - It can be redistributed to others
What are the rules for storytelling?
- Keep in mind what is interesting to the audience - Come up with the ending before figuring out the middle - Put it on paper - Find the essence of the story and the simplest way to tell it
According to Acohido, Microsoft uses all of these to combat cybercrime
- Malicious files - Early warning reports - Threat reports
What are the criteria of a good hypothesis?
- Needs to be testable - Needs to be rational - grounded in reality - Needs to be falsifiable
What are the different types of data visualizations?
- Scatterplot - Pie chart - Bubble chart - Line chart - Histogram - Bar chart - Time series
How do you create a calculated field in Tableau?
1. In a worksheet in Tableau, select Analysis > Create Calculated Field. 2. In the Calculation Editor that opens, give the calculated field a name. 3. In the Calculation Editor, enter a formula. EXAMPLE - SUM([Profit])/SUM([Sales]) 4. When finished, click OK. 5. The new calculated field is added to the Data pane. If the new field computes quantitative data, it is added to Measures. If it computes qualitative data, it is added to Dimensions.
According to Hayes, what percentage of business leaders do not trust the information they use to make decisions?
33%
How accurately was Stein able to eventually predict the gender of a caller?
80% of the time
Given the principles mentioned in Unwin's paper, to show the change in population growth over years, which type of graph should be adopted in terms of vertical and horizontal axes?
A graph with long horizontal axis and short vertical axis
According to Stein, basic data-mining does NOT involve what?
Always resulting in 100% accuracy
What is the difference between infographics and data visualization?
An infographic uses data visualizations, text, images to tell a complete story whereas data visualization is just a visual representation of data.
According to Silver's article "What the Fox Knows," the "explanation" step involves...
Answering the questions "why" and "how"
What is a Boolean?
Binary (2) values
What is a date/time?
Calendar date and time
According to Krum, the fact that people remember messages with images more often than ones with just text is called:
The Picture Superiority Effect
What is the difference between metadata (data dictionary) and data?
Data is just a value whereas metadata includes a title, description, and data type.
What is information?
Data that is processed to be useful
Explain the "attend" principle mentioned by Stephen Few?
Data visualization tools should let us see the data that's really important.
According to Unwin, one issue with map-based graphical visualizations is that
Distance is not directly related to similarity
What is a floating point?
Factional values - seen as a decimal
True or False - According to Acohido, new data-visualization technologies do not yet have the capability to detect suspicious intrusion or theft behavior.
False
True or False... According to the Ashley Madison article, a perfect privacy on the Internet is not impossible.
False
Stein developed a model that could determine the gender of a caller using...
His phone records from Google Voice
According to Unwin, a scale is 'really nice' if it:
Includes 0
What are the similarities between infographics and data visualization?
Infographics and data visualizations both use visual representations to display data.
According to Krum, the relationship between infographics and data visualizations is best described as:
Infographics can include data visualizations within them
According to Hayes, a benefit of large samples is that...
It minimizes sampling error
In the article by Weisberg, Eli Pariser argues that the Filter Bubble is caused by...
Personalization of web content
According to Crawford, a key problem of Boston's StreetBump app is that...
Low-income residents have less access to smartphone
According to Krum, good infographics should:
Make sure the relative size of chart elements are proportional to the data values
What does Crawford propose was the reason for Google's overestimation of flu outbreaks?
Media coverage of the flu season
What is a string?
Numeric and non-numeric characters
In Matlin's article, Whong states that his NYC Taxi Cab visualization is part of a larger movement for:
Open data and transparency
According to Davenport, what is an example of something that should NOT be included while storytelling with data:
Sequence of activities used in the analysis
What type of data did Whong use in developing the visualization of an NYC taxi cab driver's daily life?
Spatiotemporal data
The Ashley Madison hack is different from previous hacks in that...
The Ashley Madison hack resulted in more personal damage to users
According to Di Justo's article, "telephony metadata" includes...
The call's duration
According to Davenport, what made Florence Nightingale's work unique?
The way how she effectively communicated her results using a pie chart
The three "V"s of big data?
Volume, Variety, and Velocity
What can the NSA NOT reveal using "telephony metadata"?
Whether the talk involves illegal activity conversation
What is an integer?
Whole numbers - seen as a whole number
FiveThirtyEight's search for America's best burrito began with data from...
Yelp
What is a theory?
a supposition or a system of ideas intended to explain something, especially one based on general principles independent of the thing to be explained.
What is the definition of a good hypothesis?
a testable prediction from an idea with an underlying rationale that makes sense
What is a hypothesis?
an educated guess
What is knowledge?
application of data and information
What is data vs. information?
data is unorganized whereas information is processed and organized
What is Big Data?
extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.
What is Velocity?
how quickly that data changes
What is data?
raw, unorganized facts
What is Volume?
the amount of data
What is Variety?
the different sources of data
What is open data?
the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control.
What is data science?
the study of the generalizable extraction of knowledge from data.
According to Unwin, the reason for using a graphic displays is:
to present or explore data