data science - exam 1

Ace your homework & exams now with Quizwiz!

what are the three parameters of the IIF function?

- condition - output if condition is true - output if condition is false

what are the dangers of big data analytics?

- dirty data is everywhere - direction of causation can be tricky - easy to find what isn't really there

how do you use color, scale, size, etc. to make data visualizations?

- label data points with colors to distinguish different factors - create graphs at a size where the viewer can clearly see and understand the visualization - create a scale that is proportionate in size to the data in the visualization

what are the steps for communicating an analysis and what do they mean? how do they relate to data visualization?

- my understanding of the business problem - how will i measure the business impact? - what is the available data? - initial solution hypothesis - solution - business impact of the solution each of the steps is necessary to tell and complete the whole story behind the visualization

why use a scatter plot?

- shows each data point (state) and outliers - red line makes trend stand out

bar charts should include what?

0 axis

what are the four steps of turning data into information?

1) collection 2) organization 3) explanation 4) generalization

reading: according to Hayes, what percentage of business leaders do NOT trust information they use to make decisions?

33% - 1 in 3 people

reading: FiveThirtyEight's search for America's Best Burrito began with ________ data.

Yelp

reading: according to stein, basic data mining does NOT involve what?

always resulting in 100% accuracy

reading: according to Silver's article, "What the Fox Knows", what does the "explanation" step involve?

answering the questions "why?" and "how?"

what is knowledge?

application of data and information

which of the following is considered information rather than data? a) list of neighborhood property prices b) average return on investment of an advertising campaign c) food and drink menu at a restaurant

b) average return on investment of an advertising campaign

reading: which of the following was NOT one of the explanations Whong gave to the surprising finding of large gaps in taxi travels? a) drivers refueling tanks b) drivers switching from cabs to Uber c) drivers stopping for lunch breaks

b) drivers switching from cabs to Uber

what types of charts lend themselves to categorical values?

bar or pie charts

what is an inherent part of data and information?

bias

what are the similarities between data visualizations and infographics?

both tell a story using data

reading: according to Acohido, Microsoft uses all of the following except: a) malicious files b) early warning reports c) FBI watchlists d) threat reports

c) FBI watchlists

when presenting results to an average audience, which of the following should NOT be part of your presentation? a) hypothesis developed b) story surrounding the problem c) details on the statistical methods used d) visual aids like graphics, images, and videos

c) details on the statistical methods used

reading: characteristics of open data include ______________.

can come from any source, available for free, can be redistributed to others

what are the benefits of open data?

collaboration, everyone can benefit

how does big data facilitate the goal of data science?

combating the sources of error related to model misclassification and sampling bias, leads to better prediction

where does data come from?

companies and organizations, services that compile data (Amazon, Yelp), government agencies

what is Moore's Law?

computing power increases every 18 months, becomes more affordable and accessible for users

how does the "Filter Bubble" affect our trust of the data we consume on the web?

confirmation bias may cause a person to have an unopen mind about other perspectives

what are the basic elements of data visualization?

content, context, construction

what is the direction of causality?

correlation without knowing the actual causation, correlation does NOT equal causation

what is the IIF function on Tableau?

creating a new variable/column using an existing column (IF and ELSE statements)

what is spatiotemporal data?

data set that has events happening with a location and time tied to them

what is the signaling problem in data?

data sets are created by humans - sampling is misleading, does not represent entire population

what is metadata?

data that describes data or data about data

what is information?

data that is processed to be useful, presented in a meaningful context

what is "dirty data"?

data that is wrong or has mistakes

reading: what is the explanation of Few's "attend" principle?

data visualization tools should let us see the data that is really important

reading: According to Davenport, what is the essence of analytical communication?

describing the problem and story behind it, the model, the data employed, and the relationships among variables in the analysis

how is data accessed?

direct access to databases and spreadsheets, reporting tools, user interfaces, APIs - application programming interfaces

reading: according to Unwin, one issue with map-based graphical visualizations is that __________.

distance is not directly related to similarity

why do we care about metadata?

documents what data means, enables navigation, facilitates understanding, eliminates guesswork

which of the following would NOT be considered an example of an album's metadata on spotify? a) album name b) number of album songs c) average song rating d) song duration e) audio content of songs

e) audio content of songs

reading: true or false - According to the Ashley Madison article, a perfect privacy on the Internet is possible.

false

what is a law?

generalized theory, not yet falsified

explain how to create a calculated field in Tableau.

generates a new variable, follow the syntax of the specific function, use "Create a Calculated Field" button using both variables

reading: stein developed a model that could determine the gender of a caller using __________.

his phone records from google voice

what is a theory?

hypothesis that is supported by evidence, not yet falsified

reading: according to Unwin, a scale is "really nice" if it ________.

includes 0

reading: according to Krum, the relationship between infographics and data visualizations is best described as __________________.

infographics can include data visualizations within them

what are the differences between data visualizations and infographics?

infographics contain data visualizations, infographics contain more information that can be used for an audience that may not know much about the topic, infographics use text and images

what are some examples of data types?

integer (whole numbers), floating point (fractional values), boolean (binary values)

what types of charts lend themselves to continuous values?

line charts or histograms

what are different types of data visualizations?

line graphs, bar charts, map graphs, scatter plots, pie charts

reading: according to Crawford, a key problem of Boston's "StreetBump" app is __________.

low income residents do not have much access to smartphones

reading: according to Krum, good infographics should _______________________.

make sure the relative size of chart elements are proportional to data values

reading: what does Crawford propose was the reason for Google's overestimation of flu outbreaks?

media coverage of the flu season

reading: according to Hayes, a benefit of large samples is ___________.

minimizing sampling error

what is the "string" data type?

numeric and non-numeric characters, example - "Bob", "I want 2 burgers"

reading: in Matlin's article, Whong states that his NYC Taxi Cab visualization is part of a larger movement for ______________________.

open data and transparency

what are the detriments of open data?

organizations don't want to share, cleaning is intensive

reading: according to Krum, what is the Picture Superiority Effect?

people remember messages with images more often than ones with only text

reading: in the article by Weisberg, Eli Pariser argues that the "Filter Bubble" is caused by?

personalization of web content

what is data?

raw, unorganized facts

reading: according to Davenport, what made Florence Nightgale's work unique?

she effectively communicated her results using a pie chart

tweets mainly coming from the Manhattan area (but not so much from New Jersey due to a power outage) created a data issue called __________.

signaling problem

what are the possible issues in data?

signaling problem, dirty data measurement, filter bubble

what are the basic principles of data visualization?

simplify, compare, attend, explore, view diversely

reading: what are Few's 8 Core Principles?

simplify, compare, attend, explore, view diversely, ask why, be skeptical, respond

reading: what type of data did Whong use in developing the visualization of an NYC cab's daily life?

spatiotemporal data

reading: what is the Value Over Replacement Burrito? what were the adjustments to the burrito data?

statistic function of quantity and quality, varied data by geographic location

what is data science?

study of the generalized extraction of knowledge from data

what is the difference between tableau and excel?

tableau allows for nicer data visualizations and can include excel

what should analysts avoid when communicating results in an analysis?

technical and complex terminology, step-by-step methodology

what is a hypothesis?

testable, falsifiable, grounded in rationale - testable prediction based on data

what is a source of bias in data?

the "Filter Bubble" - personalized web content

reading: the ashley madison hack is different from previous hacks in that ___________.

the ashley madison hack resulted in more personal damages to users

each piece of data can be described by what?

title, description, data type, value (not part of metadata)

what is the goal of data science?

to drive actionable knowledge and not just extract meaning, avoid wasting time and resources

reading: according to Unwin, the reasons for using graphic displays are ______________.

to present and explore data

what is an example of a boolean data type?

true/false

pie charts add up to what?

unity

how can you make graphics stand on their own?

using color, size, scale, etc. to separate certain data from the visualization

what is an infographic?

visual representation of an analysis, uses visuals to communicate information and help explain a complex process

what is big data?

volume - amount of data, variety - different sources of data, velocity - how quickly data changes

explain an example of the IIF function in Tableau.

whether milk or soda has a higher price: - assess milk-to-soda ratio of each country - logic: if ratio is greater than 1, milk is more expensive, if ratio is less than 1, soda is more expensive - IIF function - IIF ([milk-to-soda ratio] > 1, "milk", "soda")

reading: Using "telephony data", the NSA cannot reveal _______________.

whether the talk involves illegal activity conversation


Related study sets

Joint Play, Mobilization & Manipulation

View Set

Lesson 2 - Identify Genre Characteristics of Plays and "Fiddler on the Roof"

View Set

Chapter 12 Inventory Management Section 4 Inventory Models for Independent Demand

View Set

MANA3335 MindTap Learn It: Chapter 15: Managing Operations, Quality, and Productivity

View Set

French - Unit 4 Test (Partie Orale)

View Set

Hawaii Supreme Court Trivia Set 7

View Set

Chapter 11 Principles of Radiographic Imaging

View Set