D204 LinkedIn Quizzes

Ace your homework & exams now with Quizwiz!

Which are examples of models used in predictive analytics?

regression and neural networks Predictive analytics relies on a wide range of methods, from basic linear regression through deep learning neural networks.

What does the wisdom of crowds entail?

taking many guesses and taking their average, which tend to cancel out the errors The wisdom of crowds adds and averages the results, cancelling out the errors and ending with a composite estimate that's generally closer to the true value than any one single guess is.

What is the purpose of model validation in data science?

to determine how well the statistical model works with data other than the data used for modeling Model validation is a way of checking generalizability; that is, how well the model works with new and different data.

When you have a data set of collected data and variables, why would you run a correlation?

to find out whether there are any relationships between the observations and the variables The real purpose of having the data is to find relationships, and then determine whether the relationships are meaningful.

Why is it imperative that you conduct a kickoff meeting before you begin planning a project?

to gain agreement on a single definition among multiple stakeholders as to what they want the project to look like Multiple stakeholders will have different expectations of what they want, so you have to arrive at a single definition for the project.

What are two ethical elements that are important when gathering data?

to obtain informed consent, and enforce privacy. When you are gathering data from people, they need to know what you want from them and what you are going to do with it so they can make an informed decision about whether they want to participate or not. Respecting their privacy is important as well.

When people make decisions based on data science analyses, what kind of factors should they focus on?

factors that are controllable and practical Decisions are most productive when they focus on factors that are controllable by the client and practical, or have good return-on-investment (ROI).

What is the best use for the rules that are developed in neural networks?

for automating decision-making processes such as classification Because neural networks can produce very accurate classifications but do so in an opaque manner, they work best when they are able to implement the decision themselves, such as classifying photos

At its core, what is the basis of being an ethical person?

having a concern with human well-being At its core, being an ethical person is valuing the well-being of others.

Your boss has asked you to take on a project for an important customer. What is the best way to respond to your boss?

"Let me plan the project with a Gantt chart." Using a Gantt chart as part of the planning process is the key to making the project stronger and successful.

What percentage of time is spent doing data preparation for a data science project?

80%. The rule of thumb is 80% of the time on any data science project is typically spent preparing the data.

What is meant by a "feature" in the context of feature reduction?

A feature is a variable or dimension in the data. In data science, the term "feature" is often used synonymously with "variable" or "dimension" in the dataset.

If you can only choose one number to describe a distribution, then you should choose a measure of center. But what should you choose if you can have a second number?

A measure of variability, such as the range, quartiles, variance, or standard deviation, is usually the best choice for a second number to describe a distribution.

What is a "posterior probability" in Bayes' Theorem?

A posterior probability is the probability of the cause, such as a disease, given the effect, such as a positive medical test for the disease. Baye's Theorem combines the probability of a hypothesis (the "prior") with the likelihood of the data given the hypothesis and the base rate of the cause to get the posterior probability, or probability of the hypothesis given the data.

You are creating a list of people who are stakeholders for your project. Who would be the key stakeholder?

A project sponsor is a key stakeholder who provides resources for your project, and advocates for the project from initiation to closure.

You have decided to hold two kickoff meetings. The goal for the first meeting is to have everyone agree on what the project is going to entail. What is the goal for the second kickoff meeting?

Agree to the plan and commit to it. Yes - once the plan is agreed you can make a start on the project.

How can you avoid the subjectivity in data that might provide an untrue result, even if the "lie" was unintentional?

Ask questions before accepting conclusions. This can reveal any underlying bias and subjectivity in the data.

What is meant by "autocorrelation" in time-series data?

Autocorrelation means that each point in time is influenced by the points that came before it. In time-series data, autocorrelation refers to the influence of past observations on current observations, such as yesterday's stock prices on today's stock prices.

What kind of data can be accessed with APIs?

Both proprietary and open data can be accessed with APIs, although for proprietary data you will need to include your account information.

If you had to choose between having a high level of business acumen or being an expert statistician, which choice would provide more value to your company and why?

Business acumen will let you formulate questions because they align with business strategy. It is far more valuable to a company to have someone who understands the company's strategy, and can find answers to business problems.

Why is calculus important in data science?

Calculus finds a balance between supply and demand versus production and price. Calculus is involved any time maximization and a minimization is used and when the balance between these disparate demands is needed.

Self-generated data refers to what practice?

a method for creating training data for machine learning models where computers engage themselves to generate data.

What is the name for a chart that shows "branches" or cases splitting from one, giant cluster, to individual clusters?

Coming from the Greek word for "branch," a dendrogram shows the hierarchical structure of clusters.

If your boss asks you to do a project within a borderline-impossibly tight time frame, which approach would you take in answering?

Create a Gantt chart and say, "Yes, but only with adjustments." There is usually a way to do it, before you start is when you have most strength, and the Gantt chart is a great arguing tool!

Sherry is planning her project and has brainstormed with her team, listing out all the tasks. What is the next logical step for her to organize the tasks?

Create a WBS. Yes, organize the tasks so that she can easily check for any that have been missed.

What practice does "data scraping" refer to?

Data scraping refers to the process of extracting data from formats that were not specifically designed for data sharing. Data scraping is the creative work in getting data from formats that were not designed for data sharing, such as heat maps or image PDFs.

When you are examining a data set pertaining to recidivism that contains information on an individual's past, which ethical question must you consider?

Does the prediction of recidivism affect a person's autonomy? It is both ethical and legitimate to analyze recidivism in the context of a societal good, but not at the expense of ethical sensitivity.

Which is a key characteristic of "tidy data"?

Each column represents a variable. In tidy data, columns are variables, rows are cases, and each file consists of one sheet at one level of observation.

If a medical test accurately diagnoses 99% of people with a disease, then anybody who tests positive will always have a 99% chance of having the disease.

FALSE The probability of the diagnoses given the disease may be 99%, but the probability of the disease given the diagnosis depends on several other factors.

If a table or chart is publicly available, then it is also ethical to scrape the data for use in your data science project.

FALSE. Data that is published publicly can still be proprietary, as with a newspaper graphic. Permission would need to be obtained to scrape data from the graphic.

If data is available on the Internet, then it is must be open data.

FALSE. Open data is specifically marked as being free from restrictions. Just because data is on the web does not mean that it is open data or can be legally used without permission.

Which two techniques are the most common choices for dimensionality reduction?

Factor analysis (FA) and principal component analysis (PCA) are frequently used to reduce the number of variables, features, or dimensions in a dataset.

You own a florist shop where your revenue depends on how many flowers each customer buys. If the average sale for roses is a dozen, and one standard deviation is three, what does this tell you?

For two-thirds of your sales, customers will buy between nine and fifteen roses. Both add and subtract the standard deviation from the mean to arrive at the range. Two-thirds of your sales will fall in the range.

Which characteristic makes fraud detection particularly difficult?

Fraud is relatively rare. Because fraud is rare, it leads to imbalanced distributions, which can make modeling more difficult.

What is currently the most significant privacy law within data science?

GDPR The European Union's General Data Protection Regulation is considered one of the most significant privacy laws at the moment.

You hold a kickoff meeting in which the team members want to just start the project, and work out the details as the project moves along. What should be the purpose of this first meeting?

Get everyone to define the project. The project is defined in the first meeting; otherwise, it may cost you more and possibly take longer.

Marcus is having a hard time estimating the cost of his tasks. Which action would you recommend to help him?

He should break down the tasks until he can estimate the costs. Yes, if you break the tasks down you should reach parts that you CAN estimate.

How are implicit rules different from explicit rules in data analysis?

Implicit rules cannot be easily described to humans. Explicit rules are easier to understand and monitor. Implicit rules rely on features that humans can't even detect or they may be nonsensical to humans, whereas explicit rules are much easier to understand and track.

Why should you validate your data models?

It allows for the developer to iron out the issues that might present themselves after real data is input. The principle of validating your models in data science allows for the developer to iron out issues that might present themselves after real data is input. This can be helpful in the long run to avoid redeveloping the models in data science.

Your project involves building a small office building. Why should you never use the word "ongoing" regarding the plumbing for the office building?

It does not clearly define the timeline for the plumbing contractor to complete the work. If time is not clearly specified as dates required, the contractor will likely cram in all of the work toward the end of the project.

What does an expert system do?

It is designed to mimic the decision-making process of a human domain. An expert system is an approach to machine decision-making in which algorithms are designed that mimic the decision-making process of a human domain expert.

Your plan estimates a project will cost more and take longer than the customer specifies. Why should you not answer "maybe" when the customer asks whether you can do it on time and within budget?

It is not a definitive no, so most customers will hear maybe as a yes. This is simply a matter of human nature, which is why you should never tell your customer maybe.

What does the saying "data is for doing" mean?

It means that data is typically gathered and analyzed to help direct what a person or company does. The phrase "data is for doing" means that data can and should be used to make decisions.

What does the idea of dimension reduction represent?

It reduces the number of variables and the amount of data that you're dealing with. The idea of dimension reduction is to actually reduce the number of variables and the amount of data that you're dealing with.

TwentySomething, Inc. is implementing a data mart server environment. James, the PM, realizes the key driver for the project is quality. What should James refrain from doing with the key driver?

James can ignore time and cost, and place all focus on quality as the key driver. Yes, because although quality is number one he shouldn't completely ignore time and cost - they will have limits too!

How do expert systems mimic the decision-making of experts?

by explicitly listing decisions and outcomes in a logical chain like a flow chart An expert system spells out every step in a decision tree like a flow chart.

Which method can be used for feature selection?

Lasso regression is a form of regression that is particularly well-suited for identifying important features without being overly influenced by the quirks in the current dataset.

Why do data and information systems come before laws?

Laws need to be driven by specific products, processes, or events that already took place. Laws are created after something takes place, to control parameters.

Computers frequently work with data in matrices that are arranged in rows and columns. What is the name for the version of algebra that works best with matrices?

Linear algebra is the form of algebra that deals with matrices. It is used in the algorithms that computers typically use for analyzing data.

What is an example of a data analysis technique that follows a set of rules?

Linear regression is a common and powerful data analysis technique that combines many variables in an equation to predict a single outcome which follows a set of pre-established rules.

What is the purpose of a "package" in a programming language like Python or R?

Packages are collections of code that give additional functionality to programming languages and simplify many common tasks. Packages improve and expand the capabilities of a language, making it possible to do things like advanced graphics or neural networks.

What is "machine-learning-as-a-service" or "MLaaS"?

MLaaS is a way of making machine learning easier and more accessible by hosting the software on the same cloud servers that store the data. MLaaS is cloud-hosted machine learning, where the software-often with a drag-and-drop interface-is hosted on the same servers that store the data and house the processors.

Which is not a way to deal with combinatorial explosions in data science?

Microsoft Word Microsoft Word will not assist with combinatorial explosions in data science.

How can you analyze anomalies?

Once the anomaly has been detected you can use a regression analysis. Regression analysis is one way to analyze outliers in a data set.

What does it mean that project management is a transferable skill?

Once you have mastered the skills, you can manage any project, of any size, in any industry. Once you have mastered project management skills, you can run any project, in any industry, for the rest of your career.

Saundra is getting ready to list all the manageable tasks for her project. Which method specifically utilizes a work breakdown structure (WBS)?

Organize the tasks into categories. This method is used to organize the tasks in detail, using a WBS or task tree.

What is the principle of informed consent in research?

Potential research participants have to be given enough information about the goals, methods, and applications of the research project so they can decide whether they want to participate. Informed consent means that people need to know what they're getting into: the goals, methods, and applications of the project.

In addition to trying to predict what will literally happen in the future, predictive analytics is also used to describe what kinds of analyses?

Predictive analytics also refers to models that estimate what a human judge would do if given the same task, such as categorizing photos. Predictive analytics also refers to "alternative events" or attempts to estimate how a human judge might perform a task.

What is the purpose of interpretability in data science projects?

Results that can be interpreted by humans can be used to form general principles for decision making in new situations. The purpose of interpretability is to allow humans to understand the process by which algorithms process data so they can apply those principles to new situations.

Which statement is most accurate about "one standard deviation" in a normal distribution?

Roughly one-third of data points are beyond one standard deviation from the mean. Because roughly 68% of data points were within one standard deviation from the mean, this is just another way of saying it.

What does "sensitivity" mean in the context of classification?

Sensitivity is the true positive rate, or probability of a case being assigned to a category when it should be assigned to that category. Sensitivity, also known as the true positive rate or probability of detection, is the probability of a case being assigned to a category that it should be assigned to.

Data science companies can be fined for violating the European Union's General Data Protection Regulation (GDPR) even if they complied with their own country's privacy laws.

TRUE A company may comply with local privacy laws but still be held liable for GDPR violations; they only need interaction with residents of the EU.

A machine learning algorithm can implement complex results directly without having to understand the meaning of the data.

TRUE In applications like natural language translation and recommendation engines, algorithms are able to send the results directly to users without having to understand the data or explain the predictive process.

Data analytics does not have to be complex and confusing. If you have a command of _____, you can help your company identify patterns that can be explored further.

The simple summary statistics of mean, median, mode, and standard deviation can reveal trends for someone who is not an expert statistician.

Why is self-generated data important?

The algorithms in self-generated data can engage themselves to create the data they need for machine learning algorithms. Self-generated data is important because you need data to train your machine learning algorithms so they can determine the best way to proceed or how to categorize something.

What is potentially a major disadvantage of using in-house data?

The data that you need for your project may not already exist in your organization. In-house data can be enormously helpful when it exists, but the data you need may simply not be there.

What is most likely to cause a delay and adversely impact a company's deadline in performing a data analysis of a particular business problem?

The data you have access to is not clean. If the data is not clean it can cause an unforeseen delay.

Why is data preparation important in a data science project?

The information you're going to get from your analysis is only as good as the information that you put into it. Garbage in, garbage out. The information you're going to get from your analysis is only as good as the information that you put into it.

Along with having your project team develop a list of project tasks, why else should you have a brainstorming session?

The members of your team will feel they are involved, and will buy into the project at the start. Involving your project team in this way should invite them to share all of their ideas.

How does the "combinatorial explosion" make optimization difficult?

The number of possibilities increases so fast that it is often not possible to test all possible arrangement. Because combinations (and permutations) grow so fast, the possibilities can be overwhelming.

Why are the triple constraints of quality, cost, and time often referred to as the "Iron Triangle"?

The triple constraints are not negotiable. Because none of the triple constraints are negotiable, they are considered to be "iron."

What is the wisdom of crowds also known as?

The wisdom of crowds is also known as the Central Limit Theorem.

What is the position of US courts about extraterritorial reach of American law regarding privacy in data usage?

There is a presumption around extraterritorial application of US law. US law considers the physical boundaries of the United States, and will not allow the government to reach beyond these boundaries.

According to the video, why are spreadsheets so important to data science?

They are the "universal data container." Spreadsheets are installed on billions of computers and they serve as the starting or finishing point for innumerable data projects.

How are programming languages important in data science?

They give you immense control over your work and data science. Programming languages give you immense control over your work and data science, while allowing you to expand their functionality.

Which type of stakeholder mapping technique would best be used for managing change and communication?

This technique groups stakeholders by their direction of influence, and can be particularly helpful for managing change and communications.

According to the example calculation in the video, what information do you have to have in order to use calculus?

a function that describes the relationship between price and sales In order to use calculus to find the best price for maximizing revenue, you must first have a formula that says how sales are related to price.

While it is possible to gather vast amounts of data through passive collection, researchers still need to be concerned about representativeness. Why does this matter?

Without representative data from a wide range of respondents in diverse situations, the results will not generalize well. Representative data is generalizable data. If you intend to use your algorithm widely, then you need to make sure you have data from a wide range of people to make sure you cover the use cases.

Your customer has not specifically stated that money is their key driver, although you suspect that it is. Which question can you ask your customer to determine whether, in fact, money is their key driver?

Would you like us spend a little bit more in order to add some great extra features? If you offer additional features but the customer declines, then this tells you the customer's key driver is money.

If you are a project manager and you crash a project, what are you doing?

You are taking steps to condense time if the projected time for the project is longer than acceptable. Crashing a project, or condensing the time required for completion, is a process of identifying ways that you can speed up the project.

Why is working on projects considered by some people as the best part of any job?

You can do something new every time. Most people enjoy doing something new, rather than doing the same tasks repeatedly.

What is an advantage of passive data collection?

You can obtain a lot of data. An advantage of passive data collection is that you can get enormous amounts of data very quickly by simply setting up the procedure and then letting it roll, either automatically, or outsourcing it to the people who use it.

What is potentially a major advantage of using in-house data?

You may be able to talk with the people who created the datasets. In addition to typically being the fastest way to start, the creators of in-house data may be available for consultation.

To properly use data analytics tools, which two things must you always do?

You must understand the data, and you must focus on the questions you want to have answered. Remember, the only reason you are using data analytics is to have data that answer a business question.

What is a major advantage of understanding the algebra behind data science procedures?

You will better understand how to diagnose problem and respond when things don't work as expected. Data doesn't always match the assumptions and requirements of algorithms, so things can go wrong. Understanding the algebra behind the algorithms can help you respond to problems intelligently.

What is human-in-the-loop decision making?

algorithms that can make and implement their own decisions, as long as humans are ready to take over A good example of this is Level 2 self-driving cars, which can make a lot of driving decisions but still need human involvement.

Too much granularity leads to _____.

an unwieldy plan Yes, too much detail becomes too hard to see clearly.

What is a unique attribute to trend analysis?

autocorrelation The idea is that every value is influenced more or less by the previous values, which is a unique attribute to trend analysis.

Which is not a characteristic of machine learning?

being very costly which make it less available to people Machine learning programs are accessible to a majority of people and vary in cost.

What are the two values involved in being an ethical person?

caring for one's own well-being and the well-being of others These two values are at the heart of being an ethical person, because one needs to have the character of a good person.

Which type of analysis does not represent a type of cluster analysis?

check point analysis In check point analysis you are looking for a substantial qualitative change over time, which is a form of trend analysis not cluster analysis.

What is one of the most important tasks that data science algorithms perform?

classifying data Classifying is one of the most important tasks that data science algorithms perform, and they do it on all kinds of data.

Andre is part of a data science team. He is good at creating visualizations and reports. Which role on the team is best for him?

data analyst This role is responsible for gathering and scrubbing data, and then displaying the data in visuals and reports.

What are the defining characteristics of open data?

data that is free to use with no cost and no restrictions

Perhaps the best data oath to fully apply to ethical obligations is one that includes a supererogatory claim. What does this mean?

doing more with data work than a corporate bottom line This supererogatory claim promotes people to do good with their data work, beyond corporate bottom lines.

In what area of responsibility does the project manager turn insight into something actionable?

enforcing learning This responsibility of the project manager includes taking what was learned from the team's insights and making them actionable.

Why is the critical path considered critical?

Any delay in the critical path delays the project finish date. Yes, it used to be called the Time Critical Path - so the tasks are critical to the finish time.

Why is Application so important in the Issue, Rule, Application, Conclusion (IRAC) analysis?

Applying the law to a particular situation will lead to the final outcome of the situation. Only when you do this will you be able to reach an ultimate conclusion about whether or not a law has been violated.

According to the video, when processes are conducted by computers and shared directly with other machines, as with the Internet-of-Things, then the decisions can be described in what way?

machine-centric Decisions by machines and for machines, as with a smart thermostat or a city's smart grid, can be called "machine-centric."

For a project to have success, who must engage in the project work and outcomes?

the project stakeholders Engaging stakeholders in a productive and purposeful way will ensure your project work is successful, and the project results bring value.


Related study sets

Chi Square ( χ2 test of independence)

View Set

CompTIA Network+ Chapter Three and Five

View Set

Основи наукового пізнання сесія

View Set

Penny Chapter 7: URINARY TRACT review questions

View Set

Prep U Sexuality and Reproduction

View Set

Damage, paralysis, ect to a Cranial nerve

View Set

DIFFERENT TYPES OF ENGAGEMENTS, DIFFERENT FORMS, INTERNAL CONTROLS, AUDIT PROCESS

View Set