IBM Data Science

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Which of the following is an example of open source visualization and plotting tool or tools?

a. Matplotlib b. Pixiedust c. OpenCV All d. All of the above

Consider the following diagram: [1 blue and 3 red in net] Given that red fish is relevant data (signal) and blue fish is irrelevant data (noise), what is the precision of this system?

. 0.75

Logistical regressions looks like the S curve. Which of the following (activation functions) describe the S curve in a logistical regression distribution?

. Sigmoid operation

Which of the following is one of the most fundamental characteristicsof a data scientist?

.b Having a sense of curiosity about all things

The Profile view, under the Refinery tab of Watson Studio is designed to present you with which of the following pieces of information?

Frequency and statistics

What makes a deep learning network "deep"?

It is a multi-perceptron with many 'hidden' layers

When using Jupyter Notebooks, inevitably, you will need to import libraries such as NumPy and SciPy. Which of the following integration layers best describes this kind of an activity?

Scientific computing and statistics packages

If you are looking for tool that is easy to learn and very flexible with what you want to render, which of the following is the best fit for your needs?

Tableau

A particular machine learning model has detected 80 true positive signals plus 20 false positive signals (included them as relevant data, but they are not). What is the precision of the system?

a. 80%

When training models, you would typically place your data into three buckets: train, test and hold out. What is the purpose of having hold out data?

a. A holdout sample is a part of the data you leave out of the model building so it can be used to evaluate the model afterward b. A holdout sample helps you compare models and ensures that you can generalize results to data that the model has not yet seen. c. Working with a holdout sample helps you pick the best-performing model d. All of the above are true.

Consider the following diagram: [1 red in net] Given that red fish is relevant data (signal) and blue fish is irrelevant data (noise), what is the precision of this system?

b. 100%

Which of the following are examples of unstructured data? Select all that applies.

b. Facebook images d. Twitter feeds

A spam collection engine has quarantined messages that were not spam, were not unsolicited and that they were important for the user. How would you characterize those important yet automatically removed messages?

b. False positive

Supervised learning has many advantages, which of the following may be shortcomings of supervised learning?

b. Labeling the data is arduous and expensive.

Let's say you want to predict how much salary one would earn based on level of education. Your Y axis is salary and your x axis are educational buckets (high school, Bachelors, Master's and so forth). Which of the following models is best suited to help you predict, given a certain salary what might the education level of the individual be?

b. Linear regression

Which of the following best describes a Decision Tree Classifier?

b. Maps observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves).

Linear regression tries to fit a line while ___________ the distance to each point. Fill in the blank.

b. Minimizing

The Watson Jeopardy! game used _____________ machine learning. Fill in the blank.

b. Supervised

One of the fundamentals of visualization of data lies in the human psychology of how it is perceived such as: similarity, proximity and enclosure. Which of the following best describes the notion of proximity:

b. The human eye perceives elements to be related based on how close they are to one another.

The eight data science methodology approaches can be viewed as two larger groupings, the second grouping comprises: train, validate, deploy models and the feedback environment. How is this second grouping different in overall approach from the first grouping (business understanding, exploration, transformation and visualization of data)?

c. The second grouping addresses predictive and prescriptive analytics, whereas the first grouping addresses descriptive analytics.

Which of the following algorithms is used for supervised learning?

d. Support Vector Machines

A network graph displays nodes that are connected and positioned depending on their mutual relationship. What type of data is best suited for network graphs?

d. multi-dimension data

In October 2015, AlphaGo, an AI-powered system, beat Mr. Fan Hui, the reigning 3-times European Champion of the complex board game Go, by 5 points to 0. Which machine learning method did it use?

Reinforcement

The Communities tab of Watson Studio provides which of the following artifacts?

All

Data visualization comes in two broad categories. Which of the below depict this distinction:

Exploratory versus explanatory visualization

When working with Data Refinery in Watson Studio, you are presented with three tabs: Data, Profile and Visualization. What is the purpose of the Profile view?

In the Profile view, the user can validate the data to see if any features may need further Data Refinery.

In Watson Studio, when you upload your csv file, you are presented with two data frame constructs that you can apply to your raw data. Which of the following depicts those data frames?

Pandas and SparkSession

There are many ideas as to why some data scientists prefer Python over RStudio. Which of the following seems to be the prevailing argument that favors Python over R?

Python is a more generalized language versus R which is more statistics focused.

As a data Journalist, which of the following tasks are most germane to your role?

a. Communication skills

What is meant by 'pure subset' when working with decision trees? Select all that apply.

a. All attributes of a leaf had yes for answer. b. All attributes of a leaf had no for answer. d. The leaf cannot be divided any further.

Hadley Wickham is known for saying "Tidy datasets are all alike, but every messy dataset is messy in its own way." Which of the following statements supports this assertion? Select all that apply.

a. Avoid redundancy, logical errors, or issues with updates. b. Complement programming languages' ability to perform vectorized operations. c. Ensure Boolean values are encoded appropriately.

Consider the following scenario: you are interested to discover why certain employees leave and others stay. You have access to a CSV file that contains columns (features) regarding metrics such as distance from home, age and other categorical info such as male, female, level of education marital status and so forth. If you were to choose a model to study the problem of employee attrition which of the following would be the best fit?

a. Binary classification

How is isotonic regression different from a linear regression?

a. By fitting a free-form line to the observations; and the fitted free-form line must be non-decreasing everywhere.

Data scientist and data engineers often access RDBMS databases to retrieve data. Which of the following specific tasks is an example of such tasks?

a. Data scientists access the data via SQL or language-specific libraries. b. Data engineers perform a task called ETL (Extract, Transform, Load) where they take data from one source and move it to another. c. Use of NoSQL, since it is best for high latency and JSON based storage d. All of the above.

Which of the following is a true statement?

a. Data scientists transform data into knowledge to solve business problems. b. Data journalists capture domain knowledge for successful business alignment. c. Data engineer architect how data is organized and ensure operability. d. All of the above

Should you choose a multiclass classification tree in Watson Studio, which of the following estimators (algorithms) are available to you?

a. Decision tree classifier b. Random forest classifier c. Naive Bayes d. All of the above

Sometimes we do not have access to the entire data set (population) and we have to infer our conclusions using sample data. Which of the following approaches addresses working with sample data to conclude about the population?

a. Inferential statistic

Descriptive tables share which of the following characteristics?

a. Measures of Central Tendency b. Measures of Dispersion c. Measures of Distribution d. All of the above

When transforming messy data to tidy data, which of the following is a good practice?

a. Multiple variables are stored in one column. b. Variables are stored in both rows and columns. c. Multiple types of observational units are stored in the same table. d. All of the above

Business understanding is the first part of your analytics journey. Which of the following come to mind when you are planning your business approach?

a. Perform demand planning and supply chain optimization for your offerings across different segments b. Reduce costs ^^^^^^^^^^^^^^^

If you had to describe a Naïve Bayes theorem, which of the following would apply? Select all that apply.

a. Prior probabilities are based on previous experience. b. The Classifies features assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. c. It is particularly suited when the dimensionality of the inputs is high

Select all that apply to the characteristics of data:

a. Volume b. Variety

You can flag missing observations using machine learning (ML) model. Not all models address missing data equally. Which of the following statements is true regarding using ML models to flag missing data?

a. Regression models handle summary statistics better. b. Tree based models handle outliers better.

If you had to choose one overarching difference between these methodologies in Question 19, which of the following would best depict that difference in approach?

a. Unlike KDD and SEMMA, CRIPS-DM considers business understanding.

The biggest risk of overfitting data is that the model will work well on training data but perform poorly on new data. What should be done to mitigate that problem? Select all that apply.

a. Use hold out data to evaluate the performance of the model on new data. b. Do not use hold out data to select model.

Decision trees, support vector machines, and naive Bayes are different technique to solve a _____________ problem. Fill in the blank.

c. Classification

If you are building a deep learning ecosystem, which of the following two concerns should be your starting points?

c. Ensure that I have access to a robust platform as a service plus access to deep learning frameworks.

The data science methodology includes the following stages: (fill in the missing stage) business understanding, data exploration and preparation, data representation and transformation, ________________, validate data models, ______________, and environment feedback.

c. Train data models, deploy data models

The Brunel project defines a highly succinct and novel language that defines interactive data visualizations. Which of the following statements is true?

d. All of the above.


Kaugnay na mga set ng pag-aaral

Ch 45 Nursing Care of a Family when a child has a Gastrointestinal Disorder

View Set

Cor Pulmonale and Pulmonary hypertension

View Set

NURS 211 - NCLEX Style Questions (Pain)

View Set