Tools For Data Science (Course 2)

¡Supera tus tareas y exámenes ahora con Quizwiz!

True or false? R studio supports development in Python.

True

Which statement is true about the R Studio IDE? a. RStudio is free and open source. b. RStudio is a commercial product of IBM.

a. RStudio is free and open source.

What does the "BI" in BI Tools stand for?

Business intelligence

What is a Jupyter Notebook kernel?

It is a wrapper running on the Jupyter server encapsulating the programming language interpreter.

Fill in the blank: ________________ is the heart of every organization.

Data

Fill in the blank: Auto Classification node can be used for data with ______________. a. No target variables b. A categorical target variable c. A continuous target variable d. Any target variable e. All of the above

b. A categorical target variable

Fill in the blank: When sharing a read only version of a notebook, you can choose to share __________________. a. A permalink b. All content, excluding sensitive code cells c. All content including code d. Only text and output e. All of the above

e. All of the above

Which command is used to install packages in R?

install.packages("package name")

Which statements are true about Open Source and Free Software? (Select all that apply.) a. Free Software and Open Source can be used interchangeably. b. Free Software can always be run, studied, modified and redistributed with or without changes. c. Most of Free Software licenses also qualify for Open Source. d. Open Source Software can be modified without sharing the modified source code depending on the Open Source license.

IDK??? A, B, and D together are not correct!

Which tool unifies documentation, source code and data visualizations into a single document?

Jupyter Notebooks / JupyterLab

Which of the following are SQL databases? (Select all that apply.) a. MongoDB b. MariaDB c. MySQL d. PostgreSQL e. CouchDB f. Oracle

MariaDB, MySQL, PostgreSQL, and Oracle

Is it possible to use machine learning within a web browser with Javascript?

Yes

Fill in the blank: Auto Numeric node can be used for data with __________________. a. No target variables b. A categorical target variable c. A continuous target variable d. Any target variable e. All of the above

c. A continuous target variable

Classification models can be used to determine whether: a. An image contains a dog b. A video contains a specific sound c. An email is likely spam d. All of the above

d. All of the Above

Fill in the blank: PMML, PFA, and ONNX are __________________. a. Robots that are planning to take over the planet. b. Codes for getting rid of undesired data or models c. Abbreviations for machine learning algorithm names d. Opens standards for predictive model serialization, exchange, and deployment e. Passwords for some super-secret system

d. Opens standards for predictive model serialization, exchange, and deployment

Predictive Model Markup Language (PMML) was created by which entity? a. IBM b. Oracle c. Microsoft d. The Data Mining Group e. SPSS

d. The Data Mining Group

Which feature in Watson Studio helps to keep track of and discover relevant Machine Learning assets? a. OpenScale b. AutoAI c. Modeler Flows d. Watson Knowledge Catalog e. All of the above

d. Watson Knowledge Catalog

Fill in the blank: On the environments tab you can define the _________________. a. Runtime configuration for flow editor b. Software configuration c. Runtime configuration for notebook editor d. Hardware size e. All of the above

e. All of the above

Which of the following are common tasks in data science? a. Data Management b. Model Monitoring and Assessment c. Data Integration and Transformation d. Model Building e. Data Visualization f. Model Deployment g. All of the above

g. All of the above

Which of the following languages can be used for data science? a. Julia b. Java c. Javascript d. SQL e. R f. Scala g. All of the above

g. All of the above

What does the acronym API stand for?

Application Programming Interface

Which statement about JupyterLab is correct? a. JuypterLab can run Python code only. b. JuypterLab can run R code only. c. JuypterLab can run R and Python code only. d. JuypterLab can run R and Python code in addition to other programming languages.

D. JuypterLab can run R and Python code in addition to other programming languages.

True or False: The Jupyter Notebook kernel must be installed on a local server.

False

What is the best process contributing a bugfix to a foreign repository?

Fork the repository, update the fork and create a pull request.

Which of the following functions does RStudio unify? (Select all that apply.) a. Storing of data. b. Editing and execution of source code. c. Display of the R Console. d. Visualization of plots. e. Visualization of data in table form.

b. Editing and execution of source code. AND c. Display of the R Console. AND d. Visualization of plots. AND e. Visualization of data in table form.

Fill in the blank: If you'd like to schedule a notebook in Watson Studio to run at a different time you can create a(n) _____________.

job

What tool do most R developers use?

RStudio

Which are the two most used open source tools for data science?

RStudio and Jupyter Notebooks/Jupyter Lab

Is the following statement true or false: "R integrates well with other computer languages like C++, Java, C, .Net and Python."

True

True or false? Jupyter Notebooks / JupyterLab support development in R.

True

Which statement is true about Jupyter Notebooks? a. Jupyter Notebooks are free and open source. b. Jupyter Notebooks are a commercial product of IBM.

a. Jupyter Notebooks are free and open source.

Which of the following are data management tools? (Select all that apply.) a. GitHub b. MySQL c. PostgreSQL d. KubeFlow e. PixieDust

b. MySQL and c. PostgreSQL

Which of the following is not a type of Machine Learning? a. Supervised learning b. Supervised teaching c. Reinforcement learning d. Unsupervised learning

b. Supervised teaching

Which node must be used in Modeler flows before any modeling node? a. Output node b. Type node c. Derive node d. Auto Numeric node e. All of the above

b. Type node

Fill in the blank: It's a best practice to remove or replace _____________ before publishing to GitHub.

credentials

Which products (of those we covered) allow you to build data pipelines using graphical user interface and no coding? a. Only IBM SPSS Statistics b. Only IBM SPSS Modeler c. OpenScale d. IBM SPSS Modeler and Modeler Flows in Watson Studio e. All of the above

d. IBM SPSS Modeler and Modeler Flows in Watson Studio

IBM SPSS Modeler includes what kind of models? a. Classification models (for data with a categorical target) b. Regression models (for data with a continuous target) c. Clustering models (for data with no target variables) d. Other kinds of models e. All the above

e. All of the above

Fill in the blank: In Watson Studio, a ____________ is how you organize your resources to achieve a particular goal. Resources can include data, collaborators, and analytic assets like notebooks and models.

project

True or False: Open data is always distributed under a Community Data License Agreement.

False

Fill in the blank: Before running a notebook, it's a best practice to ____________ to describe what the notebook does.

Insert a cell at the top of the notebook

What tool do most Python developers use?

Jupyter Notebooks or JupyterLab

OpenScale provides which of the following services? a. Creating SPSS syntax b. Automatic finding of optimal data preparation steps model selection and hyper parameter optimization c. Cataloging data and model assets d. Monitoring for fairness, bias, and model drift e. All of the above

*** not E not B

AutoAI provides which of the following services? a. Monitoring for fairness bias and model drift b. Automatic finding of optimal data preparation steps model selection and hyperparameter optimization c. Cataloging data and model assets d. Creating SPSS syntax e. All of the above

b. Automatic finding of optimal data preparation steps model selection and hyperparameter optimization

Which of these is not a machine learning or deep learning library for Python? a. Keras b. NumPy c. PyTorch d. Scikit-learn

b. NumPy

Which of these is a database query language? a. Julia b. SQL c. Python d. All of the Above

b. SQL

What feature of IBM SPSS Statistics allows easy saving and modifying of previous tasks? a. Charts b. Graphical user interface c. SPSS syntax d. SPSS Modeler streams e. All the above

**NOT B nor D nor E

Which are the three most used languages for data science?

R, Python, SQL

Fill in the blank: The MAX model-serving microservices expose a _________________ that applications use to consume a model.

REST API

Fill in the blank: When working in a Jupyter Notebook, before returning to a project, it's important to ________________________.

Save your notebook

Which of the following functions does RStudio provide? a. Editing and execution of R code. b. Creating relationships between data tables. c. Documenting R code applications. d. Storing data in tables.

a. Editing and execution of R code.

Which of the following statements about Jupyter Notebooks is correct? a. Jupyter Notebooks support the Visualization of data in charts. b. Jupyter Notebooks are a commercial product of IBM. c. Jupyter Notebooks are only available if installed locally on your computer. d. Jupyter Notebooks provide storage of massive quantities of data in data lakes.

a. Jupyter Notebooks support the Visualization of data in charts.

Which statement about RStudio is correct? a. RStudio is the primary choice for development in the R programming language. b. RStudio is the primary choice for web development. c. RStudio is the primary choice for development in the Python programming language.

a. RStudio is the primary choice for development in the R programming language.

Which of the following are Data Integration and Transformation tools? (Select all that apply.) a. Cassandra b. Apache Kafka c. Apache Nifi d. Apache AirFlow e. Ceph

b. Apache Kafka, c. Apache Nifi, AND d. Apache AirFlow

Which of the following statements are true? a. Git is an integrated development environment for data science. b. Git is a system for version control of source code. c. Git is very useful for data science as well since data science often involves a lot of source code to be written and managed.

b. Git is a system for version control of source code AND c. Git is very useful for data science as well since data science often involves a lot of source code to be written and managed.

Which of the following is used to make Artificial intelligence and Machine Learning possible? (Select all that apply.) a. Oracle b. PyTorch c. TensorFlow.js d. Apache Spark e. GNU f. Caffe

b. PyTorch c. TensorFlow.js (possibly others?)

Which statement about R packages is correct? a. R doesn't require any packages to be installed since it contains all functionality necessary which a data scientists ever requires. b. R currently supports more than 15,000 packages which can be installed to extend R's functionality.

b. R currently supports more than 15,000 packages which can be installed to extend R's functionality.

Modeler flows in Watson Studio always begin with which type of node? a. A modeling node b. A type node c. A data source node d. An output node e. All the above

c. A data source node

Which scientific computing library provides data structures and data analysis tools for Python? a. YumPies b. Seahorse c. Pandas d. TensorFlow

c. Pandas

Open Neural Network eXchange (ONNX) was originally created for what models? a. Deep learning models b. Regression models c. Support vector machines (SVM) d. Decision trees e. Clustering models

A. Deep learning models

Which of the following do you need to create in order to publish a notebook to your GitHub repository?

Access token

Generally speaking, which type of model is used to predict a numerical value, such as the potential sales price of a used car?

Regression model

Fill in the blank: IBM Cloud uses ______________ as a way for you to organize your account resources in customizable groupings so that you can quickly assign users access to more than one resource at a time.

Resource groups

Comma Separated Values (CSV) is a commonly used format to store:

Tabular data

Which of the following statements about repositories are correct? (Select all that apply.) a. The remote repository is only accessible by myself. b. The local repository is only accessible by myself. c. The staging is only accessible by myself. d. The remote repository is accessible by all contributors. e. The local repository is accessible by all contributors.

b. The local repository is only accessible by myself. AND c. The staging is only accessible by myself. AND d. The remote repository is accessible by all contributors.

Which statements about IBM Watson Studio and OpenScale are correct? (Select all that apply.) a. Watson Studio together with Watson OpenScale is a database management system. b. Watson Studio together with Watson OpenScale covers the complete development life cycle for all data science, machine learning and AI tasks. c. Watson Studio together with Watson OpenScale is available as a Cloud offering as well as a package running on top of Kubernetes/RedHat OpenShift in a local data center called IBM Cloud Pak for Data.

b. Watson Studio together with Watson OpenScale covers the complete development life cycle for all data science, machine learning and AI tasks. AND c. Watson Studio together with Watson OpenScale is available as a Cloud offering as well as a package running on top of Kubernetes/RedHat OpenShift in a local data center called IBM Cloud Pak for Data.

Which of the following is NOT a deep learning framework? a. TensorFlow b. PyTorch c. Tommy d. Keras

c. Tommy

Data Refinery provides which of the following services? a. Catalog the data assets b. Monitor for bias and model drift c. Visualize and prepare data d. Automatically build models e. All of the above

c. Visualize and prepare data.

Which of the following are true about Data Asset Management? a. A crucial part of data science at the enterprise level. b. To be done effectively data must be versioned and annotated with meta data c. Also known as data governance. d. All of the above

d. All of the above

Which of the following functions do Jupyter Notebooks unify? a. Editing and display of documentation b. Visualization of charts c. Editing and execution of source code d. All of the above

d. All of the above

Watson Knowledge Catalog provides what functionality? a. Catalog all books mentioning Dr. Watson and Sherlock Holmes b. Create data and apply models into population c. Build data and water pipelines d. Catalog data and ML assets, help to find relevant assets, keep track of asset lineage, enforce data governance e. Process data, build and deploy models

d. Catalog data and ML assets, help to find relevant assets, keep track of asset lineage, enforce data governance

IBM SPSS Modeler evolved from which product? a. Oracle b. Netezza c. IBM DB2 d. Clementine e. SPSS

d. Clementine

Fill in the blank: IBM SPSS Statistics syntax can be created using ___________. a. Watson studio modeler flows b. IBM SPSS modeler streams c. AutoAI d. Graphical user interface the IBM SPSS statistics product or syntax editor e. OpenScale

d. Graphical user interface the IBM SPSS statistics product or syntax editor

Which features of Data Refinery help save hours and days of data preparation? a. Flexibility of using user interface and coding templates enabled with powerful operations to shipping clean data b. Data visualization and profiles to spot the difference and guide data preparation steps c. Incremental snapshots of the result allowing the user to gauge success with each interactive change d. Saving, editing and fixing the steps provided ability to iteratively fix the steps in the flow e. All of the above

e. All of the above

Which of the following statements is true? a. Keras, Scikit-learn, Matplotlib, Pandas, and TensorFlow are all built with Python. b. 80% of data scientists worldwide use Python. c. Python is the most popular language in data science. d. Python is useful for AI, machine learning, web development, and IoT. e. All of the above

e. All of the above

How does Data Refinery help build repeatable Data Pipelines for workloads of almost any size? a. Not supported b. Manually write APIs to provide automation c. Feature is only available in the UI, not API d. Only sixth workload size is supported e. Create a scheduled job and use a custom environment to run the data flow/pipeline on different workloads

e. Create a scheduled job and use a custom environment to run the data flow/pipeline on different workloads

Fill in the blank: In the _____________ tab you can define the hardware size and software configuration for the runtime associated with Watson Studio tools such as notebooks.

environments


Conjuntos de estudio relacionados

The Biology and Behavior of the Living Primates

View Set

Crash Course: Intro to Psychology #1

View Set

النبِّر و أحكام الّلام الساكنة.

View Set

Fundamentals Adaptive Quiz test 1

View Set

8 - Florida Laws and Rules Pertinent to Insurance (Test only has 40 Questions)

View Set

Identify Campaign Types on Google Display Ads

View Set

NCLEX - Pharmacology - Reproductive / Maternity / Newborn

View Set