SQL and Python
Select all of the valid math operators in SQL (select all that apply).
- + (addition) - * (multiplication) - - (subtraction) - / (division)
Which of the following are Data Integration and Transformation tools? (Select all that apply.)
- Apache Kafka - Apache Nifi - Apache AirFlow
Select the statements below that ARE NOT true of the ORDER BY clause (select all that apply).
- Can be anywhere in the select statement
IBM SPSS Modeler includes what kind of models?
- Classification Models (for data with categorical target) - Regression models (for data with a continuous target) - clustering models (for data with no target variables) - other kinds of models
Which of the following is an aggregate function? (select all that apply)
- Count() - Min and Max?
What are common tasks in data science?
- Data Management - Data Integration and Transformation - Data Visualization - Model Building - Model Development - Model Monitoring and Assessment
Which of the following is true of GROUP BY clauses? (Select all that apply.)
- Every column in your select statement may/can be present in a group by clause, except for aggregated calculations. - GROUP BY clauses can contain multiple columns - NULLs will be grouped together if your Group By column contains NULLs
Profiling data is helpful for which of the following? (Select all that apply)
- Filter out unwanted data elements - Understanding your data
When learning data science what open source tools are the most used?
- Jupyter Notebooks / JupyterLab - RStudio
Which of the following are supported in SQL when dealing with strings?
- Lower - Concatenate - Substring - Trim - Upper
Which statements are true about Open Source and Free Software? (Select all that apply.)
- Most of Free Software licenses also qualify for Open Source. - Open Source Software can be modified without sharing the modified source code depending on the Open Source license.
Examples of data management tools
- MySQL - PostgreSQL
Examples of SQL Databases
- MySQL - PostgreSQL - Oracle
Filtering data is used to do which of the following? (select all that apply)
- Narrows down the results of the data - Reduce the time it takes to run the query - Removes unwanted data in a calculation
Which of the following is used to make Artificial Intelligence and Machine Learning possible? (Select all that apply.)
- PyTorch - TensorFlow.js - Apache Spark
Which are the three most used languages for data science? (Select all that apply.)
- Pyton - R - SQL
Which of the following languages can be used for data science?
- R - Julia - Java - Javascript - Scala - SQL
Which tool do most Python developers use?
- RStudio
When learning data science what open source tools are the most used?
- RStudio - Jupyter Notebooks / JupyterLab
Case statements can only be used for which of the following statements (select all that apply)?
- Select - Insert
Which of the following statements are true of Entity Relationship (ER) Diagrams?
- They show you the relationships between tables. - They are usually a representation of a business process.
Which statements about IBM Watson Studio and OpenScale are correct? (Select all that apply.)
- Watson Studio together with Watson OpenScale covers the complete development life cycle for all data science, machine learning and AI tasks. - Watson Studio together with Watson OpenScale is available as a Cloud offering as well as a package running on top of Kubernetes/RedHat OpenShift in a local data center called IBM Cloud Pak for Data.
Select all that are true regarding wildcards (Select all that apply.)
- Wildcards take longer to run compared to a logical operator - Wildcards at the end of search patterns take longer to run
Which of the following is true regarding Aliases? (Select all that apply.)
- an alias only exists for the duration of a query - aliases are often used to make column names more readable - SQL aliases are used to give a table, or a column in a table, a temporary name
Select which of the following statements are true regarding inner joins. (Select all that apply)
- performance will most likely worsen with the more joins you make - there is no limit to the number of tables you can join with an inner join
Which of the following statements about Unions is true? (select all that apply)
- the columns must also have similar data types - each SELECT statement within UNION must have the same number of columns - the UNION operator is used to combine the result-set of two or more SELECT statements
What type of file format is a Jupyter Notebook?
.ipynb
What type of node is used to partition the data into a training and testing set in Modeler flows?
A partition node
What type of node is used to define metadata for features in Modeler flows?
A type node
Which of the following statements is true?
All of the above
What are some of the Data Refinery abilities when working with data?
Analyzes and transforms data quickly
What does the "BI" in BI Tools stand for?
Business Intelligence
What type of model would you use if you wanted to find the relationship between dependent and independent variables?
Clustering model
How does Data Refinery help build repeatable Data Pipelines for workloads of almost any size?
Create a scheduled Job and use a custom environment to run the data flow/pipeline on different workloads.
What is data governance?
Creating processes and controls around the access of data
How does a data scientist and DBA differ in how they use SQL?
DBAs manager the database for other users
Fill in the blank: ________________ is the heart of every organization.
Data
Open Neural Network eXchange (ONNX) was originally created for what models?
Deep learning models.
A null and a zero value effectively mean the same thing. True or false?
False
True or False: The Jupyter Notebook kernel must be installed on a local server.
False
You are only allowed to have one condition in a case statement. True or false?
False
_________ filters after the data is grouped
HAVING
What type of environment is RStudio?
Integrated Development Environment (IDE)
If you can accomplish the same outcome with a join or a subquery, which one should you always choose?
Joins are usually faster, but subqueries can be more reliable, so it depends on your situation.
Which tool unifies documentation, source code and data visualizations into a single document?
Jupyter Notebooks / JupyterLab
Which statement about JupyterLab is correct?
JuypterLab can run R and Python code in addition to other programming languages.
What type of assets does the Watson Knowledge Catalog let you discover in Watson Studio?
Machine Learning
PyTorch is what type of Python library?
Machine learning
Jupyter Notebooks is the tool most R developers use.
No
Storing data in tables in a function that RStudio provides.
No
What open source tool was developed and built by statisticians?
RStudio
What tool do most R developers use?
RStudio
Which statement about RStudio is correct?
RStudio is the primary choice for development in the Python programming language.
SQL is what type of database management system?
Relational
Data scientists need to use joins in order to: (select the best answer)
Retrieve data from multiple tables.
In order to retrieve data from a table with SQL, every SQL statement must contain?
SELECT
Which of these is a database query language?
SQL
Which of these is a machine learning or deep learning library for Python?
Scikit-learn
When debugging a query, what should you always remember to do first?
Start simple and break it down first
Comma Separated Values (CSV) is a commonly used format to store:
Tabular data
What is the difference between a left join and a right join?
The only difference between a left and right join is the order in which the tables are relating.
Is the following statement true or false: R integrates well with other computer languages like C++, Java, C, .Net and Python.
True
True or false? Jupyter Notebooks / JupyterLab support development in R.
True
True or false? RStudio supports development in Python.
True
What is the most important step before beginning to write queries?
Understanding your data
Data Refinery provides which of the following services?
Visualize and prepare data.
Is Keras a machine learning or deep learning library for Python?
Yes
Is it possible to use machine learning within a web browser with Javascript?
Yes
RStudio is the tool most R developers use.
Yes
When using the "CREATE TABLE" command and creating new columns for that table, which of the following statements is true?
You must assign a data type to each column
Fill in the blank: It's a best practice to remove or replace _____________ before publishing to GitHub.
credentials
Fill in the blank: In the __________ tab you can define the hardware size and software configuration for the runtime associated with Watson Studio tools such as Notebook.
environments
Which command is used to install packages in R?
install.packages("package name")
What type of file format is a Jupyter Notebook?
ipynb
Fill in the blank: If you'd like to schedule a notebook in Watson Studio to run at a different time, you can create a(n) ________.
job
Which is the correct order of occurrence in a SQL statement?
select, from, where, group by, having
_________ always process the innermost query first and then work outward
subquery