IBM + Data Tools

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Standard SQL commands, such as:

"Select", "Insert", "Update", "Delete", "Create", and "Drop", can accomplish most of what you might need to do with a database.

A person states, "You can use open source however you like." How should you respond to this statement about open source?

"You cannot use open source however you like because it comes with a license that the users must follow."

You can use the percent sign (%) as a:

"wild card" to match any possible character that might appear before or after the characters specified.

Which of the following statements is correct about open source software? Select all that apply.

***Anyone can see the source code and understand how the software works. ***Open source software allows developers to share insights, ideas, and code. Open source software is privately available for the registered community of contributors. ***Anyone can run open source software.

Which of the following good practices did you follow when cleaning the data set?

1. Kept columns that have date and time information because they could help reveal interesting information for predictions. 2. Removed identification-type data because it's not going to be predictable for your project.

Fill in the blank. Your original CSV data set contains 975 rows and __ columns.

38

4 Columns

975 unique occurences

IBM Watson Studio gives data scientists:

A streamlined process for data projects A collaborative data science and machine learning environment Easy-to-create visualizations Access to open source tools A way to develop, train, manage, and deploy AI-powered applications IBM Watson Studio is a comprehensive, all-in-one environment to explore more of what's possible with data and artificial intelligence (AI).

With Tableau, you can:

Analyze large volumes of data. 1. Create different dashboards, charts, graphics, maps, stories and more to help make business decisions. 2. Perform tasks without programming experience. It offers an intuitive interface. 3. Design interactive visualizations.

Open source communities can:

Be a resource for mentorship Provide opportunities for developers to interact and showcase their subject matter expertise Potentially help people who are beginning a coding journey advance their career

From the perspective of a business, open source software provides many benefits:

Businesses can increase innovation using the community model. For example, the code a project might need might already have been written and is available in an open source community. There is a potential for cost savings when developing software and documenting, testing, and fixing bugs. Starting a project or contributing to an existing project can help businesses influence the direction of technological development. Open source software can help businesses increase software security and reliability and keep pace with their competitors.

3 Tabs at the top

Data, Profile, Visualization

What are the three tabs at the top that you work with to perform operations in the data refinery tool?

Data, Profile, and Visualizations

With SQL, you can:

Execute queries against a database Retrieve data from a database Insert records in a database Update records in a database Delete records from a database Create new databases Create new tables in a database Create stored procedures in a database Create views in a database Set permissions on tables, procedures, and views

Complete the sentence. _______________ is software for tracking changes and controlling versions in source code during development and _______________ is an online service for hosting source code to contribute and collaborate on projects.

Git, Github

Which data science tool is free to use without a license and allows you to perform tasks, such as entering, analyzing, and visualizing data?

Google Sheets

Google Sheets vs. MS Excel

Google Sheets is suitable for collaborating online because multiple people can edit the same sheet at once. However, many businesses use Microsoft Excel due to statistical and data analysis features and capabilities that are not available in Google Sheets.

Which of the following data science tools can a business use to collect and clean data?

Google Sheets, SQL, Excel

In GitHub you can:

Host your own open source project. To do this, you create an online repository and add files. Contribute to an existing open source project that's public. To do this, you access a copy of the project's repository, make updates, and request a review of the changes to you want to contribute.You can use Git without using GitHub, but you cannot use GitHub without using Git.

When starting out, you visit the catalog of IBM products and services for AI / Machine Learning. Which block do you select for the product that you are using?

IBM Watson Studio

Which data science tool has a data refinery function that lets you prepare and transform large amounts of raw data into high quality data that can be analyzed using a graphical interface with built-in operations?

IBM Watson Studio

Which of the following is a good practice you learned when configuring your resource and selecting a name for the Service name field?

Keep "Watson Studio" because that's the service and then add a name that's relevant to your project at the end.

You can also use MS Excel to help visualize data sets and gain insights from data:

MS Excel offers different types of charts, such as pie, bar, line, and scatter plot. You can use built-in templates and MS Excel's recommendations for a chart based on your data. You can combine different chart types on one spreadsheet in MS Excel.

Common tools to analyze and visualize data, including:

Microsoft Excel Google Sheets Structured Query Language (SQL) Python IBM Watson Studio Tableau Matplotlib

What is NoSQL?

NoSQL is an abbreviation for "not only SQL". While SQL lets you work with structured data in relational databases, NoSQL lets you work with unstructured data in nonrelational databases. NoSQL is useful for big data and real-time web apps. For example, a company like Twitter that collects terabytes of user data every day might use NoSQL.

What is one of the benefits of open source software to a data science team?

Open source software provides a way to build skills and learn new software quickly. Open source software provides a way to access the latest technologies.

Which of the following is a general-purpose programming language to connect database systems and analyze big data?

Python

Which programming language allows you to use standard commands to perform functions, such as search, insert, update, delete, and create so you can communicate with a database?

SQL

What is SQL?

Structured Query Language (SQL) is a standard language to communicate with databases. As the name suggests, you use SQL when you have structured data in a relational database. SQL is a query language, not a programming language. -The purpose is to ask questions or "query" a relational database or modify its contents.

Tamika is a data analyst at a financial services company. She's analyzing a large amount of customer investment data for the past quarter to derive insights for her team. She's also going to create an interactive visualization for a quarterly meeting coming up. Which data science tool could Tamika use for these capabilities?

Tableau

IBM Watson Studio has a built-in data refinery tool:

The data refinery lets you prepare and transform large amounts of raw data into high quality data. You can visualize data using built-in charts and graphics to understand the distribution of your data. You can schedule jobs for the data to produce repeatable outcomes.

Four key aspects of open source software to keep in mind:

Use: Anyone can use and execute (or run) the software for any purpose, under the license. View: Anyone can view the source code to understand how the software works. Modify: Enhancements, bug fixes, and solutions can come from anyone. Share: Contributions are based on a common, shared purpose.

You can use MS Excel to enter, examine, and interpret data in a variety of ways:

You can manipulate and clean the data in rows and columns prior to analysis. MS Excel has built-in data analysis functions and features, such as filtering, formulas, and pivot tables.

All of the following statements about a GitHub repository are correct except one. Which statement is incorrect?

You can review the revision history for files in a repository. ***You can edit private and public repositories from contributors. You can create your own public and private repositories. You can add files like images, spreadsheets, and data sets to a repository.

Data scientists also use NoSQL to:

communicate with NoSQL databases. The purpose is to store and work with unstructured data like lots of images and text. Types of NoSQL databases include document databases, wide-column databases, and graph databases.

Data science tools and programming languages use:

computer science, statistics, predictive analytics, and more to dig deeper into data. These tools and programming languages can be used to collect, manipulate, and analyze business data to derive valuable insights. They help data scientists perform their complex tasks efficiently. In turn, data scientists can help businesses develop solutions to achieve success.

Python can be used to:

create web applications and to support data science projects. Python's design philosophy emphasizes code that's easy to read and is notable for using whitespace. Python has syntax like the English language, with some influence from mathematics. You can use Python to connect to database systems and read and modify files. Python can handle big data and perform complex mathematics. You can pair Python with a data manipulation and analysis software library, like pandas(opens in a new tab). Python can help you obtain insights and create data visualizations.

Data science professionals use SQL to:

explore, maintain, and secure data so they can make better decisions.

The contribution guidelines describe

how to contribute and collaborate. They provide rules about how the community can participate in the open source project.

Google Sheets

is a free tool you can use to perform tasks like entering, analyzing, and visualizing data to make data-driven decisions.

Python

is a free, open source and general-purpose programming language that's available for everyone to use. Python was created by Guido van Rossum and released in 1991. It was designed with the intent of being easy and fun to use.

Tableau

is a popular data visualization and business intelligence software for deriving meaningful insights from data. Many businesses use Tableau for pictorial and graphical representations of data.

Standard Deviation

is a satistical calculation that tells you how dispersed the data is in relation to the mean. 1. The mean is the average of a data set 2. A low standard deviation means that data is clustered around the mean. 3. A high standard deviation indicates data are more spread out

Microsoft Excel

is a spreadsheet application created by Microsoft. MS Excel is one of the most-used tools for data analysis.

Git

is a version control system for tracking changes in source code during software development. It helps coordinate work among programmers. Git is an open source software that's licensed. It's installed locally on a computer. You can use Git without using GitHub, but you cannot use GitHub without using Git.

IBM Watson Studio

is an integrated development environment (IDE). Named after IBM's founder, IBM Watson Studio pulls together the most useful development and analytic tools, wrapping them in a development platform that is powerful enough to meet large-scale challenges, yet simple enough that developers can master it quickly. 1. It's a collaborative data science and machine learning environment. 2. IBM Watson Studio works with open source tools. 3. IBM Watson Studio offers a graphical interface with built-in operations. 4. You don't need to know how to code to use the tool.

GitHub

is an online service that provides a place to host source code as well as contribute and collaborate. It lets people work together on projects from anywhere. GitHub is a service, not a software. It offers free, professional, and enterprise accounts.

Like

is an operator that allows you to select only rows that are "like" what you specify.

Community

is anyone related to an open source project. This is a group of people who receive some benefit from the project.

Repository

is like a folder for your project. You can have multiple public and private repositories in GitHub. Repositories can contain files, images, videos, spreadsheets, and data sets. GitHub provides the revision history for all files in a repository.

Open source software

is software with code that is published publicly and can be used by anyone. Open source software is collaborative, meaning it relies on a virtual community of people to review, change, and share source code with each other. Developers share insights, ideas, and code to create more innovative software solutions.

Contributor

is someone who contributes back to a project.

Committer

is someone who reviews and approves changes to project source code. A committer has write access (or write permission) to a source code repository. This means they are authorized to update the data.

IBM Watson Studio offers many tools, including the data refinery tool. It's beneficial because:

it saves data preparation time. It can quickly transform large amounts of raw data into consumable, high-quality information that's ready for analytics. With it, data scientists can: Create a workflow to clean and shape data Understand the quality and distribution of data using dozens of built-in charts, graphs, and statistics Schedule data jobs for repeatable outcomes Visualize data to discover insights The data refinery tool is interactive and easy to use. You don't need coding skills. Over 100 built-in operations are available to help you transform data!

Matplotlib is a:

library for plots in Python There are numerous libraries in Python and Matplotlib is one of the most used libraries. Matplotlib is a cross-platform library that provides various tools to create two-dimensional plots from data, in lists or arrays, in Python. Matplotlib is open source and a community project maintained for and by its users. 1. A Python Matplotlib script is structured so that, in most instances, a few lines of code can generate a visual data plot. 2. You can create different types of plots(opens in a new tab), such as scatterplots, histograms, bar charts, and more. 3. The visualizations can be static, animated, and interactive. 4. You can export to many different types of file formats.

Fill in the blank. Your data set is from auto insurance claims that your company approved. The goal of your project is to predict fraudulent claims. You must therefore examine the _________________ of the data in each column. It's up to you to decide this when cleaning the data.

predictability

The code of conduct

protects the community and provides guidelines about acceptable behavior. All projects should have a code of conduct.

You can use SQL queries to perform operations in a database, such as:

selecting, retrieving, updating, and deleting data.

Internet of Things (IoT) devices:

such as smart thermostats, appliances, and fitness trackers—to name a few. These types of IoT devices collect data from you to predict your behavior and further advance home automation.

Profile tab

to check out the descriptive statistics, such as standard deviation, that the data refinery tool provides. These statistics helped you decide if a column is interesting and could have predictive data that you should keep.

Natural Language Processing (NLP)

to understand and process human language in real time. NLP has the potential to make both business and consumer applications easier to use. When used in conjunction with artificial intelligence (AI), NLP could possibly help professionals solve global challenges, such as clean energy.


Ensembles d'études connexes

ARE 155 FINAL (Theory Questions)

View Set

Chapter 5 Section 1~Muscle Tissue Categories and Functions

View Set

Fahmy 2018 >>> ( Italian to English )

View Set

Linear Equations and Systems of Equations: 700068RR (pennfoster exam answers)

View Set