Data Analytics Chapters 1 and 8
What values will the following statements display? for i in range(1, 11) print(i)
1 to 10
Dendrogram
A chart for displaying cluster groupings
Which of the following attributes are commonly used to determine data quality?
Accuracy, Completeness, Conformity, and Consistency
To determine if an email is spam, data programmers would use:
Classification
One of the earliest data-mining and analytic tools was:
Excel
Data mining only applies to numeric data
False
Hot blistering is the process of representing categorical data with numeric values
False
Like other strongly typed programming languages, such as Java and C#, in Python you must declare a variable before you use it
False
There are three forms of machine learning: supervised, unsupervised, and hybrid
False
To perform data-mining operations, Python scripts make extensive use of datatable objects. You can think of a datatable as a two-dimension table that holds values
False
To use a Python module, within your script, you use the module statement
False
A great source for sample data sets is:
Kaggle
Sklearn
Library that defines Python data structures and functions that support machine-learning and data-mining operations such as clustering, classification, and regression
Composition Charts
Represent how one or more values relate to a whole
Distribution Charts
Represent the frequency of values within a data set
Conformity
The degree to which the data values align with the company's business rules, such as "The company will measure and store sensor values on 1-second intervals."
Classification
The process of assigning data to matching groups (categories), such as a tumor being benign or malignant, email being valid or spam, or a transaction being legitimate or fraudulent
Association
The process of identifying key relationships between variables. One of the best-known data-association problems is market-basket analysis, which examines items in a customer's shopping cart to determine if the presence of one item in the cart (called the antecedent) influences the addition of a second item (called the consequent)
Data Mining
The process of identifying patterns in data
Clustering
The processing of grouping related data-set items
Machine Learning
The use of data pattern-recognition algorithms, which allow a program to solve problems, such as clustering, categorization, predictive analysis, and data association without the need for explicit step-by-step programming instructions to tell the algorithm how to perform tasks
Business Intelligence
The use of tools (data mining, machine learning, and visualization) to convert data into actionable insights and recommendations.
One of the first uses of data collection and analytics was the 1890 census
True
Programmers make extensive use of R and Python to create machine-learning applications
True
Python is a case-dependent programming language that considers upper and lowercase characters as different
True
Python is an interpreted language, as opposed to a compiled language, for which the Python interpreter executes one statement at a time
True
Python is one of the world's most popular programming languages and is used to create solutions that range from websites, data mining, machine learning, visualization, and more
True
Unlike other programming languages that use braces { } to group related statements, Python instead relies on statement indentation to group statements
True
Predictive Analytics
Try to forecast what will happen in the future
Orange and Rapid Miner are examples of:
Visual Programming Environment