DATA MINING LEC
redundant rules
If both rules have the same support and confidence, prune the more specific rule (r1).
min-Apriori
In this analysis, data contains only contiuous attribute of the same "type".
execution time
In this part of discretization, if the range is partitioned into k intervals, there are O (k2) new items.
Apriori Principle
In this principle, if an itemset is frequent, then all of its subsets must also be frequent.
neuron
It computes the weighted average of its input, and this sum is passed through a nonlinear function, and often called activation function in Artificial Neural Network.
single layer network
It contains only input and output nodes.
minsup
It controls the minimum number of data cases that a rule must cover.
concept hierarchy
It defines a sequence of mappings from a set of low-level concepts to higher-level, more general concepts.
condensing
It determine a smaller set of objects that give the same performance.
Frequent Subgraph Mining
It extends association analysis to finding frequent subgraphs.
Frequent Itemset Generation
It generates all itemsets whose support ≥ minsup.
Recurrent Neural Network
It handles sequences and is used to process speech and language.
Convolutional Neural Networks (CNN)
It handles two-dimensional gridded data and is used for image processing.
Dynamic variable discretizing
It involves discretizing all the variables at once, or simultaneously. In dynamic variable discretization you have to keep track of, and deal with, any interdependencies (interactions) between the variables.
itemset
It is a collection of one or more items.
hash table
It is a data structure that implements an associative array abstract data type, a structure that can map keys to values.
support count
It is a frequency of occurrence of an itemset.
sequential pattern
It is a frequent subsequence existing in a single sequence or a set of sequence
Topological equivalence
It is a reflexive, symmetric and transitive binary relation on the class of all topological spaces.
subsequence
It is a sequence <a1, a2 ... an> in which contained in another sequence <b1, b2, ... bm> if there are existing integers.
timing constraints
It is a series of constraints applied to a given set of paths or nets that dictate the desired performance of a design.
deep learning
It is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.
hash tree
It is a tree of hashes in which the leaves are hashes of data blocks in, for instance, a file or set of files.
bias
It is an additional parameter in the Neural Network which is used to adjust the output along with the weighted sum of the inputs to the neuron.
Perceptron Learning
It is an algorithm for supervised learning of binary classifiers.
Graph isomorphism
It is an equivalence relation on graphs and as such it partitions the class of all graphs into equivalence classes.
discretization
It is an essential preprocessing technique used in many knowledge discoveries and data mining tasks.
Frequent Itemset
It is an itemset whose support is greater than or equal to a minsup threshold.
sequence
It is an ordered list of elements (transactions). For example, purchase history of a given customer, history of events generated by a given sensor, browsing activity of a particular Web visitor, and so on.
coefficient
It is analogous to correlation coefficient for continuous variables
hash function
It is any function that can be used to map data of arbitrary size to fixed-size values.
computational complexity
It is binarizing the data increases the number of items.
confidence
It is calculated as the number of transactions that include both A and B divided by the number of transactions includes only product A.
pattern evaluation
It is defined as identifying strictly increasing patterns representing knowledge based on given measures.
Confidence (c)
It is part of the Rule Evaluation Metrics that measures how often items in Y appear in transactions that contain X.
hidden layer
It is the intermediary layer between input & output layers.
Infrequent itemset
It is the process of identifying itemsets whose rate of occurrence falls below the threshold value.
Unsupervised discretization algorithm
It is the simplest algorithms to make use of, because the only parameter you would specify is the number of intervals to use; or else, how many values should be included in each interval.
Maximal Frequent Itemset
It occurs when the itemset is frequent and none of its immediate supersets is frequent.
categorical attribute
It represents discrete values which belong to a specific finite set of categories or classes. These are also often known as classes or labels in the context of attributes or variables which are to be predicted by a model.
support
It represents the popularity of that product of all the product transactions.
local classifiers
Nearest neighbor classifiers are _____________.
statistics-based methods
The approach of this method withholds the target attribute from the rest of the data.
Most approaches use the group of classifiers in multiple sets.
The following are part of the details handling irrelevant and redundant attributes except for ___________.
Reduce the number of null values (NV)
The following are the Frequent Itemset Generation Strategies except for _____________.
sequence of all events that happen all the time
The following are the examples of a sequence except for ____________.
Limited number of queries
The following are the factors affecting the complexity of Apriori except for ___________.
infinite data generation
The following are the features of the statistical hypothesis testing exccept for _________.
rules at higher levels may depend on the number of itemsets
The following are the reasons why we should incorporate concept hierarchy except for ____________.
null datasets and attributes
The following are the things required in a Nearest-Neighbor Classifiers except for ______________.
categorical attribute
These are typically represented as floating-point variables.
Supervised discretization algorithm
These don't specify the number of bins, and the discretization is run based on entropy and purity-based calculations.
support of a word
This can determine if we simply sum up its frequency.
candidate pruning
This can form a group of itemsets up to frequent 3-itemsets.
Fk-1 x Fk-1 Method
This can merge two frequent (k-1)-itemsets if their first (k-2) items are identical.
association rule algorithm
This can produce a large number of rules.
multi-layer neural network
This can solve any type of classification task involving nonlinear decision surfaces.
closed itemset
This happens to an itemset if none of its immediate supersets has the same support as the itemset X.
Artificial Neural Networks
This model is an assembly of interconnected nodes and weighted links.
Proximity computations
This normally requires the presence of all attributes in a Nearest Neighbor Classification test sets.
Support Counting
This will make a new pass over the sequence database D to find the support for these candidate sequences under Generalized Sequential Pattern (GSP).
Candidate Generation
This will merge pairs of frequent subsequences found in the (k-1) pass to generate candidate sequences that contain k items under Generalized Sequential Pattern (GSP).
minconf
When this set higher, there will be less pattern but it may not be faster because many algorithms don't use minconf to prune the search space.
This learning can work 100% on a small scale of datasets for predicting outcomes.
Which of the following is NOT a deep learning characteristic?
Initial unlimited weights and vectors
Which of the following is NOT a design issue of Artificial Neural Network?
Can handle identical nodes because weights are manually set
Which of the following is NOT one of the characteristics of ANN?
Permit computations manuallly
Which of the following is NOT part of the Bruteforce approach in association rule mining?
storage layer
Which of the following is NOT part of the general structure of ANN?
For nonlinearly separable problems, the perceptron learning algorithm will fail because no linear hyperplane can separate the data perfectly
Which of the following is true about Perceptron Learning Rule?
Association Rule Mining
With this and a given set of instructions, it will find rules that will predict the occurrence of an item based on the occurrences of other data.