DATA MINING LEC

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

redundant rules

If both rules have the same support and confidence, prune the more specific rule (r1).

min-Apriori

In this analysis, data contains only contiuous attribute of the same "type".

execution time

In this part of discretization, if the range is partitioned into k intervals, there are O (k2) new items.

Apriori Principle

In this principle, if an itemset is frequent, then all of its subsets must also be frequent.

neuron

It computes the weighted average of its input, and this sum is passed through a nonlinear function, and often called activation function in Artificial Neural Network.

single layer network

It contains only input and output nodes.

minsup

It controls the minimum number of data cases that a rule must cover.

concept hierarchy

It defines a sequence of mappings from a set of low-level concepts to higher-level, more general concepts.

condensing

It determine a smaller set of objects that give the same performance.

Frequent Subgraph Mining

It extends association analysis to finding frequent subgraphs.

Frequent Itemset Generation

It generates all itemsets whose support ≥ minsup.

Recurrent Neural Network

It handles sequences and is used to process speech and language.

Convolutional Neural Networks (CNN)

It handles two-dimensional gridded data and is used for image processing.

Dynamic variable discretizing

It involves discretizing all the variables at once, or simultaneously. In dynamic variable discretization you have to keep track of, and deal with, any interdependencies (interactions) between the variables.

itemset

It is a collection of one or more items.

hash table

It is a data structure that implements an associative array abstract data type, a structure that can map keys to values.

support count

It is a frequency of occurrence of an itemset.

sequential pattern

It is a frequent subsequence existing in a single sequence or a set of sequence

Topological equivalence

It is a reflexive, symmetric and transitive binary relation on the class of all topological spaces.

subsequence

It is a sequence <a1, a2 ... an> in which contained in another sequence <b1, b2, ... bm> if there are existing integers.

timing constraints

It is a series of constraints applied to a given set of paths or nets that dictate the desired performance of a design.

deep learning

It is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.

hash tree

It is a tree of hashes in which the leaves are hashes of data blocks in, for instance, a file or set of files.

bias

It is an additional parameter in the Neural Network which is used to adjust the output along with the weighted sum of the inputs to the neuron.

Perceptron Learning

It is an algorithm for supervised learning of binary classifiers.

Graph isomorphism

It is an equivalence relation on graphs and as such it partitions the class of all graphs into equivalence classes.

discretization

It is an essential preprocessing technique used in many knowledge discoveries and data mining tasks.

Frequent Itemset

It is an itemset whose support is greater than or equal to a minsup threshold.

sequence

It is an ordered list of elements (transactions). For example, purchase history of a given customer, history of events generated by a given sensor, browsing activity of a particular Web visitor, and so on.

coefficient

It is analogous to correlation coefficient for continuous variables

hash function

It is any function that can be used to map data of arbitrary size to fixed-size values.

computational complexity

It is binarizing the data increases the number of items.

confidence

It is calculated as the number of transactions that include both A and B divided by the number of transactions includes only product A.

pattern evaluation

It is defined as identifying strictly increasing patterns representing knowledge based on given measures.

Confidence (c)

It is part of the Rule Evaluation Metrics that measures how often items in Y appear in transactions that contain X.

hidden layer

It is the intermediary layer between input & output layers.

Infrequent itemset

It is the process of identifying itemsets whose rate of occurrence falls below the threshold value.

Unsupervised discretization algorithm

It is the simplest algorithms to make use of, because the only parameter you would specify is the number of intervals to use; or else, how many values should be included in each interval.

Maximal Frequent Itemset

It occurs when the itemset is frequent and none of its immediate supersets is frequent.

categorical attribute

It represents discrete values which belong to a specific finite set of categories or classes. These are also often known as classes or labels in the context of attributes or variables which are to be predicted by a model.

support

It represents the popularity of that product of all the product transactions.

local classifiers

Nearest neighbor classifiers are _____________.

statistics-based methods

The approach of this method withholds the target attribute from the rest of the data.

Most approaches use the group of classifiers in multiple sets.

The following are part of the details handling irrelevant and redundant attributes except for ___________.

Reduce the number of null values (NV)

The following are the Frequent Itemset Generation Strategies except for _____________.

sequence of all events that happen all the time

The following are the examples of a sequence except for ____________.

Limited number of queries

The following are the factors affecting the complexity of Apriori except for ___________.

infinite data generation

The following are the features of the statistical hypothesis testing exccept for _________.

rules at higher levels may depend on the number of itemsets

The following are the reasons why we should incorporate concept hierarchy except for ____________.

null datasets and attributes

The following are the things required in a Nearest-Neighbor Classifiers except for ______________.

categorical attribute

These are typically represented as floating-point variables.

Supervised discretization algorithm

These don't specify the number of bins, and the discretization is run based on entropy and purity-based calculations.

support of a word

This can determine if we simply sum up its frequency.

candidate pruning

This can form a group of itemsets up to frequent 3-itemsets.

Fk-1 x Fk-1 Method

This can merge two frequent (k-1)-itemsets if their first (k-2) items are identical.

association rule algorithm

This can produce a large number of rules.

multi-layer neural network

This can solve any type of classification task involving nonlinear decision surfaces.

closed itemset

This happens to an itemset if none of its immediate supersets has the same support as the itemset X.

Artificial Neural Networks

This model is an assembly of interconnected nodes and weighted links.

Proximity computations

This normally requires the presence of all attributes in a Nearest Neighbor Classification test sets.

Support Counting

This will make a new pass over the sequence database D to find the support for these candidate sequences under Generalized Sequential Pattern (GSP).

Candidate Generation

This will merge pairs of frequent subsequences found in the (k-1) pass to generate candidate sequences that contain k items under Generalized Sequential Pattern (GSP).

minconf

When this set higher, there will be less pattern but it may not be faster because many algorithms don't use minconf to prune the search space.

This learning can work 100% on a small scale of datasets for predicting outcomes.

Which of the following is NOT a deep learning characteristic?

Initial unlimited weights and vectors

Which of the following is NOT a design issue of Artificial Neural Network?

Can handle identical nodes because weights are manually set

Which of the following is NOT one of the characteristics of ANN?

Permit computations manuallly

Which of the following is NOT part of the Bruteforce approach in association rule mining?

storage layer

Which of the following is NOT part of the general structure of ANN?

For nonlinearly separable problems, the perceptron learning algorithm will fail because no linear hyperplane can separate the data perfectly

Which of the following is true about Perceptron Learning Rule?

Association Rule Mining

With this and a given set of instructions, it will find rules that will predict the occurrence of an item based on the occurrences of other data.


Ensembles d'études connexes

Program Constructs: Iteration & Recursion

View Set

Троцкий Дм. "Я - не Я"

View Set

Chapter 3: Starting a Small Business

View Set