Machine Learning

Ace your homework & exams now with Quizwiz!

2. Draw the generic learning model to learn from data? Then define the main operations of its through indicating each operation and related steps.

(check img)

disadvantages of K-means?

- Difficult to predict the number of clusters - initial seeds have a strong impact on the final results - the order of the data has an impact on the final results - sensitive to scale: rescaling data sets will completely change results

disadvantages of K-nns.

- classifying unknown records are relatively expensive - requires distance computation of k nearest neighbors - computationally intensive, when size of training set grows - accuracy can be severely degraded by the presence of noisy or irrelevant features - knn classification expects class conditional probability to be locally constant

List all steps of the hierarchical clustering of agglomerative(bottom-up) approach

1. Make each data point a single cluster 2. Take the two closest clusters and make them one clusters 3. Take the next next two closest and make them one cluster Repeat 3 until you have one cluster

STAR: 5. What are the advantages and disadvantages of hmm

Advantages · HMMs are very powerful modeling tools · Statisticians are comfortable with the theory behind the hmm · Hmms can be combined into larger hmms · Easy to read the model and make sense of it · The model itself can help increase understanding Disadvantages - state independence - not good for RNA folding problems - overfitting - local maximums - speed

4. What are the pros and cons of the typical RNNs architecture?

Advantages · Possibility of processing input of any length · Model size s not increasing with size of input · Computation considers historical information · Weights are shared across time Disadvantages - computation being slow - difficulty of accessing information from a long time ago - cannot consider any future input for the current state.

Advantages and disadvantages of Content based recommenders

Advantages : works when the product has no user reviews Disadvantages: need descriptive data for every product you want to recommend, difficult to implement for many kinds of large products databases.

Advantages and disadvantages of Collaborative filtering

Advantages: does not require any knowledge about the products themselves Disadvantage: can't recommend product without user reviews · Difficult to make good recommendations for brand new users · Good for popular products

define the two approaches of hierarchical clustering?

Agglomerative: a bottom up strategy, where each data object is its own cluster, then merges these into larger and larger clusters. Divisive: a top down strategy, initially all objects are one cluster, then the clusters are subdivided into smaller and smaller clusters .

What are the different methods for changing training data? List them , then illustrate the working mechanism of each method, support your working mechanic with illustration diagrams.

Bagging, boosting (reweight training data) (meta learners) ( there are more ? )

What is the cluster analysis?

Cluster: A collection of data objects n similar (or related) to one another within the same group n dissimilar (or unrelated) to the objects in other groups Cluster analysis: finding similarities between data according

List Bn components and importance

Components : DAG (directed acyclic graph) · Nodes : random variables (typically binary or discrete but methods also exist to handle continuous variables ) · arcs : indicate probabilistic dependences between nodes (lack of link signifies conditional independence · CPD: conditional probability distribution , conditional probabilities at each node usually stored as table Importance · Handling of incomplete datasets · Learning about causal networks · Facilitating the combination of domain knowledge and data · Efficient and principled approach for avoiding the overfitting of data

types of recommendation systems

Content-based recommenders Collaborative filtering recommenders (look at diagram)

How to fill rates of users who have not rated any movies?

For users who have not rated movies, you use mean normalization to find them what they need.

3. What are the platforms for online machine learning algorithms?

Hydrosphere.io Prediction.io Azure machine learning , Amazon machine learning Google prediction Bigml Data robot.

5 applications of autoencoders

Image coloring feature variation dimensionality reduction denoising image watermark removal

List the ensemble methods that minimize variance and bias.

Methods that minimize variance: Bagging Random Forests Methods that minimize bias: functional gradient descent boosting, ensemble selection

1. Define the principle components analysis(PCA) then list the 3 main fields that could be used and 3 application examples.

PCA: a technique used to reduce the dimensionality of the data set to 2D or 3D. Used for: · Reduce the number of dimensions in data · Find patterns in high dimensional data · Visualize data of high dimensionality Example applications : - face recognition - image compression - gene expression analysis

Write the pseudocode of K-means algorithm

Randomly initialize K centroids (μ1, μ2, μ3, ... μk ∈ ℝn Repeat { For i = 1 to m c^(i) = index (from 1 to K) of cluster centroid closest to x^(i) For k = 1 to K μk = avg (mean) of points assigned to cluster k )

5. List the 8 types of autoencoders

Stacking , convolution, deep,denoising, sparse, contractive , variational , generative adversarial network

What are the typical applications of the cluster analysis?

Stand alone tool to get insight into data distribution; Preprocessing step for other algorithms

why do we use recommender systems?

Value to Customer find things that are interesting, narrows down their choices, helps them explore new options, discover new things, entertainment. Value to Provider helps them personalize a service to the customer Increase trust and loyalty from customer Opportunities for promotions , increase sales Obtain more knowledge about customers

2. What do we mean by the variance and covariance?

Variance : measure of the deviation from the mean for points in one dimension Covariance : measures how much of each of the dimensions vary from the mean with respect to each other

Define the Dendrograms

· A diagram that shows the hierarchical relation between objects. · A binary tree that shows how clusters are merged/split hierarchically • Each node on the tree is a cluster; each leaf node is a singleton cluster · A clustering of the data objects is obtained by cutting the dendrogram at the desired level, then each connected component forms a cluster

Define hidden markov model (hmm)3. , then list and illustrate components of hmm

· A hidden markov model is a sequence of random variables such that the distribution of Zt depends only on the hidden state of Xt of an associated markov chain · X - a finite set of state · Z - a finite set of observations · Transition probabilities · Observation probabilities · Prior probability distribution on the initial state

What is Machine Learning? What are the main ML types? What ML algorithms you studied after the midterm? Which is more important to you, model accuracy or model performance, support your answer with an example.

· A set of methods that can automatically detect patterns in data, and then use the uncovered patterns to predict future data, or to perform other kinds of decision making under uncertainty(such as planning how to collect more data). · The four types are , supervised learning, unsupervised learning,reinforcement learning, semi-supervised learning.

Can a set of weak learners be created as a single strong learne?

· A weak learner is defined as a classifier that only slightly correlates with true classification, a strong learner is a classifier that is arbitrarily well correlated with true classification. So yes you can create a strong one with a set of weak ones.

5. What are the advantages and disadvantages of hierarchical clustering?

· Advantages: Hierarchical clustering outputs a hierarchy. Meaning its easier to decide the number of clusters by looking at the dendrograms · Easy to implement Disadvantages: it's not possible to undo the previous step: once the instances have been assigned to a cluster they cannot be undone or moved around - time complexity: not suitable for large datasets - initial seeds have a strong impact on final results - the order of the data has been an impact on the final results - very sensitive to outliers

Define RNNs, show whether RNNs are supervised or unsupervised learning? What is the major difference between RNNs and FNNs, illustrate that?

· An RNN is a class of neural networks that allow previous outputs to be used as inputs while having hidden states · Recurrent neural network are Supervised learning · A feed forward neural network (fnn) the connections between units don't form a cycle · A recurrent neural network the connection between units form a cyclic path

2. What are the main differences between PCA and autoencoders?

· An autoencoder can learn non linear transformations with a non linear activation function and multiple layers · Auto encoders dont have to learn dense layers, it can use convolutional layers to learn one huge transformation with an autoencoder rather than learning one huge transformation with pca · It is more efficient to learn several layers with an auto encoder rather than learn one huge transformation with pca · It can make use of a pre trained layer from another model to apply transfer learning to enhance the encoder/decoder

3. three main training approaches of RNNs.

· Backpropagation through time(BPTT): unfolding RNNs in time and using the extended version of back-propagation · Extended kalman filter(EKF): a set of mathematical equations that provides efficient computational means to estimate the state of a process, in a way that minimizes the mean of squared error (cost function ) on a linear system. · Real-time recurrent learning (RTRL) computing the error gradient and update weights for each time step.

4. 4 hyperparameters of auto encoders

· Code size: it represents the number of nodes in the middle layer. Smaller size results in more compression. · Number of layers : the autoencoder can consist of as many layers as we want · Number of nodes per layer: the number of nodes per layer decreases with each subsequent layer of the encoder and increases back in the decoder. The decoder is symmetric to the encoder in terms of ht layer structure · Loss function: we either use mean squared error or binary cross entropy. If the input values re in the range [0,1] then we typically use cross entropy otherwise we use the mean squared error.

4. List then explain the 3 main properties andof auto encoders

· Data specific: auto encoder are only able to compress data like what they have been trained on · Lossy: the decompressed outputs will be degraded compared to the original inputs · Learned automatically from examples: it is easy to train specialized instances of the algorithm that will perform well on a specific type of input.

2. List types of probabilistic relationship, then provide 7 real world bayesian network application

· Direct cause , indirect cause , common cause , common effect · Gene regulatory network · Medicine · Biomonitoring · Document classification · Information retrieval · Semantic search · Image processing · Spam filter · Turbo code · System biology · Medical diagnosis · Ventilator associated pneumonia · Roc (receiver operating characteristics )

List advantages of K-means?

· Easy to implement · With a large number of variables, K-means may be computationally faster · K-means may produce tighter clusters than hierarchical clustering · An Instance can change cluster when the centroids are recomputed

Define the ensemble learning, illustrate the key motivation for the ensemble learning, then draw the general idea diagram of the ensemble learning.

· Ensemble learning is a machine learning paradigm where multiple learners are trained to solve the same problem. In traditional machine learning approaches you learn one hypothesis, in ensemble learning you take a set of hypotheses and combine them. · Key motivations are to reduce the error rate · Combining multiple independent and diverse decisions each of which is at least more accurate than random guessing Random errors can each other out, correct decisions are reinforced.

3. What are the two common distance metrics used for the k-nns?

· Euclidean distance , Manhattan distance

What are the three required things to implement that k-nns?

· Feature space : (training data ) · Distance metric: ( to compute distance between instances) · The value of K : (the number of nearest neighbors to retrieve from which to get majority class

2. List the K-Nearest Neighbors (k-nns) main steps

· For a given instance T, get the top K dataset instances that are nearest to T · Select a reasonable distance measure · Inspect the category of these k instances, choose the category c that represents the most instances. · Conclude that t belongs to category c

What are the hardware-based solutions that can be used for machine learning for big data?

· Hardware based solution: map reduce and data parallelism · Map reduce uses multiple cpu cores or machines with the dataset split among them

What are the Idea, algorithm, and types of the instance-based learning?

· Idea: similar examples have similar labels, classify new examples like similar training examples · Algorithm: given new examples x for which we need to predict its class y · Find the most similar training examples · Classify x "like" these most similar examples · Types of instance based learning: Rote learner , nearest neighbor

What are bayesian networks (BNs)?

· Is a probabilistic graphical model that represents a set of random variables and their conditional dependencies via a directed acyclic graph(DAC). BNs encode the conditional independence relationships between the variables in the graph structure.

define the common clustering algorithms

· K-means clustering: partitions data into K distinct clusters based on distance to the centroid of a cluster · Hierarchical clustering: builds a multilevel hierarchy of clusters by creating a cluster tree · Self-organizing maps: user neural networks that learn the topology and distribution of the data · Hidden Markov model: use overserved data to recover the sequence of states

3. List the key elements and components of autoencoders ? then illustrate the components.

· Key elements: unsupervised ml algorithm similar to pca, minimizes the same objective function as pca, is a neural net, target output is its input · Components: Encoder-this part of the network compresses the input into a latent space representation. The encoder layer encodes the input image as a compressed representation in a reduced dimension. The compressed images are a distorted version of the original image. · Code- this part of the network represents the compressed input which is fed to the decoder · Decoder- this layed decodes the encoded image back to the original dimension. The decoded image is a lossy reconstruction of the original image and it is reconstructed from the latent space representation.

3. What are the key features and elements of reinforcement learning?

· Learner is not told what action to take · Trial and error · Possibility of delayed rewards, give up short term for bigger long term reward · Explore and exploit · Agent interaction with uncertain environment

4. List the 3 types of reinforcement learning?

· Model based: learn from the model of the world, then plan using model · Value basd: learn the state or state action value, choose the best action based on value · Policy based: learn the stochastic policy function that maps state to action , exploration is baked in

What are the main features of the random forest method ?

· Random Forest is an ensemble of decision trees, runs efficiently on large datasets , takes large numbers of features

Define reinforcement learning with a diagram, then compare between reinforcement learning and supervised learning.

· Reinforcement learning is learning by experience and supervised learning is learning by example.

4. List advantages of K-nns.

· Simple technique that is easily implemented · Building model is inexpensive · Extremely flexible classification scheme: does not involve pre processing · Well suited for multi model classes , records with multiple class labels · Can sometimes be the best method

4. List, then define all possible methods of merging the clusters that depend on the distance measures?

· Single link: smallest distance between an element in one cluster and an element in the other · Complete link: largest distance between an element in one cluster and an element in the other · Average: the average distance between elements in one cluster and elements in the other. · Centroid distance: the distance between two clusters is represented by the distance between the means of the clusters.

How K-means algorithms work?

· Specify number of clusters K · Initialize centroid by first shuffling the dataset then randomly selecting K data points for the centroids with replacement · Keep iterating until there is no change to the centroid · Compute the sum of the squared distance between data points and all centroids · Assign each data point to the closest cluster · Compute the centroid for the clusters by taking the average of all data points that belong to each cluster

Illustrate the main tasks of pca process

· Subtract the mean from each of the data dimensions · All the x values have x subtracted and y values have y subtracted from them. · This produces a dataset whose mean is zero · Subtracting the mean makes variance and covariance calculation easier by simplifying their equation · The variation and covariance values are not affected by the mean value

List the four main machine learning types.

· Supervised , unsupervised , semi-supervised, reinforcement

List the difference between the variance and covariance.

· The difference is that the covariance refers to the measure of how two random variables will change together and is used to calculate the correlation between variables. The variance refers to the spread of the data set.

4. How could we drive new datasets through the pca process - step 5

· The final data is row feature vector x row zero means data · Row feature vector is the matrix with the eigenvectors in the columns transposed so that the eigenvectors are now in the rows, with the most significant eigenvector at the top · Row zero means data is the mean adjusted data transposed, the data items are in each column , with each row holding a separate dimension.

What are the two main steps of K-means Algorithms?

· The two main steps of K-means are 1. Assign 2.Optimize (cost function)

List types and architectures of RNNs,

· The types of RNN are one to one , one to many , many to one , many to many ARCHITECTURES: Feed forward neural network Simple recurrent neural network Fully connected recurrent neural networks Traditional RNN

5. What makes reinforcement learning different from other machine learning paradigms?

· There is no supervisor , only a reward signal · Feedback is delayed not instantaneous · Time really matters · Agents actions affect the subsequent data it receives

List the general types of autoencoders based on the size of the hidden layer?

· Under Complete auto encoders: have hidden layer size smaller than input layer size, dimension of embedded space lower than that of input space, cannot simply memorize training instances · Over complete auto encoders: larger hidden layers sizes, regularize to avoid overfitting, enforce a sparsity constraint

Supervised learning, semi-supervised and unsupervised learning for what kinds of applications can be used ? What is the difference between them in terms of input and output samples?

· Used for self learning, generative models , multiview algorithm , graph-based algorithms · Supervised learning: uses known and labed inputs, number of classes are known, reliable out · unsupervised learning: uses unknown dataset as input, number of classes are not known, moderately reliable output

How to classify an unknown instance (sample) using the k-nns?

· compute distance to other training instances · identify K nearest neighbors · Use class labels of nearest neighbors to determine the class label of the unknown instance

4. List with illustration the main 4 inference algorithms of hidden markov model

· he forward algorithm · The backward algorithm · The forward backward · The Viterbi algorithm

1. Define recommendation systems, why do we use recommender systems?

·A recommendation system helps match users with items. · RSs are software agents that elicit the interests and preferences of individual consumers [...] and make recommendations accordingly. They have the potential to support and improve the quality of the decisions' consumers make while searching for and selecting products online.

What are autoencoders?

·An autoencoder neural network is an unsupervised machine learning algorithm that applies backpropagation setting the target values to be equal to the inputs. Auto encoders are used to reduce the size of our inputs into smaller representation


Related study sets

Network Security Final (Version 1.0)

View Set

Lesson 9: Futures and Derivatives

View Set

COPY OF FOUND- HESI PREP QUESTIONS

View Set

Biology Ch. 42 Transport Cardiovas (MB)

View Set

Organizational Behavior Chapter 4, 5 and 6

View Set