MM Info Process

¡Supera tus tareas y exámenes ahora con Quizwiz!

define Semantic gap

different meanings and purposes behind the same syntax between two or more domains.

Description of Audi Media

- Speech - Music - Environment

What is overfitting in machine learning? A. When a model is too flexible and can fit any type of data B. When a model is too simple and fails to capture complex relationships in the data C. When a model is too complex and too closely fits the training data D. When a model is unable to generalize to new, unseen data

Answer: C. When a model is too complex and too closely fits the training data.

When should Stochastic Gradient Descent be used instead of traditional Gradient Descent? A. When the dataset is small and can fit into memory. B. When the cost function is convex. C. When the dataset is large and cannot fit into memory. D. When the cost function is linear.

Answer: C. When the dataset is large and cannot fit into memory.

T/F LDA analysis reveals the optimal subspace for separability

TRUE

T/F media retrieval an iterative process that involves feature extraction and categorization?

TRUE

What is the main goal of logistic regression? a) To classify data into two or more categories b) To predict continuous numerical values c) To cluster data points into groups d) To find correlations between input and output variables

Answer: a) To classify data into two or more categories

What is the purpose of constructing the inter-class scatter matrix SB in LDA? a) To find the feature subspace that optimizes class separability b) To reduce the number of dimensions in the dataset c) To standardize the dataset d) To build the intra-class scatter matrix SW

Answer: a) To find the feature subspace that optimizes class separability

What is the main goal of PCA (Principal Component Analysis)? a. To reduce the dimensionality of the data b. To increase the dimensionality of the data c. To add noise to the data d. To remove outliers from the data

Answer: a. To reduce the dimensionality of the data

What is the maximum depth of gradient boosting trees? a) 1 to 3 b) 3 to 6 c) 6 to 9 d) 9 to 12

Answer: b) 3 to 6

What is the main advantage of gradient boosting over other ensemble methods? a. It is easier to implement b. It can reduce both bias and variance c. It only uses a single weak learner d. It can handle high-dimensional datasets

Answer: b. It can reduce both bias and variance

Which of the following algorithms is not used to construct decision trees? a) ID3 b) C4.5 c) K-means d) CART

Answer: c) K-means

A self-driving car that learns to drive based on feedback from the environment (e.g., avoiding collisions, staying within lane boundaries) is an example of: a) Supervised learning b) Unsupervised learning c) Reinforcement learning

Answer: c) Reinforcement learning

What is the fourth step in Principal Component Analysis? a) Construct a project matrix W b) Transform c) Sort the eigenvalues in decreasing order d) Select k eigenvectors

Answer: c) Sort the eigenvalues in decreasing order

Which of the following is a disadvantage of KNN algorithm? a) It is sensitive to irrelevant features in the dataset b) It is sensitive to the choice of distance metric c) It is not suitable for imbalanced datasets d) All of the above

Answer: d) All of the above

What is the range of the output of the sigmoid function? a) (-1, 1) b) (0, 1) c) (-∞, ∞) d) [0, 1]

Answer: d) [0, 1]

True or False: Decision Trees require the data to be normalized before training the model. Answer:

False (Decision Trees do not require normalization)

How can mapping feature descriptions to description spaces help in visualizing or exploring multimedia resources?

Mapping feature descriptions to description spaces can help in visualizing or exploring multimedia resources by mapping them to a lower-dimensional space where each dimension corresponds to a specific aspect or property of the resource.

Iterative Classification

Refers to the iterative process of categorizing media based on their features or content

Feedback

Refers to the process of obtaining user feedback on the retrieved media resources, such as their relevance, quality, or usefulness, and using this feedback to refine the search results or improve the categorization filters. User feedback can be obtained through various methods, such as user ratings, comments, or click-through rates

opposite of Category deficiency

dimensionality

What is majority voting in the context of ensemble methods? A) Selecting the class label that has been predicted by the minority of classifiers. B) Selecting the class label that has been predicted by the majority of classifiers. C) Selecting the class label that has the highest probability. D) Selecting the class label that has the lowest probability.

Answer B) Selecting the class label that has been predicted by the majority of classifiers.

Can majority voting be used for multi-class settings? A) No, majority voting is only applicable to binary class settings. B) Yes, majority voting can be easily generalized to multi-class settings, which is known as plurality voting. C) Yes, majority voting can be used for multi-class settings, but it requires a different approach. D) No, plurality voting is only applicable to binary class settings.

Answer B) Yes, majority voting can be easily generalized to multi-class settings, which is known as plurality voting.

What is the alternative approach to feature selection for reducing the dimensionality of a dataset? A. Feature extraction B. Data compression C. Clustering D. Classification

Answer: A. Feature extraction

Which technique is used for supervised dimensionality reduction to maximize class separability? A. Linear discriminant analysis B. Principal component analysis C. Decision trees D. Neural networks

Answer: A. Linear discriminant analysis

What is bagging in ensemble learning? A) An ensemble learning technique that involves using the same training dataset to fit individual classifiers in the ensemble. B) An ensemble learning technique that involves drawing bootstrap samples from the initial training dataset to fit individual classifiers in the ensemble. C) An ensemble learning technique that involves combining the predictions of individual classifiers to make a final prediction. D) An ensemble learning technique that involves using only a single classifier to make predictions.

Answer: B) An ensemble learning technique that involves drawing bootstrap samples from the initial training dataset to fit individual classifiers in the ensemble.

What is the goal of ensemble methods? A) To create a single classifier that is better than all individual classifiers combined. B) To combine different classifiers into a meta-classifier with better generalization performance than each individual classifier alone. C) To reduce the size of the dataset by removing irrelevant features. D) To fine-tune hyperparameters for machine learning models.

Answer: B) To combine different classifiers into a meta-classifier with better generalization performance

What is the main idea behind gradient boosting? A) To train strong learners B) To train weak learners C) To reduce model variance D) To decrease model bias

Answer: B) To train weak learners

What is a bootstrap sample? A) A sample that is drawn randomly from the entire population. B) A sample that is drawn randomly without replacement from the entire population. C) A sample that is drawn randomly with replacement from the entire population. D) A sample that is drawn systematically from the entire population.

Answer: C) A sample that is drawn randomly with replacement from the entire population.

Why is KNN algorithm prone to overfitting? A) Because it uses distance calculations to determine nearest neighbors B) Because it assigns a class label based on the k nearest neighbors C) Because it can't handle high-dimensional feature spaces well, leading to sparseness D) Because it uses a fixed internal structure to make predictions

Answer: C) Because it can't handle high-dimensional feature spaces well, leading to sparseness. This is often referred to as the "curse of dimensionality" and can cause the KNN algorithm to become very prone to overfitting.

What is the main difference between the original boosting procedure and AdaBoost? A) The number of weak learners used in the ensemble B) The type of weak learners used in the ensemble C) The use of reweighted training examples in each iteration D) The use of majority voting to combine the weak learners

Answer: C) The use of reweighted training examples in each iteration

What is t-distributed stochastic neighbor embedding (t-SNE) used for? A. Data preprocessing B. Feature selection C. Nonlinear dimensionality reduction D. Data compression

Answer: C. Nonlinear dimensionality reduction

Which technique is used for unsupervised data compression? A. Linear discriminant analysis B. K-means clustering C. Principal component analysis D. Support vector machines

Answer: C. Principal component analysis

T/F: Decision Trees choose paths that have the smallest change in the information.

Answer: False (the correct statement is "Decision Trees choose paths that have the largest change in the information")

Which type of model creates independent model class predictors using training sets? a) Parametric models b) Non-parametric models

Answer: a) Parametric models

Which type of model relies on a fixed internal structure? a) Parametric models b) Non-parametric models

Answer: a) Parametric models

What is the fifth step in LDA? a) Sort the eigenvalues in decreasing order of eigenvalues s b) Build the intra-class scatter matrix SW c) Construct the d-dimensional mean vector d) Build the inter-class scatter matrix SB

Answer: a) Sort the eigenvalues in decreasing order of eigenvalues s

What is pruning in the context of decision trees? a) The process of removing nodes from the tree to prevent overfitting b) The process of adding new nodes to the tree to increase accuracy c) The process of replacing nodes in the tree with more complex models d) The process of weighting the data points to emphasize certain features

Answer: a) The process of removing nodes from the tree to prevent overfitting

What is the goal of successive halving in model fine-tuning? a) To eliminate underperforming models b) To reduce the number of features c) To increase the number of training samples d) To increase the number of model parameters

Answer: a) To eliminate underperforming models

What is the goal of a decision tree algorithm? a) To maximize accuracy on the training data b) To minimize error on the testing data c) To build the simplest model possible d) To split the data into the smallest possible groups

Answer: a) To maximize accuracy on the training data

How are bootstrap samples generated in bagging? a) By randomly selecting examples from the initial training dataset without replacement b) By randomly selecting examples from the initial training dataset with replacement c) By fitting multiple decision tree classifiers to the same training dataset d) By using random feature subsets when fitting decision tree classifiers

Answer: b) By randomly selecting examples from the initial training dataset with replacement

Decision Trees are based on which approach? a) Gradient descent b) Information-theoretic c) Reinforcement learning d) None of the above

Answer: b) Information-theoretic

Which type of model depends heavily on the training sets? a) Parametric models b) Non-parametric models

Answer: b) Non-parametric models

Which preprocessing technique tends to result in better classification results in certain cases, according to A.M. Martinez? a) LDA b) PCA c) Both LDA and PCA d) Neither LDA nor PCA

Answer: b) PCA

What is the first step in linear discriminant analysis (LDA)? a) Construct the d-dimensional mean vector b) Standardize the d-dimension dataset c) Build the inter-class scatter matrix SB d) Build the intra-class scatter matrix SW

Answer: b) Standardize the d-dimension dataset

What is the most common implementation of boosting? a) Gradient Boosting b) XGBoost c) Adaptive Boosting (AdaBoost) d) Stochastic Gradient Boosting

Answer: c) Adaptive Boosting (AdaBoost)

What is the difference between bagging and random forests? a) Bagging draws bootstrap samples from the initial training dataset, while random forests use the same training dataset to fit individual classifiers b) Bagging combines different decision tree classifiers, while random forests use a single decision tree classifier c) Bagging uses random feature subsets when fitting individual decision tree classifiers, while random forests do not d) Bagging and random forests are the same ensemble learning technique

Answer: c) Bagging uses random feature subsets when fitting individual decision tree classifiers, while random forests do not

What is the key concept behind boosting? a) Fitting a complex model to the training data b) Focusing on training examples that are easy to classify c) Focusing on training examples that are hard to classify d) Ensembling different types of models

Answer: c) Focusing on training examples that are hard to classify

What techniques were used for fine-tuning the model in this chapter? a) Learning and validation curves b) Confusion matrices c) Grid search, randomized search, and successive halving d) Performance metrics

Answer: c) Grid search, randomized search, and successive halving

What is a common problem in many real-world applications? a) Underfitting b) Overfitting c) Imbalanced data d) Learning algorithms

Answer: c) Imbalanced data

What is a key feature of the KNN algorithm? a) It is a parametric model b) It uses a decision boundary to classify data points c) It calculates distances between data points d) It is not prone to overfitting

Answer: c) It calculates distances between data points

What is the first step in the KNN algorithm? a) Assign a class label b) Determine the nearest neighbors c) Choose the distance metric d) Choose the number of neighbors (k)

Answer: d) Choose the number of neighbors (k)

What is the fifth step in Principal Component Analysis? a) Sort the eigenvalues in decreasing order b) Construct a project matrix W c) Decompose the covariance matrix into its eigenvectors and eigenvalues d) Select k eigenvectors

Answer: d) Select k eigenvectors

T/F the maximum number of leaf nodes in gradient boosting trees is 32 to 64

FALSE: the maximum number of leaf nodes in gradient boosting trees is 8 to 64

What is compression?

Feature description and compression are related concepts in multimedia information processing, but they are not exactly the same thing. Compression is the process of reducing the size of a file or data stream by removing redundant or irrelevant information, while preserving its essential features or content. Compression algorithms typically exploit patterns and structures in the data to reduce its redundancy, such as by replacing repeated patterns with shorter symbols, or by using statistical models to predict the next symbol based on the previous symbols. Compression is often used to reduce storage requirements or transmission bandwidth for multimedia data, such as images, videos, or audio clips. However, compression algorithms are typically lossy, meaning that some information may be lost or degraded in the compression process, especially at high compression ratios.

what is the goal of feature scaling?

The goal of feature scaling is to ensure that all the features have similar scales and ranges, so that they can be compared and weighted equally by machine learning algorithms.

Define flexibility

refers to the ability of a machine learning model to adapt to different types of data and capture complex relationships between the input features and the output labels. A flexible model is able to adjust its parameters to fit a wide range of data and is less prone to overfitting. In multimedia information processing, this can be important when working with diverse types of multimedia resources with varying features and characteristics.

Reinforcement Learning is a

type of machine learning where the algorithm learns by interacting with an environment and receiving feedback in the form of rewards or punishments. The goal of the algorithm is to learn a policy that maximizes the cumulative reward over time. Examples of reinforcement learning include game-playing agents, robotics, and recommendation systems.

What are weak learners in the context of boosting? a) Base classifiers with low bias b) Base classifiers with high variance c) Base classifiers with high bias d) Base classifiers with low variance

Answer: a) Base classifiers with low bias

What is the approach to building an ensemble of classifiers using the same base classification algorithm but fitting different subsets of the training dataset? A) Random forest algorithm B) Majority voting principle C) Plurality voting principle D) None of the above

Answer A) Random forest algorithm

In the boosting algorithm, what is the purpose of training, subsequent weak learners? A) To learn from correctly classified training examples B) To add more noise to the model C) To predict on the testing dataset D) To learn from misclassified training examples

Answer: D) To learn from misclassified training examples

True or False: Decision Trees are prone to overfitting if the tree is too deep and complex. Answer: True

Answer: True

T/F: Decision Trees can handle both categorical and numerical features.

Answer: True (Decision Trees can handle both types of features, but may require preprocessing or feature engineering to handle numerical data)

T/F: Decision Trees can build linear boundaries between classes.

Answer: True (but they can also build complex, non-linear boundaries)

What are Audio Dimensions (General)

- Loudness - Duration - Pitchs Rhythm - Timbre

Enemies

- Polysemy: duplicative interpretation - Semantic gap: context appreciation - Category deficiency - Dimensionality - Noise, missing data - Computational performance

Audio Dimensions (Music)

- Tempos - Rhythms - Melodys - Harmonys - Timbres - Instrumentation

How are training examples selected in the initial formulation of the boosting algorithm? A) Without replacement from the training dataset B) With replacement from the training dataset C) Randomly from the testing dataset D) Based on their class probabilities

Answer: A) Without replacement from the training dataset

What is the relationship between bagging and random forests? A) Bagging is a special case of random forests. B) Random forests are a special case of bagging. C) Bagging and random forests are completely unrelated. D) Bagging and random forests are two different names for the same technique.

Answer: B) Random forests are a special case of bagging.

What is the purpose of majority voting in bagging? A) To fit multiple classifiers on bootstrap samples. B) To combine the predictions of multiple classifiers into a final prediction. C) To select the best classifier from a set of classifiers. D) To evaluate the performance of the ensemble classifier.

Answer: B) To combine the predictions of multiple classifiers into a final prediction.

Which of the following statements is true about Stochastic Gradient Descent (SGD)? A. SGD updates the model parameters using the gradient of the cost function computed over the entire training dataset. B. SGD updates the model parameters using the gradient of the cost function computed over a subset of the training dataset. C. SGD updates the model parameters using the gradient of the cost function computed over the entire validation dataset. D. SGD updates the model parameters using the gradient of the cost function computed over a subset of the validation dataset.

Answer: B. SGD updates the model parameters using the gradient of the cost function computed over a subset of the training dataset.

What is the "attack" time in the context of sound processing? A. The time it takes for a sound signal to decay to silence B. The time it takes for a sound signal to reach its maximum amplitude C. The time it takes for a sound signal to change in frequency D. The time it takes for a sound signal to change in duration

Answer: B. The time it takes for a sound signal to reach its maximum amplitude.

In multimedia information processing, how are feature descriptions typically represented? A. As text descriptions B. As audio recordings C. As vectors or arrays D. As color histograms

Answer: C. As vectors or arrays

Which of the following methods of feature scaling scales the values of the features to a range between 0 and 1? A. Standardization B. Normalization C. Min-max scaling D. None of the above

Answer: C. Min-max scaling

Which of the following is a method of feature scaling? A. Principal Component Analysis (PCA) B. Support Vector Machine (SVM) C. Normalization D. Decision Tree

Answer: C. Normalization

What is a common method to address overfitting in a machine learning model? A. Increasing the complexity of the model B. Reducing the amount of training data C. Regularization techniques such as L1 and L2 regularization D. Using only a single model instead of an ensemble E. Stopping the training process after the model has already overfit the data

Answer: C. Regularization techniques such as L1 and L2 regularization can be used to penalize the model for having large parameter values, which can help to prevent overfitting.

What is the purpose of mapping feature descriptions to a lower-dimensional description space? A. To create high-dimensional representations of the multimedia resource B. To highlight the most irrelevant or noisy features of the multimedia resource C. To minimize the redundancy or noise in the feature descriptions D. To create binary representations of the multimedia resource

Answer: C. To minimize the redundancy or noise in the feature descriptions

What is the purpose of data compression in machine learning? A. To increase the dimensionality of a dataset B. To reduce the amount of data collected C. To summarize the information content of a dataset by transforming it onto a new feature subspace of lower dimensionality than the original one D. To make data analysis more complicated

Answer: C. To summarize the information content of a dataset by transforming it onto a new feature subspace of lower dimensionality than the original one

What is the main difference between bagging and majority voting? A) Bagging uses the same training dataset to fit individual classifiers in the ensemble, while majority voting draws bootstrap samples. B) Bagging is used for regression problems, while majority voting is used for classification problems. C) Bagging involves using a single classifier, while majority voting involves using multiple classifiers. D) Bagging involves combining the predictions of individual classifiers, while majority voting involves fitting individual classifiers on bootstrap samples.

Answer: D) Bagging involves combining the predictions of individual classifiers, while majority voting involves fitting individual classifiers on bootstrap samples.

Which of the following is an example of a deep learning model that is commonly trained using Stochastic Gradient Descent? A. Naive Bayes Classifier B. Random Forest C. Decision Tree D. Convolutional Neural Network

Answer: D. Convolutional Neural Network

T/F: Decision Trees are prone to underfitting due to their simplicity.

Answer: False (the correct statement is "Decision Trees are prone to overfitting due to their ability to build extremely complex boundaries")

True or False: Decision Trees can handle both categorical and numerical data.

Answer: True

How can we address underfitting? a) By increasing the complexity of the model or adding more features. b) By using less training data or reducing the amount of regularization. c) By decreasing the complexity of the model or removing features. d) By using more testing data or increasing the amount of regularization.

Answer: a) By increasing the complexity of the model or adding more features

What is the third step in Principal Component Analysis? a) Decompose the covariance matrix into its eigenvectors and eigenvalues b) Sort the eigenvalues in decreasing order c) Transform d) Select k eigenvectors

Answer: a) Decompose the covariance matrix into its eigenvectors and eigenvalues

What is the main difference between PCA and LDA? a) PCA is an unsupervised algorithm while LDA is supervised b) LDA is an unsupervised algorithm while PCA is supervised c) PCA and LDA both aim to optimize class separability d) PCA and LDA both aim to find the orthogonal component axes of maximum variance

Answer: a) PCA is an unsupervised algorithm while LDA is supervised

Which function is used in logistic regression to map the output of the linear model to a probability value between 0 and 1? a) Sigmoid function b) Rectified Linear Unit (ReLU) function c) Hyperbolic Tangent (tanh) function d) Softmax function

Answer: a) Sigmoid function

A spam filter that learns from a labeled dataset of emails that are either spam or not spam is an example of: a) Supervised learning b) Unsupervised learning c) Reinforcement learning

Answer: a) Supervised learning

An image recognition algorithm that learns to classify objects in images based on labeled training data is an example of: a) Supervised learning b) Unsupervised learning c) Reinforcement learning

Answer: a) Supervised learning

Which of the following is a sign of overfitting in a machine learning model? a) The model has high accuracy on the training data and low accuracy on the test data b) The model has low accuracy on the training data and high accuracy on the test data c) The model has high accuracy on both the training and test data d) The model has low accuracy on both the training and test data

Answer: a) The model has high accuracy on the training data and low accuracy on the test data

How is the first principal component determined in PCA (Principal Component Analysis)? a. By finding the direction of maximum variance in the data b. By finding the direction of minimum variance in the data c. By finding the direction of zero variance in the data d. By randomly selecting a direction

Answer: a. By finding the direction of maximum variance in the data

What does PCA project the data into? a) A higher dimensional subspace b) A lower dimensional subspace c) A subspace of equal dimensionality d) A random subspace

Answer: b) A lower dimensional subspace

What is the main difference between AdaBoost and gradient boosting? a) AdaBoost uses deeper decision trees than gradient boosting b) AdaBoost uses the prediction errors to assign sample weights while gradient boosting uses them directly to form the target variable for fitting the next tree c) AdaBoost uses a global learning rate while gradient boosting has an individual weighting term for each tree d) AdaBoost does not use the prediction errors for assigning sample weights while gradient boosting does

Answer: b) AdaBoost uses the prediction errors to assign sample weights while gradient boosting uses them directly to form the target variable for fitting the next tree

What is bagging? a) An ensemble learning technique that uses the same training dataset to fit individual classifiers b) An ensemble learning technique that draws bootstrap samples from the initial training dataset to fit individual classifiers c) An ensemble learning technique that uses a single decision tree classifier d) An ensemble learning technique that combines different decision tree classifiers

Answer: b) An ensemble learning technique that draws bootstrap samples from the initial training dataset to fit individual classifiers

What is PCA? a) A supervised learning algorithm b) An unsupervised learning algorithm c) A reinforcement learning algorithm d) A clustering algorithm

Answer: b) An unsupervised learning algorithm

Which ensemble learning technique involves training weak learners that subsequently learn from mistakes? a) Bagging b) Boosting c) Random Forest d) Decision Trees

Answer: b) Boosting

What is the second step in Principal Component Analysis? a) Select k eigenvectors b) Build the Covariance matrix c) Construct a project matrix W d) Transform

Answer: b) Build the Covariance matrix

How does a decision tree handle missing values? a) By ignoring the missing values b) By imputing the missing values with the mean or mode c) By randomly assigning a value to the missing data point d) By creating a new branch for each missing value

Answer: b) By imputing the missing values with the mean or mode

What is the difference between PCA and LDA? a) PCA is a supervised algorithm, whereas LDA is unsupervised. b) PCA attempts to find the orthogonal component axes of maximum variance in a dataset, whereas LDA attempts to find the feature subspace that optimizes class separability. c) PCA is used to increase computational efficiency, whereas LDA is used to reduce overfitting. d) PCA and LDA are identical techniques for feature extraction.

Answer: b) PCA attempts to find the orthogonal component axes of maximum variance in a dataset, whereas LDA attempts to find the feature subspace that optimizes class separability.

What does the analysis of PCA reveal? a) The direction of low variance b) The direction of high variance c) The direction of negative variance d) The direction of zero variance

Answer: b) The direction of high variance

What is the main cause of overfitting in a machine learning model? a) The model is too simple b) The model is too complex c ) The training data is too small d) The optimization algorithm is too slow

Answer: b) The model is too complex

What is the curse of dimensionality? a) The tendency for models to become too simple and underfit the data b) The tendency for models to become too complex and overfit the data c) The tendency for high-dimensional data to become sparse and difficult to analyze d) The tendency for low-dimensional data to become noisy and difficult to analyze

Answer: b) The tendency for models to become too complex and overfit the data

Which of the following is a disadvantage of parametric models? a) They can be computationally expensive b) They are prone to overfitting c) They require a large amount of training data

Answer: b) They are prone to overfitting

What is the purpose of ensemble learning techniques? a) To highlight individual model strengths b) To cancel out individual model weaknesses c) To increase the training time of models d) To decrease the performance of models

Answer: b) To cancel out individual model weaknesses

What is the purpose of majority voting in bagging? a) To select the best classifier from the ensemble b) To combine predictions from the individual classifiers in the ensemble c) To remove noise from the training dataset d) To reduce the variance of the individual classifiers

Answer: b) To combine predictions from the individual classifiers in the ensemble

What is k-fold cross-validation used for? a) To chain different transformation techniques and classifiers b) To diagnose common problems of learning algorithms c) To evaluate and optimize a model's performance d) To deal with imbalanced data

Answer: b) To diagnose common problems of learning algorithms

What is the main goal of LDA (Linear Discriminant Analysis)? a) To find the orthogonal component axes of maximum variance in a dataset b) To find the feature subspace that optimizes class separability c) To increase computational efficiency d) To reduce the degree of underfitting

Answer: b) To find the feature subspace that optimizes class separability

What is the final step in Principal Component Analysis? a) Construct a project matrix W b) Transform c) Decompose the covariance matrix into its eigenvectors and eigenvalues d) Select k eigenvectors

Answer: b) Transform

A clustering algorithm that groups similar customer segments based on their purchasing behavior is an example of: a) Supervised learning b) Unsupervised learning c) Reinforcement learning

Answer: b) Unsupervised learning

An anomaly detection algorithm that learns to detect unusual patterns in a dataset without prior knowledge of what constitutes an anomaly is an example of: a) Supervised learning b) Unsupervised learning c) Reinforcement learning

Answer: b) Unsupervised learning

What is the effect of increasing the number of principal components in PCA(Principal Component Analysis)? a. It increases the dimensionality of the data b. It decreases the dimensionality of the data c. It has no effect on the dimensionality of the data d. It increases the amount of noise in the data

Answer: b. It decreases the dimensionality of the data

How does gradient boosting work? a. It trains weak learners on random subsets of the training dataset b. It trains weak learners on all the training dataset c. It uses random subsets of the training dataset to create an ensemble d. It combines the predictions of multiple weak learners using majority voting

Answer: b. It trains weak learners on all the training dataset

Which type of machine learning is used for anomaly detection? a. Supervised Learning b. Unsupervised Learning c. Reinforcement Learning

Answer: b. Unsupervised Learning

What is the first step in Principal Component Analysis? a) Decompose the covariance matrix into its eigenvectors and eigenvalues b) Sort the eigenvalues in decreasing order c) Standardize the d-dimension dataset d) Select k eigenvectors

Answer: c) Standardize the d-dimension dataset

In decision trees, what are the datasets organized into groups based on? a) The number of features b) The class labels c) The entropy/information gain d) The accuracy of the model

Answer: c) The entropy/information gain

What is the result of under-fitting on the training and test error? a) Both the training and test error are low. b) Both the training and test error are high. c) The training error is low and the test error is high. d) The training error is high and the test error is low.

Answer: c) The training error is low and the test error is high.

Which of the following is a disadvantage of non-parametric models? a) They cannot handle high-dimensional data b) They require a large amount of training data c) They can be computationally expensive

Answer: c) They can be computationally expensive

What is the purpose of the global learning rate in gradient boosting? a) To assign weights to individual decision trees b) To control the maximum depth of the decision trees c) To adjust the contribution of each decision tree to the final ensemble d) To compute sample weights in each round

Answer: c) To adjust the contribution of each decision tree to the final ensemble

What is the purpose of using confusion matrices in model evaluation? a) To chain different transformation techniques and classifiers b) To diagnose common problems of learning algorithms c) To evaluate and optimize a model's performance d) To visualize data

Answer: c) To evaluate and optimize a model's performance

What type of transformation does PCA perform? a) Non-linear transformation b) Supervised linear transformation c) Unsupervised linear transformation d) Clustering transformation

Answer: c) Unsupervised linear transformation

What is gradient boosting? a. A variant of decision tree b. A variant of random forest c. A variant of boosting d. A variant of bagging

Answer: c. A variant of boosting

Which popular machine learning algorithm is based on gradient boosting? a. Logistic Regression b. K-Nearest Neighbors c. XGBoost d. Support Vector Machines

Answer: c. XGBoost

Why is KNN prone to overfitting? a) It uses a decision boundary to classify data points b) It is a parametric model c) It doesn't use a distance metric d) The feature space can become sparse as the number of dimensions increases

Answer: d) The feature space can become sparse as the number of dimensions increases.

What is the purpose of bagging in ensemble learning? a) To increase the bias of a model b) To increase the variance of a model c) To reduce the bias of a model d) To reduce the variance of a model

Answer: d) To reduce the variance of a mod

What is the last step in LDA? a) Decompose the matrix S−1SB into its eigenvectors and eigenvalues W b) Select k eigenvectors c) Construct a project matrix W d) Transform

Answer: d) Transform

What is the main cause of underfitting? a) Using a model that is too complex or has too many parameters. b) Using too much training data. c) Applying too little regularization. d) Using a model that is too simple or has too few parameters.

Answer: d) Using a model that is too simple or has too few parameters.

Which of the following can help prevent overfitting in a machine learning model? a) Using a small number of features b) Using a large number of features c) Using only one layer in a neural network d) Using dropout regularization

Answer: d) Using dropout regularization

What is the benefit of using PCA (Principal Component Analysis)? a. It can help with data visualization b. It can reduce the amount of noise in the data c. It can help with feature selection d. All of the above

Answer: d. All of the above

What are some benefits of mapping feature descriptions to lower-dimensional description spaces?

Benefits of mapping feature descriptions to lower-dimensional description spaces include efficient and effective comparisons and classifications of multimedia resources.

What is categorization?

refers to the process of assigning multimedia resources to predefined categories or classes based on their feature descriptions. This is typically done using machine learning algorithms that learn to map the feature vectors of multimedia resources to corresponding class labels.

types of data and media

- Audio - Videos - Bio - Time series - Text

What are the steps for media processing

- Feature Extraction - Descriptions - Information Filtering - Categorization - Classification

Why is Computational performance an enemy?

can you get enough computing power to actually do the ML job

True or False: SVM is a linear classifier that finds the hyperplane that maximizes the margin between two classes.

Answer: True, SVM is a linear classifier that finds the hyperplane that maximizes the margin between two classes.

How can mapping feature descriptions to description spaces help in comparing or classifying multimedia resources?

Mapping feature descriptions to description spaces can help in comparing or classifying multimedia resources by focusing on relevant or discriminative features and minimizing the impact of irrelevant or noisy features.

Which of the following statements is true about gradient descent? A. Gradient descent is used to maximize a function. B. Gradient descent always guarantees to find the global minimum of a function. C. Gradient descent works by iteratively adjusting the values of the parameters in the direction of the negative gradient of the cost function. D. Gradient descent is only used in linear regression.

The correct answer is C. Gradient descent works by iteratively adjusting the values of the parameters in the direction of the negative gradient of the cost function.

T/F Video is images with time motion

TRUE video is images with time motion

email spam filtering is what type of ML

Supervised

Which of the following is a desirable property of a machine learning model? A. High flexibility B. High overfitting C. Low generalization D. Low bias

Answer: A. High flexibility.

What are signal statistics in the context of sound processing? A. Measures of the distribution of values within a sound signal B. Measures of the frequency content of a sound signal C. Measures of the duration of a sound signal D. Measures of the phase relationship between different frequency components

Answer: A. Measures of the distribution of values within a sound signal.

Which of the following is an advantage of Stochastic Gradient Descent over traditional Gradient Descent? A. SGD is more computationally efficient. B. SGD always finds the global minimum of the cost function. C. SGD is less prone to overfitting. D. SGD requires less data preprocessing.

Answer: A. SGD is more computationally efficient.

What is short-term energy in the context of sound processing? A. A measure of the frequency content of a sound signal B. A measure of the amplitude of a sound signal within a short time window C. A measure of the duration of a sound signal D. A measure of the phase relationship between different frequency components

Answer: B. A measure of the amplitude of a sound signal within a short time window.

Which of the following is an advantage of feature scaling? A. It can eliminate outliers in the dataset B. It can make the dataset more interpretable C. It can improve the convergence speed of the machine learning algorithm D. It can reduce the dimensionality of the dataset

Answer: C. It can improve the convergence speed of the machine learning algorithm

Which of the following is a disadvantage of Stochastic Gradient Descent? A. SGD is more prone to overfitting than traditional Gradient Descent. B. SGD converges slower than traditional Gradient Descent. C. SGD can be more noisy than traditional Gradient Descent. D. SGD requires more data preprocessing than traditional Gradient Descent.

Answer: C. SGD can be more noisy than traditional Gradient Descent.

Why do we need to use feature scaling in machine learning? A. To ensure that all the features have the same values B. To make the dataset more balanced C. To normalize the range of values of the features D. To increase the accuracy of the model

Answer: C. To normalize the range of values of the features

True or False: In Kernel SVM, the choice of kernel function has a significant impact on the performance of the model.

Answer: True, In Kernel SVM, the choice of kernel function has a significant impact on the performance of the model

What is meant by the term "iterative process" in the context of media retrieval, and how does feature extraction and categorization contribute to this process?

Iterative process refers to a cycle of repeated steps where the output of each cycle is used as the input for the next cycle, with the goal of refining the results over time. In the context of media retrieval, the iterative process involves breaking down a media resource into its constituent features, such as colors, shapes, textures, audio signals, etc., and categorizing the media based on these features using a set of predefined filters or classifiers. During the process, the initial set of filters or classifiers may not be accurate enough to fully capture the relevant features of the media resource, so the process may need to be repeated multiple times, with each iteration refining the filters and classifiers to achieve better results. The process may also involve user feedback to improve the categorization and ensure that the retrieved media meets the user's needs and preferences.

Compare and contract compression and feature description:

While feature description and compression share some similarities in terms of reducing the size of multimedia data, they differ in their objectives and techniques. Compression aims to reduce the size of the data while preserving its essential content, while feature description aims to represent the data in a more compact and uniform form by extracting its essential features or characteristics.

what is the main difference between the Adaline and perception?

the main difference between the Adaline and perception is the activation function

What is the purpose of the kernel function in Kernel SVM? a) To transform the data into a higher-dimensional space b) To reduce the dimensionality of the data c) To add noise to the data d) To normalize the data

Answer: a) To transform the data into a higher-dimensional space

How do you choose the optimal value of K in KNN? a) Use cross-validation to evaluate the performance of the model for different values of K b) Choose K based on the size of the training set c) Always use K=1 for optimal performance d) Choose K based on the dimensionality of the dataset

Answer: a) Use cross-validation to evaluate the performance of the model for different values of K

What is the main difference between Supervised and Unsupervised Learning? a. Supervised Learning uses labeled training data, while Unsupervised Learning uses unlabelled data. b. Supervised Learning involves making predictions on new data, while Unsupervised Learning involves identifying patterns in the data. c. Supervised Learning is used for image classification, while Unsupervised Learning is used for speech recognition.

Answer: a. Supervised Learning uses labeled training data, while Unsupervised Learning uses unlabelled data.

What is underfitting? a) A machine learning model that is too complex and overfits the training data. b) A machine learning model that is too simple and unable to capture the underlying patterns in the data. c) A machine learning model that has high variance and performs poorly on the training data. d) A machine learning model that has low bias and high variance.

Answer: b) A machine learning model that is too simple and unable to capture the underlying patterns in the data.

What is overfitting in machine learning? a) When a model is too simple and cannot capture the complexity of the data b) When a model is too complex and fits the training data too closely c) When a model is trained on a small dataset d) When a model is trained for too long

Answer: b) When a model is too complex and fits the training data too closely

What is the key feature used to estimate the fundamental frequency in the Autocorrelation approach? a. Peaks in the spectral envelope of the signal b. Peaks in the autocorrelation function of the signal c. Peaks in the pitch histogram of the signal

Answer: b. Peaks in the autocorrelation function of the signal

What is the goal of the Pitch Histogram approach? a. To compute the autocorrelation function of an audio signal b. To estimate the fundamental frequency of an audio signal c. To model an audio signal as a linear combination of its past values

Answer: b. To estimate the fundamental frequency of an audio signal

Can you give an example of how mapping a feature description to a description space can help in visualizing or exploring a multimedia resource?

Mapping a feature description to a description space for an image, for example, can help in visualizing or exploring the image by highlighting the most relevant or distinctive features of the image, such as its color composition or texture patterns.

Feature extraction is the process of identifying and extracting key features or characteristics from a dataset. (True)

TRUE. Feature extraction is the process of identifying and extracting key features or characteristics from a dataset.

media Features

- Media Features - Smoothness - Periodicity - Harmonics (Fourier analysis) - Symmetry

How are multimedia resources typically assigned to categories in the categorization process? A. Through a process known as clustering B. By mapping feature vectors to corresponding class labels C. By compressing feature vectors to reduce their dimensionality D. By extracting relevant features from multimedia resources and discarding irrelevant ones

Answer: B. By mapping feature vectors to corresponding class labels.

Which of the following is a potential drawback of feature scaling? A. It can introduce noise into the dataset B. It can make the dataset harder to interpret C. It can cause overfitting of the machine learning algorithm D. It can be computationally expensive for large datasets

Answer: B. It can make the dataset harder to interpret

What is the purpose of categorization in multimedia information processing? A. To create high-dimensional representations of multimedia resources B. To minimize the redundancy or noise in feature descriptions C. To assign multimedia resources to predefined categories based on their feature descriptions D. To create real-valued representations of multimedia resources

Answer: C. To assign multimedia resources to predefined categories based on their feature descriptions.

What does K in KNN algorithm refer to? a) The number of neighbors used to predict the class of a test sample b) The distance metric used to measure the similarity between samples c) The type of decision boundary used to classify samples d) The size of the training set

Answer: a) The number of neighbors used to predict the class of a test sample

What is the most commonly used loss function in logistic regression? a) Mean Squared Error (MSE) loss b) Mean Absolute Error (MAE) loss c) Binary Cross-Entropy loss d) Categorical Cross-Entropy loss

Answer: c) Binary Cross-Entropy loss

Which of the following is a common method to prevent overfitting in machine learning? a) Adding more layers to the neural network b) Increasing the learning rate of the optimization algorithm c) Decreasing the regularization parameter d) Adding more training data

Answer: c) Decreasing the regularization parameter

Which of the following is NOT an advantage of Support Vector Machines (SVMs)? a) Effective in high-dimensional spaces b) Memory efficient c) Effective when the number of features is greater than the number of samples d) Can handle non-linear decision boundaries

Answer: c) Effective when the number of features is greater than the number of samples

Which of the following is NOT a distance metric commonly used in KNN? a) Euclidean distance b) Manhattan distance c) Hamming distance d) Cosine similarity

Answer: c) Hamming distance (Hamming distance is used for comparing strings of equal length)

What is overfitting also known as? a) High-bias model b) Low-variance model c) High-variance model d) Low-bias model

Answer: c) High-variance model

In logistic regression, what is the role of the optimization algorithm (such as gradient descent or stochastic gradient descent)? a) To minimize the accuracy of the model b) To maximize the loss function c) To find the optimal parameters that minimize the loss function d) To increase the number of features in the model

Answer: c) To find the optimal parameters that minimize the loss function

Which type of machine learning involves learning a policy that maximizes cumulative reward over time? a. Supervised Learning b. Unsupervised Learning c. Reinforcement Learning

Answer: c. Reinforcement Learning

What is the main purpose of Linear Predictive Coding (LPC)? a. To compute a histogram of the pitch values within an audio signal b. To estimate the overall loudness of an audio signal c. To model an audio signal as a linear combination of its past values and estimate its spectral envelope

Answer: c. To model an audio signal as a linear combination of its past values and estimate its spectral envelope

Which of the following algorithms can be used to build Decision Trees? a) CART (Classification and Regression Trees) b) ID3 (Iterative Dichotomiser 3) c) C4.5 (Successor of ID3) d) All of the above

Answer: d) All of the above

Which of the following is an advantage of KNN algorithm? a) It is computationally efficient for large datasets b) It can handle missing values in the dataset c) It can capture complex nonlinear relationships between features d) It is simple to implement and interpret

Answer: d) It is simple to implement and interpret

What is feature description?

Is the process of representing multimedia data in a more compact and uniform form by extracting its essential features or characteristics, while discarding redundant or irrelevant information. Feature description algorithms typically use statistical or analytical techniques to extract relevant features from the data, such as color histograms, texture patterns, or audio spectra, and represent them as numerical vectors or arrays. Can be used for various tasks in multimedia information processing, such as classification, clustering, or retrieval, and is often used in conjunction with compression algorithms to further reduce the size of the data without losing essential information.

Filtering

Refers to the process of selecting or prioritizing the retrieved media resources based on their relevance, quality, or other criteria. Filtering can be done manually by the user or automatically using predefined filters or classifiers based on the media features or user feedback.

T / F: Feature description reduces binary blobs to uniform descriptions

TRUE: In multimedia information processing, feature description is the process of extracting relevant features or characteristics from a media resource, such as an image, video, or audio clip, in order to represent it in a more compact and uniform form. This is often done using feature extraction algorithms that analyze the media content and identify relevant patterns or structures, such as color histograms, texture patterns, or audio spectra. The resulting features are typically represented as numerical vectors or arrays, which can be used for further processing and analysis, such as classification, clustering, or retrieval. However, these feature vectors can be quite large and complex, especially for high-dimensional or multi-modal media data, such as videos or audio recordings, making them difficult to compare or manipulate directly. To overcome this issue, feature description aims to reduce the complexity and dimensionality of the feature vectors by transforming them into more uniform and compact representations that capture the most relevant information while discarding redundant or irrelevant details. This transformation is often done using techniques such as dimensionality reduction, feature selection, or data normalization. The resulting "uniform descriptions" are typically much smaller and simpler than the original feature vectors, making them easier to compare, manipulate, and store. They can also be used to represent media resources in a more standardized and interoperable format, enabling efficient and effective sharing and exchange of multimedia data across different platforms and applications. In summary, feature description reduces binary blobs, or complex and high-dimensional feature vectors, to uniform descriptions, or simpler and more compact representations that capture the most relevant information about a media resource.

What is the relationship between the dimensions of a description space and the properties or aspects of a multimedia resource?

The dimensions of a description space correspond to the different properties or aspects of a multimedia resource that are used to describe it.

what is feature extraction?

is the process of identifying and extracting key features or characteristics from a dataset. In multimedia information processing, feature extraction involves analyzing media such as images, audio, or video to identify patterns, structures, or attributes that are relevant to a specific task, such as object recognition, speech recognition, or emotion detection. Feature extraction typically involves a combination of signal processing techniques, such as filtering and transformation, as well as machine learning algorithms to identify and extract the most relevant features. The resulting features are usually represented in a numerical format, such as a vector or a matrix, that can be used as input for further analysis or classification.

Supervised Learning is a type of

machine learning where the algorithm is provided with labeled training data, where each example is associated with a known output value. The algorithm then learns to map inputs to outputs, allowing it to make predictions on new, unseen data. Examples of supervised learning include image classification, speech recognition, and regression analysis.

Define Overfitting

occurs when a machine learning model is too complex or too closely fits the training data, which can lead to poor performance when applied to new, unseen data. This happens when the model is too specialized to the training data and fails to generalize well to new data. In multimedia information processing, this can occur when the model is trained on a limited dataset that doesn't fully capture the variability in the multimedia resources.

Unsupervised Learning is a type

of machine learning where the algorithm is given unlabelled data, and it must identify patterns or structure in the data without prior knowledge of the output. Examples of unsupervised learning include clustering, anomaly detection, and dimensionality reduction.

Querying

retrieve media resources based on their metadata, such as title, author, date, or keywords, or based on their content features, such as color, shape, texture, or audio signals.


Conjuntos de estudio relacionados

Logical reasoning and rationality experiments

View Set

My QA: BDD, Cucumber, Gherkin, TestNG, Junit

View Set

Practice Test 2- Medical Terminology

View Set

Los Mandatos de la Clase in the "Ustedes" (plural "you" form) (given to a group, 2 or more people)

View Set