3. Feature engineering

Ace your homework & exams now with Quizwiz!

What is the goal of feature engineering?

- Convert unstructured data into input to learning algorithm - Expose the structure of the concept to the learning algorithm - work well with the structure of the model - balance the number of features, complexity of concept, complexity of model, and the amount of data

Four types of supervised feature selection

- Filter - Wrapper - Embedded - Hybrid

benefits of feature engineering

- Improving the accuracy of ML model - Solving overfitting problem - speed up computation - understandability for ML procss

What are the 2 ways of feature extraction?

- Manual feature extraction - Automated feature extraction

Compare between filter and wrapper method.

- Wrapper methods are computationally more expensive than filter methods due to the repeated learning steps and cross-validation - However, wrapper methods are more accurate than the filter method

The objective of variable selection is three fold

- improving the prediction performance of predictors - providing faster and more cost effective predictors - providing a better understaning of the underlaying process that generated the data

To keep 'relevant features only', we remove the features that are?

- non informative - non discriminative - redundant

Supervised feature selection before training model

- statistical method: removing features with low variance - Filter method: Univariant feature selection

Before applying the wrapper feature selection, we must specify?

- what model type and which learning algorithms will be used - how to evaluate model accuracy

Supervised feature selection while training model

- wrapper method: recursive feature elimination - embedded method: L1 based feature selection

Filter supervised feature selection methodology

1. create groups of the features per different criteria 2. create benchmark for each group 3. test correlation inside the group against benchmark 4. keep only features less correlated to each other than to group benchmark

Backward elimination technique begins the process by considering (BLANK) of the features and removes the least significant feature

All

Feature

An individual measurable property or characteristic of a phenomenon being observed

Exhaustive feature selection of the best feature selection methods, which evaluates each feature set as (BLANK)

Brute-force

How can Information gain be used in feature selection?

Calculating the information gain of each variable with respect to the target variable

Chi-square test is a techniquw to determine the relationship between (BLANK) variables

Categorical

The performance of the wrapper method depends on the (BLANK)

Classifier

Features should be (BLANK) with the target but (BLANK) among themselves

Correlated, uncorrelated

If the correlation coefficient crosses a certain threshold value, we can?

Drop one of the features

Forward selection is an iterative process, which begins with a/n (BLANK) set of features

Empty

In wrapper supervised feature selection, a predictive model is used to (BLANK) and (BLANK)

Evaluate a combination of features, assign model performance scores

T or F: Wrapper method is less expensive than filter

False. Wrapper is computationally expensive

(BLANK) is the science and art of extracting information from raw data

Feature engineering

(BLANK) yields better results than applying ML directly to the raw data

Feature extraction

(BLANK) returns the rank of the variable on the (BLANK) criteria in descending order. Then we can select the variables with the largest score

Fisher's score, fisher's criteria

Recursive feature elimination is a recursive (BLANK) approach, where features are selected by recursively taking smaller and smaller subset of features.

Greedy optimization

Wrapper method is not recommended on (BLANK) number of features

High

(BLANK) determines the reduction in entropy while transforming the dataset

Information gain

The aim of feature selection is (BLANK)

Maximize relevance and maximize redundancy

(BLANK) is one example of wrapper and embedded feature selection

Random forest

What is a drawback of the low variance filter?

Relationships between feature or feature and target variables are not taken into account

Feature engineering is a (BLANK) problem

Representation

The wrapper methodology considers the (BLANK) sets as a search problem, where different combinations are prepared, evaluated, and compared to other combinations

Selection of feature

High correlation between two features means?

They have similar trends and are likely to carry similar information

In missing value ratio, a predefined (BLANK) may be defined. In case of low missing values, the (BLANK) technique may need to be applied

Threshold, imputation

Manual features extraction can be impractical for huge datasets, and may need a good understanding of the background or domain. is the sentence true or false?

True

T or F: filter supervised feature selection does not depend on the learning algorithm

True

T or F: the best subset of features is selected based on the results of the classifier.

True

T or F: too many variables lead to slow computation which in turn requires more memory and hardware.

True

T or F: wrapper method perform better than filter.

True

Feature selection is also called (BLANK) or (BLANK) or (BLANK)

Variable selection, attribute selection, dimensionality reduction

embedded feature selction performs better than wrapper and filter. Why?

because it has a collective decision

The classifier's performance usually will (BLANK) for a large number of features

degrade

the required number of samples to achieve the same accuract grows (BLANK) with the number of variables

exponancialy

In the embedded method, there are ensemble learning and (BLANK) learning methods for feature selection

hybrid

Manual feature extraction requires (BLANK) and (BLANK) the features that are relevant for a given problem and implementing a way to extract those features

identifiying and describing

removing redundant dara variables helps to (BLANK)

improve accuracy

As a dimentionality reduction technique, feature selection aims to choose a small subset of the relevant from the original features by removing (BLANK)

irrelevant, redundant, or noisy features

Feature selection can lead to better ?

learning performance, higher learning accuracy, lower computational cost, and better model interpretability

embedded feature selection is computationally (BLANK) than wrapper methods. However, this method has a drawbacj specific to a learning model

less intensive

Missing value ratio removes the features which have high ratio of (BLANK)

missing values

What is the advantage of the filter method?

needs low computational time and does not overfit the data

Too many variables might result to (BLANK) which means model is not able to generalize the pattern

overfitting

Inclusion of a relevant variable has a (BLANK) affect on model accuracy

positive

the main priority in hybrid feature selection

select the methods, then follow their processes

Automated feature extraction uses (BLANK) or (BLANK) to extract features automatically from signals or images without the need for human intervention

specialized algorithms or deep networks

In filter method, features are selected using (BLANK)

statistics measures

Feature extraction

the process of transforming raw data into numerical features that can be processed whil perserving the information in the original dataset

3. Feature engineering

Related study sets

Uworld 2

Anatomy and Physiology Skeletal System Anatomy Unit Test Review

Chapter 8

SIT104 Quiz

6.7 APWH

310

MKTG EXAM 4 (CH 14,15,20,4)

PSY Final Exam

SCM 371 EXAM 4

NURS 247--PrepU Questions

FR2 Ch. 17

Chap 58-Assessment and Management of Pts. with Breast Disorders

Chapter 11- Romantic Relationships

Econ 102-4001 Chapter 1 & 2 Exam

Chapter 13 Risk, Cost of Capital, Valuation

Quantitative Analysis II review of all the assignment (chap1-5)

Unit 1B Vocabulary - General Psych

History Unit 7 assignment

Stress Management Exam 2

Mathematics, A-level Statistics, Advantages and disadvantages of sampling types