Chapter 4

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Explain Linear Discriminant Analysis (LDA)?

- We're interested maximizing the separability between the two groups so we can make the best decisions. LDA is like PCA, but it focuses on maximizing the separability among known categories. Example : We have got a cancer drug - It works great for some people - but it makes other people feel worse. How do we decide who to give the drug to? may be gene expression will help us decide. LDA uses the information from both genes and creates a new axis and projects the data on to it in a way to maximize the separability of the two categories.

What is scree plot?

A scree plot is a graphical representation of variation that each PC accounts for.

What is Eigenvector?

An eigenvector is a vector whose direction remains unchanged when a linear transformation is applied to it. This 1 unit long vector, consisting of 0.97 parts of Gene 1 and 0.242 parts of Gene 2, is called the "Singular Vector" or the "Eigen Vector" for PC1. Proportions of Genes is called Loading Scores. Sum of Squared distances for Best fit line is also called eigenvalue for PC1.

What does Axes mean in Python?

Axes are defined for arrays with more than one dimension. A 2- Dimensional array has two corresponding axes. The first running vertically downwards across rows (axis = 0) and the second one running horizontally across columns (axis = 1).

what is PC1

PC1 is a linear combination of variables

Differences between PCA and LDA

Both PCA and LDA are linear transformation techniques. PCA is Unsupervised learning Algorithm and LDA is supervised Algorithm. PCA is a technique that finds the directions of maximum variance. Note: PCA does not select set of features and discard other features, but it infers some new features, which best describe the type of class from the existing features. LDA attempts to find a feature subspace that maximizes class separability. Linear Discriminant analysis is a supervised algorithm as it takes class label in to consideration. It is way of dimensionality reduction while at the same time preserving as much of the class discrimination information as possible. LDA helps you to define boundaries around cluster of classes. PCA performs better in case where number of samples per class is less. Whereas LDA works better with large dataset having multiple classes; class separability is an important factor while reducing dimensionality

What are similiarities in PCA and LDA

Both rank the new axis in order of importance. - PC1 (the first new axis that PCA creates) accounts for the most variation in the data. - LD1(the first new axis that LDA creates) accounts for the most variation between categories. Both can let you dig in and see which genes are driving the new axis. in PCA it is by looking into loading scores in LDA which gene or variables correlate with new axis

Explain Factor Analysis?

Common factors of the observations explain the variable interdependence. Example: The relationship between observed variables and observable outcome : Sales. Observed Variables : 1. Page Views 2. Clicks 3. Minutes browsed 4. Add to Carts Underlying causes: 1. Selection 2. Marketing Spend 3. Pricing Effect/Outcome : 1. Sales There may be few underlying causes which can effect the output apart from the observed ones. Identifying underlying causes helps in accurate prediction.

What is Feature Extraction?

Feature Extraction aims to reduce number of features in a dataset by creating new features from the existing ones(and then discarding the original features). These new reduced set of features should then able to summarize most of the information contained in original set of features.

What is Multicollinearity Problem?

Interdependence among two or more explanatory variables may lead to an unreliable model. By performing Factor Analysis to extract underlying cause that is leading to this behavior.

What is the use of Standard Scaler?

It transforms the data in such a manner that it has a mean Zero and standard deviation as 1. In short, it standardizes the data. Standardization is useful for data which has negative values. It arranges the data in normal distribution.

Explain Principal Component Analysis (PCA)?

PCA is a statistical procedure which uses orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. Is a technique used to emphasize variation and bring out strong patterns in a dataset. It's often used to make data easy to explore and visualize. - Extracts hidden factors from the dataset. - Defines your data using less number of components, explaining the variance in your data. - Reduces the computational complexity - Determines whether a new data point is part of the group of data points from your training data. -> Reduces dimensions by focusing on the variables with most variation. This is useful for plotting data with a lot of dimensions(or a lot of variables) onto a simple X/Y plot.

PCA Process

PCA is going to take 4 or more variables and make it to a 2 - D PCA plot (x,y). This plot will show similar mices (students) are clustered together. Which gene or variable is most valuable for clustering the data. How Accurate is 2d Graph Step1 :

What are the techniques to extract feature - based information?

a. Regression Tells the relationship among variables and quantifies the relationship using set of equations. b. Factor Analysis Common factors of the observations explain the variable interdependence

How does LDA create new axis?

cThe new axis is created according to two criteria (considered simultaneously) 1. Maximize the distance between means. 2. Minimize the variation within each category (Mean of Category1 - Mean of Category2) **2 -------------------------------------------------- (Scatter of Category1 )**2 + (Scatter of Category2)**2


Kaugnay na mga set ng pag-aaral

1984: Part 2 Chapters 9&10 Group 2

View Set

Professional Communication Final

View Set

SOL 6.7 Point vs. Non-Point Source Pollution

View Set

BIBL 104-Quiz: The Old Testament Books of Prophecy

View Set

EMR Chapter 12: Behavioral Emergencies

View Set

Unit 6 - Basic Economic Concepts

View Set

Emergency Planning and Evacuation

View Set

Biology Chapter 8 Quest Section 1 and 2

View Set

2.4 Summarize authentication and authorization design concepts

View Set