Data Analytics - Test #2

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Steps for logistic regression

1. Read file 2. Convert Result into a new column with a numerical value. (Use a Lambda Function) df['New Column'] = df['Column 1'].apply(lambda x: 1 if (x == 'Accepted') else 0) 3. Standardize Data: 0-1 Standardize: from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() Z-Score: from sklearn.preprocessing import StandardScaler scaler = StandardScaler() df[['Column1s', 'Column2s']] = scaler.fit_transform(df[['Column1', 'Column2']]) 4. Split dataset X = df[['IV1','IV2']] y = df['DV'] from sklearn.model_selection import train_test_split(X, y, test_size = 0.3, random_state = 1, stratify = df['Word Result Column'])

What are the 6 stages of the CRISP-DM

1. Business Understanding 2. Data Understanding 3. Data Preparation 4. Modeling 5. Evaluation 6. Deployment

Data Cleansing Functions

1. df.rename(columns = {'Column A' : 'New Title'}, inplace = True) 2. df.drop('column A', axis = 1, inplace = True) 3. df.isnull() to find null values 4. df.isnull.sum() to sum all null values by column 5. df.dropna() 6. df.fillna() 7. df.fillna(method='pad') 8. df.drop_duplicates

Deployment

Deploy the model, make a report, next steps, etc.

Business Understanding

What purpose are we data mining for? What are the objectives?

Descriptive Analytics

analytics that describe what has or what is happening, used for understanding and assessing a situation.

Predictive Analytics

analytics that predicts what MAY happen, using historical data and statistical techniques to predict potential scenarios

Business Intelligence (BI)

leverages software and services to transform data into actionable intelligence that informs business decisions

How to Read a CSV file

pd.read_csv('link/database.csv')

How to Read an Excel File

pd.read_excel('link/database.xlsx')

Analytics

the result of systematic and computational data analysis

Diagnostic Analytics

uses data to determine the root causes of a particular outcome, such as a correlation between variables

Prescriptive Analytics

using data to determine an optimal course of action based on predictive models and business optics

What can you use to Classify model performance evaluation?

Confusion Matrix Performance Metrics (Accuracy, Precision, Specificity, Sensitivity, TPR, FPR) Receiver Operator Characteristics (ROC) Area Under the Curve (AUC)

CRISP-DM (What does it Stand For?)

Cross Industry Standard Process for Data Mining

Data Understanding

Collecting Relevant Datasets

Evaluation

Evaluate the models for the effectiveness, Does this model help solve the problems or objectives set in the first phase?

Data Preperation

Preparing Initial data into the final data sets for better understanding. Very Labor intensive.

Modeling

Select and Apply the appropriate modeling techniques


Ensembles d'études connexes

10.INTRODUCTION TO COMPUTER HARDWARE

View Set

Nutrition Exam 2 (Chapter 3) Pre-Test

View Set

Inflammatory Bowel Diseases - Crohn's and UC

View Set

Western Civilization 1 Study Questions

View Set

Epidemiology - Observational Study Designs

View Set