Data Analytics - Test #2
Steps for logistic regression
1. Read file 2. Convert Result into a new column with a numerical value. (Use a Lambda Function) df['New Column'] = df['Column 1'].apply(lambda x: 1 if (x == 'Accepted') else 0) 3. Standardize Data: 0-1 Standardize: from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() Z-Score: from sklearn.preprocessing import StandardScaler scaler = StandardScaler() df[['Column1s', 'Column2s']] = scaler.fit_transform(df[['Column1', 'Column2']]) 4. Split dataset X = df[['IV1','IV2']] y = df['DV'] from sklearn.model_selection import train_test_split(X, y, test_size = 0.3, random_state = 1, stratify = df['Word Result Column'])
What are the 6 stages of the CRISP-DM
1. Business Understanding 2. Data Understanding 3. Data Preparation 4. Modeling 5. Evaluation 6. Deployment
Data Cleansing Functions
1. df.rename(columns = {'Column A' : 'New Title'}, inplace = True) 2. df.drop('column A', axis = 1, inplace = True) 3. df.isnull() to find null values 4. df.isnull.sum() to sum all null values by column 5. df.dropna() 6. df.fillna() 7. df.fillna(method='pad') 8. df.drop_duplicates
Deployment
Deploy the model, make a report, next steps, etc.
Business Understanding
What purpose are we data mining for? What are the objectives?
Descriptive Analytics
analytics that describe what has or what is happening, used for understanding and assessing a situation.
Predictive Analytics
analytics that predicts what MAY happen, using historical data and statistical techniques to predict potential scenarios
Business Intelligence (BI)
leverages software and services to transform data into actionable intelligence that informs business decisions
How to Read a CSV file
pd.read_csv('link/database.csv')
How to Read an Excel File
pd.read_excel('link/database.xlsx')
Analytics
the result of systematic and computational data analysis
Diagnostic Analytics
uses data to determine the root causes of a particular outcome, such as a correlation between variables
Prescriptive Analytics
using data to determine an optimal course of action based on predictive models and business optics
What can you use to Classify model performance evaluation?
Confusion Matrix Performance Metrics (Accuracy, Precision, Specificity, Sensitivity, TPR, FPR) Receiver Operator Characteristics (ROC) Area Under the Curve (AUC)
CRISP-DM (What does it Stand For?)
Cross Industry Standard Process for Data Mining
Data Understanding
Collecting Relevant Datasets
Evaluation
Evaluate the models for the effectiveness, Does this model help solve the problems or objectives set in the first phase?
Data Preperation
Preparing Initial data into the final data sets for better understanding. Very Labor intensive.
Modeling
Select and Apply the appropriate modeling techniques
