Python New Package Introduction (NPI)
Adaboost Regressor
Boosting is an ensemble learning strategy that transforms a group of weak learners into strong learners in order to reduce training errors. Boosting involves selecting a random sample of data, fitting it with a model, and then training it sequentially. That is, each model attempts to compensate for the shortcomings of its predecessor. Each cycle combines the weak rules of each classifier to generate one strict prediction rule. Scikit-learn: from sklearn.ensemble import AdaBoostRegressor adaboost_model = AdaBoostRegressor() // numerous param options adaboost_model.fit(x_train, y_train)
Random Forest Regressor
Random forest is a type of Supervised Machine Learning Algorithm that is commonly used in classification and regression issues. It constructs decision trees from several samples and uses their majority vote for classification and average for regression. Scikit-learn: from sklearn.ensemble import RandomForestRegressor rf_regressor = RandomForestRegressor() //numerous parameter options rf_regressor.fit(x_train, y_train)
Seaborn
Seaborn is a Python library that specializes in creating visually appealing and informative statistical graphics. It is built on top of matplotlib and offers seamless integration with pandas data structures, allowing for efficient visualization and analyiss of data stored in DataFrames import seaborn as sns
Random Forest Classifier
Random forest is a type of Supervised Machine Learning Algorithm that is commonly used in classification and regression issues. It constructs decision trees from several samples and uses their majority vote for classification and average for regression. Scikit-learn: from sklearn.ensemble import RandomForestClassifier rf_classifier = RandomForestClassifier() rf_classifier.fit(x_train, y_train)
Bagging Regressor
A Bagging regressor is an ensemble meta-estimator that fits base regressors each on random subsets of the original dataset and then aggregates their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it. Scikit-learn: from sklearn.ensemble import BaggingRegressor model = BaggingRegressor() //numerous parameter options model.fit(x_train, y_train)
Gradient Boosting Regressor
Boosting is an ensemble learning strategy that transforms a group of weak learners into strong learners in order to reduce training errors. Boosting involves selecting a random sample of data, fitting it with a model, and then training it sequentially. That is, each model attempts to compensate for the shortcomings of its predecessor. Each cycle combines the weak rules of each classifier to generate one strict prediction rule. Scikit-learn: from sklearn.ensemble import GradientBoostingRegressor gradientboost_model = GradientBoostingRegressor() //numerous param options gradientboost_model.fit(x_train, y_train)
Decision Tree Classification
Decision trees is non -parametic supervised learning approach that can be used for classification or regression problems. It has a tree structure that is hierarchical and consists of a root node, branches, internal nodes, and leaf nodes. Scikit-learn: from sklearn.tree import DecisionTreeClassifier dtree_classifier = DecisionTreeClassifier() // numerous parameter options dtree_classifier.fit(x_train, y_train)
Decision Tree Regression
Decision trees is non -parametic supervised learning approach that can be used for classification or regression problems. It has a tree structure that is hierarchical and consists of a root node, branches, internal nodes, and leaf nodes. Scikit-learn: from sklearn.tree import DecisionTreeRegressor dtree_regressor = DecisionTreeRegressor() // numerous parameter options dtree_regressor.fit(x_train, y_train)
Label Encoding
Intro in reference to decision trees, but useful elsewhere.Sklearn (Scikit-learn) provides a very efficient tool for encoding the levels of categorical features into numeric values. LabelEncoder encodes labels with a value between 0 and n_classes-1, where n is the number of distinct labels. If a label repeats, it assigns the same value as assigned earlier. from sklearn.preprocessing import LabelEncoder ln = LabelEncoder() ln.fit_transform(data['column']) # column should be any categorical variable
Regression Evaluation Metrics
It is necessary to obtain the accuracy on training data, But it is also important to get a genuine and approximate result on unseen data otherwise model is of no use. Below is an example of how we can check the regression model's efficiency through its respective available metrics. from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error #making predictions on the test data using the trained model y_pred = model.predict(x_test) print("mse = ", mean_sqaured_error(y_test, y_pred)) print("R^2 score = ", r2_score(y_test, y_pred)) print("mae = ", mean_absolute_error(y_test, y_pred))
NumPy
NumPy is used for working w/ numerical values as it makes it easy to apply mathematical functions. import numpy as np
Pandas
Pandas is used for data analysis tasks in Python. import pandas as pd
Xgboost Classification
XGBoost is an open-source software library that implements optimized distributed gradient boosting machine learning algorithms under the Gradient Boosting framework. It stands for Extreme Gradient Boosting, which is a scalable, distributed gradient-boosted decision tree (GBDT) machine learning library. It provides a parallel tree boosting and is the leading machine learning library for regression, classification, and ranking problems. !pip install xgboost from xgboost import XGBClassifier xgb_classifier = XGBClassifier() xgb_classifier.fit(x_train, y_train)
Xgboost Regression
XGBoost is an open-source software library that implements optimized distributed gradient boosting machine learning algorithms under the Gradient Boosting framework. It stands for Extreme Gradient Boosting, which is a scalable, distributed gradient-boosted decision tree (GBDT) machine learning library. It provides a parallel tree boosting and is the leading machine learning library for regression, classification, and ranking problems. !pip install xgboost from xgboost import XGBRegressor xgb_regressor = XGBRegressor() xgb_regressor.fit(x_train, y_train)