Module 13: Neural Networks

Ace your homework & exams now with Quizwiz!

After the model makes its predictions, we can create a DataFrame that compares the data's predicted versus actual values. What does the code look like for this?

# Create a DataFrame to compare the predictions with the actual values results = pd.DataFrame({"predictions": predictions.ravel(), "actual": new_y}) # Display sample data results.head(10)

What is the code for creating an instance of OneHotEncoder?

# Create a OneHotEncoder instance enc = OneHotEncoder(sparse=False) (We set the parameter sparse=False to fetch a NumPy array. We will use the resulting array to create a DataFrame. Later, we will use this DataFrame to fit the neural network model)

What code can we use to check our neural network model structure?

# Display model summary neuron.summary()

We can test the performance of the imported model on our TEST dataset by running the following code:

# Evaluate the model using the test data model_loss, model_accuracy = nn_imported.evaluate(X_test_scaled, y_test, verbose=2) # Display evaluation results print(f"Loss: {model_loss}, Accuracy: {model_accuracy}")

Show example code of fitting the model for neural network

# Fitting the model model = neuron.fit(X_train_scaled, y_train, epochs=100)

List examples of how neural networks help in the financial industry

-fraud detection -risk management -money laundering prevention -algorithmic trading

activation parameter

defines the activation function that will process the values of the input features before they are passed to the first hidden layer

metrics parameter

specifies additional metrics that assess the quality of a neural network model

compile() function

used if the Python code is in string form or is an AST object, and you want to change it to a code object. The code object returned by compile() method can later be called using methods like: exec() and eval() which will execute dynamically generated Python code.

When using MSE to assess a model, an MSE value closer to ______ indicates a better model.

zero

What are the 2 main evaluation metrics?

1. model predictive accuracy 2. model mean squared error (MSE)

A good rule of thumb is to start training a model with ______ epochs.

20 Then, we can plot the model's loss value and evaluation metric over the course of those 20 epochs. With these plots, we can verify whether the model's training loss decreases over the epochs, and whether its accuracy (for classification) increases, or its mean squared error (for regression) moves toward zero.

deep neural network

a neural network with multiple neuron layers between the model's input and output layers; discover nonlinear relationships among data much more effectively than traditional machine learning algorithms

The total number of neurons across all hidden layers should be ______ the size of the input layer (size of input layer = number of features), plus the size of the output layer (size of output layer = number of neurons on the output layer). Alternatively, the total number of neurons across all hidden layers should be less than _______ the size of the number of features in the input layer.

⅔ twice

Name the 3 layers of a basic neural network

-An input layer of input values transformed by weight coefficients -A hidden layer that can contain a single neuron or multiple neurons -An output layer that reports the outcome of the value

Most neural networks will use which 5 activation functions?

-linear function: returns the sum of the weighted inputs without transformation -sigmoid function: transforms the neuron's output to a range between 0 and 1, which is especially useful for predicting probabilities. A neural network that uses the sigmoid function will output a model with a characteristic S curve. -tanh function: transforms the output to a range between −1 and 1, and the resulting model also forms a characteristic S curve. Primarily used for classification between two classes. -Rectified linear unit (ReLU) function returns a value from 0 to infinity. This activation function transforms any negative input to 0. It is the most commonly used activation function in neural networks due to its faster learning and simplified output. However, it is not always appropriate for simpler models. -Leaky ReLU function is a "leaky" alternative to the ReLU function. This means that instead of transforming negative input values to 0, it transforms negative input values into much smaller negative values.

What are the steps in general for creating deep learning models?

-read in the data -create the X features and y target sets -create training and testing datasets -create scaler instance, fit scaler, scale the features data -define the deep neural net model -create a sequential model, add the layers, add the output layer -compile the model and fit the model

Because the training results are stored on the model variable, we can access the dictionary by only adding what code?

.history EXAMPLE: # Create a DataFrame with the history dictionary df = pd.DataFrame(model.history, index=range(1, len(model.history["loss"]) + 1)) # Plot the loss df.plot(y="loss") # Plot the accuracy df.plot(y="accuracy")

The model-fit-predict process follows the same general steps across all of data science. Restate the 4 steps here as review:

1. Decide on a model, and create a model instance. 2. Split the dataset into training and testing sets, and preprocess the data. 3. Train/fit the training data to the model. (Note that "train" and "fit" are used interchangeably in Python libraries, as well as in the ML field.) 4. Use the model for predictions.

What are the four major components of the perceptron model?

1. Input values, typically labeled x or χ (chi, pronounced kaai, as in eye) 2. A weight coefficient for each input value, typically labeled w or ω (omega). The weight coefficient determines the input value's strength—that is, the impact the input value has on the network 3. A constant value called bias, which is added to the inputs in order to help best fit the model for a given dataset. It is typically labeled w0. So, no matter how many inputs we have, there will always be an additional value to "stir the pot." 4. A net summary function that aggregates all weighted inputs

What three parameters do you pass to the compile() function?

1. Loss 2. Optimizer 3. Metrics neuron.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"]) When we later fit (train) a neural network model, we use the optimizer and loss functions to adjust the weights of each input during each epoch of the training cycle

Remember that there are two main evaluation metrics, list the 2 as review

1. model predictive accuracy 2. model mean squared error (MSE) **We use accuracy for classification models and MSE for regression models.

After starting with 20 epochs, we can then continue training the model, increasing the number of epochs by ________ on each trial.

20 After each new trial, we'll again verify that the model's loss and evaluation metric move in the directions we want.

deep learning

A type of machine learning that uses multiple layers of interconnections among data to identify patterns and improve predicted results. Deep learning most often uses a set of techniques known as neural networks and is popularly applied in tasks like speech recognition, image recognition, and computer vision.

What advantage does adding layers to a neural network offer?

Each additional layer of neurons makes it possible to model more complex relationships and concepts.

How do you determine the optimal number of neurons for a hidden layer?

Find the mean of the number of input features and the number of neurons in the output layer ((number of input features + number of neurons in output layer) / 2). Use a number close to this mean for the number of neurons in the first hidden layer. Repeat this pattern for subsequent hidden layers ((number of neurons in the prior hidden layer + number of neurons in output layer) / 2). This rule normally works well for the first approximation.

What is the installation code for the TensorFlow 2.0 library and Keras (included in the TensorFlow 2.0 library)?

Install: pip install --upgrade tensorflow Verify: python -c "import tensorflow as tf;print(tf.__version__)" python -c "import tensorflow as tf;print(tf.keras.__version__)"

When dealing with data that is linearly separable, do you need any hidden layers?

No, you do not typically need any hidden layers.

List the pros and cons of neural networks

Pros -can effectively detect complex relationships within data -have greater tolerance for messy data. They can learn to ignore noisy characteristics within a large dataset. Cons -algorithms can be too complex for humans to dissect and understand (creating a black box problem) -are prone to overfitting (characterizing the training data so well that the model does not effectively generalize to test data)

Once we have encoded the categorical variables, we can use the _________________________ to scale the numerical variables to a similar range

StandardScaler (Then the next step in the OneHotEncoder process is: the DataFrame with the encoded information needs to be concatenated with a version of the original DataFrame that has the categorical variable columns dropped.)

When designing a deep neural network, it is a best practice to start with __________ hidden layers. Then, what do you do from there?

TWO Then, continue adding additional layers until the model's performance no longer improves over the same number of epochs.

How does OneHotEncoder encode a categorical column?

To encode this column, OneHotEncoder creates two new columns, one for each category level—that is, one for each possible value in the original column. (Example: the encoder creates a new column called "Loan_Status_Fully_Paid", and a new column called "Loan_Status_Not_Paid". In each of these new columns, the value 1.0 represents the presence of a value, and 0.0 represents the absence of a value)

_______________________ is the only way to determine how "deep" a deep learning model should be.

Trial and error

True/False: The data used to fit a neural network should always be numerical and normalized to the same scale. This is true regardless of how many hidden layers the neural network contains.

True

True/False: Adding layers does not always guarantee better model performance.

True (Depending on the input data's complexity, adding more hidden layers sometimes just increases the chance of overfitting the training data.)

To create the plots for loss and accuracy, what do we do first?

We first create a DataFrame using the history dictionary of the training results stored on the model variable.

gradient decent approach

When gradient descent works properly, the model learns the greatest amount in the early iterations. The amount learned declines with each iteration until the optimization algorithm approaches the local minimum value, or the point where it cannot learn anything additional. The number of model iterations required for the model to learn everything it can varies widely—and is often only discovered through trial and error.

When dealing with nonlinear data, do you need any hidden layers?

Yes, you may need more than one hidden layer

activation function

a mathematical function applied to the end of each neuron (that is, each individual perceptron model). This function transforms each neuron's output into a quantitative value. The quantitative output value is then used as the input value for the next layer in the neural network model. Although activation functions can introduce both linear and nonlinear properties to a neural network, nonlinear activation functions are more common; a wide variety of activation functions exist, and each has a specific purpose

Keras

a popular deep learning framework that serves as a high-level API for TensorFlow; now included with TensorFlow 2.0

OneHotEncoder

a scikit-learn module that allows us to specify what happens when a new category appears in testing data (helps when a category isn't present in the training data but it shows up in the testing data; works similar to get_dummies(), but it better deals with new categories that may show up in testing data)

artificial neural networks (ANN)

a set of algorithms that are modeled after the human brain. They are an advanced form of machine learning that recognizes patterns and features of input data and provides a clear quantitative output

perceptron model

a single neural network unit. It mimics a biological neuron by receiving input data, weighting the information, and producing a clear output

epoch

a single pass of the entire training dataset through the model (sometimes an epoch is loosely defined as an iteration of a model)

To add a second layer to a neural network, all you have to do is include another call to the ________ function.

add EXAMPLE: # Define the model - deep neural net with two hidden layers number_input_features = 11 hidden_nodes_layer1 = 8 hidden_nodes_layer2 = 4 # Create a sequential neural network model nn = Sequential() # Add the first hidden layer nn.add(Dense(units=hidden_nodes_layer1, input_dim=number_input_features, activation="relu")) # Add the second hidden layer nn.add(Dense(units=hidden_nodes_layer2, activation="relu")) # Add the output layer nn.add(Dense(units=1, activation="linear"))

linear activation function

allows for multiple outputs instead of just binary outputs (0, 1) Example: allows for the multiple outputs we need for our 1-10 wine quality scale (rather than just a binary 0 or 1)

What does the Dense Keras library do?

allows us to add layers within a neural network

black box

an impenetrable system where the inputs are not visible to the user, making it difficult to understand the resulting output

deep learning model

are more effective than traditional machine learning algorithms at discovering nonlinear relationships among data. At a high level, a deep learning model is a neural network with more than one hidden layer

binary_crossentropy loss function is designed to deal with which type of problems?

binary classification problems Many different loss functions exist. Which one we use depends on the output we want from our neural network model

Choose the loss function based on the type of problem you're solving. For binary classification, use ___________________. For multi-class classification, use ______________________ if you encode the variables using OneHotEncoder. Or, use ________________________________ if the labels are integers. Finally, use ________ for regression.

binary_crossentropy categorical_crossentropy sparce_categorical_crossentropy mse

What does the Sequential Keras library do?

build a neural network with a linear stack of layers. Data flows sequentially from one layer to the next in this model, as in the neural network structure that we saw in the previous section

linearly separable data

can be separated by a straight line when it's plotted in two dimensions

We use model predictive accuracy (accuracy) for __________________ models, and we use MSE (mse) for ______________________ models

classification regression

A deep neural network is the best choice for what types of data?

complex or unstructured data, such as images, text, and voice

input_dim parameter

defines the number of inputs

units parameter

defines the number of neurons in the first hidden layer

Defining the neural network's structure is akin to ____________________________________, and compiling the model is like ______________________________.

designing construction plans building the house

As the number of epochs increases, we want the loss to go _________ and the accuracy to go ______

down, should tend toward 0 up, should tend toward 1 (We can also use these loss and accuracy plots to compare how well different models perform when solving the same problem.)

Although we follow the model-fit-predict-evaluate pattern, it is common to ____________ a neural network model after it's fitted, but before using it to make predictions

evaluate (After the training cycle ends (runs through all the epochs), we can visually evaluate the model by plotting the loss function and the accuracy across all epochs

Which function do we use to evaluate the model's performance?

evaluate function EXAMPLE: # Evaluate the model using testing data model_loss, model_accuracy = neuron.evaluate(X_test_scaled, y_test, verbose=2) # Display evaluation results print(f"Loss: {model_loss}, Accuracy: {model_accuracy}")

What code is used to import the OneHotEncoder?

from sklearn.preprocessing import OneHotEncoder

What is the code to import the Keras libraries?

from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense

When we begin coding the model's architecture, our initial step will create both an __________ layer containing the number of inputs and a __________ layer containing the number of neurons.

input hidden To add these initial layers to our neural network, we use the add function and the Dense module, as the following code shows: number_inputs = 2 number_hidden_nodes = 3 neuron.add(Dense(units=number_hidden_nodes, activation="relu", input_dim=number_inputs))

Within our Sequential model, we'll add three Dense layers that will act as our _____________, _____________, and ____________ layers

input, hidden, and output For each Dense layer, we'll define the number of neurons, as well as the activation function.

Since we can't always achieve perfect 100% accuracy, especially for complex datasets, it is important to do what?

it is important to establish performance thresholds before designing any machine learning model. (Depending on the type of data and the use case, we may have to re-create and retrain a model using different parameters, or different training/test data, to achieve our performance threshold. Or, we may have to consider using a different model entirely)

Once we have saved a neural network model, anyone can import the exact same trained model to their environment using the Keras _______________ function.

load_model (tf.keras.models.load_model) EXAMPLE: # Import the required libraries import tensorflow as tf # Set the model's file path file_path = Path("Resources/wine_quality.h5") # Load the model to a new object nn_imported = tf.keras.models.load_model(file_path)

If a model's output is continuous (i.e., wine quality 1-10), we need to build a regression model and not a classification model. We used the ______________ loss function, which is designed for regression problems

mean_squared_error (We also can use the mean squared error (mse) metric to evaluate the quality of the model)

How do you define an instance of a model for neural networks?

neuron = Sequential()

a neural network contains layers of ___________ that perform individual computations

neurons

Before using a neural network, we must _________________ our data.

normalize, or standardize, (we can use the StandardScaler or MinMaxScaler) Neural networks typically perform better with all input features on the same scale. This makes it easier for the neural network to adjust the weights in the network.

For classification models, the highest possible accuracy value is _____. However, for regression models, we want the MSE to reduce to ______.

one (the closer to 1, the more accurate it is) zero (the closer to 0, the more accurate it is)

What does the hidden layer do?

perform data transformations on the inputs that we enter into the network

What is currently the world's most-used activation function when it comes to training deep neural networks?

rectified linear unit (ReLU) function "relu", enables nonlinear relationships

For the activation function on a hidden layer, use __________

relu

optimization function

shapes and molds a neural network model while the model is trained on the data, to ensure that the model performs to the best of its ability. At a glance, an optimization function reduces the model's losses and provides the most accurate output possible. Which optimization function we choose depends on what we want to optimize for in our model; there are several optimizers available in TensorFlow's Keras library

For the activation function on an output layer, use ___________ for binary classification, __________ for multi-class classification, and _________ for regression.

sigmoid softmax linear

loss parameter

specifies the loss function. When we train our neural network model on a dataset, we will pass our training dataset through the model multiple times. The loss function uses machine learning algorithms to score the performance of the model after each of these iterations. This allows us to see how the model's performance changes over each iteration. We may determine that the model reaches maximum performance after a particular number of iterations.

What function do we use to export an entire model—including the configuration of the model layers, the weights associated with each layer, the activation functions, the optimizer, and the set of losses and metrics—to a Hierarchical DataFormat HDF5 file?

the Keras Sequential model's "save" function EXAMPLE: # Set the model's file path file_path = Path("Resources/wine_quality.h5") # Export your model to an HDF5 file nn_1.save(file_path)

How are deep neural nets typically constructed?

the number of neurons on each successive layer is equal to or less than the number of neurons on the previous layer, with the output layer containing the fewest neurons

Using our neural network model, we can use the predict function to generate predictions on new data by supplying the function with the data and a _____________

threshold Example: Threshold of 0.5. Any value under 0.5 is classified as 0 and anything over 0.5 is classified as 1. # Create 10 new samples of dummy data new_X, new_y = make_blobs(n_samples=10, centers=2, n_features=2, random_state=1) # Make predictions predictions = (neuron.predict(new_X) > 0.5).astype("int32")

OneHotEncoder's fit_transform() function

train it with those variables that are categorical (i.e., included in the categorical_variables list --> column names that are categorical that we need to convert to binary/numerical) EXAMPLE: encoded_data = enc.fit_transform(df[categorical_variables])

get_feature_names() function

used to set the DataFrame's column names. We use our list of categorical variables as the function's parameter. This way, in addition to the encoded values for the categorical variables, the encoder will also fetch the correct name for each column

adam optimizer

uses a gradient descent approach, which ensures that weaker classifying variables and features will not confuse the model and cause it to return less accurate results.

algorithmic trading

uses technical indicators and conditional logic to identify signals for entering and exiting trades

To finish creating our neural network, we just need to add the output layer. How do we do that?

we'll use the Dense module to create a new layer in our Sequential model EXAMPLE: number_classes = 1 neuron.add(Dense(units=number_classes, activation="sigmoid")) (For the units, we are building a classification model which will output a yes or no (1 or 0) binary decision for each input data point. So, we only need one output neuron)

Overfitting

when a model gives undue importance to patterns within a particular dataset that are not found in other, similar datasets

sigmoid function

will transform the output to a range between 0 and 1 This allows the model to map the result to a probability that the input data point belongs to Class 1 (rather than Class 0). Alternatively, it would allow the model to perform a hard classification and identify each input data point as either Class 1 or Class 0. For this type of classification, the model would use a default threshold of 0.5. In other words, the model would classify any data point with an output greater than or equal to 0.5 as Class 1, and any data point with an output less than 0.5 as Class 0.

Do we choose an activation function for each hidden layer?

yes. We will choose an activation function for the first layer. This time, we will also use this same activation function for our second hidden layer. Often, developers experiment with many potential architectures in an effort to minimize the loss metric.


Related study sets

Government 2305: Chapter 5 Smartbook

View Set

AP Chemistry: Unit 3 College Board Questions

View Set

Chapter 11: Environmental Safety

View Set

Morphology Chapter 3.6, Minor Processes

View Set

Legal Environment of Business Objectives: Chapters 16-20

View Set