Intro to Tensorflow
Training models on large datasets workflow
Don't forget to shuffle!
Scalar
0 dimensional tensor; just a standalone number
Vector
1 dimensional tensor; a list of numbers
Estimator API summary
1. Build models from prototype to production ready 2. Many pre made estimators so you can experiment quickly 3. Build custom estimators 4. Plug in out of memory datasets for large training jobs 5. Monitor training performance using tensorboard and tf.estimator.train_and_evaluate() 6. Distributed training 7. Exporters let you add production glue code that allows real-time auto-scaling serving behind your API
TF API Hierarchy
1. Estimator: fully packaged, off the shelf ML algorithms 2. Layers, losses, metrics: tools for custom ML models 3. Core TF in python: write custom TF ops in python 4. Core TF in C++: low level custom ops 5. Hardware: code which works with CPU's, GPU's, TPU's, and Android devices
Why is TF so popular?
1. Open source; you will always own what you write and updates are being made regularly 2. High level API's with python 3. Production ready and scalable bc it was built and used by google, so popular with engineers 4. Researchers like their work to be used, and since popular with engineers, they like it too 5. Extensible: users can create their own TF operations in C++
Benefits of lazy execution
1. Optimization: tf can optimize and improve the processing time by combining operations and running some things in parallel 2. TF can distribute computation across many different machines. TF optimizes the process automatically.
CMLE Workflow
1. Pre-work: clean, split, engineer features, or preprocess features for data 2. Put training data in online source that CMLE can access, like cloud storage 3. Split logic into task.py and model.py 4. Do local test to make sure package structure is correct 5. Train/eval 6. Create microservice for making predictions
Debugging workflow
1. Read ENTIRE error message to understand the problem. Find the line where the problem is from the stack trace, then understand what your actual error is. 2. Isolate the method that's causing the problem. 3. Call the problematic method with hard coded dummy data
Common TF Errors
1. Tensor Shape 2. scalar/vector mismatch 3. data type mismatch
tf.estimator() workflow
1. arrange data in feature columns that your model can understand: feat_cols=tf.feature_column() 2. instantiate a model depending on what you want to predict: model=tf.estimator.model_type(feat_cols) 3. train the model on the data: model.train(train_function, steps=100) 4. make predictions on new examples: predictions=model.predict(pred_function)
Lazy Evaluation, or the two steps of processing tensor operations
1. build DAG using python 2. run DAG on a session to receive output in the form of a numpy array. The DAG is kinda like compiled python which is easy for
TensorBoard on CloudShell workflow
1. tensorboard --port 8080 --logdir gs://${BUCKET}/${SUMMARY_DIR} 2. Preview on port 8080
tf.estimator() methods
1. tf.estimator.LinearRegressor(featcols)
How to debug entire TF programs?
1. tf.print() 2. tfdbg 3. TensorBoard 4. Change logging level
tf.get_variable()
Create variable, specifying shape, how it will be initialized, and if it can be trained
tf.Session()
Creates a session instance required to run the DAG
Matrix
2 dimensional tensor; a table of numbers with height and width
Tensor Variable
A tensor which is initialized, but whose value typically changes as the session runs. Called by tf.get_variable(init conditions)
Checkpoints
A way to save a partially trained model, so you can restart from a certain point if the model fails. Specify a folder for saved checkpoints when instantiating the model: model = tf.estimator.mod_type(featcols, './folder') Run saved model by instantiating from same folder and then just run predict(). If you want to change the model and retrain, you have to delete the saved checkpoints. tf.estimator() starts from saved checkpoints by default
tf.global_variables_initializer().run()
Activates all initializers for all variables globally. Do this right before eval()
tf.placeholder()
Allow you to feed values into a session to be run, i.e. pass in a list or numpy array from a text file. Takes datatype and shape as parameters
tf.constant()
Creates a tensor. Takes an n-dimensional tensor for parameters. Can be used with tf.placeholder()
Tensor
An n-dimensional array
feed_dict=[dict]
Argument to feed data to a tensor during .run() or .eval()
tf.reshape()
Change shape of tensors by reading entries left-right top-bottom and putting them in the new configuration
tf.logging.set_verbosity(tf.logging.INFO)
Change verbosity of logging levels to give more information when debugging
tf.cast()
Changes data type. Arguments: tensor, desired data type
Variable creation/initialization workflow
Create variable with tf.get_variable() and set it's scope Set initializer Initialize all variables Eval() all boards
tf.matmul()
Creates the dot product of two matrices
Shuffle and Queue Capacity when setting inputs
Data must be thoroughly shuffled during training. Setting the right size for the queue allows you to avoid replicating all the data in memory (needs clarification)
Rank
Dimensionality of a tensor
Slicing tensors
Dimensions separated by commas with dimension precedence (row, column, within those just normal slicing syntax
Composition of DAG's
Edges represent the flow of tensors to nodes, where numerical operations are performed on the tensors
One-hot encoding
Encoding strategy for working with categorical variables which turns each category into a binary value
tf.slice()
Extracts a slice from a tensor. Parameters: tensor, where to start, where to stop
Cloud Machine Learning Engine
GCE service for managing all tasks related to TF models in production
Monitoring Jobs
GCP web console provides logs on each job which can help dissect technical problems Tensorboard will tell you the performance of the model itself
Benefits of DAG's
Gives language and machine portability. Can be written in a high level language like python and then executed at high speed on any machine using the tensorflow execution engine (similar to java byte code and the JVM)
tf.estimator API
High level TF API. 1. test models quickly and interchangeably 2. Pause and save models with checkpointing 3. Work with out of memory datasets 4. Train, evaluate, and monitor datasets 5. Distribute training across many machines 6. Tune hyper parameters with ML engine 7. Put into production by serving predictions from a trained model
tf.eager
Imperative execution of tf operations, combining steps of lazy evaluation. Typically used only for development, not production.
tf.expand_dims()
Inserts a dimension of 1 into a tensors shape. Parameters: tensor and index
tfdbg
Interactive debugging tool that can be run from a local terminal and is attached to a local or remote tf session. Add "-- debug" to any python program using the CLI
What happens when we instantiate an estimator?
It forms a graph, but does not process data. Data will only be processed when the graph is run.
Training with in memory data
Locally stored data used for training models in the form of numpy or pandas array (pandas extracts col_names automatically from input). Parameters: X,Y,batch_size, num_epochs, shuffle, queue_capacity tf.estimator.inputs.numpy_input_fn() tf.estimator.inputs.pandas_input_fn()
Batch Size
Number of examples to train model on. Use a "mini" batch during development to keep iteration time low.
Epochs
Number of times to repeat training on current batch.
Directed Acyclic Graph (DAG)
One way graphs, this is how TF programs are represented graphically. Each node represents a different mathematical operation. Connected by edges, or the input/output of other operations. Arrays of data travel along these lines.
TensorFlow
Open source, high performance library that uses directed graphs for numerical computations (good for any numerical computations). Our data, in the form of TENSORS, FLOW through the DAG
Serving Input Function
Parses JSON and delivers features for model to make prediction. Called only once when model is instantiated, creates graph which connects from REST endpoint and parsed JSON to the model to make predictions
tf.print()
Prints out the values of tensors when specific conditions are met. Used to debug rare errors.
TensorFlow Lite
Provides lightweight inference with on device ML for mobile and embedded systems. Models are trained in the cloud and then stored locally
tf.squeeze()
Removes a dimension of 1 from a tensor
How to fix tensor shape problems?
Reshape tensor with... 1. tf.reshape() 2. tf.expand_dims() 3. tf.slice() 4. tf.squeeze()
.get_shape(tensor)
Returns the shape of tensor passed. Used for debugging
session.run(tensor)
Runs a session to process a tensor on the DAG
tensor.eval()
Runs a session to process a tensor on the DAG
What do input functions return?
Set of tensorflow nodes with the labels and features expected by the model. It is hooked directly into the first layer of the model. NOTE: the input function is called only once, building the graph. It is the tensors themselves which keep pumping in new batch data each time the graph is run.
tf.variable_scope()
Set scope of variable, giving it a name and setting whether or not it can be saved and reused
tf.summary.FileWriter()
Uses a session to output a representation of a DAG that can be visualized with TensorBoard
Summaries and Tensorboard
Summarize model performance in real time to dissect issues
Session
TF object required for running DAG's; they are what allow TF to cache (for re-running computation) and distribute computations across devices
tf.feature_column() API
TF works with a list of feature columns of different varieties. E.G.: 1. tf.feature_column.numeric_column() 2. tf.feature_column.categorical_column_with_vocabulary_list("col_header", ["type1", "type2"]) 3. many more
tf.truncated_normal_initializer()
Used for initializing tensors when building neural nets. The weights are initialized with numbers within a gaussian distribution (0 mean and unit variance), excluding those found in the long tails (hence truncated)
task.py
Used when sending jobs to CMLE, it parses command line arguments and for model.py
Tensor Shape Error Causes
Two tensors of different size being used in an operation. Frequently resulting from: 1. Batch size 2. scalar/vector mixup
Federated Learning
Updates and feedback from ML models being served to lots of users can be aggregated and used as a consensus mechanism to update the weights of a shared model that was trained in the cloud.
TensorBoard
Visualization tool accessible from datalab that can display tensor DAG's. TensorBoard can be run on CloudShell
Serving
When the model is served an example for prediction. Serving and training time inputs are often very different
model.py
When using CMLE, it executes core ML. Called by task.py
Method to import CSV into pandas dataframe
df.read_csv('file.csv', headers=None, names=CSV_COLUMNS)
Estimator API function for training real-world (i.e.: LARGE) datasets
estimator.train_and_evaluate() automatically distributes training across many machines to improve performance. All you need to do is: 1. run_config = tf.estimator.RunConfig() Choose run configuration (output_dir, summary steps, checkpoints) 2. estimator = tf.estimator.LinearRegressor() Choose estimator model (LR, KNN, DNN, etc) 3. train_spec = tf.estimator.TrainSpec() tells estimator how to get training data using input_fn; use dataset API) 4. export_latest = tf.estimator.LatestExporter() serving input function creates a saved model with most recent weights for how your model will serve requests. Parses JSON from REST API and transforms into features our model will expect. Parameter for eval_spec() 5. eval_spec = tf.estimator.EvalSpec() (controls evaluation of model and checkpoints using the eval_input_fn; num_batches; throttling; exporters) 6. tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec) alternates between training and testing so you can monitor progress in tensorboard
Reading CSV file with TextLineDataSet
model.train() launches training loop The training loop receives nodes from the input_fn(). The nodes return data every time they are run by the training loop. The dataset is shuffled 15 times, then batched into groups of 128. tf.data.TextLineDataset() reads examples from a csv line by line, and the map() function transforms them into features
Methods to execute a DAG
session_obj.run(tensor) OR tensor.eval() You can evaluate multiple tensors at once, and will get back a corresponding number of numpy arrays
Dataset API
tf.data.dataset() They generate input functions for estimators. Divides large datasets into smaller ones which can be loaded as you need them. One mini - batch is all you need for one training step. Allows you to work with datasets that are too large to store locally.
tf.estimator.DNNRegressor(feat_cols, hidden_units=[layer1, layer2],
tf.estimator.DNNRegressor(feat_cols=..., hidden_units=[layer1, layer2], activation_fn=tf.nn.relu, dropout=0.2, optimizer="Adam")
Syntax for running a session
with tf.Session as sess: result = sess.run(ten)