TensorFlow End-to-End Example - MNIST

Tensorflow Keras example with SavedModel model saving


Tested with TensorFlow 2.4.0

Machine Learning on Hopsworks

hops.png

The hops python module

hops is a helper library for Hops that facilitates development by hiding the complexity of running applications and iteracting with services.

Have a feature request or encountered an issue? Please let us know on github.

Using the experiment module

To be able to run your Machine Learning code in Hopsworks, the code for the whole program needs to be provided and put inside a wrapper function. Everything, from importing libraries to reading data and defining the model and running the program needs to be put inside a wrapper function.

The experiment module provides an api to Python programs such as TensorFlow, Keras and PyTorch on a Hopsworks on any number of machines and GPUs.

An Experiment could be a single Python program, which we refer to as an Experiment.

Grid search or genetic hyperparameter optimization such as differential evolution which runs several Experiments in parallel, which we refer to as Parallel Experiment.

ParameterServerStrategy, CollectiveAllReduceStrategy and MultiworkerMirroredStrategy making multi-machine/multi-gpu training as simple as invoking a function for orchestration. This mode is referred to as Distributed Training.

Using the tensorboard module

The tensorboard module allow us to get the log directory for summaries and checkpoints to be written to the TensorBoard we will see in a bit. The only function that we currently need to call is tensorboard.logdir(), which returns the path to the TensorBoard log directory. Furthermore, the content of this directory will be put in as a Dataset in your project’s Experiments folder.

The directory could in practice be used to store other data that should be accessible after the experiment is finished.

# Use this module to get the TensorBoard logdir
from hops import tensorboard
tensorboard_logdir = tensorboard.logdir()

Using the hdfs module

The hdfs module provides a method to get the path in HopsFS where your data is stored, namely by calling hdfs.project_path(). The path resolves to the root path for your project, which is the view that you see when you click Data Sets in HopsWorks. To point where your actual data resides in the project you to append the full path from there to your Dataset. For example if you create a mnist folder in your Resources Dataset, the path to the mnist data would be hdfs.project_path() + 'Resources/mnist'

# Use this module to get the path to your project in HopsFS, then append the path to your Dataset in your project
from hops import hdfs
project_path = hdfs.project_path()
# Downloading the mnist dataset to the current working directory
from hops import hdfs
mnist_hdfs_path = hdfs.project_path() + "Resources/mnist"
local_mnist_path = hdfs.copy_to_local(mnist_hdfs_path)

Documentation

See the following links to learn more about running experiments in Hopsworks

Managing experiments

Experiments service provides a unified view of all the experiments run using the experiment module.
As demonstrated in the gif it provides general information about the experiment and the resulting metric. Experiments can be visualized meanwhile or after training in a TensorBoard.

Image7-Monitor.png

def keras_mnist():
    
    import os
    import sys
    import uuid
    import random
    
    import numpy as np
    
    from tensorflow import keras
    import tensorflow as tf
    from tensorflow.keras.datasets import mnist
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Dense, Dropout, Flatten
    from tensorflow.keras.layers import Conv2D, MaxPooling2D
    from tensorflow.keras.callbacks import TensorBoard
    from tensorflow.keras import backend as K

    import math
    from hops import tensorboard

    from hops import model as hops_model
    from hops import hdfs

    import pydoop.hdfs as pydoop
    

    batch_size=32
    num_classes = 10

    
    # Provide path to train and validation datasets
    train_filenames = [hdfs.project_path() + "TourData/mnist/train/train.tfrecords"]
    validation_filenames = [hdfs.project_path() + "TourData/mnist/validation/validation.tfrecords"]
    
    # Define input function
    def data_input(filenames, batch_size=128, num_classes = 10, shuffle=False, repeat=None):

        def parser(serialized_example):
            """Parses a single tf.Example into image and label tensors."""
            features = tf.io.parse_single_example(
                serialized_example,
                features={
                    'image_raw': tf.io.FixedLenFeature([], tf.string),
                    'label': tf.io.FixedLenFeature([], tf.int64),
                })
            image = tf.io.decode_raw(features['image_raw'], tf.uint8)
            image.set_shape([28 * 28])

            # Normalize the values of the image from the range [0, 255] to [-0.5, 0.5]
            image = tf.cast(image, tf.float32) / 255 - 0.5
            label = tf.cast(features['label'], tf.int32)
    
            # Create a one hot array for your labels
            label = tf.one_hot(label, num_classes)
            
            return image, label

        # Import MNIST data
        dataset = tf.data.TFRecordDataset(filenames)

        # Map the parser over dataset, and batch results by up to batch_size
        dataset = dataset.map(parser)
        if shuffle:
            dataset = dataset.shuffle(buffer_size=128)
        dataset = dataset.batch(batch_size, drop_remainder=True)
        dataset = dataset.repeat(repeat)
        return dataset

    # Define a Keras Model.
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)))
    model.add(tf.keras.layers.Dense(num_classes, activation='softmax'))

    # Compile the model.
    model.compile(loss=tf.keras.losses.categorical_crossentropy,
                  optimizer= tf.keras.optimizers.Adam(0.001),
                  metrics=['accuracy']
                 )
        
    callbacks = [
        tf.keras.callbacks.TensorBoard(log_dir=tensorboard.logdir()),
        tf.keras.callbacks.ModelCheckpoint(filepath=tensorboard.logdir()),
    ]
    model.fit(data_input(train_filenames, batch_size), 
        verbose=0,
        epochs=3, 
        steps_per_epoch=5,
        validation_data=data_input(validation_filenames, batch_size),
        validation_steps=1,                    
        callbacks=callbacks
    )
    
    score = model.evaluate(data_input(validation_filenames, batch_size), steps=1)

    # Export model
    # WARNING(break-tutorial-inline-code): The following code snippet is
    # in-lined in tutorials, please update tutorial documents accordingly
    # whenever code changes.

    export_path = os.getcwd() + '/model-' + str(uuid.uuid4())
    print('Exporting trained model to: {}'.format(export_path))
    
    tf.saved_model.save(model, export_path)

    print('Done exporting!')
    
    metrics = {'accuracy': score[1]}
    
    hops_model.export(export_path, "mnist", metrics=metrics)    
    
    return metrics
Starting Spark application
IDYARN Application IDKindStateSpark UIDriver log
0application_1614814942983_0001pysparkidleLinkLink
SparkSession available as 'spark'.
from hops import experiment
from hops import hdfs

experiment.launch(keras_mnist, name='mnist model', local_logdir=True, metric_key='accuracy')
Finished Experiment 

('hdfs://rpc.namenode.service.consul:8020/Projects/demo_ml_meb10000/Experiments/application_1614814942983_0001_1', {'accuracy': 0.6875, 'log': 'Experiments/application_1614814942983_0001_1/output.log'})

Check Model Repository for best model based on accuracy

Image7-Monitor.png

Query Model Repository for best mnist Model

from hops import model
from hops.model import Metric
MODEL_NAME="mnist"
EVALUATION_METRIC="accuracy"
best_model = model.get_best_model(MODEL_NAME, EVALUATION_METRIC, Metric.MAX)
print('Model name: ' + best_model['name'])
print('Model version: ' + str(best_model['version']))
print(best_model['metrics'])
Model name: mnist
Model version: 1
{'accuracy': '0.6875'}

Create Model Serving of Exported Model

from hops import serving
# Create serving
# Optionally, add the kfserving flag to deploy the model server using this serving tool.
# If not specified, it is deployed using the serving tool by default on the current Hopsworks version (docker or kubernetes)
serving_name = MODEL_NAME
model_path="/Models/" + best_model['name']
response = serving.create_or_update(serving_name, model_path, model_version=best_model['version'],
                                    model_server="TENSORFLOW_SERVING", kfserving=False)
Creating a serving for model mnist ...
Serving for model mnist successfully created
# List all available servings in the project
for s in serving.get_all():
    print(s.name)
mnist
# Get serving status
serving.get_status(serving_name)
'Stopped'

Check Model Serving for active servings

Image7-Monitor.png

Start Model Serving Server

if serving.get_status(serving_name) == 'Stopped':
    serving.start(serving_name)
Starting serving with name: mnist...
Serving with name: mnist successfully started
import time
while serving.get_status(serving_name) != "Running":
    time.sleep(5) # Let the serving startup correctly
time.sleep(5)

Send Prediction Requests to the Served Model using Hopsworks REST API

import numpy as np
TOPIC_NAME = serving.get_kafka_topic(serving_name)
NUM_FEATURES=784
import json
for i in range(20):
    data = {
                "signature_name": "serving_default", "instances": [np.random.rand(NUM_FEATURES).tolist()]
            }
    response = serving.make_inference_request(serving_name, data)
    print(response)
{'predictions': [[0.0476838797, 0.173909977, 0.0915973857, 0.029981954, 0.219457656, 0.00450907415, 0.0543082468, 0.008187823, 0.304568678, 0.0657953]]}
{'predictions': [[0.0759346634, 0.0985039324, 0.116473138, 0.0433241054, 0.260845393, 0.00322122383, 0.0469626449, 0.0123444796, 0.282883972, 0.0595064238]]}
{'predictions': [[0.0667591169, 0.0377722494, 0.101670593, 0.083593972, 0.34185648, 0.00489190826, 0.0282242764, 0.0127474749, 0.271570325, 0.0509136431]]}
{'predictions': [[0.074885264, 0.0370866619, 0.106610455, 0.0885797, 0.252809554, 0.00820388552, 0.0418941565, 0.00708836969, 0.306697756, 0.0761441439]]}
{'predictions': [[0.0882823914, 0.0847160295, 0.142984703, 0.099473238, 0.299038827, 0.004494566, 0.0333742052, 0.00832830649, 0.20473294, 0.0345748328]]}
{'predictions': [[0.0606784932, 0.0592611134, 0.163883701, 0.0483892485, 0.270460546, 0.0075587458, 0.0167558268, 0.0162525065, 0.331456423, 0.0253034271]]}
{'predictions': [[0.0850376487, 0.0976636335, 0.141756833, 0.0906533897, 0.266124517, 0.00889984798, 0.0379593633, 0.012274202, 0.199083105, 0.0605473928]]}
{'predictions': [[0.139974594, 0.0209932979, 0.110152908, 0.135149419, 0.247445315, 0.00316160941, 0.043607194, 0.00824826676, 0.242398068, 0.0488693342]]}
{'predictions': [[0.080461, 0.0642398894, 0.180882946, 0.0274151582, 0.264869332, 0.00335265533, 0.0537998937, 0.0140668182, 0.239477515, 0.0714347139]]}
{'predictions': [[0.0993365571, 0.0876696929, 0.0337918811, 0.0376172587, 0.387955576, 0.00310366787, 0.0203587189, 0.00436903723, 0.276718169, 0.0490794666]]}
{'predictions': [[0.0745070204, 0.0974836648, 0.100005902, 0.0316812657, 0.19939211, 0.00821776502, 0.0282482784, 0.0193053782, 0.289186031, 0.151972607]]}
{'predictions': [[0.0761937499, 0.257911444, 0.0936932, 0.0466454849, 0.227876425, 0.00712384889, 0.0443606824, 0.0383024625, 0.171303749, 0.0365889035]]}
{'predictions': [[0.110165693, 0.0951228514, 0.0734109581, 0.0969071537, 0.269149631, 0.00360885891, 0.0675947368, 0.0100677507, 0.190484986, 0.083487317]]}
{'predictions': [[0.0469552875, 0.0283691157, 0.119177647, 0.106253795, 0.264144599, 0.00548252696, 0.0441412665, 0.00704256119, 0.329935819, 0.0484973118]]}
{'predictions': [[0.0429104455, 0.139211923, 0.127249, 0.113401599, 0.244302675, 0.00612696959, 0.0431880206, 0.0201609507, 0.220304772, 0.0431436747]]}
{'predictions': [[0.150555506, 0.0407862104, 0.0792004466, 0.0889020115, 0.360775769, 0.00644372217, 0.0381254964, 0.0161697268, 0.133853361, 0.0851876736]]}
{'predictions': [[0.0907203406, 0.0718188435, 0.143610761, 0.0750679821, 0.194054991, 0.00957031175, 0.0641007349, 0.0161506496, 0.301371098, 0.0335342288]]}
{'predictions': [[0.102563672, 0.109304965, 0.0859460756, 0.0511440225, 0.287603885, 0.00489214296, 0.0418436, 0.0135970907, 0.204372928, 0.0987316594]]}
{'predictions': [[0.0390603617, 0.11417643, 0.0949593857, 0.0468359254, 0.194509432, 0.00351499743, 0.029983459, 0.00911557488, 0.430569172, 0.0372752547]]}
{'predictions': [[0.0597339123, 0.0748178959, 0.131104514, 0.0772406086, 0.198730648, 0.0076195444, 0.0442764536, 0.0117876884, 0.329271764, 0.0654169545]]}