Maggy Distributed HPO, Ablation and Training with Tensorflow example

Maggy is an open-source framework that simplifies writing and maintaining distributed machine learning programs. By encapsulating your training logic in a function, the same code can be run unchanged with Python on your laptop or distributed using PySpark for hyperparameter tuning, data-parallel training, or model-parallel training. With the arrival of GPU support in Spark 3.0, PySpark can be now used to orchestrate distributed deep learning applications in TensorFlow and PySpark.
We are pleased to announce we have now added support for Maggy on Databricks, so training machine learning models with many workers should be as easy as running Python programs on your laptop.

0. Spark Session

First, make sure you have a running Spark Session/Context available.

from pyspark.sql import SparkSession

Make sure you have the right tensorflow version.

%pip install tensorflow-cpu==2.4.1
%pip install scikit-optimize
import tensorflow as tf

Python interpreter will be restarted. Collecting tensorflow-cpu==2.4.1 Downloading tensorflow_cpu-2.4.1-cp37-cp37m-manylinux2010_x86_64.whl (144.1 MB) Collecting flatbuffers~=1.12.0 Downloading flatbuffers-1.12-py2.py3-none-any.whl (15 kB) Requirement already satisfied: google-pasta~=0.2 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from tensorflow-cpu==2.4.1) (0.2.0) Collecting six~=1.15.0 Downloading six-1.15.0-py2.py3-none-any.whl (10 kB) Requirement already satisfied: opt-einsum~=3.3.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from tensorflow-cpu==2.4.1) (3.3.0) Collecting numpy~=1.19.2 Downloading numpy-1.19.5-cp37-cp37m-manylinux2010_x86_64.whl (14.8 MB) Collecting tensorboard~=2.4 Downloading tensorboard-2.5.0-py3-none-any.whl (6.0 MB) Collecting tensorflow-estimator<2.5.0,>=2.4.0 Downloading tensorflow_estimator-2.4.0-py2.py3-none-any.whl (462 kB) Requirement already satisfied: termcolor~=1.1.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from tensorflow-cpu==2.4.1) (1.1.0) Collecting grpcio~=1.32.0 Downloading grpcio-1.32.0-cp37-cp37m-manylinux2014_x86_64.whl (3.8 MB) Requirement already satisfied: typing-extensions~=3.7.4 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from tensorflow-cpu==2.4.1) (3.7.4.3) Collecting wheel~=0.35 Downloading wheel-0.36.2-py2.py3-none-any.whl (35 kB) Requirement already satisfied: protobuf>=3.9.2 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from tensorflow-cpu==2.4.1) (3.11.4) Requirement already satisfied: h5py~=2.10.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from tensorflow-cpu==2.4.1) (2.10.0) Requirement already satisfied: astunparse~=1.6.3 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from tensorflow-cpu==2.4.1) (1.6.3) Collecting wrapt~=1.12.1 Downloading wrapt-1.12.1.tar.gz (27 kB) Collecting absl-py~=0.10 Downloading absl_py-0.12.0-py3-none-any.whl (129 kB) Requirement already satisfied: gast==0.3.3 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from tensorflow-cpu==2.4.1) (0.3.3) Requirement already satisfied: keras-preprocessing~=1.1.2 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from tensorflow-cpu==2.4.1) (1.1.2) Collecting tensorboard-data-server<0.7.0,>=0.6.0 Downloading tensorboard_data_server-0.6.1-py3-none-manylinux2010_x86_64.whl (4.9 MB) Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from tensorboard~=2.4->tensorflow-cpu==2.4.1) (0.4.1) Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from tensorboard~=2.4->tensorflow-cpu==2.4.1) (1.7.0) Requirement already satisfied: requests<3,>=2.21.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from tensorboard~=2.4->tensorflow-cpu==2.4.1) (2.22.0) Requirement already satisfied: markdown>=2.6.8 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from tensorboard~=2.4->tensorflow-cpu==2.4.1) (3.1.1) Requirement already satisfied: setuptools>=41.0.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from tensorboard~=2.4->tensorflow-cpu==2.4.1) (45.2.0.post20200210) Requirement already satisfied: werkzeug>=0.11.15 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from tensorboard~=2.4->tensorflow-cpu==2.4.1) (1.0.0) Requirement already satisfied: google-auth<2,>=1.6.3 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from tensorboard~=2.4->tensorflow-cpu==2.4.1) (1.11.2) Requirement already satisfied: requests-oauthlib>=0.7.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.4->tensorflow-cpu==2.4.1) (1.3.0) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard~=2.4->tensorflow-cpu==2.4.1) (1.25.8) Requirement already satisfied: idna<2.9,>=2.5 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard~=2.4->tensorflow-cpu==2.4.1) (2.8) Requirement already satisfied: certifi>=2017.4.17 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard~=2.4->tensorflow-cpu==2.4.1) (2020.11.8) Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard~=2.4->tensorflow-cpu==2.4.1) (3.0.4) Requirement already satisfied: cachetools<5.0,>=2.0.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard~=2.4->tensorflow-cpu==2.4.1) (4.1.1) Requirement already satisfied: rsa<4.1,>=3.1.4 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard~=2.4->tensorflow-cpu==2.4.1) (4.0) Requirement already satisfied: pyasn1-modules>=0.2.1 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard~=2.4->tensorflow-cpu==2.4.1) (0.2.8) Requirement already satisfied: oauthlib>=3.0.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.4->tensorflow-cpu==2.4.1) (3.1.0) Requirement already satisfied: pyasn1>=0.1.3 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from rsa<4.1,>=3.1.4->google-auth<2,>=1.6.3->tensorboard~=2.4->tensorflow-cpu==2.4.1) (0.4.8) Building wheels for collected packages: wrapt Building wheel for wrapt (setup.py): started Building wheel for wrapt (setup.py): finished with status 'done' Created wheel for wrapt: filename=wrapt-1.12.1-cp37-cp37m-linux_x86_64.whl size=71068 sha256=5c04cfc7530df7b4bb4f6b905dc6ea61e40dc721cfe0e600eeef4c39b05a1669 Stored in directory: /root/.cache/pip/wheels/62/76/4c/aa25851149f3f6d9785f6c869387ad82b3fd37582fa8147ac6 Successfully built wrapt ERROR: torch 1.7.0 requires dataclasses, which is not installed. ERROR: petastorm 0.9.7 requires pyspark>=2.1.0, which is not installed. ERROR: mlflow 1.12.1 requires alembic<=1.4.1, which is not installed. ERROR: mlflow 1.12.1 requires prometheus-flask-exporter, which is not installed. ERROR: mlflow 1.12.1 requires sqlalchemy, which is not installed. ERROR: maggy 0.5.0 has requirement numpy==1.20.1, but you'll have numpy 1.19.5 which is incompatible. Installing collected packages: flatbuffers, six, numpy, tensorboard-data-server, absl-py, wheel, grpcio, tensorboard, tensorflow-estimator, wrapt, tensorflow-cpu Attempting uninstall: six Found existing installation: six 1.14.0 Uninstalling six-1.14.0: Successfully uninstalled six-1.14.0 Attempting uninstall: numpy Found existing installation: numpy 1.20.1 Uninstalling numpy-1.20.1: Successfully uninstalled numpy-1.20.1 Attempting uninstall: absl-py Found existing installation: absl-py 0.9.0 Uninstalling absl-py-0.9.0: Successfully uninstalled absl-py-0.9.0 Attempting uninstall: wheel Found existing installation: wheel 0.34.2 Uninstalling wheel-0.34.2: Successfully uninstalled wheel-0.34.2 Attempting uninstall: grpcio Found existing installation: grpcio 1.27.2 Uninstalling grpcio-1.27.2: Successfully uninstalled grpcio-1.27.2 Attempting uninstall: tensorboard Found existing installation: tensorboard 2.3.0 Uninstalling tensorboard-2.3.0: Successfully uninstalled tensorboard-2.3.0 Attempting uninstall: tensorflow-estimator Found existing installation: tensorflow-estimator 2.3.0 Uninstalling tensorflow-estimator-2.3.0: Successfully uninstalled tensorflow-estimator-2.3.0 Attempting uninstall: wrapt Found existing installation: wrapt 1.11.2 Uninstalling wrapt-1.11.2: Successfully uninstalled wrapt-1.11.2 Attempting uninstall: tensorflow-cpu Found existing installation: tensorflow-cpu 2.3.1 Uninstalling tensorflow-cpu-2.3.1: Successfully uninstalled tensorflow-cpu-2.3.1 Successfully installed absl-py-0.12.0 flatbuffers-1.12 grpcio-1.32.0 numpy-1.19.5 six-1.15.0 tensorboard-2.5.0 tensorboard-data-server-0.6.1 tensorflow-cpu-2.4.1 tensorflow-estimator-2.4.0 wheel-0.36.2 wrapt-1.12.1 Python interpreter will be restarted. Python interpreter will be restarted. Requirement already satisfied: scikit-optimize in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (0.7.4) Requirement already satisfied: scipy>=0.18.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from scikit-optimize) (1.4.1) Requirement already satisfied: pyaml>=16.9 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from scikit-optimize) (20.4.0) Requirement already satisfied: scikit-learn>=0.19.1 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from scikit-optimize) (0.22.1) Requirement already satisfied: joblib>=0.11 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from scikit-optimize) (0.14.1) Requirement already satisfied: numpy>=1.11.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from scikit-optimize) (1.19.5) Requirement already satisfied: PyYAML in /local_disk0/.ephemeral_nfs/envs/pythonEnv-e805b4c7-3fed-413f-8bec-9ff7b31c6903/lib/python3.7/site-packages (from pyaml>=16.9->scikit-optimize) (5.3.1) Python interpreter will be restarted.

1. Model definition

Let’s define the model we want to train. The layers of the model have to be defined in the __init__ function.

Do not instantiate the class, otherwise you won’t be able to use Maggy.

from tensorflow import keras 
from tensorflow.keras.layers import Dense
from tensorflow.keras import Sequential
from tensorflow.keras.optimizers import Adam

# you can use keras.Sequential(), you just need to override it 
# on a custom class and define the layers in __init__()
class NeuralNetwork(Sequential):
        
    def __init__(self, nl=4):
        
        super().__init__()
        self.add(Dense(10,input_shape=(None,4),activation='tanh'))
        if nl >= 4:
          for i in range(0, nl-2):
            self.add(Dense(8,activation='tanh'))
        self.add(Dense(3,activation='softmax'))

model = NeuralNetwork

2. Dataset creation

In this example, we are using the iris dataset. Let’s download the dataset from https://www.kaggle.com/uciml/iris and upload it on your Databricks profile.

You can process the dataset in the notebook and pass it to the configuration classes, or process it during the experiment. In order to do that you have to wrap the processing logic in a function and pass it to the training configuration (this step is currently supported only by TfDistributedConfig).

You need to change the dataset path is correct.

display(dbutils.fs.ls("/FileStore/tables/Iris.csv"))

pathnamesize
dbfs:/FileStore/tables/Iris.csvIris.csv5107

dataset_path = "dbfs:/FileStore/tables/Iris.csv"

train_set, test_set = spark.read.format("csv").option("header","true")\
  .option("inferSchema", "true").load(dataset_path).drop('_c0').randomSplit((0.80, 0.20), seed=0)


raw_train_set = train_set.toPandas().values
raw_test_set = test_set.toPandas().values

raw_train_set

Out[3]: array([[1, 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'], [2, 4.9, 3.0, 1.4, 0.2, 'Iris-setosa'], [3, 4.7, 3.2, 1.3, 0.2, 'Iris-setosa'], [4, 4.6, 3.1, 1.5, 0.2, 'Iris-setosa'], [5, 5.0, 3.6, 1.4, 0.2, 'Iris-setosa'], [6, 5.4, 3.9, 1.7, 0.4, 'Iris-setosa'], [7, 4.6, 3.4, 1.4, 0.3, 'Iris-setosa'], [8, 5.0, 3.4, 1.5, 0.2, 'Iris-setosa'], [9, 4.4, 2.9, 1.4, 0.2, 'Iris-setosa'], [10, 4.9, 3.1, 1.5, 0.1, 'Iris-setosa'], [11, 5.4, 3.7, 1.5, 0.2, 'Iris-setosa'], [12, 4.8, 3.4, 1.6, 0.2, 'Iris-setosa'], [13, 4.8, 3.0, 1.4, 0.1, 'Iris-setosa'], [14, 4.3, 3.0, 1.1, 0.1, 'Iris-setosa'], [15, 5.8, 4.0, 1.2, 0.2, 'Iris-setosa'], [16, 5.7, 4.4, 1.5, 0.4, 'Iris-setosa'], [17, 5.4, 3.9, 1.3, 0.4, 'Iris-setosa'], [18, 5.1, 3.5, 1.4, 0.3, 'Iris-setosa'], [19, 5.7, 3.8, 1.7, 0.3, 'Iris-setosa'], [20, 5.1, 3.8, 1.5, 0.3, 'Iris-setosa'], [23, 4.6, 3.6, 1.0, 0.2, 'Iris-setosa'], [24, 5.1, 3.3, 1.7, 0.5, 'Iris-setosa'], [25, 4.8, 3.4, 1.9, 0.2, 'Iris-setosa'], [26, 5.0, 3.0, 1.6, 0.2, 'Iris-setosa'], [27, 5.0, 3.4, 1.6, 0.4, 'Iris-setosa'], [28, 5.2, 3.5, 1.5, 0.2, 'Iris-setosa'], [32, 5.4, 3.4, 1.5, 0.4, 'Iris-setosa'], [33, 5.2, 4.1, 1.5, 0.1, 'Iris-setosa'], [34, 5.5, 4.2, 1.4, 0.2, 'Iris-setosa'], [35, 4.9, 3.1, 1.5, 0.1, 'Iris-setosa'], [36, 5.0, 3.2, 1.2, 0.2, 'Iris-setosa'], [39, 4.4, 3.0, 1.3, 0.2, 'Iris-setosa'], [41, 5.0, 3.5, 1.3, 0.3, 'Iris-setosa'], [42, 4.5, 2.3, 1.3, 0.3, 'Iris-setosa'], [43, 4.4, 3.2, 1.3, 0.2, 'Iris-setosa'], [44, 5.0, 3.5, 1.6, 0.6, 'Iris-setosa'], [45, 5.1, 3.8, 1.9, 0.4, 'Iris-setosa'], [46, 4.8, 3.0, 1.4, 0.3, 'Iris-setosa'], [47, 5.1, 3.8, 1.6, 0.2, 'Iris-setosa'], [48, 4.6, 3.2, 1.4, 0.2, 'Iris-setosa'], [49, 5.3, 3.7, 1.5, 0.2, 'Iris-setosa'], [51, 7.0, 3.2, 4.7, 1.4, 'Iris-versicolor'], [52, 6.4, 3.2, 4.5, 1.5, 'Iris-versicolor'], [53, 6.9, 3.1, 4.9, 1.5, 'Iris-versicolor'], [55, 6.5, 2.8, 4.6, 1.5, 'Iris-versicolor'], [56, 5.7, 2.8, 4.5, 1.3, 'Iris-versicolor'], [57, 6.3, 3.3, 4.7, 1.6, 'Iris-versicolor'], [58, 4.9, 2.4, 3.3, 1.0, 'Iris-versicolor'], [59, 6.6, 2.9, 4.6, 1.3, 'Iris-versicolor'], [60, 5.2, 2.7, 3.9, 1.4, 'Iris-versicolor'], [64, 6.1, 2.9, 4.7, 1.4, 'Iris-versicolor'], [65, 5.6, 2.9, 3.6, 1.3, 'Iris-versicolor'], [67, 5.6, 3.0, 4.5, 1.5, 'Iris-versicolor'], [68, 5.8, 2.7, 4.1, 1.0, 'Iris-versicolor'], [69, 6.2, 2.2, 4.5, 1.5, 'Iris-versicolor'], [70, 5.6, 2.5, 3.9, 1.1, 'Iris-versicolor'], [71, 5.9, 3.2, 4.8, 1.8, 'Iris-versicolor'], [72, 6.1, 2.8, 4.0, 1.3, 'Iris-versicolor'], [74, 6.1, 2.8, 4.7, 1.2, 'Iris-versicolor'], [75, 6.4, 2.9, 4.3, 1.3, 'Iris-versicolor'], [77, 6.8, 2.8, 4.8, 1.4, 'Iris-versicolor'], [78, 6.7, 3.0, 5.0, 1.7, 'Iris-versicolor'], [79, 6.0, 2.9, 4.5, 1.5, 'Iris-versicolor'], [80, 5.7, 2.6, 3.5, 1.0, 'Iris-versicolor'], [81, 5.5, 2.4, 3.8, 1.1, 'Iris-versicolor'], [82, 5.5, 2.4, 3.7, 1.0, 'Iris-versicolor'], [83, 5.8, 2.7, 3.9, 1.2, 'Iris-versicolor'], [84, 6.0, 2.7, 5.1, 1.6, 'Iris-versicolor'], [85, 5.4, 3.0, 4.5, 1.5, 'Iris-versicolor'], [86, 6.0, 3.4, 4.5, 1.6, 'Iris-versicolor'], [87, 6.7, 3.1, 4.7, 1.5, 'Iris-versicolor'], [89, 5.6, 3.0, 4.1, 1.3, 'Iris-versicolor'], [92, 6.1, 3.0, 4.6, 1.4, 'Iris-versicolor'], [93, 5.8, 2.6, 4.0, 1.2, 'Iris-versicolor'], [94, 5.0, 2.3, 3.3, 1.0, 'Iris-versicolor'], [95, 5.6, 2.7, 4.2, 1.3, 'Iris-versicolor'], [96, 5.7, 3.0, 4.2, 1.2, 'Iris-versicolor'], [97, 5.7, 2.9, 4.2, 1.3, 'Iris-versicolor'], [98, 6.2, 2.9, 4.3, 1.3, 'Iris-versicolor'], [99, 5.1, 2.5, 3.0, 1.1, 'Iris-versicolor'], [100, 5.7, 2.8, 4.1, 1.3, 'Iris-versicolor'], [102, 5.8, 2.7, 5.1, 1.9, 'Iris-virginica'], [103, 7.1, 3.0, 5.9, 2.1, 'Iris-virginica'], [104, 6.3, 2.9, 5.6, 1.8, 'Iris-virginica'], [106, 7.6, 3.0, 6.6, 2.1, 'Iris-virginica'], [107, 4.9, 2.5, 4.5, 1.7, 'Iris-virginica'], [109, 6.7, 2.5, 5.8, 1.8, 'Iris-virginica'], [110, 7.2, 3.6, 6.1, 2.5, 'Iris-virginica'], [113, 6.8, 3.0, 5.5, 2.1, 'Iris-virginica'], [114, 5.7, 2.5, 5.0, 2.0, 'Iris-virginica'], [115, 5.8, 2.8, 5.1, 2.4, 'Iris-virginica'], [116, 6.4, 3.2, 5.3, 2.3, 'Iris-virginica'], [117, 6.5, 3.0, 5.5, 1.8, 'Iris-virginica'], [118, 7.7, 3.8, 6.7, 2.2, 'Iris-virginica'], [119, 7.7, 2.6, 6.9, 2.3, 'Iris-virginica'], [123, 7.7, 2.8, 6.7, 2.0, 'Iris-virginica'], [124, 6.3, 2.7, 4.9, 1.8, 'Iris-virginica'], [126, 7.2, 3.2, 6.0, 1.8, 'Iris-virginica'], [127, 6.2, 2.8, 4.8, 1.8, 'Iris-virginica'], [128, 6.1, 3.0, 4.9, 1.8, 'Iris-virginica'], [129, 6.4, 2.8, 5.6, 2.1, 'Iris-virginica'], [130, 7.2, 3.0, 5.8, 1.6, 'Iris-virginica'], [132, 7.9, 3.8, 6.4, 2.0, 'Iris-virginica'], [133, 6.4, 2.8, 5.6, 2.2, 'Iris-virginica'], [134, 6.3, 2.8, 5.1, 1.5, 'Iris-virginica'], [136, 7.7, 3.0, 6.1, 2.3, 'Iris-virginica'], [137, 6.3, 3.4, 5.6, 2.4, 'Iris-virginica'], [138, 6.4, 3.1, 5.5, 1.8, 'Iris-virginica'], [139, 6.0, 3.0, 4.8, 1.8, 'Iris-virginica'], [140, 6.9, 3.1, 5.4, 2.1, 'Iris-virginica'], [141, 6.7, 3.1, 5.6, 2.4, 'Iris-virginica'], [142, 6.9, 3.1, 5.1, 2.3, 'Iris-virginica'], [143, 5.8, 2.7, 5.1, 1.9, 'Iris-virginica'], [144, 6.8, 3.2, 5.9, 2.3, 'Iris-virginica'], [145, 6.7, 3.3, 5.7, 2.5, 'Iris-virginica'], [146, 6.7, 3.0, 5.2, 2.3, 'Iris-virginica'], [147, 6.3, 2.5, 5.0, 1.9, 'Iris-virginica'], [148, 6.5, 3.0, 5.2, 2.0, 'Iris-virginica'], [149, 6.2, 3.4, 5.4, 2.3, 'Iris-virginica'], [150, 5.9, 3.0, 5.1, 1.8, 'Iris-virginica']], dtype=object)

We can also wrap the data processing in a function and pass it to the training configuration, as we’ll see later.

def process_data(train_set, test_set):
    
    import tensorflow as tf
    from sklearn.preprocessing import LabelBinarizer
    import numpy as np
  
    encoder = LabelBinarizer()
    
    X_train = np.asarray(train_set[:,1:5]).astype('float32')
    y_train = encoder.fit_transform(train_set[:,5])
    X_test = np.asarray(test_set[:,1:5]).astype('float32')
    y_test = encoder.fit_transform(test_set[:,5])

    return (X_train, y_train), (X_test, y_test)
  
train_set, test_set = process_data(raw_train_set, raw_test_set)

test_set

Out[4]: (array([[5.4, 3.4, 1.7, 0.2], [5.1, 3.7, 1.5, 0.4], [5.2, 3.4, 1.4, 0.2], [4.7, 3.2, 1.6, 0.2], [4.8, 3.1, 1.6, 0.2], [5.5, 3.5, 1.3, 0.2], [4.9, 3.1, 1.5, 0.1], [5.1, 3.4, 1.5, 0.2], [5. , 3.3, 1.4, 0.2], [5.5, 2.3, 4. , 1.3], [5. , 2. , 3.5, 1. ], [5.9, 3. , 4.2, 1.5], [6. , 2.2, 4. , 1. ], [6.7, 3.1, 4.4, 1.4], [6.3, 2.5, 4.9, 1.5], [6.6, 3. , 4.4, 1.4], [6.3, 2.3, 4.4, 1.3], [5.5, 2.5, 4. , 1.3], [5.5, 2.6, 4.4, 1.2], [6.3, 3.3, 6. , 2.5], [6.5, 3. , 5.8, 2.2], [7.3, 2.9, 6.3, 1.8], [6.5, 3.2, 5.1, 2. ], [6.4, 2.7, 5.3, 1.9], [6. , 2.2, 5. , 1.5], [6.9, 3.2, 5.7, 2.3], [5.6, 2.8, 4.9, 2. ], [6.7, 3.3, 5.7, 2.1], [7.4, 2.8, 6.1, 1.9], [6.1, 2.6, 5.6, 1.4]], dtype=float32), array([[1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1], [0, 0, 1]]))

3. Wrap the training logics.

The programming model is that you wrap the code containing the logics of your experiment in a function.

For HPO, we have to define a function that has the HPs to be optimized as parameters. Inside the function we simply put the training logic as we were training our model in a single machine using Tensorflow.

def hpo_function(number_layers, reporter):
  
  model = NeuralNetwork(nl=number_layers)
  model.build()
  
  #fitting the model and predicting
  model.compile(Adam(lr=0.04),'categorical_crossentropy',metrics=['accuracy'])
  train_input, test_input = process_data(raw_train_set, raw_test_set)

  train_batch_size = 75
  test_batch_size = 15
  epochs = 10
  
  model.fit(x=train_input[0], y=train_input[1],
            batch_size=train_batch_size,
            epochs=epochs,
            verbose=1)

  score = model.evaluate(x=test_input[0], y=test_input[1], batch_size=test_batch_size, verbose=1)
                         
  print(f'Test loss: {score[0]}')
  print(f'Test accuracy: {score[1]}')

  return score[1]

We do the same for the training function, this time passing the model, train_set, test_set and hparams.

def training_function(model, train_set, test_set, hparams):
    
    model = model(nl=hparams['number_layers'])
    model.build()
    #fitting the model and predicting

    model.compile(Adam(lr=hparams['learning_rate']),'categorical_crossentropy',metrics=['accuracy'])
    
    #raise ValueError(list(train_set.as_numpy_iterator()))

    model.fit(train_set,epochs=hparams['epochs'])

    accuracy = model.evaluate(test_set)

    return accuracy

In the next step we have to create a configuration instance for Maggy. Since in this example we are using Maggy for hyperparameter optimization and distributed training using TensorFlow, we will use OptimizationConfig and TfDistributedConfig.

4. Configure and run distributed HPO

OptimizationConfig contains the information about the hyperparameter optimization. We need to define a Searchspace class that contains the hyperparameters we want to get optimized. In this example we want to search for the optimal number of layers of the neural network from 2 to 8 layers.

OptimizationConfig the following parameters: * num_trials: Controls how many separate runs are conducted during the hp search. * optimizer: Optimizer type for searching the hp searchspace. * searchspace: A Searchspace object configuring the names, types and ranges of hps. * optimization_key: Name of the metric to use for hp search evaluation. * direction: Direction of optimization. * es_interval: Early stopping polling frequency during an experiment run. * es_min: Minimum number of experiments to conduct before starting the early stopping mechanism. Useful to establish a baseline for performance estimates. * es_policy: Early stopping policy which formulates a rule for triggering aborts. * name: Experiment name. * description: A description of the experiment. * hb_interval: Heartbeat interval with which the server is polling. * fixed_hp: Hyperparamets not to be tuned.

from maggy.experiment_config import OptimizationConfig
from maggy import Searchspace

# The searchspace can be instantiated with parameters
sp = Searchspace(number_layers=('INTEGER', [2, 8]))

hpo_config = OptimizationConfig(num_trials=4, optimizer="randomsearch", searchspace=sp, direction="max", es_interval=1, es_min=5, name="hp_tuning_test")

Hyperparameter added: number_layers

Our HPO function and configuration class are now ready, so we can go on and run the HPO experiment. In order to do that, we run the lagom function, passing our training function and the configuration object we instantiated during the last step. Lagom is a swedish word meaning “just the right amount”.

from maggy import experiment

result = experiment.lagom(train_fn=hpo_function, config=hpo_config)

print(result)

You are running Maggy on Databricks.

—— RandomSearch Results —— direction(max) BEST combination {"number_layers": 6} – metric 0.8999999761581421 WORST combination {"number_layers": 8} – metric 0.6333333253860474 AVERAGE metric – 0.7749999910593033 EARLY STOPPED Trials – 0 Total job time 0 hours, 0 minutes, 17 seconds

Finished experiment. {'best_id': 'ce3a082c9201f474', 'best_val': 0.8999999761581421, 'best_config': {'number_layers': 6}, 'worst_id': 'b334fd67693ed413', 'worst_val': 0.6333333253860474, 'worst_config': {'number_layers': 8}, 'avg': 0.7749999910593033, 'metric_list': [0.6666666865348816, 0.8999999761581421, 0.8999999761581421, 0.6333333253860474], 'num_trials': 4, 'early_stopped': 0, 'num_epochs': 0, 'trial_id': 'ef1c8b938213a74d'}

5. Configure and run distributed training

Now it’s time to run the final step of our ML program. Let’s initialize the configuration class for the distributed training. First, we need to define our hyperparameters, we want to take the best hyperparameters from the HPO.

#define the constructor parameters of your model
model_params = {
    #train dataset entries / num_workers
    'train_batch_size': 75,
    #test dataset entries / num_workers
    'test_batch_size': 15,
    'learning_rate': 0.04,
    'epochs': 20,
    'number_layers': result['best_config']['number_layers'],
}

TfDistributedConfig class contains the following parameters: * name: the name of the experiment. * module: the model to be trained (defined in the first step of this guideline). * train_set: the train set as a tuple (x_train, y_train) or the train set path. * test_set: the test set as a tuple (x_test, y_test) or the test set path. * process_data: the function to process the data (if needed). * hparams: the model and dataset parameters. In this case we also need to provide the ‘train_batch_size’ and the ‘test_batch_size’, these values represent the subset sizes of the sharded dataset. It’s value is usually the dataset_size/number_workers but can change depending on your needs.

from maggy.experiment_config.tf_distributed import TfDistributedConfig

training_config = TfDistributedConfig(name="tf_test", model=model, train_set=train_set, test_set=test_set, process_data=process_data, hparams = model_params)

Finally, we are ready to launch the maggy experiment. You just need to pass 2 parameters: the training function and the configuration variable we defined in the previous steps.

experiment.lagom(training_function, training_config)

Final average test loss: 0.346 Finished experiment. Total run time: 0 hours, 0 minutes, 17 seconds Out[11]: {'test result': 0.34621101431548595}