Data Scientist Tutorial

This section takes the first time user through the DKube workflow using a sample program and dataset. The MNIST model is used to provide a simple, successful initial experience.

General Workflow

The workflow demonstrated in this example is as follows:

  • Load the program folder as a DKube Project

  • Load the dataset folder as a DKube Dataset

  • Create a model placeholder for versioned output

  • Create and open a DKube JupyterLab Notebook

  • Create a Training Run

  • Test Model Inference

Create Project

Load the MNIST program folder from a GitHub repository into DKube from the Repo menu by selecting “+ Project”.

_images/Data_Scientist_Projects_Tutorial.png

The fields should be filled in as follows, then select “Add Project”.

Field

Value

Name

mnist

Project Source

Git

url

https://github.com/oneconvergence/dkube-examples/tree/2.0/tensorflow/classification/mnist/digits/classifier/program

Branch

2.0

_images/Data_Scientist_Project_mnist.png

This will create the mnist Project.

_images/Data_Scientist_Project_Success.png

Create Dataset

Load the MNIST dataset folder from a GitHub repository into DKube from the Datasets menu by selecting “+ Dataset”.

_images/Data_Scientist_Datasets_Tutorial.png

The fields should be filled in as follows, then select “Add Dataset”.

Field

Value

Name

mnist

Dataset Source

Git

url

https://github.com/oneconvergence/dkube-examples/tree/2.0/tensorflow/classification/mnist/digits/classifier/data

Branch

2.0

_images/Data_Scientist_Dataset_mnist.png

This will create the mnist Dataset.

_images/Data_Scientist_Dataset_Success.png

Create Model

A Model needs to be created that will become the basis of the output of the Training Run later in the process.

_images/Data_Scientist_Models_Tutorial.png

The fields should be filled in as follows, then select “Add Model”.

Field

Value

Name

mnist

Versioning

DVS

Model Store

default

Model Source

None

_images/Data_Scientist_Models_mnist.png

Create Notebook

Create a JupyterLab Notebook from the IDE menu to experiment with the program by selecting “+ JupyterLab”.

_images/Data_Scientist_Notebooks_Tutorial.png

Fill in the fields as shown.

Basic Submission Screen

Field

Value

Name

mnist

_images/Data_Scientist_Notebook_mnist_Basic.png

All the other fields should be left in their default state. No not submit at this point. Select the “Repos” tab.

Repo Submission Screen

_images/Data_Scientist_Notebook_Project_Select.png

Field

Value

Project

mnist

_images/Data_Scientist_Notebook_mnist_Repo_Project.png
_images/Data_Scientist_Notebook_Dataset_Select.png

Field

Value

Dataset

mnist

Version

Select ver 1

Mount Path

/opt/dkube/input

_images/Data_Scientist_Notebook_mnist_Repo_Dataset.png

The mount path is the path that is used within the program code to access the input dataset.

All the other fields should be left in their default state. Select “Submit” to start the Notebook.

Note

The initial Notebook will take a few minutes to start. Follow-on Notebooks with the same framework version will start more quickly.

_images/Data_Scientist_Notebook_Success.png

While on this screen, start the default DKube notebook instance by selecting the “Start” icon.

_images/Data_Scientist_Notebook_DKube.png

Open JupyterLab Notebook

Open a JupyterLab notebook by selecting the Jupyter icon under “Actions” on the far right.

_images/Data_Scientist_Jupyter_mnist.png

Create Training Run

Create a Training Run from the Runs menu to train the mnist model on the dataset and create a trained model.

_images/Data_Scientist_Run_mnist.png

Fill in the fields as shown.

Basic Submission Screen

Field

Value

Name

mnist

Start-up script

python model.py

_images/Data_Scientist_Run_mnist_Basic.png

All the other fields should be left in their default state. Select the “Repos” tab.

Repos Submission Screen

In order to submit a Training Run:

  • A Project and Dataset need to be selected for input

  • A Model needs to be selected for output

Input Selections

_images/Data_Scientist_Run_Project_Select.png

Field

Value

Project

mnist

_images/Data_Scientist_Run_mnist_Repo_Project.png
_images/Data_Scientist_Run_Dataset_Select.png

Field

Value

Dataset

mnist

Version

Select ver 1

Mount Path

/opt/dkube/input

The mount path is the path that is used within the program code to access the input dataset.

_images/Data_Scientist_Run_mnist_Repo_Dataset.png

Output Selection

A Model needs to be selected for the Training Run output.

_images/Data_Scientist_Run_Model_Select.png

Field

Value

Model

new-model

Version

Select ver 1

Mount Path

/opt/dkube/output

The Mount Path corresponds to the path within the Program code where the output model will be written. After the fields have been completed, select “Submit”.

_images/Data_Scientist_Run_mnist_Repo_Model.png

Note

The initial Run will take a few minutes to start. Follow-on Runs with the same framework version will start more quickly.

The Training Run will appear in the “All Runs” tab.

_images/Data_Scientist_Run_Success.png

Create Test Inference

Once the Run status shows “Complete”, it indicates that a trained Model has been created. The trained Model will appear in the Models Repo.

_images/Data_Scientist_Models_Trained_mnist.png

Selecting the trained Model will provide the details on the model, including the versions.

_images/Data_Scientist_Models_mnist_Detail.png
  • Ver 1 of the model is the initial blank version that was created earlier in the tutorial in order to set up the versioning capability

  • Ver 2 is the new model that was created by the training run

_images/Data_Scientist_Models_mnist_Version.png

While on this screen, create a test inference for the model using the “Test Inference” button. This will show a pop-up window, where you can enter the inference name, and select whether to use a CPU or GPU inference.

Field

Value

Name

mnist

CPU/GPU

CPU

Preprocessing

Select

Docker Image url

ocdr/mnist-example-preprocess:2.0.4

_images/Data_Scientist_Models_mnist_Test_Inference.png

The lineage for the model can viewed by selecting the “Lineage” tab. This shows all of the inputs that were used to create the model.

_images/Data_Scientist_Models_mnist_Lineage.png

The test inference is viewed from the “Test Inferences” menu. Once the status of the test inference shows “Running”, the “Endpoint” column provides the API that is serving the model.

_images/Data_Scientist_Inferences_mnist.png

Test the Trained Model

A Model can be tested for inference accuracy. This is accomplished from the special DKube Notebook started previously in this tutorial.

_images/Data_Scientist_Notebooks_DKube_Select.png

After Opening JupyterLab for the DKube notebook, navigate to the folder tools and select the dkube.html file. This will show a window with several applications.

Before choosing any application, select the Trust HTML button at the top of the JupyterLab window. The mode toggles, so selecting it will instruct JupyterLab to allow the application to be opened.

After this step, select the “DKube Inference** application by right-clicking and opening a new tab.

_images/Data_Scientist_Infapp_Select.png

This will open a tab that contains the test inference application. Fill in the fields as follows:

Field

Value

Model Serving url

Endpoint API from the Inferences screen highlighted in the previous section

Authorization Token

OAuth token from the Developer Settings menu at the top right of the screen

Model Type

mnist

Upload Image

As described below

_images/Data_Scientist_Developer.png

The test image can be downloaded to your local workstation from https://oneconvergence.com/guide/downloads/3.png and used in the Upload Image field.

_images/Data_Scientist_Notebooks_Jupyter_Tools_Infapp_Screen.png