deepforge-dev · brollb · Jul 21, 2020 · Jul 1, 2020 · Jul 1, 2020 · Jul 15, 2020
diff --git a/docs/examples/application-pipelines.png b/docs/examples/application-pipelines.png
diff --git a/docs/examples/basic-io.png b/docs/examples/basic-io.png
diff --git a/docs/examples/display-cifar.png b/docs/examples/display-cifar.png
diff --git a/docs/examples/display-rand-img.png b/docs/examples/display-rand-img.png
diff --git a/docs/examples/download.png b/docs/examples/download.png
diff --git a/docs/examples/redshift.rst b/docs/examples/redshift.rst
@@ -0,0 +1,66 @@
+Redshift Example Project
+========================
+
+This branch provides a small collection of generalized pipelines for the training and utilization of red-shift estimation models. This branch is designed to allow simple use by only requiring that the configuration parameters of individual nodes be defined where necessary. The most involved alterations that should be necessary for most users is the definition of additional architectures in the **Resources** tab. It should be noted that any newly defined architecture should have an output length and input shape that match the *num_bins* and *input_shape* configuration parameters being used in the various pipelines.
+
+Pipeline list
+-------------
+
+* `Train Test Single`_
+* `Train Test Compare`_
+* `Download Train Evaluate`_
+* `Train Predict`_
+* `Predict Pretrained`_
+* `Test Pretrained`_
+* `Download SDSS`_
+* `Download Train Predict`_
+
+.. * `Visualize Predictions`_
+.. * `Train Visualize`_
+
+.. figure:: application-pipelines.png
+    :align: center
+    :width: 75%
+
+Pipelines
+---------
+
+Train Test Single
+~~~~~~~~~~~~~~~~~
+Trains and evaluates a single CNN model.  Uses predefined artifacts that contain the training and testing data. For this and all training pipelines, the artifacts should each contain a single numpy array. Input arrays should be a 4D array of shape **(n, y, x, c)** where n=number of images, y=image height,x=image width, and c=number of color channels. Output (label) arrays should be of shape **(n,)** .
+
+.. Visualize Predictions
+.. ~~~~~~~~~~~~~~~~~~~~~
+
+
+Train Test Compare
+~~~~~~~~~~~~~~~~~~
+Trains and evaluates two CNN models and compares effectiveness of the models.
+
+Download Train Evaluate
+~~~~~~~~~~~~~~~~~~~~~~~
+Downloads SDSS images, trains a model on the images, and evaluates the model on a separate set of downloaded images.  Care should be taken when defining your own CasJobs query to ensure that all queried galaxies for training have a redshift value below the **Train** node’s *max_val* configuration parameter’s value.
+
+Train Predict
+~~~~~~~~~~~~~
+Trains a single CNN model and uses the newly trained model to predict the redshift value of another set of galaxies.
+
+Predict Pretrained
+~~~~~~~~~~~~~~~~~~
+Predicts the redshift value of a set of galaxies using a pre-existing model that is saved as an artifact.
+
+Test Pretrained
+~~~~~~~~~~~~~~~
+Evaluates the performance of a pre-existing model that is saved as an artifact.
+
+.. Train Visualize
+.. ~~~~~~~~~~~~~~~
+
+
+Download SDSS
+~~~~~~~~~~~~~
+Download SDSS images and save them as artifacts. Can be used in conjunction with the other pipelines that rely on artifacts rather than images retrieved at execution time.
+
+Download Train Predict
+~~~~~~~~~~~~~~~~~~~~~~
+Download SDSS images and use some images to train a model before using the model to predict the redshift value of the remaining galaxies.
diff --git a/docs/examples/rs-tutorial.rst b/docs/examples/rs-tutorial.rst
@@ -0,0 +1,163 @@
+Tutorial Project - Redshift
+===========================
+
+Pipeline list
+-------------
+1. `Basic Input/Output`_
+2. `Display Random Image`_
+3. `Display Random CIFAR-10`_
+4. `Train CIFAR-10`_
+5. `Train-Test`_
+6. `Train-Test-Compare`_
+7. `Download-Train-Evaluate`_
+
+.. 6. `Visualize Predictions`_
+
+Pipelines
+---------
+
+
+Basic Input/Output
+~~~~~~~~~~~~~~~~~~
+This pipeline provides one of the simplest examples of a pipeline possible in DeepForge. Its sole purpose is to create an array of numbers, pass the array from the first node to the second node, and print the array to the output console.
+
+.. figure:: basic-io.png
+    :align: center
+
+
+.. code-block:: python
+
+    import numpy
+
+    class GenArray():
+        def __init__(self, length=10):
+            self.length = length
+            return
+
+        def execute(self):
+            arr = list(numpy.random.rand(self.length))
+            return arr
+
+
+Display Random Image
+~~~~~~~~~~~~~~~~~~~~
+.. figure:: display-rand-img.png
+    :align: center
+
+This pipeline’s primary purpose is to show how graphics can be output and viewed. A random noise image is generated and displayed using matplotlib’s pyplot library.  Any graphic displayed using the **plt.show()** function can be viewed in the executions tab.
+
+.. code-block:: python
+
+    from matplotlib import pyplot as plt
+    from random import randint
+
+    class DisplayImage():
+        def execute(self, image):
+            if len(image.shape) == 4:
+                image = image[randint(0, image.shape[0] - 1)]
+            plt.imshow(image)
+            plt.show()
+
+Display Random CIFAR-10
+~~~~~~~~~~~~~~~~~~~~~~~
+.. figure:: display-cifar.png
+    :align: center
+
+As  with  the  previous  pipeline,  this  pipeline  simply  displays  a  single  image.   The  image  from  this pipeline, however, is more meaningful, as it is drawn from the commonly used CIFAR-10 dataset. This pipeline seeks to provide an example of the input being used in the next pipeline while providing an example of how the data can be obtained. This is important for users who seek to develop their own pipelines, as CIFAR-10 data generally serves as an effective baseline for testing and development of new CNN architectures or training processes.
+
+Also note, as shown in the figure above, that it is not necessary to utilize all of the outputs of a given node. Unless specifically handled, however, it is generally inappropriate for an input to be left undefined.
+
+.. code-block:: python
+
+    from keras.datasets import cifar10
+
+    class GetDataCifar():
+        def execute(self):
+            ((train_imgs, train_labels),
+            (test_imgs, test_labels)) = cifar10.load_data()
+            return train_imgs, train_labels, test_imgs, test_labels
+
+Train CIFAR-10
+~~~~~~~~~~~~~~
+.. figure:: train-basic.png
+    :align: center
+
+This pipeline gives a very basic example of how to create, train, and evaluate a simple CNN. The primary takeaway from this pipeline should be the overall structure of a training pipeline, which should follow the following steps in most cases:
+
+1. Load data
+2. Define the loss, optimizer, and other metrics
+3. Compile model, with loss, metrics, and optimizer, using the **compile()** method
+4. Train model using the **fit()** method, which requires the training inputs and outputs
+5. Output the trained model for serialization and/or utilization in subsequent nodes
+
+.. code-block:: python
+
+    import numpy as np
+    import keras
+
+    class TrainBasic():
+        def __init__(self, model, epochs=20, batch_size=32, shuffle=True):
+            self.model = model
+            self.epochs = epochs
+            self.batch_size = batch_size
+            self.shuffle = shuffle
+            return
+
+        def execute(self, train_imgs, train_labels):
+            opt = keras.optimizers.rmsprop(lr=0.001)
+            self.model.compile(loss='sparse_categorical_crossentropy',
+                            optimizer=opt,
+                            metrics=['sparse_categorical_accuracy'])
+            self.model.fit(train_imgs,
+                        train_labels,
+                        batch_size=self.batch_size,
+                        epochs=self.epochs,
+                        shuffle=self.shuffle,
+                        verbose=2)
+            model = self.model
+            return model
+
+.. code-block:: python
+
+    class EvalBasic():
+        def __init__(self):
+            return
+
+        def execute(self, model, test_imgs, test_labels):
+            results = model.evaluate(test_imgs, test_labels, verbose=0)
+            for i, metric in enumerate(model.metrics_names):
+                print(metric,'-',results[i])
+            return results
+
+Train-Test
+~~~~~~~~~~
+.. figure:: train-basic.png
+    :align: center
+
+This pipeline provides an example of how one might train and evaluate a redshift estimation model.For the training process, there are two primary additions that should be noted.
+
+First, the **Train** class has been given a function named **to_categorical**.  Because we are using categorization models for redshift estimation in this tutorial, the keras model expects the output labels to be either one-hot vectors or a single integer where the position/value indicates the range in which the true redshift value falls. This function converts the continuous redshift values into the necessary discrete, categorical format.
+
+Second, a class has been provided to give examples of how researchers may dene their own Sequence for training. Sequences are helpful in that they allow alterations to be made to the data during3 training. In the example given here, the **SdssSequence** class provides the ability to rotate or flip images before every epoch, which will hopefully improve the robustness of the final model.
+
+The evaluation node has also been updated to provide metrics more in line with redshift estimation. Specifically, it calculates the fraction of outlier predictions, the model’s prediction bias, thedeviation in the MAD scores of the model output, and the average Continuous Ranked Probability Score (CRPS) of the output.
+
+
+.. Visualize Predictions
+.. ~~~~~~~~~~~~~~~~~~~~~
+
+
+Train-Test-Compare
+~~~~~~~~~~~~~~~~~~
+.. figure:: train-compare.png
+    :align: center
+
+This pipeline gives a more complicated example of how to create visualizations that may be helpful for understanding the effectiveness of a model. The **EvalCompare** node provides a simple comparison visualization of two models.
+
+
+Download-Train-Evaluate
+~~~~~~~~~~~~~~~~~~~~~~~
+.. figure:: download.png
+    :align: center
+
+This pipeline provides an example of how data can be retrieved and utilized in the same pipeline. The previous pipelines use manually uploaded artifacts.  In many real cases, users may desire to retrieve novel data or more specific data using SciServer’s CasJobs API. In such cases, the **DownloadSDSS** node here makes downloading data relatively simple for users. It should be noted that the data downloaded is not in a form easily usable by our models and first requires moderate preprocessing, which is performed in the **Preprocessing** node. This general structure of download-process-train is a common pattern, as data is rarely supplied in a clean, immediately usable format.
diff --git a/docs/examples/train-basic.png b/docs/examples/train-basic.png
diff --git a/docs/examples/train-compare.png b/docs/examples/train-compare.png
diff --git a/docs/examples/train-single.png b/docs/examples/train-single.png
diff --git a/docs/fundamentals/artifacts_tab.png b/docs/fundamentals/artifacts_tab.png
diff --git a/docs/fundamentals/custom_serializer.png b/docs/fundamentals/custom_serializer.png
diff --git a/docs/fundamentals/custom_utils.png b/docs/fundamentals/custom_utils.png
diff --git a/docs/fundamentals/execute_pipeline.png b/docs/fundamentals/execute_pipeline.png
diff --git a/docs/fundamentals/execution_finished.png b/docs/fundamentals/execution_finished.png
diff --git a/docs/fundamentals/executions_tab.png b/docs/fundamentals/executions_tab.png
diff --git a/docs/fundamentals/import_artifact.png b/docs/fundamentals/import_artifact.png
diff --git a/docs/fundamentals/interface.rst b/docs/fundamentals/interface.rst
@@ -0,0 +1,105 @@
+Deepforge Interface
+===================
+The Deepforge editor interface is separated into six views for defining all of the necessary features of your desired project. The details of each interface tab are detailed below. You can switch to any of the views at any time by clicking the appropriate icon on the left side of the screen. In order, the tabs are:
+
++---------------+--------------------------+
+| |tabs|        | - Pipelines_             |
+|               | - Executions_            |
+|               | - Resources_             |
+|               | - Artifacts_             |
+|               | - `Custom Utils`_        |
+|               | - `Custom Serialization`_|
++---------------+--------------------------+
+
+.. |tabs| image:: interface_tabs.png
+
+Pipelines
+---------
+.. figure:: pipelines_tab.png
+    :align: center
+    :width: 75%
+
+In the initial view, all pipelines that currently exist in the project are displayed. New pipelines can be created using the red plus symbol in the bottom right. From this screen, existing pipelines can also be opened for editing, deleted, or renamed. Pipelines in this list are arranged automatically by the system and cannot be manually reordered in the current implementation.
+
+Pipeline editing
+~~~~~~~~~~~~~~~~
+.. figure:: pipeline_example.png
+    :align: center
+    :width: 50%
+
+Pipelines are composed of a directed graph of nodes, where each node is an isolated python module. Nodes are added to a pipeline using the red plus button in the bottom right of the workspace. Any nodes that have previously been defined in the project can be added to the pipeline, or new operations can be created when needed. Arrows in the workspace indicate the passing of data between nodes. These arrows can be created by clicking on the desired output (bottom circles) of the first node before clicking on the desired input (top circles) of the second node. Clicking on a node also gives the options to delete (red X), edit (blue </>), or change attributes. Information on the editing of nodes can be found in `Custom Operations <custom_operations.rst>`_
+
+Pipelines are executed by clicking the yellow play button in the bottom right of the workspace. In the window that appears, you can name the execution, select a computation platform, and select a storage platform. The computation platform can either be SciServer's Compute service or a WebGME platform. The available storage platforms are SciServer's Files service and Amazon's S3 service. The provided storage option will be used for storing both the output objects defined in the pipeline, as well as all files used in execution of the pipeline. Login credentials will be required for SciServer computation service, either storage service, and each individual input node in the pipeline.
+
+.. figure:: execute_pipeline.png
+    :align: center
+    :width: 75%
+
+Executions
+----------
+.. figure:: executions_tab.png
+    :align: center
+    :width: 75%
+
+This view allows the review of previous pipeline executions. Clicking on any execution will display any plotted data generated by the pipeline, and selecting multiple executions will display all of the selected plots together. Clicking the provided links will open either the associated pipeline or a trace of the execution (shown below). The blue icon in the top right of every node allows viewing the text output of that node. The execution trace can be viewed during execution to check the status of a running job. During execution, the color of a node indicates its current status. The possible statuses are:
+
+- **Dark gray**: Awaiting initialization
+- **Light gray**: Awaiting execution
+- **Yellow**: Currently executing
+- **Green**: Successfully finished execution
+- **Red**: Execution failed
+
+.. figure:: execution_finished.png
+    :align: center
+    :width: 50%
+
+Resources
+---------
+.. figure:: resources_tab.png
+    :align: center
+    :width: 75%
+
+This view shows the available neural network resources available to your pipelines. From this view, resources can be created, deleted, and renamed. Resources are arranged by the deepforge system and cannot by manually reordered.
+
+.. figure:: neural_network.png
+    :align: center
+    :width: 50%
+
+As with pipelines, the neural networks are depicted as directed graphs. Each node in the graph corresponds to a single layer or operation in the network (information on operations can be found on the `keras website <https://keras.io/api/>`_). Clicking on a node provides the ability to change the attributes of that layer, delete the layer, or add new layers before or after the current node. Many operations require that certain attributes be defined before use. The Conv2D node pictured above, for example, requires that the *filters* and *kernel_size* attributes be defined. If these are left as *<none>*, a visual indicator will show that there is an error to help prevent mistakes. In order to ease analysis and development, hovering over any connecting line will display the shape of the data as it moves between the given layers.
+
+Artifacts
+---------
+.. figure:: artifacts_tab.png
+    :align: center
+    :width: 75%
+
+In this view, you can see all artifacts that are available to your pipelines. These artifacts can be used in any pipeline through the inclusion of the built in **Input** node. Artifacts are, by default, only supported in the form of either keras models (such as those created using the `keras.model.save_model <https://keras.io/api/models/model_saving_apis/#save_model-function>`_ function) or python pickle objects. Other artifact types can also be used, but require the definition of a `custom serialization <Custom Serialization_>`_. A new artifact can be created in one of three ways. First, artifacts are automatically during the execution of any pipeline that includes the built-in **Output** node. Second, artifacts can be directly uploaded in this view using the red upload button in the bottom right of the workspace. Using this option will also upload the artifact to the storage platform specified in the popup window. Finally, artifacts that already exist in one of the storage platforms can be imported using the blue import button in the bottom right of the workspace.
+
+|import| |upload|
+
+.. |import| image:: import_artifact.png
+    :width: 45%
+.. |upload| image:: upload_artifact.png
+    :width: 45%
+
+
+Custom Utils
+------------
+.. figure:: custom_utils.png
+    :align: center
+    :width: 75%
+
+This view allows the creation and editing of custom utility modules. Utilities created here can be imported into any pipeline node. For example, the *swarp_config_string* shown above can be printed out in a node using the following code:
+
+.. code-block:: python
+
+    import utils.swarp_string as ss
+    print(ss.swarp_config_string)
+
+Custom Serialization
+--------------------
+.. figure:: custom_serializer.png
+    :align: center
+    :width: 75%
+
+In this view, you can create custom serialization protocols for the creation and use of artifacts that are neither python pickle objects nor keras models. To create a serialization, you will need to define two functions, one for serialization and one for deserialization. These functions must then be passed as arguments to the *deepforge.serialization.register* function as shown in the commented code above. The serializer and deserializer should have the same signatures as the dump and load functions respectively from python's `pickle module <https://docs.python.org/3/library/pickle.html>`_.
diff --git a/docs/fundamentals/interface_tabs.png b/docs/fundamentals/interface_tabs.png
diff --git a/docs/fundamentals/neural_network.png b/docs/fundamentals/neural_network.png
diff --git a/docs/fundamentals/pipeline_example.png b/docs/fundamentals/pipeline_example.png
diff --git a/docs/fundamentals/pipelines_tab.png b/docs/fundamentals/pipelines_tab.png
diff --git a/docs/fundamentals/resources_tab.png b/docs/fundamentals/resources_tab.png
diff --git a/docs/fundamentals/upload_artifact.png b/docs/fundamentals/upload_artifact.png
diff --git a/docs/index.rst b/docs/index.rst
@@ -17,6 +17,7 @@ Welcome to DeepForge's documentation!
    :maxdepth: 1
    :caption: Fundamentals
 
+   fundamentals/interface.rst
    fundamentals/custom_operations.rst
    fundamentals/integration.rst
 
@@ -29,6 +30,13 @@ Welcome to DeepForge's documentation!
    deployment/overview.rst
    deployment/native.rst
 
+.. toctree::
+   :maxdepth: 1
+   :caption: Example Projects
+
+   examples/rs-tutorial.rst
+   examples/redshift.rst
+
 .. toctree::
    :maxdepth: 1
    :caption: Reference