mobie · constantinpape · Jun 2, 2022 · May 24, 2022 · May 24, 2022 · May 25, 2022
diff --git a/README.md b/README.md
@@ -28,8 +28,10 @@ $ pip install -e .
 
 ## Usage
 
-The library contains functionality to generate MoBIE projects and add data to it.
-Check out [the example notebook](https://github.com/mobie/mobie-utils-python/blob/master/examples/create_mobie_project.ipynb) to see how to generate a MoBIE project.
+The library contains functionality to generate MoBIE projects, add data to it and create complex views.
+For complete examples, please check out the [examples](https://github.com/mobie/mobie-utils-python/blob/master/examples):
+- [normal project creation](https://github.com/mobie/mobie-utils-python/blob/master/examples/create_mobie_project.ipynb): generate a MoBIE project for multi-modal data from a CLEM experiment
+- [htm project creation](https://github.com/mobie/mobie-utils-python/blob/master/examples/create_mobie_htm_project.ipynb): generate a MoBIE project for high-throughput microscopy from a imaging based SARS-CoV-2 antibody assay.
 
 Below is a short code snippet that shows how to use it in a python script.
 
@@ -74,4 +76,4 @@ Run `<COMMAND-NAME> --help` to get more information on how to use them.
 
 ## Citation
 
-If you use the MoBIE framework in your research, please cite [Whole-body integration of gene expression and single-cell morphology](https://www.biorxiv.org/content/10.1101/2020.02.26.961037v1).
+If you use the MoBIE framework in your research, please cite [the MoBIE bioRxiv preprint](https://www.biorxiv.org/content/10.1101/2022.05.27.493763v1).
diff --git a/examples/README.md b/examples/README.md
@@ -0,0 +1,6 @@
+# MoBIE python examples
+
+This folder contains notebooks that demonstrate the usage of the MoBIE python library.
+Currently, we have the following two example notebooks:
+- [create_mobie_project](https://github.com/mobie/mobie-utils-python/blob/master/examples/create_mobie_project.ipynb): generate a MoBIE project for multi-modal data from a CLEM experiment
+- [create_mobie_htm_project](https://github.com/mobie/mobie-utils-python/blob/master/examples/create_mobie_htm_project.ipynb): generate a MoBIE project for high-throughput microscopy from a imaging based SARS-CoV-2 antibody assay.
diff --git a/examples/create_mobie_htm_project.ipynb b/examples/create_mobie_htm_project.ipynb
@@ -0,0 +1,332 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "6da32cff",
+   "metadata": {},
+   "source": [
+    "# Create MoBIE HTM Project\n",
+    "\n",
+    "Create a MoBIE project for high-throughput-microscopy data. The test data for this example is available here: https://owncloud.gwdg.de/index.php/s/eu8JMlUFZ82ccHT. It contains 3 wells  of a plate from a immunofluorescence based SARS-CoV-2 antibody assay from https://onlinelibrary.wiley.com/doi/full/10.1002/bies.202000257."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "38524f13",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# general imports\n",
+    "import os\n",
+    "import string\n",
+    "from glob import glob\n",
+    "\n",
+    "import mobie\n",
+    "import mobie.htm as htm\n",
+    "import pandas as pd\n",
+    "\n",
+    "# the location of the data\n",
+    "# adapt these paths to your system and the input data you are using\n",
+    "\n",
+    "# location of the input data. \n",
+    "# the example data used in this notebook is available via this link:\n",
+    "# https://oc.embl.de/index.php/s/IV1709ZlcUB1k99\n",
+    "example_input_folder = \"/home/pape/Work/data/htm-test-data\"\n",
+    "\n",
+    "# the location of the mobie project that will be created\n",
+    "# we recommend that the mobie project folders have the structure <PROECJT_ROOT_FOLDER/data>\n",
+    "# the folder 'data' will contain the sub-folders for individual datasets\n",
+    "mobie_project_folder = \"/home/pape/Work/data/mobie/mobie_htm_project/data\"\n",
+    "\n",
+    "# name of the dataset that will be created.\n",
+    "# one project can contain multiple datasets\n",
+    "dataset_name = \"example-dataset\"\n",
+    "dataset_folder = os.path.join(mobie_project_folder, dataset_name)\n",
+    "\n",
+    "# the platform and number of jobs used for computation.\n",
+    "# choose 'local' to run computations on your machine.\n",
+    "# for large data, it is also possible to run computation on a cluster;\n",
+    "# for this purpose 'slurm' (for slurm cluster) and 'lsf' (for lsf cluster) are currently supported\n",
+    "target = \"local\"\n",
+    "max_jobs = 4"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe5a2a13",
+   "metadata": {},
+   "source": [
+    "## Adding image data\n",
+    "\n",
+    "First, we add all the image data for the 3 wells. Here, we have 3 channels:\n",
+    "- `serum`: showing the measured immunofluorescence of the human serum\n",
+    "- `marker`: showing a marker channel for viral RNA\n",
+    "- `nuclei`: showing the nuclei stained with DAPI\n",
+    "\n",
+    "The function `htm.add_images` will add sources to the dataset metadata for all `input_files` that are passed.\n",
+    "It **will not** add corresponding views to show the individual images. Instead, we will add a grid view below that recreates the plate layout and where all image (and segmentation) sources can be toggled on and off."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "aeb0deb1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# the individual images are stored as h5 files in the folder with the example data.\n",
+    "# each hdf5 file contains multiple datasets, each corresponding to a different image channel (or segmentation)\n",
+    "input_files = glob(os.path.join(example_input_folder, \"*.h5\"))\n",
+    "input_files.sort()\n",
+    "\n",
+    "# the resolution in micron for this data, as well as the downscaling factors and chunks to be used in the data conversion\n",
+    "resolution = [0.65, 0.65]\n",
+    "scale_factors = 4 * [[2, 2]]\n",
+    "chunks = [512, 512]\n",
+    "\n",
+    "# the 3 image channels (each stored as dataset in the h5 file corresponding to the site)\n",
+    "channels = [\"serum\", \"marker\", \"nuclei\"]\n",
+    "for channel_name in channels:\n",
+    "    # image_names determines the names for the corresponding image sources in MoBIE\n",
+    "    image_names = [os.path.splitext(os.path.basename(im))[0] for im in input_files]\n",
+    "    image_names = [f\"{channel_name}-{name}\" for name in image_names]\n",
+    "\n",
+    "    htm.add_images(input_files, mobie_project_folder, dataset_name,\n",
+    "                   image_names, resolution, scale_factors, chunks, key=channel_name,\n",
+    "                   target=target, max_jobs=max_jobs, file_format=\"ome.zarr\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dbfe819f",
+   "metadata": {},
+   "source": [
+    "## Add segmentation data\n",
+    "\n",
+    "Next, we add the segmentation data. Here, we have 2 segmentations per site:\n",
+    "- `cells`: the segmentation of individual cells\n",
+    "- `nuclei`: the segmentation of individual nuclei\n",
+    "\n",
+    "`htm.add_segmentations` works very similar to `htm.add_images`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "49d15bd5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "segmentation_names = [\"cells\", \"nuclei\"]\n",
+    "for seg_name in segmentation_names:\n",
+    "    image_names = [os.path.splitext(os.path.basename(im))[0] for im in input_files]\n",
+    "    image_names = [f\"segmentation-{seg_name}-{name}\" for name in image_names]\n",
+    "    \n",
+    "    htm.add_segmentations(input_files, mobie_project_folder, dataset_name,\n",
+    "                          image_names, resolution, scale_factors, chunks, key=f\"segmentation/{seg_name}\",\n",
+    "                          target=target, max_jobs=max_jobs, file_format=\"ome.zarr\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d7e7a6f",
+   "metadata": {},
+   "source": [
+    "## Add views to create plate layout\n",
+    "\n",
+    "Finally, we create the view with the plate layout and data, using MoBIE `grid` transformations and `regionDisplays`.\n",
+    "In addition to the layout, we can also add tables associated with wells, or with individual sites (=image positions). Here, we can use the example table for our test data from: https://owncloud.gwdg.de/index.php/s/m1ILROJc7Chnu9h"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "79793e3e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# first, we need to define function that translate source names to site names, site_names to well names and \n",
+    "# that return the 2d grid position for a given well\n",
+    "\n",
+    "\n",
+    "# extract the site name (= Well name and position in well for an image)\n",
+    "# here, the site name comes in the source name after the source prefix, i.e.\n",
+    "# source_name = f\"{prefix}_{site_name}\"\n",
+    "def to_site_name(source_name, prefix):\n",
+    "    return source_name[(len(prefix) + 1):]\n",
+    "\n",
+    "\n",
+    "# extract the well name from the site name.\n",
+    "# here, the site name consists of well name and position in the well, i.e.\n",
+    "# source_name = f\"{well_name}_{position_in_well}\"\n",
+    "def to_well_name(site_name):\n",
+    "    return site_name.split(\"_\")[0]\n",
+    "\n",
+    "\n",
+    "# map the well name to its position in the 2d grid\n",
+    "# here, the Wells are called C01, C02, etc.\n",
+    "def to_position(well_name):\n",
+    "    r,c = well_name[0], well_name[1:]\n",
+    "    r = string.ascii_uppercase.index(r)\n",
+    "    c = int(c) - 1\n",
+    "    return [c, r]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "aa6e9438",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# all our source prefixes (= image channel / segmentation names)\n",
+    "# and the corresponding source types\n",
+    "source_prefixes = [\"nuclei\", \"serum\", \"marker\", \"segmentation-cells\", \"segmentation-nuclei\"]\n",
+    "source_types = [\"image\", \"image\", \"image\", \"segmentation\", \"segmentation\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "031d6629",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# compute the contrast limits for the image channels\n",
+    "# (this is not strictly necessaty, but usually very beneficial for htm data to obtain a reasonable visualization of the data)\n",
+    "clims_nuclei = htm.compute_contrast_limits(\"nuclei\", dataset_folder, lower_percentile=4, upper_percentile=96, n_threads=max_jobs)\n",
+    "clims_serum = htm.compute_contrast_limits(\"serum\", dataset_folder, lower_percentile=4, upper_percentile=96, n_threads=max_jobs)\n",
+    "clims_marker = htm.compute_contrast_limits(\"marker\", dataset_folder, lower_percentile=4, upper_percentile=96, n_threads=max_jobs)\n",
+    "\n",
+    "# specifiy the settings for all the sources\n",
+    "source_settings = [ \n",
+    "    # nucleus channel: color blue\n",
+    "    {\"color\": \"blue\", \"contrastLimits\": clims_nuclei, \"visible\": True},\n",
+    "    # serum channel: color green\n",
+    "    {\"color\": \"green\", \"contrastLimits\": clims_serum, \"visible\": False},\n",
+    "    # marker channel: color red\n",
+    "    {\"color\": \"red\", \"contrastLimits\": clims_marker, \"visible\": False},\n",
+    "    # the settings for the 2 segmentations\n",
+    "    {\"lut\": \"glasbey\", \"tables\": [\"default.tsv\"], \"visible\": False, \"showTable\": False},\n",
+    "    {\"lut\": \"glasbey\", \"tables\": [\"default.tsv\"], \"visible\": False, \"showTable\": False},\n",
+    "]  "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "85ea862d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create table for the sites (individual images)\n",
+    "\n",
+    "# adapt this to where this path is on your system\n",
+    "site_table_path = \"/home/pape/Work/data/htm-test-data/site-table.tsv\"\n",
+    "table = pd.read_csv(site_table_path, sep=\"\\t\")\n",
+    "\n",
+    "# the tables should be saved in a path relative to the dataset root folder,\n",
+    "# and are usually stored in the subfolder 'tables', with another sub-folder for each source or view with table(s)\n",
+    "table_out_path = os.path.join(dataset_folder, \"tables\", \"sites\")\n",
+    "os.makedirs(table_out_path, exist_ok=True)\n",
+    "table_out_path = os.path.join(table_out_path, \"default.tsv\")\n",
+    "\n",
+    "# we need to rename the site name from its representation in the table (C01-0001) to our representation (C01-1)\n",
+    "def rename_site(site_name):\n",
+    "    well, image_id = site_name.split(\"-\")\n",
+    "    image_id = int(image_id)\n",
+    "    return f\"{well}_{image_id}\"\n",
+    "\n",
+    "table[\"sites\"] = table[\"sites\"].apply(rename_site)\n",
+    "\n",
+    "# the first column in tables for a MoBIE region display (which is used internally by the grid view)\n",
+    "# has to be called \"regionId\"\n",
+    "table = table.rename(columns={\"sites\": \"region_id\"})\n",
+    "table.to_csv(table_out_path, sep=\"\\t\", index=False)\n",
+    "print(table)\n",
+    "\n",
+    "# this is the relative path to the table folder, in relation to the dataset folder\n",
+    "site_table_folder = os.path.split(os.path.relpath(table_out_path, dataset_folder))[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "829020cf",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# we can also create a table for the wells; the procedure here is similar to the case where the images were added\n",
+    "\n",
+    "well_table_path = \"/home/pape/Work/data/htm-test-data/well-table.tsv\"\n",
+    "table = pd.read_csv(well_table_path, sep=\"\\t\")\n",
+    "\n",
+    "table_out_path = os.path.join(dataset_folder, \"tables\", \"wells\")\n",
+    "os.makedirs(table_out_path, exist_ok=True)\n",
+    "table_out_path = os.path.join(table_out_path, \"default.tsv\")\n",
+    "\n",
+    "table = table.rename(columns={\"wells\": \"region_id\"})\n",
+    "table.to_csv(table_out_path, sep=\"\\t\", index=False)\n",
+    "print(table)\n",
+    "\n",
+    "well_table_folder = os.path.split(os.path.relpath(table_out_path, dataset_folder))[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f0b36b9b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# crate the plate grid view\n",
+    "dataset_folder = os.path.join(mobie_project_folder, dataset_name)\n",
+    "htm.add_plate_grid_view(dataset_folder, view_name=\"default\",\n",
+    "                        source_prefixes=source_prefixes, source_types=source_types, source_settings=source_settings,\n",
+    "                        source_name_to_site_name=to_site_name, site_name_to_well_name=to_well_name,\n",
+    "                        well_to_position=to_position, site_table=site_table_folder, well_table=well_table_folder,\n",
+    "                        sites_visible=False, menu_name=\"bookmark\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1901c4d1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "mobie.validation.validate_project(mobie_project_folder)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d685231",
+   "metadata": {},
+   "source": [
+    "For adding the necessary metadata to share the project via s3, and options for uploading it to s3, please check out the last cells of the `ceate_mobie_project.ipynb` notebook (in the same folder on github as this one)."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}