Xenium spatial transcriptomics analysis pipeline built with Nextflow and Singularity/Apptainer. The pipeline uses cv2 to automatically identify tissue contours and bounding boxes, making it especially suited for tissue microarrays and multi-sample slides.
- Build a container
sopa.sifusing the definition file from this repository. - Export two environment variables:
CONTAINERDIR→ path to the container image.PROJECTDIR→ path to the project directory (mounted inside the container at runtime).
- Adjust resource settings in resources.config for each pipeline step.
- Create a run-specific config (e.g., run01.config) in the config directory, based on run.template.
- Create an environment with Nextflow installed using
make env_create. - Run the pipeline using
make nf_run.
The workflow processes raw Xenium output through to the identification of gene programs with cNMF.
- CONVERT_XENIUM converts raw data to spatialdata-formatted zarr archive for downstream processing.
- RESEGMENT_NUCLEI applies cellpose or stardist to resegment nuclei.
- RESEGMENT_CELLS applies Baysor or Proseg to refine segmentation based on transcript distributions.
- DETECT_TISSUE identifies tissue contours and bounding boxes, splitting multi-sample slides into independent samples (especially useful for tissue microarrays).
- SPLIT_SAMPLES creates one AnnData h5ad archive per sample based on the tissue contours.
- IDENTIFY_PROGRAMS runs cNMF on each sample. Note: The pipeline does not automatically select the optimal number of programs, this step is left to the user.