Skip to content

PopicLab/cue2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cue2: a deep learning framework for long-read SV discovery

Table of Contents

Overview
Installation
User Guide
Recommended workflow

Overview

Cue2 is an image-based deep learning framework for structural variant (SV) discovery from PacBio HiFi and ONT long reads. It is an extension of Cue and similarly formulates SV discovery as a multi-class keypoint localization task in custom multi-channel images derived from read alignments to a reference genome. Cue2 expands the set of alignment-derived statistics used to encode SV signatures, constructing 12-channel images that include new channels designed specifically for long reads. The latest Cue2 long-read model was trained on over one million images encompassing a wide range of SV types, sizes, genome contexts, and coverage regimes; it currently supports the discovery of deletions (DEL), duplications (DUP), and inversions (INV) larger than 50bp.

Installation

1. Clone the repository: $> git clone git@github.com:PopicLab/cue2.git

2. Navigate into the cue2 folder: $> cd cue2

3. Setup a Python virtual environment (recommended)

  • Create the virtual environment (in the env directory): $> python3.9 -m venv env
  • Activate the environment: $> source env/bin/activate

4. Install the framework: $> pip install .

5. Set the PYTHONPATH: export PYTHONPATH=${PYTHONPATH}:/path/to/cue2

6. Download the latest pre-trained Cue2 model from this Google Cloud Storage bucket

User guide

Execution

  • To call structural variants: $> cue call --config </path/to/config>
  • To train a new model: $> cue train --config </path/to/config>
  • To generate a training dataset: $> cue generate --config </path/to/config>

Each cue command accepts a YAML file with configuration parameters. Template config files are provided in the docs/config_templates/ directory.

The key parameters for each cue command are listed below.

call:

  • bam [required] path to the alignments file (BAM/CRAM format)
  • fa [required] path to the reference FASTA file
  • chr_names [optional] list of chromosomes to process: null (all) or a specific list e.g. ["chr1", "chr21"] (default: null)
  • model_path [required] path to the pretrained Cue model (recommended: the latest available model)
  • gpu_ids [optional] list of GPU ids to use for calling (default: CPU(s) will be used if empty)
  • n_jobs_per_gpu [optional] number of parallel jobs to launch on the same GPU (default: 1)
  • n_cpus [optional] number of CPUs to use for calling if no GPUs are listed (default: 1)

train:

  • dataset_dirs [required] list of annotated imagesets to use for training
  • dataset_lens [required] list containing the number of images to select from each imageset listed in dataset_dirs
  • gpu_ids [optional] GPU id to use for training -- a CPU will be used if empty
  • report_interval [optional] frequency (in number of batches) for reporting training stats and image predictions (default: 50)

generate:

  • bam [required] path to the alignments file (BAM/CRAM format)
  • vcf [required] path to the ground truth SV VCF file
  • fa [required] path to the reference FASTA file
  • n_cpus [optional] number of CPUs to use for image generation (parallelized by chromosome) (default: 1)
  • chr_names [optional] list of chromosomes to process: null (all) or a specific list e.g. ["chr1", "chr21"] (default: null)

Recommended workflow

  1. Create a new directory for the experiment.
  2. Place the YAML config file in this directory (see the provided templates).
  3. Populate the YAML config file with the parameters specific to this experiment.
  4. Execute the appropriate cue command providing the path to the newly configured YAML file. cue will automatically create auxiliary directories with results in the folder where the config YAML file is located.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages