Cue2: a deep learning framework for long-read SV discovery

Overview

Cue2 is an image-based deep learning framework for structural variant (SV) discovery from PacBio HiFi and ONT long reads. It is an extension of Cue and similarly formulates SV discovery as a multi-class keypoint localization task in custom multi-channel images derived from read alignments to a reference genome. Cue2 expands the set of alignment-derived statistics used to encode SV signatures, constructing 12-channel images that include new channels designed specifically for long reads. The latest Cue2 long-read model was trained on over one million images encompassing a wide range of SV types, sizes, genome contexts, and coverage regimes; it currently supports the discovery of deletions (DEL), duplications (DUP), and inversions (INV) larger than 50bp.

Installation

1. Clone the repository: $> git clone git@github.com:PopicLab/cue2.git

2. Navigate into the cue2 folder: $> cd cue2

3. Setup a Python virtual environment (recommended)

Create the virtual environment (in the env directory): $> python3.9 -m venv env
Activate the environment: $> source env/bin/activate

4. Install the framework: $> pip install .

5. Set the PYTHONPATH: export PYTHONPATH=${PYTHONPATH}:/path/to/cue2

6. Download the latest pre-trained Cue2 model from this Google Cloud Storage bucket

User guide

Execution

To call structural variants: $> cue call --config </path/to/config>
To train a new model: $> cue train --config </path/to/config>
To generate a training dataset: $> cue generate --config </path/to/config>

Each cue command accepts a YAML file with configuration parameters. Template config files are provided in the docs/config_templates/ directory.

The key parameters for each cue command are listed below.

call:

bam [required] path to the alignments file (BAM/CRAM format)
fa [required] path to the reference FASTA file
chr_names [optional] list of chromosomes to process: null (all) or a specific list e.g. ["chr1", "chr21"] (default: null)
model_path [required] path to the pretrained Cue model (recommended: the latest available model)
gpu_ids [optional] list of GPU ids to use for calling (default: CPU(s) will be used if empty)
n_jobs_per_gpu [optional] number of parallel jobs to launch on the same GPU (default: 1)
n_cpus [optional] number of CPUs to use for calling if no GPUs are listed (default: 1)

train:

dataset_dirs [required] list of annotated imagesets to use for training
dataset_lens [required] list containing the number of images to select from each imageset listed in dataset_dirs
gpu_ids [optional] GPU id to use for training -- a CPU will be used if empty
report_interval [optional] frequency (in number of batches) for reporting training stats and image predictions (default: 50)

generate:

bam [required] path to the alignments file (BAM/CRAM format)
vcf [required] path to the ground truth SV VCF file
fa [required] path to the reference FASTA file
n_cpus [optional] number of CPUs to use for image generation (parallelized by chromosome) (default: 1)
chr_names [optional] list of chromosomes to process: null (all) or a specific list e.g. ["chr1", "chr21"] (default: null)

Recommended workflow

Create a new directory for the experiment.
Place the YAML config file in this directory (see the provided templates).
Populate the YAML config file with the parameters specific to this experiment.
Execute the appropriate cue command providing the path to the newly configured YAML file. cue will automatically create auxiliary directories with results in the folder where the config YAML file is located.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
cue		cue
docs/config_templates		docs/config_templates
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cue2: a deep learning framework for long-read SV discovery

Table of Contents

Overview

Installation

User guide

Execution

Recommended workflow

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cue2: a deep learning framework for long-read SV discovery

Table of Contents

Overview

Installation

User guide

Execution

Recommended workflow

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages