Skip to content

New Feature - Flexible analysis start#55

Merged
sguizard merged 34 commits into
devfrom
feature-flexible_start
Apr 9, 2026
Merged

New Feature - Flexible analysis start#55
sguizard merged 34 commits into
devfrom
feature-flexible_start

Conversation

@sguizard

@sguizard sguizard commented Nov 1, 2025

Copy link
Copy Markdown
Collaborator

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/isoseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Isoseq providers deliver sequences in many different format depending of the pre-processing they apply (Subreqds, CCS, Full Length isoseq). This even more true with the new MAS-seq.
I had implemented the possibility to deals with these format through options. However, their usage along with the possibility to skip ISOseq processing and align made the samplesheet and the usage of the pipeline complex.

In this PR, I changed the way to inject input sequences into the pipeline. Now, it's possible start analysis from ccs, lima, isoseq refine or at the mapping step. The different types of inputs can be even mixed in the samplesheet.
This modification simplify the usage but also the code.

It not necessary to deals with the different entrypoints any more. The inputs files are injected at the right moment in the main channel paths.

@github-actions

github-actions Bot commented Nov 1, 2025

Copy link
Copy Markdown

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 370ffb4

+| ✅ 256 tests passed       |+
!| ❗   5 tests had warnings |!
Details

❗ Test warnings:

  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in nextflow.config: Specify any additional parameters here

✅ Tests passed:

Run details

  • nf-core/tools version 3.4.1
  • Run at 2026-04-06 09:42:17

Comment thread assets/schema_input.json
"errorMessage": "PacBio Index file for BAM subreads cannot contain spaces and must have extension '.bam.pbi' or being empty"
},
"reads": {
"start_from": {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can find a better name for this field

@sguizard sguizard Nov 3, 2025

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been scratching my head for naming this.
I used entrypoint, but I didn't felt it was very clear for users.
'start_from' is might not be the best, at least, it's very clear on it's signification.

Do you have propositions?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use step in sarek, but it's a param, not in the samplesheet

@sguizard

sguizard commented Nov 3, 2025

Copy link
Copy Markdown
Collaborator Author

I added a subworflow to chunk the fasta files (from lima, isoseq refine and mapping start) before the mapping steps.
As those inputs files are devided into chunks with CCS, the fasta produced is massive and contains all sequences. To mitigate this problem, the CHUNKER subworkflow will split those into small chunks.

@sguizard

Copy link
Copy Markdown
Collaborator Author

A new update of the CHUNKER to apply it twice in the pipeline.

@sguizard sguizard added the WIP Work in progress label Nov 11, 2025
@sguizard

sguizard commented Apr 6, 2026

Copy link
Copy Markdown
Collaborator Author

Hi @maxulysse, I'm returning on isoseq analysis and I would like to merge this pull request before applying the last templates updates.
Can you have a look and indicates if the code need some corrections?

@sguizard sguizard removed the WIP Work in progress label Apr 6, 2026
Comment thread nextflow.config
Comment on lines +192 to +204
test { includeConfig 'conf/test.config' }
test_minimap2 { includeConfig 'conf/test_minimap2.config' }
test_full { includeConfig 'conf/test_full.config' }
test_inputs_map { includeConfig 'conf/test_samplesheet_v2_map.config' }
test_inputs_lima { includeConfig 'conf/test_samplesheet_v2_lima.config' }
test_inputs_refine { includeConfig 'conf/test_samplesheet_v2_refine.config' }
test_inputs_ccs { includeConfig 'conf/test_samplesheet_v2_ccs.config' }
test_inputs_ccs_lima_refine_map_mergeAll { includeConfig 'conf/test_samplesheet_v2_ccs_lima_refine_map_mergeAll.config' }
test_inputs_ccs_lima_refine_map { includeConfig 'conf/test_samplesheet_v2_ccs_lima_refine_map.config' }
test_inputs_ccs_map { includeConfig 'conf/test_samplesheet_v2_ccs_map.config' }
test_inputs_lima_refine_map { includeConfig 'conf/test_samplesheet_v2_lima_refine_map.config' }
test_inputs_refine_map { includeConfig 'conf/test_samplesheet_v2_refine_map.config' }
test_inputs_multi_lib { includeConfig 'conf/test_samplesheet_v2_multi_lib.config' }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do tend to recommend less config files dedicated to test, and putting more or the logic in nf-test, but fine with me for now

@maxulysse

Copy link
Copy Markdown
Member

Hi @maxulysse, I'm returning on isoseq analysis and I would like to merge this pull request before applying the last templates updates. Can you have a look and indicates if the code need some corrections?

I don't see anything weird, so fine by me

@maxulysse maxulysse left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sguizard sguizard merged commit 6f77054 into dev Apr 9, 2026
22 checks passed
@sguizard

sguizard commented Apr 9, 2026

Copy link
Copy Markdown
Collaborator Author

@maxulysse Many thanks for the review!

@sguizard sguizard deleted the feature-flexible_start branch April 9, 2026 06:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants