Description of the bug
I’ve noticed a concerning lack of reproducibility when running the pipeline. Even when using the exact same input data and parameters across identical runs, the results change significantly. In some cases, I’ve seen a gene's FDR jump from 0.06 to 0.6 between two runs without a single change in the configuration.
I first tracked this down to the CRISPRCleanR module, where the output files differ every time despite keeping parameters like min_reads or min_targeted_genes constant. This inconsistency then propagates downstream, making the final MAGeCK rankings and statistics completely unreliable for a stable analysis.
It seems like there are stochastic processes in the R scripts or the MAGeCK steps that aren't being pinned down with a fixed seed. Is there any way to set a seed or any specific configuration to ensure the pipeline behaves deterministically? I’d like to know if this is a known issue or if there's a workaround to get consistent results across runs.
Command used and terminal output
Relevant files
No response
System information
Singularity
MAGECK RRA & MLE
Description of the bug
I’ve noticed a concerning lack of reproducibility when running the pipeline. Even when using the exact same input data and parameters across identical runs, the results change significantly. In some cases, I’ve seen a gene's FDR jump from 0.06 to 0.6 between two runs without a single change in the configuration.
I first tracked this down to the CRISPRCleanR module, where the output files differ every time despite keeping parameters like min_reads or min_targeted_genes constant. This inconsistency then propagates downstream, making the final MAGeCK rankings and statistics completely unreliable for a stable analysis.
It seems like there are stochastic processes in the R scripts or the MAGeCK steps that aren't being pinned down with a fixed seed. Is there any way to set a seed or any specific configuration to ensure the pipeline behaves deterministically? I’d like to know if this is a known issue or if there's a workaround to get consistent results across runs.
Command used and terminal output
Relevant files
No response
System information
Singularity
MAGECK RRA & MLE