Skip to content

Lack of determinism in results across identical runs #269

@alvaropvr

Description

@alvaropvr

Description of the bug

I’ve noticed a concerning lack of reproducibility when running the pipeline. Even when using the exact same input data and parameters across identical runs, the results change significantly. In some cases, I’ve seen a gene's FDR jump from 0.06 to 0.6 between two runs without a single change in the configuration.

I first tracked this down to the CRISPRCleanR module, where the output files differ every time despite keeping parameters like min_reads or min_targeted_genes constant. This inconsistency then propagates downstream, making the final MAGeCK rankings and statistics completely unreliable for a stable analysis.

It seems like there are stochastic processes in the R scripts or the MAGeCK steps that aren't being pinned down with a fixed seed. Is there any way to set a seed or any specific configuration to ensure the pipeline behaves deterministically? I’d like to know if this is a known issue or if there's a workaround to get consistent results across runs.

Command used and terminal output

Relevant files

No response

System information

Singularity
MAGECK RRA & MLE

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions