joanapy

Joint continuous multi-Omics enrichment ANAlysis (JOANA) from MLO Lab.

Install joanapy

We recommend utilizing conda for environment management, and pip for installing joanapy as a Python package. Follow these instructions to set up and activate joanapy

conda create -n joana python=3.11
conda activate joana

Before installing joanapy try to install mono on your conda environment by the following command

conda install conda-forge::mono

Make sure that your working directory is JOANA-main which you have downloaded. Use pip to install joanapy on your conda environment.

pip install .

after installing joanapy on joana environment run JOANA through run-joana function.

run-joana -o <omics1.txt> [-o2 <omics2.txt>] -p <pathwayfile.gmt> -d <output_directory> [-m <min_num_genes>]

-o <omics1.txt> 
Path to the primary omics input file.
The file must be a two-column tab-delimited file:
1. Gene name
2. Significance score (e.g., q-value) corresponding to the gene

-o2 <omics2.txt> (optional)
Path to the second omics input file for multi-omics analysis.
Format must be the same as -o.

-p <pathwayfile.gmt>
Path to the pathway file in GMT format, containing biological pathways to be tested for enrichment.

-m <min_num_genes> (optional)
A value in the range [0, 1) (default: 0).
Defines the minimum proportion of genes in a pathway that must have measurements.
Example:
-m 0.5 → Only pathways where at least 50% of genes have measurements will be considered.

-d <output_directory>
Path to the directory where JOANA results will be saved.

Note: Use full paths (e.g., /home/user/data/file.txt) or relative paths (e.g., data/file.txt) depending on your working directory. The -o2 parameter is only required for multi-omics analysis.

Input file format (-o and -o2) The input files specified by -o and -o2 must contain two columns with the following structure:

Gene identifier (e.g., gene symbol)
Numeric score (e.g., q-value or p-value)

⚠️ Important:

Files must not contain a header row
Only the first two columns are used
The second column must contain numeric values

Supported file types JOANA supports the following formats:

.txt → whitespace-separated (spaces or tabs)
.tsv → tab-separated

Example (TXT / TSV format)

A2ML1  0.025202476125022
A3GALT2  0.878666355638669
A4GALT  0.983155339235838
A4GNT  0.971337673847852
AAAS  0.0863723498889275
AACS  0.230709278931887
AADAC  0.881216487254285

The 'gmt' file could be downloaded from msigDB or any other desired biological pathway file with gmt format.

run-joana -o /path/to/omics1.txt -p /path/to/pathway.gmt -d /path/to/dirOutputs/

And to execute JAOAN on multi-omics data the command line would be:

run-joana -o /path/to/omics1.txt -o2 /path/to/omics2.txt -p /path/to/pathway.gmt -m 0.7 -d /path/to/dirOutputs/

Note: When dealing with multi-omics data, the '-o' input file serves as the reference file, and missing values in the second modality '-o2' are handled based on the reference data-modality. It's crucial to select the file with more gene measurements as the reference, as this provides better data integrity and completeness.

Example with Sample Data

You can quickly test the tool using the included sample data:

run-joana -o ./sample_data/rna.txt -o2 ./sample_data/prot.txt -p ./sample_data/h.all.v6.2.symbols.gmt -m 0.7 -d ./dirOutputs/

Data Sources

For the evaluation of JOANA on real data, we used supplementary Excel files from the following studies:

Lung adenocarcinoma dataset (Table S2, Table S3)
Hot tumor dataset (Table S2)
Myeloma (single-cell transcriptomics dataset) (GSE117156)
Coding and non-coding mutations dataset (currently not accessible)
LDC mouse hepatocyte dataset (13072_2023_504_MOESM3_ESM.bed,13072_2023_504_MOESM2_ESM.xlsx)

Data Preprocessing

We provide the scripts used for preprocessing each dataset in the scripts/ folder.

Due to licensing restrictions, we do not redistribute the original datasets.
Please download the data from the sources above and run the preprocessing scripts to reproduce the inputs used in JOANA.

Troubleshooting

If you encounter issues while installing or running JOANA, check the following common problems and solutions:

1. run-joana: command not found

Cause: The package was not installed correctly or the environment is not activated.

Solution:

Ensure your conda environment is activated:

conda activate joana

Reinstall the package:

pip install .

Verify installation:

pip list | grep joana

2. Error related to mono not found

Cause: JOANA depends on Mono, which may not be installed properly.

Solution:

conda install -c conda-forge mono

Then verify:

mono --version

3. Input file format errors

Cause: Input file does not follow the required two-column structure.

Common error:

ValueError: Error reading file 'omics1.txt'. Ensure it is a properly formatted two-column file (geneSymbol, q-value) without extra columns.
Original error: Error tokenizing data. C error: Expected 2 fields in line 2, saw 3

Explanation:

JOANA expects exactly 2 columns only:
1. Gene name
2. Numeric score
This error usually happens when: A header AND row index are both included → creates a third column
Files with only a header or only row names may still work, but both together will cause failure

Solution:

Ensure the file has exactly two columns only
Remove any extra index column when exporting (e.g., from Excel or pandas)
Save as .txt or .tsv

Correct format example:

A2ML1  0.025202476125022
A3GALT2  0.878666355638669
A4GALT  0.983155339235838

4. Unsupported file type

Cause: Input file is not in a supported format (e.g., .csv).

Common error:

Unsupported file type '.csv' for file 'prot1.csv'. Only '.txt' and '.tsv' files are supported

Solution:

Convert .csv to .tsv or .txt

5. File/path errors (No such file or directory)

Cause: Cause: Incorrect file path or missing file (applies to input files and .gmt pathway file)..

Solution:

Ensure the file exists and path is correct:
```
ls /path/to/pathway.gmt
```
Use absolute paths:
```
/home/user/data/file.txt
```
Or verify your current working directory for relative paths:
```
pwd
```

6. Empty output or pandas.errors.EmptyDataError

Cause:

-m parameter too strict (filters out all pathways)
Input data does not meet minimum gene coverage

Common error:

  pandas.errors.EmptyDataError: No columns to parse from file

Solution:

Try lowering the -m threshold:
```
-m 0.3
```
Check that enough genes in your dataset overlap with pathway genes
Ensure input files are not empty and properly formatted

7. Unrecognized arguments error

Cause: Incorrect command syntax (using -o1 instead of -o).

Common error:

run-joana: error: unrecognized arguments: omics1.txt

Explanation:

The correct flag for the primary omics file is -o, not -o1
Using -o1 causes the argument parser to misinterpret the command

Solution:

Use the correct command format:

run-joana -o omics1.txt -o2 omics2.txt -p h.all.v6.2.symbols.gmt -d ./output

General syntax:

run-joana -o <omics1.txt> [-o2 <omics2.txt>] -p <pathway.gmt> -d <output_directory>

8. Missing primary omics file (-o)

Cause: The required -o argument is missing.

Common error:

TypeError: expected str, bytes or os.PathLike object, not NoneType

Explanation:

The -o parameter (primary omics file) is mandatory
Running JOANA with only -o2 is invalid
Internally, JOANA tries to process -o, but receives None, causing this error

Example of incorrect command:

run-joana -o2 omics2.txt -p h.all.v6.2.symbols.gmt.txt -d ./output

Solution:

Always provide the primary omics file using -o:

run-joana -o omics1.txt -o2 omics2.txt -p h.all.v6.2.symbols.gmt.txt -d ./output

For single-omics analysis:

run-joana -o omics2.txt -p h.all.v6.2.symbols.gmt.txt -d ./output

9. TypeError: expected str, bytes or os.PathLike object, not NoneType

Cause: Missing required -d (output directory) argument.

Common error:

TypeError: expected str, bytes or os.PathLike object, not NoneType

Example of incorrect command:

run-joana -o omics1.txt -o2 omics2.txt -p h.all.v6.2.symbols.gmt

Solution:

Always include the -d argument:

run-joana -o omics1.txt -o2 omics2.txt -p h.all.v6.2.symbols.gmt -d ./output

Explanation:

The -d parameter (output directory) is required
If -d is not provided, JOANA receives None instead of a path, causing this error

10. Permission errors when writing output

Cause: No write access to output directory.

Solution:

Use a directory you own:
```
mkdir -p ./dirOutputs
```
Or change permissions:
```
chmod u+w /path/to/output_directory
```

11. Python or dependency issues

Cause: Incompatible Python version or missing dependencies.

Solution:

Ensure Python 3.11 is used:
```
python --version
```

Recreate environment:

conda remove -n joana --all
conda create -n joana python=3.11
conda activate joana

12. Unexpected crashes or errors

If you encounter an issue not listed here:

Double-check all input formats and parameters

Run with minimal example:

run-joana -o ./sample_data/rna.txt -p ./sample_data/h.all.v6.2.symbols.gmt -d ./dirOutputs/

Ensure all dependencies are installed

Uninstall joanapy

The package can be uninstalled with the following command:

pip uninstall joanapy

Fitting a mixture of Beta distributions

Code was adapted from Schröder C, Rahmann S. A hybrid parameter estimation algorithm for beta mixtures and applications to methylation state classification. Algorithms Mol Biol. 2017 Aug 18;12:21. doi: 10.1186/s13015-017-0112-1. PMID: 28828033; PMCID: PMC5563068 (https://bitbucket.org/genomeinformatics/betamix/src/master/).

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
joanapy		joanapy
sample_data		sample_data
scripts		scripts
.Rhistory		.Rhistory
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.ipynb		README.ipynb
README.md		README.md
results.md		results.md
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

joanapy

Install joanapy

Example with Sample Data

Data Sources

Data Preprocessing

Troubleshooting

1. run-joana: command not found

Solution:

2. Error related to mono not found

Solution:

3. Input file format errors

Common error:

Explanation:

Solution:

4. Unsupported file type

Common error:

Solution:

5. File/path errors (No such file or directory)

Solution:

6. Empty output or pandas.errors.EmptyDataError

Common error:

Solution:

7. Unrecognized arguments error

Common error:

Explanation:

Solution:

8. Missing primary omics file (-o)

Common error:

Explanation:

Solution:

9. TypeError: expected str, bytes or os.PathLike object, not NoneType

Common error:

Solution:

Explanation:

10. Permission errors when writing output

Solution:

11. Python or dependency issues

Solution:

12. Unexpected crashes or errors

Uninstall joanapy

Fitting a mixture of Beta distributions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages