Skip to content
This repository was archived by the owner on Jan 22, 2025. It is now read-only.

Commit 485f679

Browse files
committed
Changes to documentation to reflect pipeline changes.
1 parent 9ac7196 commit 485f679

1 file changed

Lines changed: 9 additions & 15 deletions

File tree

docs/guides/tools/xenocp.md

Lines changed: 9 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -17,29 +17,28 @@ XenoCP supports hg19 (GRCh37) and mm9 (MGSCv37).
1717

1818
XenoCP workflow contains the following five steps (see diagram below):
1919

20-
1. Split input human BAM file into given number of small pieces.
20+
1. Split input human BAM file into pieces by chromosome.
2121
2. Align mapped reads to mouse reference genome.
2222
3. Compare human and mouse alignments and identify read identity.
2323
4. Create lists of contamination reads and set them to unmapped in human BAMs.
2424
5. Merge the BAM pieces to a cleansed BAM.
2525

26-
Note that steps 2-4 run in parallel.
26+
Note that steps 1-4 run in parallel.
2727

2828
![](../../images/guides/tools/xenocp/xenocp_workflow2.png)
2929

3030
<h3 id="inputs">Inputs</h3>
3131

3232
| Name | Type | Description | Example |
3333
|--------------------------------|----------------|----------------------------------------------------------------------------------------------|-----------------------|
34-
| BAM | File | Input bam aligned to human reference genome. [required] |`test.bam`, `test.bam.bai`|
34+
| BAM | File | Input bam aligned to human reference genome. [required] |`test.bam` |
35+
| BAI | File | Bam index for input bam. [required] |`test.bai` |
3536
| [Reference DB Prefix] | String | Basename of the input human reference assembly. [required] | MGSCv37.fa |
3637
| Suffix Length | Integer | Length of read name suffixes to be trimmed. [default: 3] | 3 |
3738
| Keep Mates Together | Boolean | Whether to keep mates together [default: True] | True |
38-
| Bucket Number | Integer | Number of buckets to split original bam to for parallelism [default: 31] | 15 |
3939
| Validation Stringency | String | Validation stringency: STRICT, LENIENT, SILENT [default: SILENT] | SILENT |
4040
| Output Prefix | String | Prefix to append to output filenames [default: xenocp-] | xenocp- |
4141
| Output Extension | String | Output file extension [default: bam] | bam |
42-
| Sort Order | String | Read sort order [default: queryname] | queryname |
4342

4443
[Reference DB Prefix]: #db-prefix
4544

@@ -63,8 +62,8 @@ the tool already exists. Click "Launch Tool" to start a new analysis.
6362

6463
### Input configuration
6564

66-
XenoCP requires two inputs: a BAM file aligned to human reference genome and
67-
the basename of the input human reference assembly. All other inputs are optional.
65+
XenoCP requires three inputs: a BAM file aligned to human reference genome, a bam index file (BAI)
66+
corresponding to the input BAM file and the basename of the contaminant organism reference assembly. All other inputs are optional.
6867

6968
Input files can be uploaded via the [data transfer application] or [command
7069
line].
@@ -79,8 +78,8 @@ Input BAM aligned to a human reference genome.
7978
<h4 id="db-prefix">Reference DB prefix</h4>
8079

8180
Basename of the input human reference assembly. For example, a prefix of
82-
MGSCv37.fa would assume the following files in the same directory exist: MGSCv37.fa,
83-
MGSCv37.fa.amb, MGSCv37.fa.ann, MGSCv37.fa.bwt, MGSCv37.fa.dict, MGSCv37.fa.fai,
81+
MGSCv37.fa would assume the following files in the same directory exist:
82+
MGSCv37.fa.amb, MGSCv37.fa.ann, MGSCv37.fa.bwt,
8483
MGSCv37.fa.pac, and MGSCv37.fa.sa.
8584

8685
<h4 id="output-prefix">Output prefix</h4>
@@ -94,11 +93,6 @@ _Output prefix_ is the prefix to append to the output contamination and tie file
9493
| xenocp- | `xenocp-000.contam.txt` |
9594
| xenocp- | `xenocp-000.tie.bam` |
9695

97-
<h4 id="disabled-vcf-column">Bucket Number</h4>
98-
99-
Number of small bam pieces that an input bam is split to. This should be less than the number of cores of the instance type. As
100-
the default instance type is azure:mem2_ssd1_x16, default bucket number is 15.
101-
10296
## Uploading input data files
10397
XenoCP requires at least one BAM along with its BAI files
10498
to be uploaded. These files can be uploaded via the [data transfer
@@ -116,7 +110,7 @@ Upon a successful run of XenoCP, a cleansed BAM file, a list of contamination re
116110

117111
<h4 id="cleansed-bam">Cleansed BAM</h4>
118112

119-
Cleansed BAM is the major output of XenoCP workflow. The mapped reads in this BAM file are of human origin (including reads in tie BAM) and are mapped to the human genome reference sequence. Any reads deemed to originate from mouse by XenoCP are set to 'unmapped'.
113+
Cleansed BAM is the major output of XenoCP workflow. The mapped reads in this BAM file are of human origin (including reads in tie BAM) and are mapped to the human genome reference sequence. Any reads deemed to originate from the contaminant organism by XenoCP are set to 'unmapped'.
120114

121115
<h4 id="contam-list">Contamination files</h4>
122116

0 commit comments

Comments
 (0)