Jemultiplexer
Jemultiplexer is now Je and its new page can be found here
WARNING for serial copy-paster : if you copy-paste command lines from this page directly in your terminal, make sure to replace all the dash characters i.e. '-'. For some unknown reasons these are not copy-pasted as '-'
News and help : subscribe to the ML !
Stay up-to-date, get help from the developpers and the community: subscribe to the ML !
Mailing list : je@embl-heidelberg.de
Useful administrative commands for the jemultiplexer list
- For help and a description of available commands, send a message to:
- To subscribe to the list, send a message to:
- To remove your address from the list,
- just send a message to the address in the ``List-Unsubscribe'' header of any list message. If you haven't changed addresses since subscribing,
- you can also send a message to:
- or for the digest to:
- If you need to get in touch with the human owner of this list, please send a message to:
General Usage and executable
Jemultiplexer is written in Java and is available as an executable jar.
> java –Xmx4g -jar /g/funcgen/gbcs-tools/jemultiplexer/jemultiplexer.jar F1=file1.txt.gz BF=barcodes.txt [OPTIONS]
Simply add -h or --help to the commad line to print help i.e.
> java –Xmx4g -jar /g/funcgen/gbcs-tools/jemultiplexer/jemultiplexer.jar --help
Get executable
Latest public version is here (EMBL user, please see below) : Latest (1.0.6)
Older versions :
Note that the publicly available executable cannot connect to emBASE. EMBL users should use the executable available on our server.
Inside EMBL
The executable JAR can be found in in /g/funcgen/gbcs-tools/jemultiplexer/.
This executable is available from all EMBL servers, including LSF cluster nodes.
The USE_EMBASE option is available from version 1.0.3 ; please see below.
Note that we have defined sensible defaults so you would not need to place any additional options but your fastq file(s) and barcode file i.e.
- Both for single-end and paired-end : MM=1 MMD=1 Q=10 GZ=true XT=1 ZT=0 O=`pwd` UN=true C=true ADD=true
- Only applicable for paired end (ignored in singled end) : BPOS=BOTH BRED=true BM=BOTH S=false ie redundant barcodes at both ends
Why a new demultiplexer tool?
Well... there is not much solutions available out there when it comes to paired end data. And the few ones I have heard of (fastq-multx) only deal with the "one barcode == one sample" situation.
But there are much more situations than this one in paired end situations and coupled with Illumina indexing. Also having our own internal tool will allow a tight integration with emBASE and a better traceability of datasets (see below).
The different barcoding scenarii (Paired-end/Illumina indexing)
The barcode is found in both reads (no Illumina index files) :
- Sub-case 1 : The two barcodes encode sample information
- Redundant barcodes : either of the two barcodes can be used to resolve the sample
- Non-redundant barcodes : both barcodes are needed to resolve the sample
- Sub-case 2 : A single barcode encodes sample information, the second one is a random one (to distinguish biological from PCR duplication)
Barcodes found in reads can also be coupled with standard Illumina indexing :
- Illumina index file(s) are used to encode the sample identity
- Additional barcoding sequences can be added in the reads (as above), these barcodes do NOT encode sample identity and are usually random sequences to encode molecular identity i.e. to tell apart PCR duplicates from real biological molecular abundance
=> Jemulitplexer deals with all these situations !
Setting the correct set of options (graphical decision trees)
Below are decision workflow to help you pick the right key options depending on your situation (click pictures to enlarge).
Without Illumina index files
With Illumina index files
Practical examples
Single End
Simply call (will use defaults) :
> java –Xmx4g –jar /g/funcgen/gbcs-tools/jemultiplexer/jemultiplexer.jar F1=file1.txt.gz BF=barcodes.txt
Paired End
Scenario 1: the same barcode is found in both reads
Maximize the number of reads : reads are attributed to samples even if only one of the two barcode resolves to the sample S=false, the default (reads are ignored if barcodes resolve to different samples)
> java –Xmx4g –jar /g/funcgen/gbcs-tools/jemultiplexer/jemultiplexer.jar F1=file1.txt.gz F2=file2.txt.gz BF=barcodes.txt
Keep reads only if both barcodes resolve to the same sample
> java –Xmx4g –jar /g/funcgen/gbcs-tools/jemultiplexer/jemultiplexer.jar F1=file1.txt.gz F2=file2.txt.gz BF=barcodes.txt S=true
Note that in both situations:
- Input FASTQ file are expected to be encoded using phred scale + 33 (V option)
- the output files are automatically gzipped (GZ=true), named after a pattern like samplename_barcode.txt.gz (can be overridden by providing file names in extra columns of the barcode file) and placed in the current dir (output dir can be adapted using O=/path/to/dir).
- Unassigned reads are saved (UN=true) in unassigned_1.txt and unassigned_2.txt files (current dir) as well as a summary file jemultiplexer_out_stats.txt (all these file names and path can be adapted if needed).
- the barcode look up use a maximum mismatch (MM=1) of 1, a minimal mismatch delta (MMD=1) with second best barcode match of 1 and a minimum quality of 10 (Q=10)
- the barcodes are removed from reads (C=true) with one extra base (XT=1) and added to the FASTQ header (ADD=true)
- Keep reads only if both barcodes resolve to the same sample, adapt the output dir (all files will be now saved in this dir) and provide demultiplexed file names (unassigned read files will still be named automatically)
> java –Xmx4g –jar /g/funcgen/gbcs-tools/jemultiplexer/jemultiplexer.jar F1=file1.txt.gz F2=file2.txt.gz BF=barcodes.txt S=true O=/tmp/jemultiplexer_out/
with a barcode file like:
sample1 ATATAT sample1.txt.gz sample2 GACGAC sample2.txt.gz
Scenario 2: the barcode is found in only one read
The barcode is in READ_2
> java –Xmx4g –jar /g/funcgen/gbcs-tools/jemultiplexer/jemultiplexer.jar F1=file1.txt.gz F2=file2.txt.gz BF=barcodes.txt O=/tmp/jemultiplexer_out/ BPOS=READ_2
N.B.: BM will be automatically set to READ_2, BRED and S will be ignored.
N.B.2: reads will have different length as only one read will have the barcode sequence removed , see below example to make sure reads have same length.
The barcode is in READ_2 => make sure read 1 is of same resulting size by using unbalanced trimming options
We need to compensate 7 bases i.e. assuming barcode length is 6, READ_2 will be trimmed of 7 bases (XT=1 by default)
- Example 1: also remove 7 bases at start of READ_1 (XT=7:1)
> java –Xmx4g –jar /g/funcgen/gbcs-tools/jemultiplexer/jemultiplexer.jar F1=file1.txt.gz F2=file2.txt.gz BF=barcodes.txt O=/tmp/jemultiplexer_out/ BPOS=READ_2 XT=7:1
- Example 2: remove 7 bases at the end of READ_1 and none at the end of read 2 (ZT=7:0)
> java –Xmx4g –jar /g/funcgen/gbcs-tools/jemultiplexer/jemultiplexer.jar F1=file1.txt.gz F2=file2.txt.gz BF=barcodes.txt O=/tmp/jemultiplexer_out/ BPOS=READ_2 ZT=7:0
- Example 3: or use a mix ...
> java –Xmx4g –jar /g/funcgen/gbcs-tools/jemultiplexer/jemultiplexer.jar F1=file1.txt.gz F2=file2.txt.gz BF=barcodes.txt O=/tmp/jemultiplexer_out/ BPOS=READ_2 ZT=6:0 XT=1:1
Scenario 3: a different barcode is found in both read with both barcodes being needed for sample lookup
The barcode file must be like:
sample1 ATATAT:CGCGCG sample2 GACGAC:TACGTT
i.e. sample 1 has barcode ATATAT in READ_1 and CGCGCG in READ_2 ; and
sample 2 has barcode GACGAC in READ_1 and TACGTT in READ_2
Then :
> java –Xmx4g –jar /g/funcgen/gbcs-tools/jemultiplexer/jemultiplexer.jar F1=file1.txt.gz F2=file2.txt.gz BF=barcodes.txt BPOS=BOTH BM=BOTH BRED=false O=/tmp/jemultiplexer_out/
Note that you could use different barcode lookup parameters (if this makes sense!) using the syntax MM=1:2 MMD=2:1 Q=30:10
Scenario 4: a different barcode is found in both read with only one barcode needed for sample lookup (e.g. READ_1) while the other barcode (in READ_2) is a random sequence of length 10
The barcode file must be like:
sample1 ATATAT sample2 GACGAC
we'll compensate length difference using ZT of 3 on READ_1 and XT should only be applied to READ_1
> java –Xmx4g –jar /g/funcgen/gbcs-tools/jemultiplexer/jemultiplexer.jar F1=file1.txt.gz F2=file2.txt.gz BF=barcodes.txt BPOS=BOTH BM=READ_1 BRED=false LEN=6:10 XT=1:0 ZT=3:0 O=/tmp/jemultiplexer_out/
iCLIP barcodes and barcodes mixed with random sequence
For the iCLIP experiment, barcodes contain random letters (A,T,G, or C). For example, a barcode may look like NNNNATTCNN, where N can be A, T, G, C or N. Only the barcode letters from 5th to 8th positions are used to resolve a sample.
The barcode file may look like:
sample1 NNNNATATATNNN sample2 NNNNGACGACNNN
Important : when barcodes (defined in the barcode file) contain Ns, Jemulitplexer will copy the extracted sequence at the end of header line (CLIP and ADD options set to true) instead of the barcode (as found in the barcode file, which the default behavior). Note that many tools (bowtie2, ...) will clip this information from the read header (due to the space separator). Because you might need to have this information available down your workflow (when dealing with duplicates), we suggest you to use the RCHAR option (e.g. add RCHAR=':' in the command line) so all spaces are replaced with a ':'.
Note that in this case: Users can also output a barcode match reporting file with the option BARCODE_DIAG_FILE.
Options overview
Options about barcoding configuration / sample resolution:
- Where are the barcodes (BPOS option) : READ_1, READ_2 or BOTH
- Which barcode should be used for sample look up (BM option) : READ_1, READ_2 or BOTH
- If BOTH, are the barcode REDUNDANT i.e. do they both resolve to the same sample (BRED option)
- If BOTH and REDUNDANT, should we require both to resolve to same sample or not (S option)?
Options about barcode matching:
- How many mismatches (MM option)
- Minimum quality for the base (Q option)
- Minimum mismatch delta with the second best match (when mismatches are present, MMD option)
Options about read processing:
- Should the barcode be removed (C option)?
- Should extra bases at beginning (XT option) and/or end (ZT option) of the read be removed?
- => important to control read length (e.g. bc is at one end only)
- => these values can be different for both ends using the synthax e.g. ZT=2:5 i.e. trim 2 and 5 bases from the end for read 1 and 2, respectively.
Options about input and output:
- Allow user to give all output file names and paths ; these names/pathes should be provided in the barcode file (extra columns)
- Allow to read and write in compressed (gzip) format to save space (default)
- MD5 file can be generated (CREATE_MD5_FILE=true)
Jemultiplexer USE_EMBASE run mode
This mode is available from version 1.0.3
In this running mode, simply call Jemultiplexer on the lane file(s) and Jemultiplexer will use information stored in emBASE to demultiplex and place the demultiplexed files directly in your group NGS library (an example of such a group library is shown below and decribed at here).
For example:
> java –Xmx4g –jar jemultiplexer.jar F1=/path/file1.txt.gz USE_EMBASE=true
- No barcode file : barcodes are fetched from emBASE
- File location is used to look up emBASE
- User calling Jemultiplexer is used for authentication and rights
- Demultiplexed files are named according a pre-defined naming scheme and directly placed where they should => emBASE will automatically know about them
- Demultiplexing options right at GCBridge form validation
Comparison with other demultiplexers
Single end mode : fastx barcode splitter
- Matched options for comparison
- Exactly the same results
Paired-end mode : fastq-multx barcode splitter
- Matched options for comparison
- Can only compare in the ‘standard situation’ (one sample == one barcode)
- Exactly the same results
Detailed Usage
Jemultiplexer:
Jemultiplexer : A fastq files demultiplexer with many neat options. Input fastq files can be in gzip compressed format (end in .gz). By default output files are gzipped and have names following the pattern 'samplename_barcode-barcode2...-barcodeN_2.txt.gz' unless you gave file names to use within the barcode description file. See 'Usage' section below for all other option defaults
Usage:
java -Xmx2g -jar jemultiplexer.jar F1=fastq_1.txt.gz F2=fastq_2.txt.gz BF=barcodes.bs O=/Users/girardot/Desktop/test-jemultiplexer2/ BPOS=BOTH MM=1 MMD=1 Q=10 XT=1 ZT=0 DIAG=diags.txt UN=true GZ=true CREATE_MD5_FILE=true &
Version: 1.0.4
Option | Description |
---|---|
FASTQ_FILE1=File [ short name : F1 ] | Input fastq file (optionally gzipped) for single end data, or first read in paired end data.
Required. |
FASTQ_FILE2=File [ short name : F2 ] | Input fastq file (optionally gzipped) for the second read of paired end data.
Default value: null. |
INDEX_FILE1=File [ short name : I1 ] | Fastq file for index 1 (optionally gzipped) i.e. when barcodes are read with specific primers (the Illumina way basically).
When using INDEX_FILE, **sample encoding** barcodes cannot be found within the reads anymore (BM option set to NONE automatically, see below). Nevertheless, barcodes (for e.g. molecular barcoding or for any purpose other than sample encoding) can be still found in the read(s) and the BPOS option is then used to indicate the presence of such barcode(s). While the options Q, MM, MMD, S, BRED apply to the INDEX_FILE barcodes; the options XT, ZT, CLIP will always apply to barcodes found in reads only. Default value: null. |
INDEX_FILE2=File [ short name : I2 ] | Fastq file for index 2 (optionally gzipped) i.e. when barcodes are read with specific primers (the Illumina way basically).
A INDEX_FILE1 MUST be provided when INDEX_FILE2 is given. This situation corresponds to using 2 extra specific primers for index sequencing (not clear whether this even exists right now!). When using INDEX_FILE, **sample encoding** barcodes cannot be found within the reads anymore (BM option set to NONE automatically, see below). Nevertheless, barcodes (for e.g. molecular barcoding or for any purpose other than sample encoding) can be still found in the read(s) and the BPOS option is then used to indicate the presence of such barcode(s). While the options Q, MM, MMD, S, BRED apply to the INDEX_FILE barcodes; the options XT, ZT, CLIP will always apply to barcodes found in reads only. Default value: null. |
BARCODE_FILE=File [ short name : BF ] | Barcode file describing sequence list and sample names. Tab-delimited file with 2 columns, with the sample in col1 and the corresponding barcode in col2.
Simple barcode file format : 2 tab-delimited colums If multiple barcode map to the same sample, either line can be duplicated e.g. sample1 ATAT sample1 GAGG sample2 CCAA sample2 TGTG Or barcodes can be combined using the OR operator '|' i.e. the file above can be re-written like sample1 ATAT|GAGG sample2 CCAA|TGTG Finally, for the special situation of paired-end data in which barcodes differ at both ends (ie BPOS=BOTH BRED=false BM=BOTH , see BRED option description), barcodes for read_1 and read_2 can be distinguished using a ':' separator i.e. sample1 ATAT:GAGG sample2 CCAA:TGTG Here understand that sample 1 is encoded with ATAT barcode at read_1 AND GAGG barcode at read_2. Note that you can still combine barcodes using | e.g. sample1 ATAT|GAGG:CCAA|TGTG would mean that sample 1 is mapped by the combination of barcode: ATAT OR GAGG at read_1 AND CCAA OR TGTG at read_2. Extended barcode file format : 3 (single-end) or 4 (paired-end) tab-delimited colums same as the simple barcode file format but the extra columns contains the file name(s) to use to name output files : only one extra column for single-end and 2 extra columns for paired-end. In case, lines are duplicated (multiple barcodesmapping the same sample), the same file name should be indicated in the third (and fourth) column(s). sample1 ATAT spl1_1.txt.gz spl1_2.txt.gz sample1 GAGG spl1_1.txt.gz spl1_2.txt.gz sample2 CCAA spl2_1.txt.gz spl2_2.txt.gz Or sample1 ATAT|GAGG:CCAA|TGTG spl1_1.txt.gz spl1_2.txt.gz Ns in barcode sequence are allowed and are used to flag positions that should be ignored in sample matching i.e. they will be clipped off the read sequence (like in iCLIP protocol). If you need to get access to the nature of these positions in downstream processing steps (e.g. PCR dups elimination), consider enabling the METRICS_FILE_NAME option. Required. Cannot be used in conjuction with option(s) USE_EMBASE (EM) |
USE_EMBASE=Boolean [ short name : EM ] | Tells Jemultiplexer to fetch information from emBASE and place demultiplexed files directly in emBASE repository structure.
This option is mutually exclusive with BARCODE_FILE. This option cannot be used when using INDEX_FILE. Note that using this option forces O=null GZ=true UN=true UF1=null UF2=null STATS_ONLY=false (all other user options supported). Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} Cannot be used in conjuction with option(s) BARCODE_FILE (BF) |
BARCODE_READ_POS=BarcodePosition [ short name : BPOS ] | For paired-end data, where to expect the barcode(s) : READ_1 (beginning of read from FASTQ_FILE_1), READ_2 (beginning of read from FASTQ_FILE_2), BOTH (beginning of both reads).
Automatically set to READ_1 in single end mode. If INDEX_FILE otions is/are used, BPOS is then used to indicate the presence of additional barcode(s) in the read(s). Note that by default, BPOS=NONE is assumed when INDEX_FILE option(s) is/are used. !! Importantly, these additional barcodes must not encode sample identity information (BM is forced to NONE) but used for e.g. molecular barcoding or for any purpose other than sample encoding. Default value: BOTH. This option can be set to 'null' to clear the default value. Possible values: {READ_1, READ_2, BOTH, NONE} |
REDUNDANT_BARCODES=Boolean [ short name : BRED ] | For paired-end data with BARCODE_READ_POS == BOTH, or both INDEX_FILE1 and INDEX_FILE2 are provided ;
this option indicates if both read's barcodes encode redundant information, which is the usual situation (REDUNDANT_BARCODES=true) i.e. barcodes are supposed to be the same at both ends (or in both INDEX_FILE) or to resolve to the same sample (when a pool of barcodes has been used for each sample). When REDUNDANT_BARCODES=false (and INDEX_FILE are NOT used, see next section when using two INDEX_FILE), the 2 barcodes potentially encode different information. For example, only one of the barcodes encodes the sample the read belongs to while the second barcode might be a random barcode to tell apart PCR artefacts from real duplicates. Another example is when both barcodes should be used in a combined fashion to resolve the sample. In the first example, you should use BPOS=BOTH BRED=false BM=READ_1. In the second example, you should have BPOS=BOTH BRED=false BM=BOTH. Note that with BPOS=BOTH BRED=true BM=BOTH, the behavior would be different as Jemultiplexer would then check the STRICT option to perform sample resolution. Importantly, when BARCODE_READ_POS == BOTH AND REDUNDANT_BARCODES=false, BLEN, barcode matching options (MM, MMD, Q) and read trimming/clipping options (XT, ZT) accept different values for both barcodes in the form X:Z where X and Z are 2 integers. When using 2 INDEX_FILE and REDUNDANT_BARCODES=false, BPOS and BM are ignored (indeed, BPOS is used to indicate the presence of extra barcoding sequences in reads, and BM=NONE), either BRED=true and the S option will guide the sample lookup behavior or BRED=false meaning that both barcode should be combined prior to sample lookup. When BRED=false, barcode matching options (MM, MMD, Q) accept different values for both barcodes in the form X:Z where X and Z are 2 integers (like when not using INDEX_FILE). BUT BLEN, read trimming/clipping options (XT, ZT) then apply to extra barcodes (BPOS!=NONE) if present (or are irrelevant when BPOS=NONE). Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
BARCODE_FOR_SAMPLE_MATCHING=BarcodePosition [ short name : BM ] | Indicates which barcode(s) should be used for sample lookup -- this option is irrelevant when using INDEX_FILE option(s).
Automatically set to READ_1 in single end mode. For paired-end data and when BARCODE_READ_POS == BOTH, which barcode should be used to resolve sample : - use BM=READ_1 (beginning of read from FASTQ_FILE_1) if only this read should be used for sample matching, - use BM=READ_2 (beginning of read from FASTQ_FILE_2) if only this read should be used for sample matching, - use BM=BOTH (beginning of both reads) if both should be used ; when BM=BOTH, the behaviour of Jemultiplexer is different based on the value of REDUNDANT_BARCODES. If REDUNDANT_BARCODES=true, the two barcodes are considered to map to the same sample and Jemultiplexer uses the two barcodes according to the STRICT value. If REDUNDANT_BARCODES=false, the barcode file should map a couple of barcode to each sample (e.g. sample1 => AGAGTG:TTGATA) and Jemultiplexer needs both barcodes to find the relevant sample. Note that this is the only situation in which all barcode matching options (MM, MMD, Q) accept different values for both barcodes in the form X:Z where X and Z are 2 integers. !! Important !! . Default value: BOTH. This option can be set to 'null' to clear the default value. Possible values: {READ_1, READ_2, BOTH, NONE} |
STRICT=Boolean [ short name : S ] | For paired-end data and when BARCODE_READ_POS == BOTH and BM=BOTH and BRED=true (or when using 2 INDEX_FILE with BRED=true),
tells whether both barcodes should resolve to the same sample. When true and if only one of the two reads has a barcode match, the read pair is ignored. When false and if only one of the two reads has a barcode match, the read pair is assigned to the corresponding sample ; in cases where reads resolve to different samples, the read pair is ignored. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
BCLEN=String [ short name : LEN ] | Length of the barcode sequences, optional. Taken from barcode file when not given.
In situations where BARCODE_READ_POS == BOTH AND REDUNDANT_BARCODES=false, two distinct length can be provided using the syntax LEN=X:Z where X and Z are 2 integers representing the barcode length for read_1 and read_2 respectively. IMPORTANT : when using INDEX_FILE, this MUST be given if BPOS!=NONE as BLEN then indicates the length of the non sample encoding barcodes found in reads. If BPOS=NONE, this option is otherwise totally ignored when using sample index files Default value: null. |
MAX_MISMATCHES=String [ short name : MM ] | Maximum mismatches for a barcode to be considered a match. MM=null is like MM=0
In situations where both barcodes are used for sample matching i.e. BPOS=BOTH BM=BOTH (or 2 INDEX_FILE given) (note that most likely BRED=false as it does not make great sense otherwise), two distinct values can be given here using the syntax MM=X:Z where X and Z are 2 integers to use for read_1 and read_2 respectively. Default value: 1. This option can be set to 'null' to clear the default value. |
MIN_MISMATCH_DELTA=String [ short name : MMD ] | Minimum difference between number of mismatches in the best and second best barcodes for a barcode to be considered a match. MMD=null is like MMD=0
In situations where both barcodes are used for sample matching i.e. BPOS=BOTH BM=BOTH (or 2 INDEX_FILE given) (note that most likely BRED=false as it does not make great sense otherwise), two distinct values can be given here using the syntax MMD=X:Z where X and Z are 2 integers to use for read_1 and read_2 respectively. Default value: 1. This option can be set to 'null' to clear the default value. |
MIN_BASE_QUALITY=String [ short name : Q ] | Minimum base quality. Any barcode bases falling below this quality will be considered a mismatch even in the bases match. Q=null is like Q=0.
In situations where both barcodes are used for sample matching i.e. BPOS=BOTH BM=BOTH (or 2 INDEX_FILE given) (note that most likely BRED=false as it does not make great sense otherwise), two distinct values can be given here using the syntax Q=X:Z where X and Z are 2 integers to use for read_1 and read_2 respectively. Default value: 10. This option can be set to 'null' to clear the default value. |
XTRIMLEN=String [ short name : XT ] | Extra number of base to be trimmed right after the barcode (only used if CLIP_BARCODE=true). Default is 1 as an extra 'T' (or 'A' depending how you see it) is added for barcode ligation but this default will be adapted according to the rules below unless using INDEX_FILE with BPOS=NONE, in which case XT=0.
When running paired-end, two distinct values can be given using the syntax XT=X:Z where X and Z are 2 integers to use for read_1 and read_2 respectively. Note that even when BPOS=READ_1 or BPOS=READ_2, a X:Y synthax can be given to trim the read w/o barcode as to end up with reads of the same length (note that this can also be operated using ZT). If a unique value is given, e.g. XT=1, while running paired-end the following rule applies : (1) BPOS=READ_1 or BPOS=READ_2, no trim is applied at the read w/o barcode ; (2) BPOS=BOTH, the value is used for both reads. Note that XT=null is like XT=0. Default value: 1. This option can be set to 'null' to clear the default value. |
ZTRIMLEN=String [ short name : ZT ] | Extra number of bases to be trimmed from the read end i.e. 3' end. Pretty handy when a pipeline is set and you already know you'll trim read at a given size. ZT=null is like ZT=0.
When running paired-end, two distinct values can be given here using the syntax ZT=X:Z where X and Z are 2 integers to use for read_1 and read_2 respectively. Note that even when BPOS=READ_1 or BPOS=READ_2, a X:Y synthax can be given to trim the read w/o barcode as to end up with reads of the same length (note that this can also be operated using XT). Note that if a single value is passed, the value always apply to both reads in paired-end mode without further consideration. Default value: 0. This option can be set to 'null' to clear the default value. |
CLIP_BARCODE=Boolean [ short name : C ] | Remove barcode sequence from read, as well as XTRIMLEN (and ZTRIMLEN) bases if applicable, before writing to output file.
If false, reads are written without modification to output file. Apply to both barcodes when BPOS=BOTH. When using INDEX_FILE, only applies to barcodes found in read(s). Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
ADD_BARCODE_TO_HEADER=Boolean [ short name : ADD ] | Add matched barcode at the end of the read header. Apply to both barcodes when BPOS=BOTH.
If true, the string ':barcode' is added at the end of the read header with a ':' added only if current read header does not end with ':'. If both reads of the pair have a barcode (i.e. BARCODE_READ_POS == BOTH), thenthe second read also has its own matched barcode written. Else, the read without a barcode receives the barcode from the barcoded read. For example : '@D3FCO8P1:178:C1WLBACXX:7:1101:1836:1965 2:N:0:' becomes '@D3FCO8P1:178:C1WLBACXX:7:1101:1836:1965 2:N:0:BARCODE'
In the very special situation of barcodes containing random positions, i.e. 'N', (for example like in the iCLIP protocol), the added sequence is the clipped sequence and NOT the matched barcode. When using INDEX_FILE(s), the rule is the following: First the sample encoding barcodes from I1 (and I2 when relevant) are added to the barcode I1:I2 like '@D3FCO8P1:178:C1WLBACXX:7:1101:1836:1965 2:N:0:I1_BARCODE:I2_BARCODE' Then, if BPOS!=NONE, the sequence clipped from the read(s) are added to their own header, like '@D3FCO8P1:178:C1WLBACXX:7:1101:1836:1965 2:N:0:I1_BARCODE:I2_BARCODE:CLIPPED_SEQ_FROMREAD' Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
READ_NAME_REPLACE_CHAR=String [ short name : RCHAR ] | Replace spaces in read name/header using provided character. This is particularly handy when you need to retain ADDed barcode in read name/header during mapping (everything after space in read name is usually clipped in BAM files). For example, with RCHAR=':' :
'@D3FCO8P1:178:C1WLBACXX:7:1101:1836:1965 2:N:0:' becomes '@D3FCO8P1:178:C1WLBACXX:7:1101:1836:1965:2:N:0:BARCODE' Default value: null. |
QUALITY_FORMAT=FastqQualityFormat [ short name : V ] | A value describing how the quality values are encoded in the fastq. Either 'Solexa' for pre-pipeline 1.3 style scores (solexa scaling + 66), 'Illumina' for pipeline 1.3 and above (phred scaling + 64) or 'Standard' for phred scaled scores with a character shift of 33. If this value is not specified (or 'null' is given), the quality format will be detected automatically.
Default value: Standard. This option can be set to 'null' to clear the default value. Possible values: {Solexa, Illumina, Standard} |
METRICS_FILE_NAME=String [ short name : M ] | File name where to write demultiplexing statistics. Either a name (in which case the file will be created in the output dir) or full path.
Default value: jemultiplexer_out_stats.txt. This option can be set to 'null' to clear the default value. |
BARCODE_DIAG_FILE=String [ short name : DIAG ] | Name for a barcode match reporting file (not generated by default).Either a name (in which case the file will be created in the output dir) or full path. This file will contain a line per read pair with the barcode best matching the read subsequence or 'null' when no match is found accordign to matching parameters ; and the final selected sample. This file is useful for debugging or further processing in case both ends are barcoded.
Default value: null. |
OUTPUT_DIR=File [ short name : O ] | Where to write output files. By default, these are written in running directory (see default value).
Default value: /Users/girardot/Work/eclipse_ws/jemultiplexer. This option can be set to 'null' to clear the default value. |
FORCE=Boolean [ short name : ] | Tells Jemultiplexer to overwrite files when already existing.
Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
GZIP_OUTPUTS=Boolean [ short name : GZ ] | Compress output s_l_t_barcode.txt files using gzip and append a .gz extension to the filenames.
Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
KEEP_UNASSIGNED_READ=Boolean [ short name : UN ] | Should un-assigned reads be saved in files or simply ignored. File names will be automatically created or can be given using UF1 & UF2 options. Default names are unassigned_1.txt and unassigned_2.txt for F1 and F2 respectively.
Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
UNASSIGNED_FILE_NAME_1=String [ short name : UF1 ] | Name of the file in which to write unassigned reads from FILE1.Either a name (in which case the file will be created in the output dir) or full path.
Default value: unassigned_1.txt. This option can be set to 'null' to clear the default value. |
UNASSIGNED_FILE_NAME_2=String [ short name : UF2 ] | Name of the file in which to write unassigned reads from FILE2.Either a name (in which case the file will be created in the output dir) or full path.
Default value: unassigned_2.txt. This option can be set to 'null' to clear the default value. |
WRITER_FACTORY_USE_ASYNC_IO=Boolean [ short name : ASYNC ] | Use one thread per Fastq Writer.
Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false} |
STATS_ONLY=Boolean [ short name : ] | Only produces metric and diagnostic reports i.e. no output fastq file produced.
Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} |