BridgeIntegrateExternalData
This page explains how you can use the GCBridge with data that was not generated by GeneCore.
Since GCBridge was written with GeneCore conventions in mind, the procedure presented here describes how to let the bridge think the data comes from GeneCore.
1. Create a new directory "YYYY-MM-DD-FLOWCELLID" for the flowcell to be transferred
- this directory must be named "YYYY-MM-DD-FLOWCELLID" e.g. "2019-05-06-H5YCHBGXB"
- placed in the "genecore_transfer" directory found in your NGS library folder e.g. /g/
/NGS_Data_Repository/genecore_transfer/
2. Move all the fastq files related to the flowcell to transfer in this new directory
It is important to process all the fastq files of all the lanes at once (for your group). So make sure your colleagues give you their files from this flowcell.
3. Rename all the fastq files according to the our file naming convention
- For paired-end :
FLOWCELL-ID_SAMPLENAME_XXsXXX-1-1_OWNER_laneXZZZZZZZZ_1|2_sequence.txt.gz - For single-end :
FLOWCELL-ID_SAMPLENAME_XXsXXX-1-1_OWNER_laneXZZZZZZZZ_sequence.txt.gz
with :
- FLOWCELL-ID :
is the flowcell ID eg as found in the third slot of the fastq read header see example below with the sequencer serial number (M00724), the flowcell id (00000000-BM4P9) and the lane number (1) bolded.- @M00724:53:000000000-BM4P9:1:1101:15471:1334 1:N:0:11
- SAMPLENAME :
is the sample (or assay) name made: numbers, letters and ‘-‘ (all others won t work) where X are numbers only i.e. 2 numbers (the year) followed by ’s’ followed by 1 to n numbers then ‘-‘ then one number then ‘-‘ then one number) is the GeneCore sample ID that you need to fake e.g. 19s0000-1-1 would be accepted is the user name (only letters); the best here is to use the unix username X is the lane number and is like ; this is because when data was multiplexed the contains the same "assay name” for all multiplexed samples in this lane so one can use the ZZZZZ to inject the sample name for this demultiplexed file - 1|2 : only for paired end data, the file name should state either ‘_1’ or ‘_2’
4. For each lane, create a lane information file "lane_info_X.txt"
- The name must be like "lane_info_X.txt" where X is the lane number e.g. "lane_info_1.txt"
- The text file must contain the following key=value lines :
- useremail=xx@embl.de (where xx is the unixname of the data owner)
- paired=0 or 1 (tells if the lane was run paired-end mode with 0 for false and 1 for true)
- multiplexed=0 or 1 (tells if the lane was multiplexed, with 0 for false and 1 for true)
- demultiplexed=0 or 1 (when multiplexed, tells if the lane has been demultiplexed already, with 0 for false and 1 for true)
5. Make sure the new directory and all its content have the unix permissions 777
- needed so our daemons can move and delete the data
- you can do this with the unix
-
> chmod -R 777
-
6. Make sure the sequencer exists in embase.embl.de
In case the sequencer is not available, please contact the embase admin (gbcs) with the following information :
- sequencer serial number (see example above)
- sequencer type e.g. Illumina HiSeq 4000
- the location of this sequencer (the service provider name)
7. Tell the bridge the folder is ready for transfer (touch dir_ready_for_gbc_transfer)
- create an empty file named "dir_ready_for_gbc_transfer"
- this can be done with the unix command
-
> touch dir_ready_for_gbc_transfer
-