GCBridgeNextSteps
Your data has been successfully submitted through the GCBridge ? Your wonder what to do next ?
Here is the page your are looking for !
Pick Your Situation ...
- I received an email saying my "Demultiplexed files available in your group NGS library..."
- I received an email saying "Your NGS data has been processed and is available in emBASE"
- How can I easily learn where are my files located in the NGS library ?
I received an email saying my "Demultiplexed files available in your group NGS library..."
1. Check the results of the demultiplexing
=> The number of reads assigned to each library should not be too much unbalanced and the number of unassigned reads should not be too high (generally below 10% but this is just an indication). Keep reading to learn how you get to these numbers...
Locate the line in the email stating "Base directory of demultiplexed files", this is the base folder containing your demultiplexed files.
This folder contains a sub-folder named "stats" (see picture below). Two files are available :
- The first has a name ending with "...jemultiplexing_metrics.txt" (this is the file boxed in green in below picture) : it contains the number of reads processed, the number of reads assigned to each of the barcoded library and the number of read that could not be assigned.
- The second has a name ending with "..._galaxy_barcodes.txt" (this is the file boxed in red in below picture) :
- it lists the barcodes used for demultiplexing i.e. as provided by you (and colleagues) in the GCBridge form
- this file also lists the exact path to the demultiplexed files.
2. What should I do if the demultiplexing numbers look weird ?
- Check the barcodes you gave (these are listed in the red boxed file above). If you spot mistakes, here is what you must do :
- Log in emBASE, navigate to the NGSAssay for this lane and edit each of the library with a wrong barcode
- Send a email request to GBCS (gbcs_request@embl.de) re-stating the full path to the lane file that needs to be reprocessed (this is found in the demultiplexing email you received i.e. the one with a title like "Demultiplexed files available in your group NGS library...". You can also simply forward this mail in the request)
- Check the demultiplexing options used. If you spot a mistake or are simply unsure, simply come and talk to us. You can always ask us to run the demultiplexing again with new options by sending us a request (gbcs_request@embl.de) re-stating the full path to the lane file that needs to be reprocessed.
I received an email saying "Your NGS data has been processed and is available in emBASE"
You receive this email when the data has been loaded in emBASE (you can then see it when you log in) and in Galaxy (if you asked for it) i.e. :
- you can start working with the data either in Galaxy or usign the command line
- you can log in emBASE and start annotating your samples and linking protocols to your samples, extracts and NGS libraries.
Note that if you requested demultiplexing, the demultiplexed files will not be created yet and you need to wait until you get an email entitled "Demultiplexed files available in your group NGS library...". In this situation, you should NOT re-demultiplex the fastq lane file(s) in Galaxy , please wait until the demulitplexed files are computed, they will be placed in Galaxy automatically for you.
How can I easily learn where are my files located in the NGS library ?
- This is stated in the emails you received... ok that does not help after some time
- You can always navigate in the folders of your lab NGS Library if you now the date...ok that does not really help if your lab has tons of NGS data
- Using the emBASE web interface :
- each lane is represented as a NGS Assay in the NGS Sequencing section. Use the filters at the top of the page to narrow down the dataset you are looking for.
- Alternatively, locate the Experiment that contains the data your looking for and navigate back to the NGS Assay using the Raw data set tab.
- I am a bioinformatician, I want a command line solution !
- Please use one of the GetNGSxxx command line utility of the JemBASE API. You can fetch data file description (including all sample annotations) by lane file path, flowcellid/lane number, project or experiment name. Piping stdout to awk is an easy way to create sym links to the repository data files, right from your project folder ! No data dups !
- Full description of the tools can be found in this page.
- Here is how you can create sym links to all files from a lane you just received :
java -jar /g/funcgen/gbcs-tools/embase-cmdline/GetNGSByFlowcellLane.jar -f C3WEMACXX -l 6 --split -n -c RBAFastqFiles | awk 'NR!=1{system("ln -s " $1)}'