Galaxy QC Workflow Report Generator
The power of using Galaxy comes when using Workflows (WF). You can indeed process many samples in parallel and name every ouputs in a unique and consistent fashion. When combined with NFSTransfer, a Galaxy tool allowing to easily copy results accumulating during WF execution to a predefined location outside Galaxy space (e.g. on your file server), Galaxy WFs allow to accumulate analysis results for all your samples under a predefined folder with each results stored in a specific sub-folder and named in a systematic fashion.
GalaxyQCReport is a command line tool that parses outputs genarated in Galaxy Workflows into an HTML meta report presenting all-samples summary tables and links to individual sample reports.
We developped GalaxyQCReport with flexibity in mind so we can easily:
- add support for a new Galaxy Tool (or adapt to different tool versions),
- adapt to different naming strategies
- adapt to different storage organizations
There are of course a number of expectations that must be fullfiled in order to let GalaxyQCReport understand what is going on.
When building your worflow, simply follow the following rules for naming your result datasets and decidign where to copy them :
- All results to be parsed and assembled must be found under a "result root folder", the absolute path to which is given as a command line argument
- e.g. "/g/furlong/wf_results/"
- All results generated by a given Galaxy Tool (i.e. for all samples and different steps if applicable) must be found in a tool folder, below the "result root folder"
- the property result.dirname indicates the name where all the tool's results are found
- e.g. "result.dirname=fastqc" woudl indicate that all FastQC results are found under "/g/furlong/wf_results/fastqc/"
- This can also be a sub-path in the form "SPP/strand-cross-correlation"
- Results for a specific sample, generated by a specific tool at a specific workflow step are found as one unique file or one unique directory.
- the property onedirpersample (=0 |1) indicates if results generated by a specific tool come as a file or as a directory
- The sample result file or directory must obey a systematic naming convention which, at the minimum, includes the name of the sample it is related to
- When a tool is executed at different workflow steps
- all results of a particular step are placed in a sub-folder which name is called the 'category'
- e.g. all FastQC results gathered on all reads found under "/g/furlong/wf_results/fastqc/all-reads"
- e.g. all FastQC results gathered on filtered reads found under "/g/furlong/wf_results/fastqc/no-multi_nodups-reads"
- Or, the category name must be found in the result file or directory name (see below)
- the property onedirpercategory (=0 |1) indicates if results for a specifc step are stored in their own directory (=1)
- all results of a particular step are placed in a sub-folder which name is called the 'category'
- The systematic naming should be provided in the form of two regular expression
- the sample.regex with exactly one match group is used to match sample result file/directory name to extract the sample name
- an optional "<CATEGORYNAME>" placeholder can be placed in the regex to indicate that this should be replaced by a category name right before matching
- e.g. "algmnt_metrics_
_(.+)_bowtie2"
- the category.regex with exactly one match group is used to match sample result file/directory name to extract the category name
- an optional "
" placeholder can be placed in the regex to indicate that this should be replaced by a sample name right before matching - e.g. "algmnt_metrics_(.+)_
_bowtie2"
- an optional "
- the sample.regex with exactly one match group is used to match sample result file/directory name to extract the sample name