NGS Ecosystem
We aim at providing the unit with an integrated solution to store, manage and analyze NGS data.
To achieve this goal, we develop and/or maintain different applications in a coherent ecosystem.
emBASE is a web-based database to store and describe your microarray and NGS experiments. It is fully MIAME compliant and is a bridge to EBI ArrayExpress data submission, which is mandatory upon publication. In emBASE, your data remains private until you decide otherwise. emBASE facilitates finding the right data, even long after collaborators have left the lab. Access emBASE Server (if you don't have an emBASE account, contact us first).
Galaxy is a web application that allows performing reproducible data analyses in a user-friendly graphical interface. To ensure scalability, these analyses are performed on high-performance servers and super-computers (EMBL cluster maintained by the EMBL IT Services). Access Galaxy server and log in with your usual unix account.
GCBridge is a in-house component used to automatically transfer your NGS data from GeneCore servers to our NGS ecosystem. The NGS files (fastq) are copied to your group emBASE-managed NGS library (this folder structure leaves on your group file server), registered in emBASE and Galaxy. All this process is fully automated : when new data is available, you receive an email containing a link to the transfer validation form. Once submitted, all you have to do is wait for the confirmation of successful data transfer.
Galaxy Workflows : We develop analysis pipelines that you can use to perform standard data analysis (QC, read mapping & filtering, peak calling, ...). These workflows are published in Galaxy. Note that we also have pipelines available as shell scripts like IDR analysis for ChIP-seq.
RStudio Server instance runs on spinoza. Simply log in with your EMBL credentials after reading the documentation!
Your group NGS Library is a folder architecture that hosts all your NGS raw data (lane files, demultiplexed lane files) in a controlled fashion. The data files stored in this library are fully protected from deletion or renaming, which is essential to insure data traceability. Of course all members of your group have full read access to this data. This library is fully managed by emBASE, it is in fact where emBASE stores the NGS data files that belong to your group. It is also where the GCBridge (see below) copies data. Because this library is located on your tier1 group space, it is visible from all servers and this allows us to never duplicate the data : emBASE, Galaxy, RStudio Server instances, the EMBL cluster, spinoza/schroedinger servers can all access your NGS data.
Watch how our ecosystem works
- Real life example : Launch an analysis pipeline on dozens of samples, in a few clicks
Learn : Slides, videos
This presentation gives an overview of the whole system, and of each individual components. It also contains a section of the very first practical operations users have to perform : GCBridge Data Transfer Form validation and sample/protocol annotations in emBASE.
Even better, watch Charles presenting these slides (cut into sub-parts for convenient viewing) :