Loading...
 

NGS Ecosystem

We aim at providing the unit with an integrated solution to store, manage and analyze NGS data.

To achieve this goal, we develop and/or maintain different applications in a coherent ecosystem.

Image

emBASE is a web-based database to store and describe your microarray and NGS experiments. It is fully MIAME compliant and is a bridge to EBI ArrayExpress data submission, which is mandatory upon publication. In emBASE, your data remains private until you decide otherwise. emBASE facilitates finding the right data, even long after collaborators have left the lab. Access emBASE Server (if you don't have an emBASE account, contact us first).

Galaxy is a web application that allows performing reproducible data analyses in a user-friendly graphical interface. To ensure scalability, these analyses are performed on high-performance servers and super-computers (EMBL cluster maintained by the EMBL IT Services).  Access Galaxy server and log in with your usual unix account.

GCBridge is a in-house component used to automatically transfer your NGS data from GeneCore servers to our NGS ecosystem. The NGS files (fastq) are copied to your group emBASE-managed NGS library (this folder structure leaves on your group file server), registered in emBASE and Galaxy. All this process is fully automated : when new data is available, you receive an email containing a link to the transfer validation form. Once submitted, all you have to do is wait for the confirmation of successful data transfer.

Galaxy Workflows : We develop analysis pipelines that you can use to perform standard data analysis (QC, read mapping & filtering, peak calling, ...). These workflows are published in Galaxy. Note that we also have pipelines available as shell scripts like IDR analysis for ChIP-seq.

RStudio Server instances run on both spinoza and schroedinger, on port 8080. Simply log in with your EMBL credentials.

Your group NGS Library is a folder architecture that hosts all your NGS raw data (lane files, demultiplexed lane files) in a controlled fashion. The data files stored in this library are fully protected from deletion or renaming, which is essential to insure data traceability. Of course all members of your group have full read access to this data. This library is fully managed by emBASE, it is in fact where emBASE stores the NGS data files that belong to your group. It is also where the GCBridge (see below) copies data. Because this library is located on your tier1 group space, it is visible from all servers and this allows us to never duplicate the data : emBASE, Galaxy, RStudio Server instances, the EMBL cluster, spinoza/schroedinger servers can all access your NGS data.

 

Learn : Slides, videos

This presentation gives an overview of the whole system, and of each individual components. It also contains a section of the very first practical operations users have to perform : GCBridge Data Transfer Form validation and sample/protocol annotations in emBASE. 

Even better, watch Charles presenting these slides (cut into sub-parts for convenient viewing) :

Part 1: The Big Picture

Part 2: What is the data?

Part 3: emBASE

Part 4: NGS Data Library

Part 5: GeneCore Bridge

Part 6: Practical Steps

Part 7: Galaxy