Compute Environment

File Servers

IT Services provides a centrally managed data storage infrastructure for all groups at EMBL. A tiered storage model has been implemented with three different categories of storage: Tier-1 and Tier-2 storage as well as a Large Volume Archiving tier. EMBL groups can choose space from these storage tiers based on their individual needs for performance, availability, pricing, etc. If your group needs central disk space, please contact IT. More information at IT Service-Data Storage

GBCS Application Servers

GBCS has two machines to run specific applications developed and/or maintained by the GBCS. These are: GBCS: CentOS 6.3. The production server running, postgresql, mysql with the applications Galaxy, embase, cellbase etc. GBCS-DEV: CentOS 6.5: The development machine, mainly used for Galaxy related development. Currently also supporting docker.

Compute Cluster / Big-mem GB Servers

GB Unit big-mem Interactive Servers

1. seneca 

  • log in from within EMBL with e.g. ssh seneca
  • 64CPUs, 2TB RAM
  • local space in /tmpdata/
  • registered as SLURM submitter i.e. you can submit jobs directly to the cluster  
  • /scratch is also available from there

2. schroedinger:

  • log in from within EMBL with e.g. ssh schroedinger
  • 40CPUs, 1024GB RAM
  • 10 Tb of local space in /tmpdata/
  • registered as SLURM submitter i.e. you can submit jobs directly to the cluster  
  • /scratch is also available from there

2. spinoza:

  • log in from within EMBL with e.g. ssh spinoza
  • 40CPUs, 1024GB RAM, CentOS6
  • 10 Tb of local space in /tmp/
  • registered as SLURM submitter i.e. you can submit jobs directly to the cluster 
  • /scratch is also available from there

EMBL LSF Cluster

EMBL Cluster Nodes

If you use the cluster, or intend to, you should subscribe to the clusterng@embl.de mailing list (send an email to clusterng-subscribe@embl.de). More information at IT Service-Computing

SLURM Submission System

  • A software for managing and accelerating batch workload to distribute jobs to the cluster
  • To run jobs on EMBL-clusters, log into submaster, seneca, schroedinger or spinoza server and use the "sbatch" command for submitting jobs
  • For more information, please check IT-LSFand EMBL-cluster

Storage issues

When working with huge data files, the question of where to store your input/output files becomes an issue. The only file servers accessible from the cluster nodes are the ones on tier1. Of course, other storages are accessible from the cluster nodes and you should consider using the right one according to the situation. You could store everything on tier1 but this would cost a lot and you should avoid the situation where processes access concurrently the same file (e.g. an index file). So what else can you do ?

  1. Use the /tmp space of the cluster node to store intermediate files that you won t need later. You can make sure to have enough tmp space by requesting a minimum amount of tmp space in the bsub options
  2. Use the 100 Tb of space available under /scratch (check this IT post). The huge advantages of /scratch : it is available from all cluster nodes, it supports parallel file access (FHFS) and it is also accessible from spinoza and schroedinger GB servers. 
  3. Use tier2 space (4 times cheaper than tier1!) to store your file. Remember that tier2 is also safe, it is a RAID6 redundancy; but there is no back up. You can then stage-in your files at the beginning of your job using scp i.e. you scp the needed data either to tier1 space, local /tmp or /scratch.

See? Plenty of solutions not to waste your expensive tier1 space!

Tools and scientific libraries

Easybuild module

Most of the tools we maintain are installed under easybuild for broad availability: these tools/softwares are then available EMBL-wide (all servers and and all cluster nodes) using the "module load ". 

GBCS tools not in easybuild

It is not always straightforward to deploy tools in easybuild. We therefore also install tools on our file server (available from all servers): it is all in /g/funcgen/bin/

Simply check what's in there

R Studio Server

RStudio Server is a Linux server application that provides a web browser-based interface to the version of R running on the server. We run R Studio Server on seneca. Simply point your browser to https://rstudio.embl.de and log in with your EMBL credentials. Access restricted to intranet (or VPN)