Compute Environment

File Servers

IT Services provides a centrally managed data storage infrastructure for all groups at EMBL. A tiered storage model has been implemented with three different categories of storage: Tier-1 and Tier-2 storage as well as a Large Volume Archiving tier. EMBL groups can choose space from these storage tiers based on their individual needs for performance, availability, pricing, etc. If your group needs central disk space, please contact IT. More information at IT Service-Data Storage

GBCS Application Servers

GBCS has two machines to run specific applications developed and/or maintained by the GBCS. These are: GBCS: CentOS 6.3. The production server running, postgresql, mysql with the applications Galaxy, embase, cellbase etc. GBCS-DEV: CentOS 6.5: The development machine, mainly used for Galaxy related development. Currently also supporting docker.

Compute Cluster / Big-mem GB Servers

GB Unit big-mem Interactive Servers

1. schroedinger:

  • log in from within EMBL with e.g. ssh schroedinger
  • 40CPUs, 1024GB RAM,  CentOS6
  • 10 Tb of local space in /tmpdata/
  • registered as LSF submitter i.e. you can bsub to the cluster  
  • /scratch is also available from there

2. spinoza:

  • log in from within EMBL with e.g. ssh spinoza
  • 40CPUs, 1024GB RAM, CentOS6
  • 10 Tb of local space in /tmp/
  • registered as LSF submitter i.e. you can bsub to the cluster
  • /scratch is also available from there

EMBL LSF Cluster

EMBL Cluster Nodes

  • 60 nodes comprising more than 700 CPU cores, including 8 nodes with 1TB of RAM and 40 Cores and 52 nodes with 16GB of RAM and 8 Cores
  • Runs LSF 7.0.6 and CentOS 6.2
  • The default memory limit is 2GB, but you can request more memory and CUPs when submitting the jobs
  • More information about EMBL cluster at EMBL-cluster

If you use the cluster, or intend to, you should subscribe to the clusterng@embl.de mailing list (send an email to clusterng-subscribe@embl.de). More information at IT Service-Computing

LSF Submission System

  • A software for managing and accelerating batch workload to distribute jobs to the cluster
  • To run jobs on EMBL-clusters, log into submaster, schroedinger or spinoza server and use the "bsub" command for submitting jobs
  • For more information, please check IT-LSFand EMBL-cluster

When the job has problems, you can use "bjobs -l" to check the detail information about the job. For example, it shows which computer (node) the job is executed. You can ssh to that node (such as compute036) through submaster machine (not from schroedinger/spinoza) to check the job running detail status.

Storage issues

When working with huge data files, the question of where to store your input/output files becomes an issue. The only file servers accessible from the cluster nodes are the ones on tier1. Of course, other storages are accessible from the cluster nodes and you should consider using the right one according to the situation. You could store everything on tier1 but this would cost a lot and you should avoid the situation where processes access concurrently the same file (e.g. an index file). So what else can you do ?

  1. Use the /tmp space of the cluster node to store intermediate files that you won t need later. You can make sure to have enough tmp space by requesting a minimum amount of tmp space in the bsub options
  2. Use the 100 Tb of space available under /scratch (check this IT post). The huge advantages of /scratch : it is available from all cluster nodes, it supports parallel file access (FHFS) and it is also accessible from spinoza and schroedinger GB servers. 
  3. Use tier2 space (4 times cheaper than tier1!) to store your file. Remember that tier2 is also safe, it is a RAID6 redundancy; but there is no back up. You can then stage-in your files at the beginning of your job using scp i.e. you scp the needed data either to tier1 space, local /tmp or /scratch.

See? Plenty of solutions not to waste your expensive tier1 space!

Tools and scientific libraries


Most of the tools we maintain are installed under BCR/SEPP for broad availability : these tools/softwares are then available EMBL-wide (all servers and and all cluster nodes) under /g/software/bin/. More information and how to use these softwares, please check SEPP-softwares The BCR/SEPP system is an initiative of the EMBL's Bio-IT community

GBCS tools not in BCR/SEPP

It is not always straightforward to deploy tools in BCR/SEPP. We therefore also install tool on our file server (available from all servers): > /g/funcgen/gbcs-tools/ Simply check what's in there

R Studio Server

RStudio Server is a Linux server application that provides a web browser based interface to the version of R running on the server. We run R Studio Server on both spinoza and schroedinger, on port 8787. Simply point your browser to : http://spinoza:8787/   and log in with your EMBL credentials.