Loading...
 

NGSDataLoading

 

In this page, you learn what happens after you validated NGS data transfer with the GCBridge, and how to lock/unlock data files and delete fastq lane files.

Please note that in-depth information about locking/unlocking is found in the emBASE locking/unlocking functionality page.

Sections :

  1. What happens to your NGS data files after validating the GCBridge form ?
  2. Locking the demultiplexed files and deleting the lane files
  3. How does emBASE store NGS data already demultiplexed by GeneCore ?
  4. Locking/Unlocking library files in absence of lane files

 

What happens to your NGS data files after validating the GCBridge form ?

When the GeneCore sequences data for you, the data files will be transferred to your group file server afterwards. You also will get an email (see screenshot below) with a link to the GCBridge to validate your experiment. 

Image

After you validated the data submission by completing the GCBridge web form linked in this email (also see the complete tutorial on how to fill in the GCBridge web form), the upload to emBASE will happen automatically and you will get an email upon successfull completion (see example email below).

Image

 

Now your "data" is in place i.e. :

  • data files and associated meta-data are registered in emBASE database
  • data files have been moved in your NGS group library :
    • either at the lane level (when files are not demultiplexed yet) ; we talk about sequence lane file(s) 
    • either at the raw bioassay level ; a raw bioassay represents the data file(s) of a single sample i.e. obtained after demultiplexing in the situation of multiplexed libraries
    • NB: in the case of single sample library (i.e. no multiplexing), the sequence lane file(s) and the raw bioassays file(s) are the same files.

Note that the library folder structure has also been created in the relevant place of your NGS group library (see picture below).

Image

This library folder structure lets you add/replace demultiplexed files by yourself (both fastq and bam files).

In the lane directory (i.e. the lane5 directory in the picture above), you will find one directory for each library, named like ‘LIBxx_RBAxx’ where xx is replaced by the internal emBASE ID of the library and the raw bio assay, respectively. In these library folders three directories can be found: bam, fastq and stats. These directories are writable at first, so you can add files to load into emBASE and later also to Galaxy. When these directories are writable, we say there are unlocked.

 

Locking the demultiplexed files and deleting the lane files

If at some point you want to remove the lane files to e.g. save space, we first expect files for each library to be loaded. These files can be either bam or fastq format, also a mixture is possible.

Getting demultiplexed fastq/bam files in emBASE can be achieved in different manners :

  1. By ordering demultiplexing when validating data transfer in the GCBridge. This is by far the preferred solution. 
  2. Using the command line tool AddLibraryFilesToNGSRepository.jar from the JemBASE API, this is the way to go when you have list(s) of files to copy/replace
  3. Manual copy directly in your the NGS group library (see previous section or this page for an introduction to this), this is the way to go to replace few files

Let's assume demultiplexed files are already available for now (i.e. situation 1).

EmBASE will only you to trash a fastq lane file if all the folders containing the demultiplexed files are read-only, we say they are locked.

The locking is performed on the NGS Assay page, by cliking the "Lock all raw data set folders" (see image below)

Image

Once the raw bio assays are locked, you can unlock the lane itself by clicking the unlock icon ( Image ).  You can now remove the lane file by clicking the ‘Remove lane files’ link at the top of page (see images below)

 

Image

Image

 

To see the loading and locking/unlocking of demultiplexed files, please see the next section. 

 

How does emBASE store NGS data already demultiplexed by GeneCore ?

After the validation with the GCBridge, your data folder should look like this:

Image

The ‘.info’ file is created for you with the information given to the GCBridge. This file is then used by emBASE to load your data.  emBASE does check for new experiments every 15 minutes, and loads them accordingly. You will get a confirmation email once the loading is done.

After the loading, your data folder should look like this:

Image

A lane folder is created for you, where you can store additional files. For every library (if not multiplexed there is only one) you get one folder with the internal emBASE library and rawbioassay IDs, e.g. LIB12_RBA1. In the library folder you find the bam, fastq and stats folders.

These folders are usually writable, i.e. unlocked, for the user's group. Only if GeneCore already provides the demultiplexed data, the folder of the loaded data is read-only i.e. locked. Indeed, the demultiplexed files are the only copy of the data know by emBASE (i.e. no lane file).

Like in the given example, the permissions would look something like this: 

Image

As you can see, the bam and stats folders are still writable. Adding files to these folders would still be possible. But because there is no lane file given, the fastq folder is locked, so that we can ensure these data to be available.

 

Locking/Unlocking library files in absence of lane files

Here we place ourselves in the situation where either GeneCore transferred already demultiplexed files or the lane file(s) has been deleted after demultiplexed were provided and locked in the system. 

 In other words, your assay would look something like that in emBASE. Note the absence of lane files (File(s) subsection) and the presence of files in the raw data sets table (bottom)

Image

Because the assay does not have any lane files associated, unlocking the raw data sets represents a data loss risk. Therefore, if you unlock such a folder, a copy of the folder is created. 

Image

As you can see, the fastq folder is writable again, but a one to one copy was created as fastq_backup and this in not writable. On the interface you will see a message saying ‘No files uploaded’. This just indicates that the raw bio assay was unlocked. Locking it again would reset the original state. Clicking the link ‘Show not loaded files’ shows the old files that are still in place.

Image

Now you could remove the files within the fastq folder and load new files. Doing so and clicking ‘Show not loaded files’ will show the files that are new in the directory. 

Image

After locking the raw bio assay again, the new files will be loaded and shown. 

Image

Note that  you can reload lane file(s) later on (for e.g. archiving). To do so, you simply place the file(s) somewhere known to emBASE (usually your group file server). After the file is accessible, type the full path to the file in the corresponding text box and press ‘Upload’

Image