The Sample Annotation System


In emBASE, a Sample Annotation is a particular value that takes a Sample Annotation Type for a given sample. So annotating a sample simply means giving values to particular Sample Annotation Types. For example, you can annotate a sample with DevelopmentalStage : embryonic stage 11, where DevelopmentalStage is the sample annotation type and embryonic stage 11 is the sample annotation.

To annotate a sample:

  • go to Biomaterials > Samples Then select a sample (click on the sample name) and, on the sample view page, click Annotate Sample. This is nice to edit/correct few annotations
  • most likely, you want to batch annotate samples using the Annotate samples from file link on the sample list page (Biomaterials > Samples). More information is given later in this how-to.

Quotes from MIAME:

MIAME describes the Minimum Information About a Microarray Experiment that is needed to enable the interpretation of the results of the experiment unambiguously and potentially to reproduce the experiment.

This means you need to annotate your samples to be MIAME compliant and this still holds for MINI-SEQ.

To help you in this process, MIAME compliant/required Sample Annotation Type have been created in emBASE. Please consult the Sample Annotation Type (go to Biomaterials > Sample Annotations) list and especially check the Sample Annotation Type whether it says it is a MIAME annotation and whether it is mandatory or not. There is currently no automated way in emBASE to check whether mandatory annotations have been provided, so this is up to you!

In addition to MIAME compliant Sample Annotation Types, you can create new Sample Annotation Types. In this case, please check the MGED Ontology (or a more recent ontology used by public repositories) or contact us to find if standard types would fulfill your needs.

Why should I bother to annotate my samples?

Samples annotation is a very important step in the microarray experiment annotation process. Sample annotations are used at different places in emBASE, the most relevant ones are:

  • At export time i.e. MAGE-TAB export requires samples to be annotated for deposition in public repository at publication time. If not EBI's ArrayExpress will reject your submission.
  • In Factor Value definition, which is required at publication time: most of the time, experimental factor values match one-to-one with sample annotation. In emBASE, defining factor values that match with sample annotations is AUTOMATIC if your samples are annotated.
  • In analysis modules: some plugins make use of sample annotations to understand how samples group together

These annotations are also extremely useful on a documentation point of view.

What annotations should I provide ?

Obvious sample annotations are those that can be turned into experimental factors and values i.e. factors that are experimental parameters or regarded as influencing the experimental results. Such annotations are usually biomaterial characteristics of the sample (age, disease state…) but not only. For example, if different operators prepared samples used in the study, you might want to add this fact as an annotation as it might turn out to help explaining results and including these confounding factors in statistical analysis.

Available Sample Annotation Types in emBASE

What follows is a description of Sample Annotation Types available in BASE. Note that some annotations are coupled (i.e. if you provide one, you should also provide the other one)

MIAME Required Sample Annotation Types

  • Sex : please select from value list. If not applicable, don’t provide, this will result in using the default value “unknown_sex”
  • Organism: enter organism name as defined by standard like NCBI taxonomy
  • SampleType: please select from suggestion in the description or enter a new one
  • DiseaseState: if not applicable, then don’t provide, it will be defaulted to “normal”
  • DevelopmentalStage and/or Age and InitialTimePoint (Age always comes with InitialTimePoint)

Additonal MIAME Sample Annotation Types

You should use them if applicable to your sample

  • GeneticModification and IndividualGeneticCharacteristics : usually come in couple. The first describes more the kind of modification while the later should detail the genetic modification of the sample.
  • CellType: The type of cell used in the experiment ; example of instances, epithelial, glial etc.
  • CellLine: The identifier for the established culture of a metazoan cell

Custom Sample Annotation Types

Fell free to create new Sample Annotation Type. In this case, please check the MGED Ontology or contact the BASE administrator to find if standard types would fulfill your needs.

How to Batch Annotate Samples

Overview steps

  1. Go to the Biomaterials > Samples page.
  2. Select samples you want to annotate
  3. Scroll to the end of the page and click the button Create Template Annotationfile
  4. Open the generated document in e.g. excel
    1. Remove Annotation Type that don’t apply
    2. Add annotation values. Please refer to below section to correctly describe annotation values.
    3. Save your document as “text-only”
  5. Go back in emBASE (Biomaterials > Samples page) and follow the link "Annotate sample from file" at the top of the page
  6. Fill in the batch annotation form and click “Continue (dry-run)”.
    • This will not save your data in the emBASE but evaluate your input.
    • An online editable spreadsheet is built : you can correct mistakes or change values
  7. When you are happy with the spreadsheet, click the "Annotate for real" buuton at the bottom of the page

File format

The file format is rather rigid, and samples and annotation types must be identified by name. The file should be a tab-separated matrix of sample annotations, with one sample per row and one sample annotation type per column. The first row of the file should contain the names of the annotation types (case sensitive), and the first column should contain sample names (again, case sensitive). The top left element of the file may be anything. If there a sample name or annotation type name does not uniquely identify one item (e.g. if you have two samples with the same name), an error will be generated.


  Organism DiseaseState Age InitialTimePoint ...
MySampleName_1 Drosophila melanogaster normal 2-4 hours egg_laying ...
MySampleName_2 Drosophila melanogaster normal 4-6 hours egg_laying ...
... ... ... ... ... ...
MySampleName_n Drosophila melanogaster normal 10-12 hours egg_laying ...

Additional remarks:

  1. You don’t need to include all Annotation Type, if one doesn’t apply, just leave it out.
  2. Some Annotation Types have default values, but those can change with time. Therefore, it is always better to include them in your file.

Annotation values

Before adding annotation values to an Annotation Type, always check the Annotation Type definition in emBASE. For this go to Biomaterials | Sample annotations page. A Sample Annotation Type can be one of:

  • Integer : the annotation value must be an integer
  • Float : the annotation value must be a float e.g. 1.34
  • Text: the annotation value must be anything i.e. free text
  • Enumeration: the annotation value must be one of the value found in the list

When you annotate your samples using a file, you must follow these rules as well. If a list of values is associated with a type, your value must match with one of those (and this is case-sensitive)!

Additional remarks:

  • Use “NA” for missing values.
  • Do not forget to specify the InitialTimePoint if you filled in the Age annotation
  • If you need to specify a range for e.g. the Age use a dash “-” between the start and stop value of your range e.g. 2-4 hours
  • Use usual acronyms for describing units (time, distance, concentration, mass …). “micro” is usually represented by a lowercase ‘u’ e.g. ‘ul’ for microliter. Please see associated Unit table.
  • Although emBASE correctly stores special characters, please never use greek letters or special characters : these always get lost along the way to data transfer.  

Important: Some Annotation Type propose possible values in the definition field but are defined as free text annotations. This means that you can provide whatever you want as an annotation value but you should better use one of the proposed one if possible (again case sensitive). This will help in the MAGE-TAB document creation and save you time at this point.