Clinical data is supported in File, Sample, Cohort, Individual and Family in a field called annotationSets. Any of these entities will be able to perform the same operations described below apart from their own particular features.

In this document, we will be referring to annotationSets and annotations (the field names used in OpenCGA to store any clinical data or any other user-defined free data model).

Clinical data ingestion

Create or remove a whole

In order to add new clinical information to one entity, the user will need to call to the main /update web service of the entity (files/{file}/update, samples/{sample}/update, etc.). These web services accept a list of annotationSets using the format described above. By default, any time the user sends a list of annotationSets without adding any other parameter, those annotationSets containing the clinical information will be added to the already existing annotationSets the entity might contain (if any). However, this behaviour can be altered by changing the value of the annotationSetsAction query parameter. This parameter accepts 3 possible action values:

Updating some values of already stored clinical data

In order to update only a few values (annotations) of an already stored annotationSet, users will need to call to a new web service .../annotationSets/{annotationSetId}/annotations/update present for the different 5 supported entities. This web service accepts a map of key-values that will generally contain the name of the annotation being updated and the new value to be stored. At the moment, there are 5 different actions supported by the action query parameter:

Querying by clinical data

The 5 supported entities mentioned above have their own /search web service. Among all the different unique fields those entities can be queried for, there is an additional annotation field to perform queries over the clinical data stored. There are mainly three different kinds of filtering that can be performed:

Let's imagine that for the above described Individual data model, we want to look for any Individual whose gender has been defined as FEMALE, older than 30 and living in London. To do this query, we would need to write something like:

1. annotation: individual_private_details:age>30;individual_private_details:gender=FEMALE;individual_private_details:address.city=London
or
2. annotation: age>30;gender=FEMALE;address.city=London

Filtering by variableSet and annotationSet

Filtering by any of these fields can be a bit tricky depending on the amount of annotationSets stored for a particular entry. This can be better explained with the following example. Let's say we have only 4 Individuals stored in OpenCGA, and they contain the following annotationSets, each A, B, C and D corresponding to different variableSets A, B, C and D respectively.

Individual 1 :       {  A, B  }

Individual 2 :       { B }

Individual 3:        { C, D }

Individual 4:        { }

In this case, the operators =, == and != are also supported, though they might give unexpected results to the user. For this reason, we have also added === and !== operators to support any possible query operation. An example containing the results that would be obtained is shown in the table below:

OperatorValue looked forIndividuals returnedExplanation
=, ==B1, 2Fetch all the individuals containing annotationSet or variableSet B
===B2Fetch all the individuals that only contains annotationSet or variableSet B
!=B1, 3, 4Fetch all the individuals that doesn't only contain annotationSet or variableSet B. Individuals containing B plus any other annotationSet or variableSet  will be returned.
!==B3, 4Fetch all the individuals that have never been annotated using annotationSet or variableSet B.


Project the annotation fields to return

Annotations, as well as any other field from the data models can be included or excluded from the final JSON the user will get. However, because annotations contain custom data models that are not completely under OpenCGA's control, a set of reserved prefixes have been defined as explained below:

      include: annotation.full_name,annotation.hpo

            or

      include: annotationSets.annotation.full_name,annotationSets.annotation.hpo
      include: annotationSets.id.B,annotationSets.id.D

            or

      include: annotationSet.B,annotationSet.D
      include: annotationSets.variableSetId.X

            or

      include: variableSet.B

Flatten annotations

Additionally, the different /info and /search web services have a new query parameter called flattenAnnotations. That field is a simple boolean to indicate whether the annotations should be returned flattened or not. Let's imagine we have the following annotationSet:

{
  "id": "annotation_set_id",
  "variableSetId": "individual_private_details",
  "annotations": {
    "full_name": "John Smith",
    "age": 60,
    "gender": "MALE",
    "address": {
		"city": "United States",
        "zip": "99501"
    }
  }
}

The same result with flattenAnnotations set to true would be:

{
  "id": "annotation_set_id",
  "variableSetId": "individual_private_details",
  "annotations": {
    "full_name": "John Smith",
    "age": 60,
    "gender": "MALE",
    "address.city": "United States",
    "address.zip": "99501"
  }
}


GroupBy