Clinical data is supported in File, Sample, Cohort, Individual and Family in a field called annotationSets. Any of these entities will be able to perform the same operations described below apart from their own particular features.

In this document, we will be referring to annotationSets and annotations (the field names used in OpenCGA to store any clinical data or any other user-defined free data model).

Clinical data ingestion

Create or remove a whole

In order to add new clinical information to one entity, the user will need to call to the main /update web service of the entity (files/{file}/update, samples/{sample}/update, etc.). These web services accept a list of annotationSets using the format described above. By default, any time the user sends a list of annotationSets without adding any other parameter, those annotationSets containing the clinical information will be added to the already existing annotationSets the entity might contain (if any). However, this behaviour can be altered by changing the value of the annotationSetsAction query parameter. This parameter accepts 3 possible action values:

Updating some values of already stored clinical data

In order to update only a few values (annotations) of an already stored annotationSet, users will need to call to a new web service .../annotationSets/{annotationSetId}/annotations/update present for the different 5 supported entities. This web service accepts a map of key-values that will generally contain the name of the annotation being updated and the new value to be stored. At the moment, there are 5 different actions supported by the action query parameter:


Querying by clinical data

The 5 supported entities mentioned above have their own /search web service. Among all the different unique fields those entities can be queried for, there is an additional annotation field to perform queries over the clinical data stored. There are mainly three different kinds of filtering that can be performed:

Filter by annotation: Users are also allowed to filter by any of the clinical data values. Examples:
Let's imagine that for the above described Individual data model, we want to look for any Individual whose gender has been defined as FEMALE, older than 30 and living in London. To do this query, we would need to write something like:

annotation: individual_private_details:age>30;individual_private_details:gender=FEMALE;individual_private_details:address.city=London
annotation: age>30;gender=FEMALE;address.city=London

The first option to search, though longer, should never fail as long as there exist a variableSet in the study containing the variables that are being queried. Basically, we are telling OpenCGA to look for any Individual matching those values but, at the same time, we are giving OpenCGA information of where the variables the user want to look for have been defined (the variableSet that defines those variables). A general way of seeing this query would have the following format: {variableSetId}:{variable}{operator}{value}, where operator can be any of = or != for any data type, plus >, >=, <, <= for numeric variables.



Querying annotation sets

Creating or modifying annotationSets

Deleting annotationSets

Deleting annotations

GroupBy