View Source

Clinical data is supported in File, Sample, Cohort, Individual and Family in a field called annotationSets. Any of these entities will be able to perform the same operations described below apart from their own particular features.

In this document, we will be referring to annotationSets and annotations (the field names used in OpenCGA to store any clinical data or any other user-defined free data model).

Clinical data ingestion

Create or remove a whole

In order to add new clinical information to one entity, the user will need to call to the main /update web service of the entity (files/{file}/update, samples/{sample}/update, etc.). These web services accept a list of annotationSets using the format described above. By default, any time the user sends a list of annotationSets without adding any other parameter, those annotationSets containing the clinical information will be added to the already existing annotationSets the entity might contain (if any). However, this behaviour can be altered by changing the value of the annotationSetsAction query parameter. This parameter accepts 3 possible action values:

ADD: This is the default behavior, even if the query parameter is not sent. Any annotationSets sent through the main body of the POST operation will be added to the already existing ones (if any).
SET: Replace the current existing annotationSets (if any) of the entity being updated by the ones sent through the main body of the POST operation.
REMOVE: Remove the list of annotationSets sent through the main body of the POST operation if they already exist in the entity being updated. In this case, the only field that is necessary and therefore taken into account, will be the annotationSet id.

Updating some values of already stored clinical data

In order to update only a few values (annotations) of an already stored annotationSet, users will need to call to a new web service .../annotationSets/{annotationSetId}/annotations/update present for the different 5 supported entities. This web service accepts a map of key-values that will generally contain the name of the annotation being updated and the new value to be stored. At the moment, there are 5 different actions supported by the action query parameter:

ADD: Default behavior if the query parameter is not provided. Adds the new value to the annotation. If it already existed, the value will be replaced.
SET: This action might be really harmful. It will set the annotations provided in the body of the POST operation and will remove any other annotations the annotationSet might have had stored.
REPLACE: Replace the value of an already existing annotation. Similar to the ADD action but, in this case, If the annotation did not exist, it will not set the new value !!
REMOVE: Empty the values of some stored annotations. To perform this action, the map of the body will need to contain the key 'remove' and a comma separated list containing the annotations to be removed. Example: {"remove": "member.address,member.age"}
RESET: Reset the values of the annotations defined to their default values defined in the variables of the variableSet. To perform this actiont, he map of the body will need to contain the key 'reset' and a comma separated list containing the annotations to be reset. Example: {"reset": "member.address"}

Querying by clinical data

The 5 supported entities mentioned above have their own /search web service. Among all the different unique fields those entities can be queried for, there is an additional annotation field to perform queries over the clinical data stored. There are mainly three different kinds of filtering that can be performed:

Filter by variableSet: Users might want to filter all the entities that have been annotated (have values) using one user-defined variableSet. * Follow xxx to see the supported operations
Filter by annotationSet: Users might want to filter all the entities that have been annotated (have values) for one particular annotationSet. * Follow xxx to see the supported operations

Filter by annotation: Users are also allowed to filter by any of the clinical data values. Examples:
Let's imagine that for the above described Individual data model, we want to look for any Individual whose gender has been defined as FEMALE, older than 30 and living in London. To do this query, we would need to write something like:

annotation: individual_private_details:age>30;individual_private_details:gender=FEMALE;individual_private_details:address.city=London
annotation: age>30;gender=FEMALE;address.city=London

The first option to search, though longer, should never fail as long as there exist a variableSet in the study containing the variables that are being queried. Basically, we are telling OpenCGA to look for any Individual matching those values but, at the same time, we are giving OpenCGA information of where the variables the user want to look for have been defined (the variableSet that defines those variables). A general way of seeing this query would have the following format: {variableSetId}:{variable}{operator}{value}, where operator can be any of = or != for any data type, plus >, >=, <, <= for numeric variables.

However, OpenCGA also allows performing the query using the shorter way from the first line

All the annotationset webservices have been deprecated.

Querying annotation sets

Querying by annotation sets is only possible through sample|individual|family|cohort/search. variableSet and annotationSetName parametesrs have been deprecated. Instead, all the queries should be done through the annotation query param. The annotation query param will be able to contain a ; separated string following any combination of the following:
- Filtering by an annotation: Considering a.b is the variable of a nested object we want to query for from the variableSet "tumor", it would be supported "a.b=4" or "tumor:a.b=4". As long as the variable is only valid in one of the variable sets defined for the study, the variableSet part can be omitted.
- Filtering by a variableSet:
  - "variableSet=tumor" will return all the objects that have been annotated with that variableSet
  - "variableSet!=tumor" will return all the object that have not been annotated with that variableSet
- Filtering by annotation set name:
  - "annotationSet=pepe" will return all the objects that have an AnnotationSet with the name "pepe".
  - "annotationSet!=pepe" will return all the objects that don't have an AnnotationSet with the name "pepe".
Projections of annotationSets can be done using the typical include/exclude query params. In this case, we have special words to only include/exclude some concrete things:
- Projecting annotations: "annotationSets.annotations.a.b" and "annotation.a.b" will project the result of a.b annotations only !
- Projecting whole AnnotationSets: "annotationSets.name.pepe" or "annotationSet.pepe" will project the result of the whole AnnotationSet with name pepe
- Projecting AnnotationSets from VariableSets: "annotationSets.variableSet.tumor" or "variableSet.tumor" will project all the existing AnnotationSets annotating the VariableSet tumor.
The boolean "flattenAnnotations" will be used to flatten the annotations in one single level (true) or leave it as nested objects (default - false)

Creating or modifying annotationSets

AnnotationSets can be created when the entry that will contain it is being created or by calling to the /entry/{entry}/update webservice
AnnotationSets can be updated by calling the /entry/{entry}/update webservice

Deleting annotationSets

AnnotationSets can be deleted by using the /entry/{entry}/update webservice using the new deleteAnnotationSet parameter

Deleting annotations

Single annotations can be deleted by using the /entry/{entry}/update webservice using the new deleteAnnotation parameter

GroupBy

We can put something like the following the 'fields' field: annotation:29:pedigreeAnnotation:Population to group by Population