Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Clinical data is supported in File, Sample, Cohort, Individual and Family in a field called annotationSets. Any of these entities will be able to perform the same operations described below apart from their own particular features.

In this document, we will be referring to annotationSets and annotations (the field names used in OpenCGA to store any clinical data or any other user-defined free data model).

...

  • ADD: Default behavior if the query parameter is not provided. Adds the new value to the annotation. If it already existed, the value will be replaced.
  • SET: This action might be really harmful. It will set the annotations provided in the body of the POST operation and will remove any other annotations the annotationSet might have had stored.
  • REPLACE: Replace the value of an already existing annotation. Similar to the ADD action but, in this case, If the annotation did not exist, it will not set the new value !!
  • REMOVE: Empty the values of some stored annotations. To perform this action, the map of the body will need to contain the key 'remove' and a comma separated list containing the annotations to be removed. Example: {"remove": "member.address,member.age"}

  • RESET: Reset the values of the annotations defined to their default values defined in the variables of the variableSet. To perform this actiont, he map of the body will need to contain the key 'reset' and a comma separated list containing the annotations to be reset. Example: {"reset": "member.address"}

Querying by clinical data

...

  • Filter by variableSet: Users might want to filter all the entities that have been annotated (have values) using one user-defined variableSet. * Follow xxx Filtering by variableSet and annotationSet section defined below to see the supported operations.
  • Filter by annotationSet: Users might want to filter all the entities that have been annotated (have values) for one particular annotationSet. * Follow xxx  Follow Filtering by variableSet and annotationSet section defined below to see the supported operations.
  • Filterby annotation: Users are also allowed to filter by any of the clinical data values.

...

  • Example:

Let's imagine that for the above described Individual data model, we want to look for any Individual whose gender has been defined as FEMALE, older than 30 and living in London. To do this query, we would need to write something like:

...

nopaneltrue
1. annotation: individual_private_details:age>30;individual_private_details:gender=FEMALE;individual_private_details:address.city=London
or
2. annotation: age>30;gender=FEMALE;address.city=London
  • The first option to search, though longer, should never fail as long as there exist a variableSet in the study containing the variables that are being queried. Basically, we are telling OpenCGA to look for any Individual matching those values but, at the same time, we are giving OpenCGA information of where the variables the user want to look for have been defined (the variableSet that defines those variables). A general way of seeing this query would have the following format: [[{annotationSetId}@]{variableSetId}:]{variable}{operator}{value}, where operator can be any of =, == or != for any data type, plus >, >=, <, <= for numeric variables.

  •  
  • However, OpenCGA also allows performing the query using the shorter way

  • from
  • as seen in the

  • first line
  • All the annotationset webservices have been deprecated.

Querying annotation sets

  • Querying by annotation sets is only possible through sample|individual|family|cohort/search. variableSet and annotationSetName parametesrs have been deprecated. Instead, all the queries should be done through the annotation query param. The annotation query param will be able to contain a ; separated string following any combination of the following:
    • Filtering by an annotation: Considering a.b is the variable of a nested object we want to query for from the variableSet "tumor", it would be supported "a.b=4" or "tumor:a.b=4". As long as the variable is only valid in one of the variable sets defined for the study, the variableSet part can be omitted. 
    • Filtering by a variableSet: 
      • "variableSet=tumor" will return all the objects that have been annotated with that variableSet
      • "variableSet!=tumor" will return all the object that have not been annotated with that variableSet
    • Filtering by annotation set name:
      • "annotationSet=pepe" will return all the objects that have an AnnotationSet with the name "pepe".
      • "annotationSet!=pepe" will return all the objects that don't have an AnnotationSet with the name "pepe".
  • Projections of annotationSets can be done using the typical include/exclude query params. In this case, we have special words to only include/exclude some concrete things:
    • Projecting annotations: "annotationSets.annotations.a.b" and "annotation.a.b" will project the result of a.b annotations only !
    • Projecting whole AnnotationSets: "annotationSets.name.pepe" or "annotationSet.pepe" will project the result of the whole AnnotationSet with name pepe
    • Projecting AnnotationSets from VariableSets: "annotationSets.variableSet.tumor" or "variableSet.tumor" will project all the existing AnnotationSets annotating the VariableSet tumor.
  • The boolean "flattenAnnotations" will be used to flatten the annotations in one single level (true) or leave it as nested objects (default - false)

Creating or modifying annotationSets

  • AnnotationSets can be created when the entry that will contain it is being created or by calling to the /entry/{entry}/update webservice
  • AnnotationSets can be updated by calling the /entry/{entry}/update webservice

Deleting annotationSets

  • AnnotationSets can be deleted by using the /entry/{entry}/update webservice using the new deleteAnnotationSet parameter

Deleting annotations

...

  • second line in which users can omit specifying the variableSet where the variables were defined. In this case, OpenCGA will look for all the VariableSets that might have defined these variables and, as long as those variables have only been defined in one VariableSet, the query will be performed. Otherwise, OpenCGA will raise an error because it will not know the real scope of the query.

Anchor
variableSetQueries
variableSetQueries

Filtering by variableSet and annotationSet

Filtering by any of these fields can be a bit tricky depending on the amount of annotationSets stored for a particular entry. This can be better explained with the following example. Let's say we have only 4 Individuals stored in OpenCGA, and they contain the following annotationSets, each A, B, C and D corresponding to different variableSets A, B, C and D respectively.

Individual 1 :       {  A, B  }

Individual 2 :       { B }

Individual 3:        { C, D }

Individual 4:        { }

In this case, the operators =, == and != are also supported, though they might give unexpected results to the user. For this reason, we have also added === and !== operators to support any possible query operation. An example containing the results that would be obtained is shown in the table below:

OperatorValue looked forIndividuals returnedExplanation
=, ==B1, 2Fetch all the individuals containing annotationSet or variableSet B
===B2Fetch all the individuals that only contains annotationSet or variableSet B
!=B1, 3, 4Fetch all the individuals that doesn't only contain annotationSet or variableSet B. Individuals containing B plus any other annotationSet or variableSet  will be returned.
!==B3, 4Fetch all the individuals that have never been annotated using annotationSet or variableSet B.


Project the annotation fields to return

Annotations, as well as any other field from the data models can be included or excluded from the final JSON the user will get. However, because annotations contain custom data models that are not completely under OpenCGA's control, a set of reserved prefixes have been defined as explained below:

  • Include/exclude specific annotations: If we need to project some specific annotations only, users will need to add the prefixes "annotationSets.annotations" or "annotation" to the field to be projected. Example: If after running a query we only want to include the full_name and the hpo variables defined in the Individual VariableSet, users will need to write

      include: annotation.full_name,annotation.hpo

            or

      include: annotationSets.annotation.full_name,annotationSets.annotation.hpo
  • Include/exclude specific annotationSets: Let's imagine that we have several annotationSets defined such as in the examples of Individual1 and Individual3. If we only want to project the annotations of one specific annotationSet, users will need to use the prefixes "annotationSets.id" or "annotationSet" to the annotationSet id to be projected. Example: To include only the annotations from the annotationSets B and D, we will need to write:
      include: annotationSets.id.B,annotationSets.id.D

            or

      include: annotationSet.B,annotationSet.D
  • Include/exclude specific variableSets: Let's say that for some entries the user have created several annotationSets using the same variableSet and the user wants to fetch only those instead of getting other annotationSets. To do so, users will need to use the prefixes "annotationSets.variableSetId" or "variableSet". Example: Let's imagine that we have another Individual that contains 2 annotationSets (a and b) using the template defined in the variableSet X and another annotationSet (c) annotating the variableSet Y. If the user is only interested in getting the annotationSets "a" and "b", we will need to write:
      include: annotationSets.variableSetId.X

            or

      include: variableSet.B

Flatten annotations

Additionally, the different /info and /search web services have a new query parameter called flattenAnnotations. That field is a simple boolean to indicate whether the annotations should be returned flattened or not. Let's imagine we have the following annotationSet:

Code Block
languagejs
{
  "id": "annotation_set_id",
  "variableSetId": "individual_private_details",
  "annotations": {
    "full_name": "John Smith",
    "age": 60,
    "gender": "MALE",
    "address": {
		"city": "United States",
        "zip": "99501"
    }
  }
}

The same result with flattenAnnotations set to true would be:

Code Block
languagejs
{
  "id": "annotation_set_id",
  "variableSetId": "individual_private_details",
  "annotations": {
    "full_name": "John Smith",
    "age": 60,
    "gender": "MALE",
    "address.city": "United States",
    "address.zip": "99501"
  }
}


GroupBy

  • We can put something like the following the 'fields' field: annotation:29:pedigreeAnnotation:Population to group by Population

...