Clinical Data in OpenCGA is managed through what we have called Variable Sets and Annotation Sets.
A Variable Set is a free modelled data model. The fields of a Variable Set are explained below:
* Annotable: We consider an entry to be Annotable if the entry can have Annotation Sets. At this stage, only File, Sample, Individual, Cohort and Family are Annotable.
** Confidential: Explained in Sharing and Permissions section ! |
A Variable Set is composed of a set of Variables. A Variable can be understood as a user-defined field that can be of any type (Boolean, String, Integer, Float, Object, List...). The different fields of a Variable are:
* Categorical: A Categorical variable can be understood as an Enum object where the possible values that can be assigned are already known. Example of some categorical Variables are: month, that can only contain values from January to December, gender, that could only contain values from MALE, FEMALE, UNKNOWN; etc.
We are going to create two different Variable Sets, remember that the Variable Sets are defined at study level. The first one will be used to properly identify every single Individual created in OpenCGA. The other one will be used to store some additional metadata from the Samples extracted from the Individuals.
{ "id": "individual_private_details", "unique": true, "confidential": true, "description": "Private details of the individual", "variables": [ { "id": "full_name", "name": "Full name", "category": "Personal", "type": "TEXT", "defaultValue": "", "required": true, "multiValue": false, "allowedValues": [], "rank": 1, "dependsOn": "", "description": "Individual full name", "attributes": {} }, { "id": "age", "name": "Age", "category": "Personal", "type": "INTEGER", "required": true, "multiValue": false, "allowedValues": [ "0:120" ], "rank": 2, "dependsOn": "", "description": "Individual age", "attributes": {} }, { "id": "gender", "name": "Gender", "category": "Personal", "type": "CATEGORICAL", "defaultValue": "UNKNOWN", "required": true, "multiValue": false, "allowedValues": [ "MALE", "FEMALE", "UNKNOWN" ], "rank": 3, "dependsOn": "", "description": "Individual gender", "attributes": {} }, { "id": "hpo", "name": "HPO phenotypes", "category": "Disease", "type": "TEXT", "defaultValue": "", "required": true, "multiValue": true, "allowedValues": [], "rank": 4, "dependsOn": "", "description": "Individual HPO terms", "attributes": {} }, { "id": "address", "name": "Address", "category": "Personal", "type": "OBJECT", "required": false, "multiValue": false, "allowedValues": [], "rank": 5, "dependsOn": "", "description": "Individual country of birth", "attributes": {}, "variableSet": [ { "id": "city", "name": "City", "category": "Personal", "type": "TEXT", "defaultValue": "", "required": false, "multiValue": false, "allowedValues": [], "rank": 1, "dependsOn": "", "description": "Individual city", "attributes": {} }, { "id": "zip", "name": "ZIP code", "category": "Personal", "type": "TEXT", "defaultValue": "UNKNOWN", "required": false, "multiValue": false, "allowedValues": [], "rank": 2, "dependsOn": "city", "description": "ZIP code", "attributes": {} } ] } ] } |
{ "unique": true, "confidential": false, "id": "sample_metadata", "description": "Sample origin", "variables": [ { "id": "tissue", "name": "Tissue", "category": "string", "type": "TEXT", "required": false, "multiValue": false, "allowedValues": [], "rank": 1, "dependsOn": "", "description": "Sample tissue", "attributes": {} }, { "id": "cell_line", "name": "Cell line", "category": "string", "type": "TEXT", "required": false, "multiValue": false, "allowedValues": [], "rank": 2, "dependsOn": "", "description": "Sample cell line", "attributes": {} }, { "id": "cell_type", "name": "Cell type", "category": "string", "type": "TEXT", "required": false, "multiValue": false, "allowedValues": [], "rank": 3, "dependsOn": "", "description": "Sample cell type", "attributes": {} }, { "id": "preparation", "name": "Preparation", "category": "string", "type": "TEXT", "required": false, "multiValue": false, "allowedValues": [], "rank": 4, "dependsOn": "", "description": "Sample preparation", "attributes": {} } ] } |
An Annotation Set is the set of Annotations given for a concrete Annotable entry using a particular Variable Set template. The most important fields of an Annotation Set are:
The Annotations are just key-value objects where each key need to match any of the Variable names defined in the Variable Set, and the values will correspond to the actual Annotation of the Variable.
Every time an annotation is made, OpenCGA will make, at least, the following checks:
An Annotation example for both Variable Sets examples can be found below:
{ "id": "annotation_set_id", "variableSetId": "individual_private_details", "annotations": { "full_name": "John Smith", "age": 60, "gender": "MALE", "hpo": ["HP:0000118", "HP:0000220"] } } |
{ "id": "annotation_set_id", "variableSetId": "sample_metadata", "annotations": { "tissue": " umbilical cord blood", "cell_type": "multipotent progenitor", "preparation": "100 (or less, if 100 were not available) highly purified Haematopoietic stem and progenitor cells..." } } |
Table of Contents: