Date: Thu, 28 Mar 2024 18:33:20 +0000 (GMT) Message-ID: <1015479881.277.1711650800103@web> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_276_2048870360.1711650800100" ------=_Part_276_2048870360.1711650800100 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
Clinical Data in OpenCGA is managed through what we have called Vari= able Sets and Annotation Sets.
A Variable Set is a free modelle= d data model. The fields of a Variable Set are explained below:
* Annotable: We consider an entry to be Annotable= if the entry can have Annotation Sets. At this sta= ge, only File, Sample, Individual, Cohort and Family are = Annotable.
** Confidential: Explained in Sharing and Permissions section !
A Variable Set is compos= ed of a set of Variables. A Var= iable can be understood as a user-defined field that can be of any typ= e (Boolean, String, Integer, Float, Object, List...). The different fields = of a Variable are:
* Categorical: A Categorical variable can be understoo= d as an Enum object where the possible values that can be assigned are alre= ady known. Example of some categorical Variables are: mo= nth, that can only contain values from January to December, g= ender, that could only contain values from MALE, FEMALE, UNKNOWN;= etc.
We are going to create two different Variable Sets, remember th= at the Variable Sets are defined at study level. The first one wil= l be used to properly identify every single Individual create= d in OpenCGA. The other one will be used to store some additional metadata = from the Samples extracted from the Individuals.
{ "id": "individual_private_details", "unique": true, "confidential": true, "description": "Private details of the individual", "variables": [ { "id": "full_name", "name": "Full name", "category": "Personal", "type": "TEXT", "defaultValue": "", "required": true, "multiValue": false, "allowedValues": [], "rank": 1, "dependsOn": "", "description": "Individual full name", "attributes": {} }, { "id": "age", "name": "Age", "category": "Personal", "type": "INTEGER", "required": true, "multiValue": false, "allowedValues": [ "0:120" ], "rank": 2, "dependsOn": "", "description": "Individual age", "attributes": {} }, { "id": "gender", "name": "Gender", "category": "Personal", "type": "CATEGORICAL", "defaultValue": "UNKNOWN", "required": true, "multiValue": false, "allowedValues": [ "MALE", "FEMALE", "UNKNOWN" ], "rank": 3, "dependsOn": "", "description": "Individual gender", "attributes": {} }, { "id": "hpo", "name": "HPO phenotypes", "category": "Disease", "type": "TEXT", "defaultValue": "", "required": true, "multiValue": true, "allowedValues": [], "rank": 4, "dependsOn": "", "description": "Individual HPO terms", "attributes": {} }, { "id": "address", "name": "Address", "category": "Personal", "type": "OBJECT", "required": false, "multiValue": false, "allowedValues": [], "rank": 5, "dependsOn": "", "description": "Individual country of birth", "attributes": {}, "variableSet": [ { "id": "city", "name": "City", "category": "Personal", "type": "TEXT", "defaultValue": "", "required": false, "multiValue": false, "allowedValues": [], "rank": 1, "dependsOn": "", "description": "Individual city", "attributes": {} }, { "id": "zip", "name": "ZIP code", "category": "Personal", "type": "TEXT", "defaultValue": "UNKNOWN", "required": false, "multiValue": false, "allowedValues": [], "rank": 2, "dependsOn": "city", "description": "ZIP code", "attributes": {} } ] } ] }=20
{ "unique": true, "confidential": false, "id": "sample_metadata", "description": "Sample origin", "variables": [ { "id": "tissue", "name": "Tissue", "category": "string", "type": "TEXT", "required": false, "multiValue": false, "allowedValues": [], "rank": 1, "dependsOn": "", "description": "Sample tissue", "attributes": {} }, { "id": "cell_line", "name": "Cell line", "category": "string", "type": "TEXT", "required": false, "multiValue": false, "allowedValues": [], "rank": 2, "dependsOn": "", "description": "Sample cell line", "attributes": {} }, { "id": "cell_type", "name": "Cell type", "category": "string", "type": "TEXT", "required": false, "multiValue": false, "allowedValues": [], "rank": 3, "dependsOn": "", "description": "Sample cell type", "attributes": {} }, { "id": "preparation", "name": "Preparation", "category": "string", "type": "TEXT", "required": false, "multiValue": false, "allowedValues": [], "rank": 4, "dependsOn": "", "description": "Sample preparation", "attributes": {} } ] }=20
An Annotation Set is the se= t of Annotations gi= ven for a concrete Annotable entry using a particular Var= iable Set template. The most important fields of an Annotatio= n Set are:
The Annotations are just key-value objects where each key = need to match any of the Variable names defined= in the Variable Set, and the values will correspond to the a= ctual Annotation of the Variable.
Every time an annotation is made, OpenCGA will make, at least, the follo= wing checks:
An Annotation example for bot= h Variable Sets examples can be found below:
{ "id": "annotation_set_id", "variableSetId": "individual_private_details", "annotations": { "full_name": "John Smith", "age": 60, "gender": "MALE", "hpo": ["HP:0000118", "HP:0000220"] } }=20
{ "id": "annotation_set_id", "variableSetId": "sample_metadata", "annotations": { "tissue": "=09umbilical cord blood", "cell_type": "multipotent progenitor", "preparation": "100 (or less, if 100 were not available) highly purifie= d Haematopoietic stem and progenitor cells..." } }=20
Table of Contents: