One of the first steps on clinical trials is the selection of the candidate individuals for the study. The rules that an individual must match in order to be elected as a candidate usually involve complex queries combining clinical metadata and variants data, expressed as a list of filters with intersections (AND), unions (OR) and complementary (NOT).
A complex query can be defined as a tree where each node is either a query or the intersection or union of a list of queries
To resolve the query, the tree has to be explored, resolve the leaves from the tree, and then join the results.
Standard variant query. Can be combined with sample and individual catalog query prepending the query params with sample.
or individual.
respectively.
{ "type" : QUERY, "query" : { "<query param 1>" : "<value>", "<query param 2>" : "<value>", ... "<query param n>" : "<value>", } } |
{ "type" : COMPLEMENT "nodes" : [ <node> ] } |
{ "type" : UNION "nodes" : [ <node1>, <node2>, ... ] } |
{ "type" : INTERSECTION "nodes" : [ <node1>, <node2>, ... ] } |
Get all samples that have (a lof
mutation in gene A and with HPO X) or ((a missense_variant mutation on gene B) and (a missense_variant on gene C) but not (a missense_variant on gene D))
This can be divided into 3 queries:
These queries are combined with AND/OR like this:
Q1 OR (Q2 AND Q3 AND (NOT Q4))
This can be expressed in a JSON query like this:
{ "type" : UNION "nodes" : [ { "type" : QUERY, "query" : Q1 } , { "type" : INTERSECTION "nodes" : [ { "type" : QUERY "query" : Q2 }, { { "type" : QUERY "query" : Q3 }, { "type" : COMPLEMENT "nodes" : [{ "type" : QUERY "query" : Q4 }] }, ] } ] } |
Implementation details at opencga#1474.
Table of Contents: