Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

In construction...!fields=chromosome>>typeOpenCGA, stats refers to the arrangement of search results into categories based on indexed field. Results are presented as a list of buckets, where each bucket is composed of 1) the field value and 2) a numerical count of how many matching documents were found for that field. In literature, this stats concept is known as facet or faceting as well. In fact, OpenCGA stats are based on Solr faceted search. In addition, stats

  • Ranges that allow users to count how many documents are in an interval of a numerical field.
  • Aggregation functions such as average, maximum, minimum, percentiles,...
  • Nested faceted search.

The basic syntax for stats (or facets) is:

Code Block
languagexml
themeRDark
titleRange specification
field_name[value1,value2,value3...]:limit

Parameters:

ParameterDescription

field_name

The field name to produce buckets from. Mandatory.

value1,value2,value3...

They are the values of the field name you want to select counts from. They are optional and have to be enclosed in square brackets.

limit

Number of counts to show, i.e., number of buckets. Optional.


List of facet fields separated by semicolons, e.g.: studies;type. For nested faceted fields use >>, e.g.: chromosome>>type;percentile(gerp)

query

Ranges

When asking for ranges, the result contains multiple buckets over a numeric field. You must specify the field name, the lower and upper bounds and the gap step or bucket size.

Code Block
languagexml
themeRDark
titleRange specification
field_name[start..end]:step

...

Range parameters:

ParameterDescription

field_name

The numeric field name to produce range buckets from. Mandatory

start

Lower bound of the ranges. Mandatory.

end

Upper bound of the ranges. Mandatory.

step

Size of each range bucket produced.

E.g.: gerp[0..5]:0.2


Aggregation functions

Aggregation functions, also called facet functions, analytic functions, or metrics, calculate something interesting over a domain (each facet bucket).

...

Aggregation functionDescriptionExample

avg

Average of numeric values

avg(gerp)

min

Minimum value

min(sift)

max

Maximum value

max(caddScaled)

unique

Number of unique values

unique(biotypes)

hll

Distributed cardinality estimate via hyper-log-log algorithm

hll(type)

percentile

Percentile estimates via t-digest algorithm. Calculate the percentiles: 1, 10, 25, 50, 75, 90 and 99th.

percentile(gerp)

sumsq

Sum of squares of field or function

sumsq(caddRaw)


Nested facets

Nested facets allow users to nest bucketing terms, ranges or aggregations. In order to specify nested facets you must use the symbols: >>

...