Pre-requisites

In order to follow this guide you have to install BioNetDB in your system. Please, please follow the steps on installation guide and set it up.

Download test data

Download the test data from http://bioinfo.hpc.cam.ac.uk/downloads/bionetdb/bionetdb.dataset.tar.gz and extract the content of the archive executing:

Create CSV files

 tar xvfz bionetdb.dataset.tar.gz

The content of the archive is:

/tmp$ tar xvfz bionetdb.dataset.tar.gz 
bionetdb.dataset/
bionetdb.dataset/illumina_platinum.export.5k.json
bionetdb.dataset/mirna.csv
bionetdb.dataset/genes.json.gz
bionetdb.dataset/proteins.json.gz
bionetdb.dataset/illumina_platinum.export.5k.json.meta.json
bionetdb.dataset/Homo_sapiens.owl
bionetdb.dataset/10k.clinvar.json.gz


/tmp$ cd bionetdb.dataset/


/tmp/bionetdb.dataset$ ls -ltrh
total 475M
-rw-rw-r-- 1 jtarraga jtarraga  38M Jun 26 13:39 proteins.json.gz
-rw-rw-r-- 1 jtarraga jtarraga  78M Jun 26 13:39 genes.json.gz
-rw-rw-r-- 1 jtarraga jtarraga 1.2M Jun 26 13:39 mirna.csv
-rw-rw-r-- 1 jtarraga jtarraga  53K Jun 26 13:39 illumina_platinum.export.5k.json.meta.json
-rw-rw-r-- 1 jtarraga jtarraga  56M Jun 26 13:39 illumina_platinum.export.5k.json
-rw-rw-r-- 1 jtarraga jtarraga 215M Jun 26 13:39 Homo_sapiens.owl
-rw-rw-r-- 1 jtarraga jtarraga  89M Jun 26 13:39 10k.clinvar.json.gz

Import genomic data

Before you query BioNetDB database, you have to populate it by importing your data into the Neo4j database. BioNetDB provides a command line interface to import data. First, you prepare your data, and then, you load into the BioNetDB database:

Prepare your data, i.e., transform your genomic data files into Neo4j CSV files:

Create CSV files

./bionetdb.sh import -i <input-directory> -o <output-csv-directory> --create-csv-files

Load the create Neo4j CSV files into the database:

Load CSV files

./bionetdb.sh import -i <csv-directory>

Accesing BioNetDB from Neo4j browser interface

You can access to your BioNetDB database from the Neo4j browser interface. Open your regular internet browser and type http://localhost:7474:

Now that you can access the BioNetDB database, you can start working with your imported data using the Cypher query language. For a Cypher tutorial, please refer to Intro to Cypher by the Neo4j Team.

As examples, here you have some Cypher queries to the BioNetDB data model:

match (n:TRANSCRIPT) return n.id, n.name, n.biotype, n.chromosome, n.start, n.end, n.annotationFlags limit 10

n.id	n.name	n.biotype	n.chromosome	n.start	n.end	n.annotationFlags
"ENST00000553557"	"TSPYL2-003"	"retained_intron"	"X"	"53111549"	"53115595"	"-"
"ENST00000375442"	"TSPYL2-001"	"protein_coding"	"X"	"53111549"	"53117722"	"CCDS;basic"
"ENST00000579390"	"TSPYL2-005"	"protein_coding"	"X"	"53111563"	"53115300"	"mRNA_end_NF;cds_end_NF"
"ENST00000578306"	"TSPYL2-006"	"nonsense_mediated_decay"	"X"	"53112175"	"53115021"	"cds_start_NF;mRNA_start_NF"
"ENST00000556808"	"TSPYL2-004"	"retained_intron"	"X"	"53112305"	"53117721"	"-"
"ENST00000463525"	"TSPYL2-002"	"retained_intron"	"X"	"53113881"	"53115125"	"-"
"ENST00000314888"	"TLN1-001"	"protein_coding"	"9"	"35696945"	"35732392"	"CCDS;basic"
"ENST00000540444"	"TLN1-201"	"protein_coding"	"9"	"35697334"	"35732392"	"basic"
"ENST00000489255"	"TLN1-003"	"processed_transcript"	"9"	"35698041"	"35699325"	"-"
"ENST00000464379"	"TLN1-005"	"processed_transcript"	"9"	"35703556"	"35707871"	"-"

match (n:VARIANT) return count(n)

count(n)
9010279

Table of Contents:

Page tree

Getting started in 5 minutes

Pre-requisites

Download test data

Import genomic data

Accesing BioNetDB from Neo4j browser interface