Makes all necessary imports and initialise the Python CVA client

When initialising the client it is handy to set the log level to INFO in order to see the times that each query takes and how the URL is built.

We will be importing some system libraries, the Pandas library which is used by pyark and will be explained later, some entities available in the Genomics England models in the package protocols and finally the pyark client.

In [1]:
import getpass
import logging
import os
import sys
import pandas as pd
from collections import defaultdict, OrderedDict
import pyark
from pyark.cva_client import CvaClient
from protocols.protocol_7_2.reports import Program, Tier, Assembly
from protocols.protocol_7_2.cva import ReportEventType

# sets logging messages so the URLs that are called get printed
logging.basicConfig(level=logging.INFO)

You need three things to initialise pyark: the CVA backend URL, your user name and your password. In this example these are loaded from environment variables.

The client gets a token which will contain your authorisation level. The token renews automatically if necessary. The client will also make retries in case of request failures.

In [2]:
# initialise CVA client and subclients
# every subclient provides access to different sets of data exposed in the API
user = os.environ.get("CVA_USER")
password = os.environ.get("CVA_PASSWORD")
url = os.environ.get("CVA_URL_BASE", "http://localhost:8090")
cva = CvaClient(url_base=url, user=user, password=password)
INFO:root:POST https://bio-test-cva.gel.zone/cva/api/0/authentication?
INFO:root:Response time : 16 ms

Once the token is obtained we will have available a number of different subclients, each of those providing access to a different CVA entity or functionality.

In [3]:
cases_client = cva.cases()
pedigrees_client = cva.pedigrees()
entities_client = cva.entities()
variants_client = cva.variants()
report_events_client = cva.report_events()
transactions_client = cva.transactions()

Check the version of your client as follows.

In [4]:
print("pyark version {}".format(pyark.VERSION))
pyark version 4.0.4

Count number of primary elements in CVA

As the simplest usage example we can count the number of entities in CVA.

In [5]:
# we can count the total number of cases
cases_client.count()
INFO:root:GET https://bio-test-cva.gel.zone/cva/api/0/cases?count=True
INFO:root:Response time : 766 ms
Out[5]:
30601
In [6]:
# or we can count the number of cases given some criteria
cases_client.count(program=Program.rare_disease, panelNames='intellectual disability')
INFO:root:GET https://bio-test-cva.gel.zone/cva/api/0/cases?program=rare_disease&panelNames=intellectual disability&count=True
INFO:root:Response time : 93 ms
Out[6]:
9779
In [7]:
# count the total number of report events
report_events_client.count()
INFO:root:GET https://bio-test-cva.gel.zone/cva/api/0/report-events?count=True
INFO:root:Response time : 9718 ms
Out[7]:
36291592
In [8]:
# count the number of report events given some criteria
report_events_client.count(program=Program.rare_disease, type="questionnaire")
INFO:root:GET https://bio-test-cva.gel.zone/cva/api/0/report-events?program=rare_disease&type=questionnaire&count=True
INFO:root:Response time : 30 ms
Out[8]:
3088
In [9]:
# count the total number of variants
variants_client.count()
INFO:root:GET https://bio-test-cva.gel.zone/cva/api/0/variants?count=True
INFO:root:Response time : 869 ms
Out[9]:
3263870
In [10]:
# count the number of variants given some criteria
variants_client.count(assembly=Assembly.GRCh38, geneSymbols="BRCA2")
INFO:root:GET https://bio-test-cva.gel.zone/cva/api/0/variants?count=True&assembly=GRCh38&geneSymbols=BRCA2
INFO:root:Response time : 32 ms
Out[10]:
863