The Clinical Variant Ark (CVA) https://github.com/genomicsengland/clinical_variant_ark is a knowledge base for clinically relevant variants and their association to phenotypes, with fine grained detail for all the stages of interpretation, from automated variant prioritisation to manual classification. CVA holds the interpretation results of the 100K Genomes Project and in the future it will hold the results of the National Genomic Informatics System (NGIS).
The motivation of such a knowledge base are many:
The above questions are answered by providing the following features:
The CVA ecosystem is composed of a Java backend, backed by a MongoDB database, exposing a REST API. A user can access CVA in three ways: 1) directly through the REST API, 2) using the CVA portal from a browser or 3) using the Python client, pyark. The rest of this documentation will be focused on accessing CVA using pyark, although it may be a good documentation to understand what can be done using the REST API.
CVA is a component in Genomics England interpretation platform. The components within this interpretation platform share a lingua franca described in the Genomics England data models, https://gelreportmodels.genomicsengland.co.uk/. This will be a useful reference manual for the data that is served by CVA.
CVA relies on three external systems:
The interpretation API sends to CVA 4 pieces of data:
Pyark is a python client to the Clinical Variant Ark (CVA) REST API. The aim of pyark is to provide easy and flexible access to the CVA dataset and enable an analytical framework. While the CVA portal aims to cover a particular use case which is helping solve cases by using existing knowledge in CVA, the aim of pyark is less precise and as such this guide will not cover everything that could be done using pyark.
For a complete guide of CVA REST API please refer to the swagger documentation available in http://your.cva.server/cva/docs.