In [1]:
%run initialise_pyark.py
POST https://bio-test-cva.gel.zone/cva/api/0/authentication?
Response time : 30 ms
pyark version 4.1.2
In [2]:
# fetch a case
case = next(cases_client.get_cases(program=Program.rare_disease, max_results=1))
GET https://bio-test-cva.gel.zone/cva/api/0/cases?program=rare_disease&include=__all
Response time : 194 ms

Fetching the pedigree of a case

The pedigrees for rare disease cases can be fetched using the case identifier and version. All relevant information in the pedigree is reshaped into the case entity.

In [3]:
ped = cases_client.get_pedigree(
    identifier=case.get('identifier'), version=case.get('version'), as_data_frame=True)
GET https://bio-test-cva.gel.zone/cva/api/0/pedigrees/2168/2?
Response time : 4 ms

Fetching the clinical report of a case

The clinical reports for any case can be fetched using the case identifier and version. All relevant information in the clinical report is reshaped into the case entity and in the report events.

In [4]:
cr = cases_client.get_clinical_report(
    identifier=case.get('identifier'), version=case.get('version'), as_data_frame=True)
GET https://bio-test-cva.gel.zone/cva/api/0/clinical-reports/2168/2?
Response time : 27 ms

Fetching the exit questionnaires of a case

The rare disease exit questionnaires for any case can be fetched using the case identifier and version. All relevant information in the exit questionnaire is reshaped into the case entity and in the report events.

In [5]:
eq = cases_client.get_rd_exit_questionnaire(
    identifier=case.get('identifier'), version=case.get('version'), as_data_frame=True)
GET https://bio-test-cva.gel.zone/cva/api/0/rare-disease-exit-questionnaires/2168/2?
Response time : 17 ms

Other secondary entities

There are secondary entities that are usually used to filter the main entities. CVA provide endpoints to support autocomplete features and to provide some basic summary related to the distrution of these entities across different cohorts of cases.

Some of these are:

  • Panel of genes. A gene panel is a list of genes that is of specific interest for a given condition. Although each family may be analysed against multiple panels, our family analysis are panel-centric.
  • Disorders. The clinical indication reported with the clinical data.
  • Organisations. The owner of a given case.
  • Genes. The mapping between Ensembl identifiers and HGNC gene symbols
  • Phenotypes. The HPO terms including identifier, name and synonyms (TODO)

Panels

Our panels data source is PanelApp. That being said CVA does not query PanelApp or receive data directly from PanelApp, all information it has about panels is aggregated from the data it receives.

In [6]:
# fetch the list of unique panel names
all_panels = entities_client.get_all_panels()
all_panels[0:5]
GET https://bio-test-cva.gel.zone/cva/api/0/panels?consider_versions=False
Response time : 664 ms
Out[6]:
intellectual disability                    intellectual disability
mitochondrial disorders                    mitochondrial disorders
undiagnosed metabolic disorders    undiagnosed metabolic disorders
epileptic encephalopathy                  epileptic encephalopathy
hereditary ataxia                                hereditary ataxia
dtype: object

We can fetch a summary of panels which gives the count of cases where each panel was applied.

In [7]:
# fetch a summary of panels across cases
entities_client.get_panels_summary(as_data_frame=True).head()
GET https://bio-test-cva.gel.zone/cva/api/0/panels?
Response time : 636 ms
Out[7]:
countCases panel.panelIdentifier panel.panelName panel.panelVersion panel.source
0 9779 None intellectual disability None None
1 5999 None mitochondrial disorders None None
2 4963 None undiagnosed metabolic disorders None None
3 2411 None epileptic encephalopathy None None
4 2192 None hereditary ataxia None None

We can fetch the summary of panels on a given cohort of cases. All filters for case cohort selection apply here.

In [8]:
entities_client.get_panels_summary(as_data_frame=True, hasPositiveDx=True).head()
GET https://bio-test-cva.gel.zone/cva/api/0/panels?hasPositiveDx=True
Response time : 184 ms
Out[8]:
countCases panel.panelIdentifier panel.panelName panel.panelVersion panel.source
0 517 None intellectual disability None None
1 259 None mitochondrial disorders None None
2 199 None undiagnosed metabolic disorders None None
3 179 None posterior segment abnormalities None None
4 137 None cystic kidney disease None None

We can disaggregate this information by panel version using the parameter considerVersions=True.

In [9]:
entities_client.get_panels_summary(as_data_frame=True, hasPositiveDx=True, considerVersions=True).head()
GET https://bio-test-cva.gel.zone/cva/api/0/panels?hasPositiveDx=True&considerVersions=True
Response time : 200 ms
Out[9]:
countCases panel.panelIdentifier panel.panelName panel.panelVersion panel.source
0 82 None intellectual disability 1.436 None
1 69 None intellectual disability 1.158 None
2 65 None mitochondrial disorders 1.66 None
3 58 None intellectual disability 1.2 None
4 54 None undiagnosed metabolic disorders 1.72 None

We can search panels by regex.

In [15]:
entities_client.get_panels_by_regex(regex="dystrophy", as_data_frame=True).head()
GET https://bio-test-cva.gel.zone/cva/api/0/panels/search?regex=dystrophy
Response time : 422 ms
Out[15]:
panelIdentifier panelName panelVersion source
0 55b7a65322c1fc05fc7a1869 limb girdle muscular dystrophy 1.0 panelapp
1 None limb girdle muscular dystrophy 1.0 panelapp
2 55b2109c22c1fc7dd7ce411f insulin resistance (including lipodystrophy) 1.9 panelapp
3 5811a8738f620323c5766a2b xeroderma pigmentosum, trichothiodystrophy or ... 1.6 panelApp
4 None limb girdle muscular dystrophy 1.2 PanelApp

Clinical indications

The clinical indications are defined in a hierarchy of three levels: disease group, disease subgroup and specific disease. CVA holds the clinical indications provided with the clinical data of each case. It is slightly different for cancer as the hierarchy only has two levels but they are stored in the same data structure.

We can fetch all disease groups.

In [32]:
entities_client.get_all_disease_groups()[0:5]
GET https://bio-test-cva.gel.zone/cva/api/0/disorders?
Response time : 455 ms
Out[32]:
infantile enterocolitis monogenic inflammatory bowel disease                  infantile enterocolitis monogenic inflammatory...
congenital hypothyroidism or thyroid agenesis dilated cardiomyopathy (dcm)    congenital hypothyroidism or thyroid agenesis ...
intellectual disability                                                                                 intellectual disability
familial cerebral small vessel disease                                                   familial cerebral small vessel disease
familial breast cancer multiple endocrine tumours                             familial breast cancer multiple endocrine tumours
dtype: object

We can get the above over any selected cohort of cases.

In [38]:
entities_client.get_all_disease_groups(filter="countParticipants gt 3")[0:5]
GET https://bio-test-cva.gel.zone/cva/api/0/disorders?filter=countParticipants gt 3
Response time : 186 ms
Out[38]:
cakut                                                                                        cakut
na                                                                                              na
intellectual disability                                                    intellectual disability
congenital hearing impairment (profound/severe)    congenital hearing impairment (profound/severe)
familial thoracic aortic aneurysm disease                familial thoracic aortic aneurysm disease
dtype: object

We can also fetch disease subgroups.

In [40]:
entities_client.get_all_disease_subgroups(filter="countParticipants gt 3")[0:5]
GET https://bio-test-cva.gel.zone/cva/api/0/disorders?filter=countParticipants gt 3
Response time : 183 ms
Out[40]:
motor disorders of the cns      motor disorders of the cns
intellectual disability            intellectual disability
fetal disorders                            fetal disorders
dysmorphic disorders                  dysmorphic disorders
lysosomal storage disorders    lysosomal storage disorders
dtype: object

And specific diseases.

In [39]:
entities_client.get_all_specific_diseases(filter="countParticipants gt 3")[0:5]
GET https://bio-test-cva.gel.zone/cva/api/0/disorders?filter=countParticipants gt 3
Response time : 182 ms
Out[39]:
familial hemifacial microsomia                                                     familial hemifacial microsomia
unexplained monogenic fetal disorders                                       unexplained monogenic fetal disorders
infantile enterocolitis monogenic inflammatory bowel disease    infantile enterocolitis monogenic inflammatory...
non syndromic hypotrichosis                                                           non syndromic hypotrichosis
classical beckwith wiedemann syndrome                                       classical beckwith wiedemann syndrome
dtype: object

We can also fetch summaries of disorders across a given cohort of cases.

In [42]:
entities_client.get_disorders_summary(hasPositiveDx=True, as_data_frame=True).head()
GET https://bio-test-cva.gel.zone/cva/api/0/disorders?hasPositiveDx=True
Response time : 197 ms
Out[42]:
countCases disorder.ageOfOnset disorder.diseaseGroup disorder.diseaseSubGroup disorder.specificDisease
0 265 None neurology and neurodevelopmental disorders neurodevelopmental disorders intellectual disability
1 121 None renal and urinary tract disorders structural renal and urinary tract disease cystic kidney disease
2 83 None None None intellectual disability
3 45 None ophthalmological disorders posterior segment abnormalities rod cone dystrophy
4 43 None None None rod cone dystrophy

To support autocomplete we also support search by regular expressions.

In [46]:
entities_client.get_disorders_by_regex(regex="intellect", as_data_frame=True).head()
GET https://bio-test-cva.gel.zone/cva/api/0/disorders/search?regex=intellect
Response time : 414 ms
Out[46]:
ageOfOnset diseaseGroup diseaseSubGroup specificDisease
0 0.000 endocrine disorders rare subtypes of diabetes intellectual disability
1 0.000 gastroenterological disorders liver disease intellectual disability
2 0.000 gastroenterological disorders gastrointestinal disorders intellectual disability
3 1.600 neurology and neurodevelopmental disorders neurodevelopmental disorders intellectual disability
4 1.200 neurology and neurodevelopmental disorders neurodevelopmental disorders intellectual disability

Organisations

The organisations are the owners of cases. In the 100K Genomes Project these are the different Genomic Medicine Centers (GMCs).

We can obtain a summary of organisations across a cohort of cases.

In [55]:
next(entities_client.get_organisations(hasPositiveDx=True, as_data_frame=True)).head()
GET https://bio-test-cva.gel.zone/cva/api/0/organisations?hasPositiveDx=True
Response time : 195 ms
Out[55]:
countCases organisation.gmc organisation.ods organisation.site
_index
0 183 North Thames RP4 Great Ormond Hospital for Children NHS FT
1 109 Northeast and North Cumbria NCL Pilot: Newcastle
2 89 Greater Manchester RW3 Central Manchester Uni Hospital NHS FT
3 86 Greater Manchester MAN Pilot: Manchester
4 74 Southwest Peninsula RH8 Royal Devon and Exeter NHS FT

Genes

We store the Cellbase annotations for every gene reported in any case in CVA. Within the Cellbase annotations we have the Ensembl gene identifier, the HGNC gene symbold and a number of cross references from several resources (eg: UniProt, InterPro, Gene Ontology, etc.).

We can fetch the distribution of genes affected by a potential LoF variant across cases in any given cohort.

In [18]:
entities_client.get_genes_summary(hasPositiveDx=True, as_data_frame=True).head()
GET https://bio-test-cva.gel.zone/cva/api/0/genes?hasPositiveDx=True
Response time : 192 ms
Out[18]:
countCases ensemblId
0 48 ENSG00000008710
1 13 ENSG00000197912
2 9 ENSG00000123066
3 9 ENSG00000273079
4 9 ENSG00000108821

We can perform search on genes by gene symbol, cross reference or any regex of both. This endpoint may be useful for autocomplete purposes.

In [24]:
entities_client.get_genes(geneSymbols="BRCA2", as_data_frame=True).head()
GET https://bio-test-cva.gel.zone/cva/api/0/genes/search?geneSymbols=BRCA2
Response time : 6 ms
Out[24]:
ensemblId geneSymbol otherIds type
0 ENSG00000139618 BRCA2 [] gene
In [20]:
entities_client.get_genes(geneSymbolRegex="^BRC", as_data_frame=True).head()
GET https://bio-test-cva.gel.zone/cva/api/0/genes/search?geneSymbolRegex=^BRC&hasPositiveDx=True
Response time : 8 ms
Out[20]:
ensemblId geneSymbol otherIds type
0 ENSG00000012048 BRCA1 [] gene
1 ENSG00000139618 BRCA2 [] gene
2 ENSG00000185515 BRCC3 [] gene
3 ENSG00000251667 BRCC3P1 [] gene
In [26]:
entities_client.get_genes(xrefs=["GO:0006351", "GO:0003700"], as_data_frame=True).head()
GET https://bio-test-cva.gel.zone/cva/api/0/genes/search?xrefs=GO:0006351&xrefs=GO:0003700
Response time : 3679 ms
Out[26]:
ensemblId geneSymbol otherIds type
0 ENSG00000174306 ZHX3 [] gene
1 ENSG00000184492 FOXD4L1 [] gene
2 ENSG00000148513 ANKRD30A [] gene
3 ENSG00000102974 CTCF [] gene
4 ENSG00000157554 ERG [] gene

Phenotypes

Phenotypes for rare disease cases are represented in the Human Phenotype Ontology (HPO) terms. The whole HPO dataset is integrated into CVA providing normalisation of terms from older versions and allowing searching of terms by a different set of criteria. HPO terms are enriched with the annotations of their information content, according to the disease annotation provided by HPO.

We can fetch a single phenotype by identifier.

In [48]:
entities_client.get_hpo(identifier="HP:0012345", as_data_frame=True)
GET https://bio-test-cva.gel.zone/cva/api/0/hpos/HP:0012345?
Response time : 2 ms
Out[48]:
alternativeIdentifiers definition diseases hpoBottomUpAccumulatedInformationContent hpoInformationContent hpoTopDownAccumulatedInformationContent identifier internalInformationContent isA isObsolete mainIdentifier name synonyms xrefs
0 [] An anomaly of a glycosylation process, i.e., a... [PMID:26833332, ORPHA:370921, ORPHA:370924] 8.456 10.791 6.153 HP:0012345 None [HP:0011013] False HP:0012345 Abnormal glycosylation [] [UMLS:C4022946]

We can do a semantic search over phenotype names and synonyms.

In [51]:
next(entities_client.get_hpos(search="color blind", as_data_frame=True)).head()
GET https://bio-test-cva.gel.zone/cva/api/0/hpos/search?search=color blind
Response time : 22 ms
Out[51]:
alternativeIdentifiers definition diseases hpoBottomUpAccumulatedInformationContent hpoInformationContent hpoTopDownAccumulatedInformationContent identifier internalInformationContent isA isObsolete mainIdentifier name synonyms xrefs
_index
0 [] Difficulty distinguishing between yellow and b... [OMIM:190900, OMIM:611131, OMIM:125250, OMIM:6... 10.098 10.098 6.703 HP:0000552 None [HP:0011519] False HP:0000552 Tritanomaly [Blue yellow color blindness, Blue-yellow dysc... [MSH:D003117, SNOMEDCT_US:51886007, SNOMEDCT_U...
1 [] An abnormality of the pigmentation of the muco... [ORPHA:2869, ORPHA:2907] 10.504 11.197 6.731 HP:0100669 None [HP:0011830] False HP:0100669 Abnormal pigmentation of the oral mucosa [Abnormal color of the oral mucosa, Abnormal p... [UMLS:C4020959]
2 [] [] Infinity Infinity 6.802 HP:0030584 None [HP:0000551] False HP:0030584 Color vision test abnormality [] [UMLS:C4073057]
3 [] An anomolous earwax color. Earwax (cerumen) is... [] Infinity Infinity 7.521 HP:0030790 None [HP:0030787] False HP:0030790 Abnormal cerumen color [Abnormal cerumen colour, Abnormal cerumen pig... [UMLS:C4280768]
4 [HP:0002214, HP:0002294] A lesser degree of hair pigmentation than woul... [OMIM:601375, OMIM:269000, ORPHA:280651, OMIM:... 9.000 9.000 6.809 HP:0002286 None [HP:0011358] False HP:0002286 Fair hair [Blond hair, Fair hair, Fair hair color, Flaxe... [SNOMEDCT_US:297995004, UMLS:C0239801, UMLS:C1...

Finally, we can also search for phenotypes by cross references.

In [52]:
next(entities_client.get_hpos(xrefs="SNOMEDCT_US:51886007", as_data_frame=True)).head()
GET https://bio-test-cva.gel.zone/cva/api/0/hpos/search?xrefs=SNOMEDCT_US:51886007
Response time : 49 ms
Out[52]:
alternativeIdentifiers definition diseases hpoBottomUpAccumulatedInformationContent hpoInformationContent hpoTopDownAccumulatedInformationContent identifier internalInformationContent isA isObsolete mainIdentifier name synonyms xrefs
_index
0 [] Difficulty distinguishing between yellow and b... [OMIM:190900, OMIM:611131, OMIM:125250, OMIM:6... 10.098 10.098 6.703 HP:0000552 None [HP:0011519] False HP:0000552 Tritanomaly [Blue yellow color blindness, Blue-yellow dysc... [MSH:D003117, SNOMEDCT_US:51886007, SNOMEDCT_U...