Fetching secondary entities

1 Fetching the pedigree of a case
2 Fetching the clinical report of a case
3 Fetching the exit questionnaires of a case
4 Other secondary entities
5 Get all data about a given case

%run initialise_pyark.py

POST https://bio-test-cva.gel.zone/cva/api/0/authentication?
Response time : 30 ms

pyark version 4.1.2

# fetch a case
case = next(cases_client.get_cases(program=Program.rare_disease, max_results=1))

GET https://bio-test-cva.gel.zone/cva/api/0/cases?program=rare_disease&include=__all
Response time : 194 ms

Fetching the pedigree of a case¶

The pedigrees for rare disease cases can be fetched using the case identifier and version. All relevant information in the pedigree is reshaped into the case entity.

ped = cases_client.get_pedigree(
    identifier=case.get('identifier'), version=case.get('version'), as_data_frame=True)

GET https://bio-test-cva.gel.zone/cva/api/0/pedigrees/2168/2?
Response time : 4 ms

Fetching the clinical report of a case¶

The clinical reports for any case can be fetched using the case identifier and version. All relevant information in the clinical report is reshaped into the case entity and in the report events.

cr = cases_client.get_clinical_report(
    identifier=case.get('identifier'), version=case.get('version'), as_data_frame=True)

GET https://bio-test-cva.gel.zone/cva/api/0/clinical-reports/2168/2?
Response time : 27 ms

Fetching the exit questionnaires of a case¶

The rare disease exit questionnaires for any case can be fetched using the case identifier and version. All relevant information in the exit questionnaire is reshaped into the case entity and in the report events.

eq = cases_client.get_rd_exit_questionnaire(
    identifier=case.get('identifier'), version=case.get('version'), as_data_frame=True)

GET https://bio-test-cva.gel.zone/cva/api/0/rare-disease-exit-questionnaires/2168/2?
Response time : 17 ms

Other secondary entities¶

There are secondary entities that are usually used to filter the main entities. CVA provide endpoints to support autocomplete features and to provide some basic summary related to the distrution of these entities across different cohorts of cases.

Some of these are:

Panel of genes. A gene panel is a list of genes that is of specific interest for a given condition. Although each family may be analysed against multiple panels, our family analysis are panel-centric.
Disorders. The clinical indication reported with the clinical data.
Organisations. The owner of a given case.
Genes. The mapping between Ensembl identifiers and HGNC gene symbols
Phenotypes. The HPO terms including identifier, name and synonyms (TODO)

Panels¶

Our panels data source is PanelApp. That being said CVA does not query PanelApp or receive data directly from PanelApp, all information it has about panels is aggregated from the data it receives.

# fetch the list of unique panel names
all_panels = entities_client.get_all_panels()
all_panels[0:5]

GET https://bio-test-cva.gel.zone/cva/api/0/panels?consider_versions=False
Response time : 664 ms

intellectual disability                    intellectual disability
mitochondrial disorders                    mitochondrial disorders
undiagnosed metabolic disorders    undiagnosed metabolic disorders
epileptic encephalopathy                  epileptic encephalopathy
hereditary ataxia                                hereditary ataxia
dtype: object

We can fetch a summary of panels which gives the count of cases where each panel was applied.

# fetch a summary of panels across cases
entities_client.get_panels_summary(as_data_frame=True).head()

GET https://bio-test-cva.gel.zone/cva/api/0/panels?
Response time : 636 ms

We can fetch the summary of panels on a given cohort of cases. All filters for case cohort selection apply here.

entities_client.get_panels_summary(as_data_frame=True, hasPositiveDx=True).head()

GET https://bio-test-cva.gel.zone/cva/api/0/panels?hasPositiveDx=True
Response time : 184 ms

We can disaggregate this information by panel version using the parameter considerVersions=True.

entities_client.get_panels_summary(as_data_frame=True, hasPositiveDx=True, considerVersions=True).head()

GET https://bio-test-cva.gel.zone/cva/api/0/panels?hasPositiveDx=True&considerVersions=True
Response time : 200 ms

We can search panels by regex.

entities_client.get_panels_by_regex(regex="dystrophy", as_data_frame=True).head()

GET https://bio-test-cva.gel.zone/cva/api/0/panels/search?regex=dystrophy
Response time : 422 ms

Clinical indications¶

The clinical indications are defined in a hierarchy of three levels: disease group, disease subgroup and specific disease. CVA holds the clinical indications provided with the clinical data of each case. It is slightly different for cancer as the hierarchy only has two levels but they are stored in the same data structure.

We can fetch all disease groups.

entities_client.get_all_disease_groups()[0:5]

GET https://bio-test-cva.gel.zone/cva/api/0/disorders?
Response time : 455 ms

infantile enterocolitis monogenic inflammatory bowel disease                  infantile enterocolitis monogenic inflammatory...
congenital hypothyroidism or thyroid agenesis dilated cardiomyopathy (dcm)    congenital hypothyroidism or thyroid agenesis ...
intellectual disability                                                                                 intellectual disability
familial cerebral small vessel disease                                                   familial cerebral small vessel disease
familial breast cancer multiple endocrine tumours                             familial breast cancer multiple endocrine tumours
dtype: object

We can get the above over any selected cohort of cases.

entities_client.get_all_disease_groups(filter="countParticipants gt 3")[0:5]

GET https://bio-test-cva.gel.zone/cva/api/0/disorders?filter=countParticipants gt 3
Response time : 186 ms

cakut                                                                                        cakut
na                                                                                              na
intellectual disability                                                    intellectual disability
congenital hearing impairment (profound/severe)    congenital hearing impairment (profound/severe)
familial thoracic aortic aneurysm disease                familial thoracic aortic aneurysm disease
dtype: object

We can also fetch disease subgroups.

entities_client.get_all_disease_subgroups(filter="countParticipants gt 3")[0:5]

GET https://bio-test-cva.gel.zone/cva/api/0/disorders?filter=countParticipants gt 3
Response time : 183 ms

motor disorders of the cns      motor disorders of the cns
intellectual disability            intellectual disability
fetal disorders                            fetal disorders
dysmorphic disorders                  dysmorphic disorders
lysosomal storage disorders    lysosomal storage disorders
dtype: object

And specific diseases.

entities_client.get_all_specific_diseases(filter="countParticipants gt 3")[0:5]

GET https://bio-test-cva.gel.zone/cva/api/0/disorders?filter=countParticipants gt 3
Response time : 182 ms

familial hemifacial microsomia                                                     familial hemifacial microsomia
unexplained monogenic fetal disorders                                       unexplained monogenic fetal disorders
infantile enterocolitis monogenic inflammatory bowel disease    infantile enterocolitis monogenic inflammatory...
non syndromic hypotrichosis                                                           non syndromic hypotrichosis
classical beckwith wiedemann syndrome                                       classical beckwith wiedemann syndrome
dtype: object

We can also fetch summaries of disorders across a given cohort of cases.

entities_client.get_disorders_summary(hasPositiveDx=True, as_data_frame=True).head()

GET https://bio-test-cva.gel.zone/cva/api/0/disorders?hasPositiveDx=True
Response time : 197 ms

To support autocomplete we also support search by regular expressions.

entities_client.get_disorders_by_regex(regex="intellect", as_data_frame=True).head()

GET https://bio-test-cva.gel.zone/cva/api/0/disorders/search?regex=intellect
Response time : 414 ms

Organisations¶

The organisations are the owners of cases. In the 100K Genomes Project these are the different Genomic Medicine Centers (GMCs).

We can obtain a summary of organisations across a cohort of cases.

next(entities_client.get_organisations(hasPositiveDx=True, as_data_frame=True)).head()

GET https://bio-test-cva.gel.zone/cva/api/0/organisations?hasPositiveDx=True
Response time : 195 ms

Genes¶

We store the Cellbase annotations for every gene reported in any case in CVA. Within the Cellbase annotations we have the Ensembl gene identifier, the HGNC gene symbold and a number of cross references from several resources (eg: UniProt, InterPro, Gene Ontology, etc.).

We can fetch the distribution of genes affected by a potential LoF variant across cases in any given cohort.

entities_client.get_genes_summary(hasPositiveDx=True, as_data_frame=True).head()

GET https://bio-test-cva.gel.zone/cva/api/0/genes?hasPositiveDx=True
Response time : 192 ms

We can perform search on genes by gene symbol, cross reference or any regex of both. This endpoint may be useful for autocomplete purposes.

entities_client.get_genes(geneSymbols="BRCA2", as_data_frame=True).head()

GET https://bio-test-cva.gel.zone/cva/api/0/genes/search?geneSymbols=BRCA2
Response time : 6 ms

entities_client.get_genes(geneSymbolRegex="^BRC", as_data_frame=True).head()

GET https://bio-test-cva.gel.zone/cva/api/0/genes/search?geneSymbolRegex=^BRC&hasPositiveDx=True
Response time : 8 ms

entities_client.get_genes(xrefs=["GO:0006351", "GO:0003700"], as_data_frame=True).head()

GET https://bio-test-cva.gel.zone/cva/api/0/genes/search?xrefs=GO:0006351&xrefs=GO:0003700
Response time : 3679 ms

Phenotypes¶

Phenotypes for rare disease cases are represented in the Human Phenotype Ontology (HPO) terms. The whole HPO dataset is integrated into CVA providing normalisation of terms from older versions and allowing searching of terms by a different set of criteria. HPO terms are enriched with the annotations of their information content, according to the disease annotation provided by HPO.

We can fetch a single phenotype by identifier.

entities_client.get_hpo(identifier="HP:0012345", as_data_frame=True)

GET https://bio-test-cva.gel.zone/cva/api/0/hpos/HP:0012345?
Response time : 2 ms

We can do a semantic search over phenotype names and synonyms.

next(entities_client.get_hpos(search="color blind", as_data_frame=True)).head()

GET https://bio-test-cva.gel.zone/cva/api/0/hpos/search?search=color blind
Response time : 22 ms

Finally, we can also search for phenotypes by cross references.

next(entities_client.get_hpos(xrefs="SNOMEDCT_US:51886007", as_data_frame=True)).head()

GET https://bio-test-cva.gel.zone/cva/api/0/hpos/search?xrefs=SNOMEDCT_US:51886007
Response time : 49 ms

	countCases	panel.panelIdentifier	panel.panelName	panel.panelVersion	panel.source
0	9779	None	intellectual disability	None	None
1	5999	None	mitochondrial disorders	None	None
2	4963	None	undiagnosed metabolic disorders	None	None
3	2411	None	epileptic encephalopathy	None	None
4	2192	None	hereditary ataxia	None	None

	countCases	panel.panelIdentifier	panel.panelName	panel.panelVersion	panel.source
0	517	None	intellectual disability	None	None
1	259	None	mitochondrial disorders	None	None
2	199	None	undiagnosed metabolic disorders	None	None
3	179	None	posterior segment abnormalities	None	None
4	137	None	cystic kidney disease	None	None

	countCases	panel.panelIdentifier	panel.panelName	panel.panelVersion	panel.source
0	82	None	intellectual disability	1.436	None
1	69	None	intellectual disability	1.158	None
2	65	None	mitochondrial disorders	1.66	None
3	58	None	intellectual disability	1.2	None
4	54	None	undiagnosed metabolic disorders	1.72	None

	panelIdentifier	panelName	panelVersion	source
0	55b7a65322c1fc05fc7a1869	limb girdle muscular dystrophy	1.0	panelapp
1	None	limb girdle muscular dystrophy	1.0	panelapp
2	55b2109c22c1fc7dd7ce411f	insulin resistance (including lipodystrophy)	1.9	panelapp
3	5811a8738f620323c5766a2b	xeroderma pigmentosum, trichothiodystrophy or ...	1.6	panelApp
4	None	limb girdle muscular dystrophy	1.2	PanelApp

	countCases	disorder.ageOfOnset	disorder.diseaseGroup	disorder.diseaseSubGroup	disorder.specificDisease
0	265	None	neurology and neurodevelopmental disorders	neurodevelopmental disorders	intellectual disability
1	121	None	renal and urinary tract disorders	structural renal and urinary tract disease	cystic kidney disease
2	83	None	None	None	intellectual disability
3	45	None	ophthalmological disorders	posterior segment abnormalities	rod cone dystrophy
4	43	None	None	None	rod cone dystrophy

	ageOfOnset	diseaseGroup	diseaseSubGroup	specificDisease
0	0.000	endocrine disorders	rare subtypes of diabetes	intellectual disability
1	0.000	gastroenterological disorders	liver disease	intellectual disability
2	0.000	gastroenterological disorders	gastrointestinal disorders	intellectual disability
3	1.600	neurology and neurodevelopmental disorders	neurodevelopmental disorders	intellectual disability
4	1.200	neurology and neurodevelopmental disorders	neurodevelopmental disorders	intellectual disability

	countCases	organisation.gmc	organisation.ods	organisation.site
_index
0	183	North Thames	RP4	Great Ormond Hospital for Children NHS FT
1	109	Northeast and North Cumbria	NCL	Pilot: Newcastle
2	89	Greater Manchester	RW3	Central Manchester Uni Hospital NHS FT
3	86	Greater Manchester	MAN	Pilot: Manchester
4	74	Southwest Peninsula	RH8	Royal Devon and Exeter NHS FT

	countCases	ensemblId
0	48	ENSG00000008710
1	13	ENSG00000197912
2	9	ENSG00000123066
3	9	ENSG00000273079
4	9	ENSG00000108821

	ensemblId	geneSymbol	otherIds	type
0	ENSG00000012048	BRCA1	[]	gene
1	ENSG00000139618	BRCA2	[]	gene
2	ENSG00000185515	BRCC3	[]	gene
3	ENSG00000251667	BRCC3P1	[]	gene

	ensemblId	geneSymbol	otherIds	type
0	ENSG00000174306	ZHX3	[]	gene
1	ENSG00000184492	FOXD4L1	[]	gene
2	ENSG00000148513	ANKRD30A	[]	gene
3	ENSG00000102974	CTCF	[]	gene
4	ENSG00000157554	ERG	[]	gene

	alternativeIdentifiers	definition	diseases	hpoBottomUpAccumulatedInformationContent	hpoInformationContent	hpoTopDownAccumulatedInformationContent	identifier	internalInformationContent	isA	isObsolete	mainIdentifier	name	synonyms	xrefs
_index
0	[]	Difficulty distinguishing between yellow and b...	[OMIM:190900, OMIM:611131, OMIM:125250, OMIM:6...	10.098	10.098	6.703	HP:0000552	None	[HP:0011519]	False	HP:0000552	Tritanomaly	[Blue yellow color blindness, Blue-yellow dysc...	[MSH:D003117, SNOMEDCT_US:51886007, SNOMEDCT_U...
1	[]	An abnormality of the pigmentation of the muco...	[ORPHA:2869, ORPHA:2907]	10.504	11.197	6.731	HP:0100669	None	[HP:0011830]	False	HP:0100669	Abnormal pigmentation of the oral mucosa	[Abnormal color of the oral mucosa, Abnormal p...	[UMLS:C4020959]
2	[]		[]	Infinity	Infinity	6.802	HP:0030584	None	[HP:0000551]	False	HP:0030584	Color vision test abnormality	[]	[UMLS:C4073057]
3	[]	An anomolous earwax color. Earwax (cerumen) is...	[]	Infinity	Infinity	7.521	HP:0030790	None	[HP:0030787]	False	HP:0030790	Abnormal cerumen color	[Abnormal cerumen colour, Abnormal cerumen pig...	[UMLS:C4280768]
4	[HP:0002214, HP:0002294]	A lesser degree of hair pigmentation than woul...	[OMIM:601375, OMIM:269000, ORPHA:280651, OMIM:...	9.000	9.000	6.809	HP:0002286	None	[HP:0011358]	False	HP:0002286	Fair hair	[Blond hair, Fair hair, Fair hair color, Flaxe...	[SNOMEDCT_US:297995004, UMLS:C0239801, UMLS:C1...