MitoLink is a scalable and modular web-based workflow system developed to study genotype-phenotype correlations in human mitochondrial diseases. MitoLink integrates applications for assessment of genomic variation and currently houses genome-wide datasets from Genome Asia Pilot project, gnomAD, ClinVar and DisGenNet. In this study, a reference list of nearly 3975 proteins (both nuclear and mitochondrial encoded) set is mapped to disease associated variants in the Genome Asia Project and DisGenNet and evaluated for pathogenicity as defined by ClinVar. Observations of shared genetic components in potential comorbidities are discussed from gene-disease network in Asian population, however, the platform is generic and can be applied to any population dataset.
Graphical Abstract
Overall architechture and components of MitoLink: Details on tool and data resources available from the platform. Tool description is also provided in supplementary file. Resources marked with ‘n’ or ‘m’ superscript indicates genome variation dataset for nuclear encoded and mitochondrial encoded genes, respectively. More help on the usage of MitoLink Galaxy can be find here. MitoLink User manual

MitoLink is a unique customized workflow system that allows for systematic storage, extraction, analysis and visualization of genomic variation to understand genotype-phenotype correlations for mitochondrial diseases. Given the modularity of tool and data integration, MitoLink is a scalable system that can accommodate a diverse set of applications linked via standard data structure within the framework of Galaxy. MitoLink is built on FAIR principles and supports creation of reproducible workflows towards understanding genotype-phenotype correlations across several disease phenotypes globally..

ClinVar pathogenic Gene-Disease Network
Pathogenic variants Gene-Disease Network: Gene-disease network for mitochondrial genes with variant information in the GAsP dataset (Click on image for enlarge view).

This network has two types of edges, genes and diseases. There are 259 gene nodes and 488 disease nodes, the edge defined by the information obtained from DisGenNet GDA association score of >0.6. The genes are shown as round green circles, diseases as blue squares. Of 259 genes, 24 have LoF data in gnomAD datasets and these are shown as orange circles. The size of the gene nodes represents the number of genomic variation with ClinVar annotation of pathogenicity and range from 1 to 10. Disease node size is the degree of these nodes, number of genes mapping to them and this ranges from 1 to 6.

Chromosome-wise gene and varaint distribution
Chromosome-wise gene and varaint distribution for Gene-Disease Network: Chromosome-wise distribution of ClinVar pathogenic and likely pathogenic variants in mitochondrial genes in GAsP.

A graph on chromosome-wise distribution of variation in these 259 genes is shown in the figure. It is important to emphasize that these 259 genes have at least one variant annotated as pathogenic or likely pathogenic in ClinVar.

The MitoLink Galaxy workflow system is also available as a standalone VirtualBox virtual machine appliance from the open-access repository ZENODO (URL https://doi.org/10.5281/zenodo.5167938).
