Information

This page provides a variant-centered integrative annotation framework, offering a comprehensive view of a single genetic variant by combining population genetics, functional consequence annotation, regulatory effect prediction, association evidence, and molecular impact modeling. By integrating both traditional annotation methods and modern deep learning–based predictions, this module supports systematic interpretation of variant effects at the DNA, RNA, and protein levels.

Annotation Framework: Variant annotations in this module are organized into two complementary layers: (i) classical coding variant annotations, which focus on protein-coding consequences, and (ii) deep learning–based predictive models, which extend interpretation to regulatory, splicing, and protein-level effects.

Classical Coding Variant Annotations: Classical annotation methods primarily target variants located in protein-coding regions and infer functional consequences based on gene models, evolutionary conservation, and protein sequence or structural features.

  • SnpEff (Cingolani et al., 2012): annotates variants by mapping them to gene models and transcript structures, classifying consequences such as synonymous, missense, or stop-gain mutations and assigning impact levels.
  • Coovar (Vergara et al., 2012): provides complementary classification of coding variants by evaluating codon changes and their potential effects on protein sequences.
  • Sift (Ng and Henikoff et al., 2003): predicts whether amino acid substitutions are likely to affect protein function based on evolutionary conservation across homologous sequences.
  • PolyPhen-2 (Adzhubei et al., 2010): evaluates the potential impact of amino acid substitutions by integrating sequence conservation, structural context, and physicochemical properties.

Deep Learning–based Variant Effect Predictions: To extend variant interpretation beyond classical coding annotations, this module integrates deep learning models trained on large-scale functional genomics and molecular datasets.

Regulatory Effect Prediction (Non-coding Variants): Regulatory effects of non-coding variants are predicted using the Basenji (Kelley et al., 2018), a deep learning sequence-to-signal model trained to predict chromatin accessibility from genomic sequence. For each variant, the reported score represents the predicted change in local chromatin accessibility within a ±1 kb window centered on the variant, calculated as ΔPCA = PCAalt - PCAref , where PCA denotes Predicted Chromatin Accessibility. Positive values indicate increased chromatin accessibility associated with the alternative allele, whereas negative values indicate reduced accessibility relative to the reference allele. Larger absolute values correspond to stronger predicted regulatory effects.

RNA Splicing Impact Prediction: Potential effects of variants on RNA splicing are predicted using the DRANetSplicer (liu et al., 2024):

Protein-level Impact Prediction:

  • Subcellular localization effects are predicted using DeepLoc (Vineet et al., 2022):, assessing whether variants may alter protein targeting or intracellular distribution.
  • Protein stability and folding effects are evaluated using DDGUN (Ludovica et al., 2022):, PROSTATA (Dmitriy et al., 2023):, and THPLM (Gong et al., 2023):, which estimate variant-induced changes in protein stability.

GWAS Associations and External Evidence

  • Trait associations: GWAS results linking the variant to agronomic or metabolic traits are summarized, including associated populations and p-values.
  • Lead Variant identification: Variants are flagged when they represent the strongest association signal within a GWAS locus.
  • Literature evidence: Links to relevant publications are provided to support reported associations.
TOC

Navigation