
Citation: Alaa Fathalla, Amal Abd el-mageed. Salt tolerance enhancement Of wheat (Triticum Asativium L) genotypes by selected plant growth promoting bacteria[J]. AIMS Microbiology, 2020, 6(3): 250-271. doi: 10.3934/microbiol.2020016
[1] | Shusen Yan, Weilin Yu . Planar vortices in a bounded domain with a hole. Electronic Research Archive, 2021, 29(6): 4229-4241. doi: 10.3934/era.2021081 |
[2] | Yiyuan Qian, Haiming Song, Xiaoshen Wang, Kai Zhang . Primal-dual active-set method for solving the unilateral pricing problem of American better-of options on two assets. Electronic Research Archive, 2022, 30(1): 90-115. doi: 10.3934/era.2022005 |
[3] | Abdeljabbar Ghanmi, Hadeel Z. Alzumi, Noureddine Zeddini . A sub-super solution method to continuous weak solutions for a semilinear elliptic boundary value problems on bounded and unbounded domains. Electronic Research Archive, 2024, 32(6): 3742-3757. doi: 10.3934/era.2024170 |
[4] | Yijun Chen, Yaning Xie . A kernel-free boundary integral method for reaction-diffusion equations. Electronic Research Archive, 2025, 33(2): 556-581. doi: 10.3934/era.2025026 |
[5] | Margarida Camarinha . A natural 4th-order generalization of the geodesic problem. Electronic Research Archive, 2024, 32(5): 3396-3412. doi: 10.3934/era.2024157 |
[6] | Matthew Gardner, Adam Larios, Leo G. Rebholz, Duygu Vargun, Camille Zerfas . Continuous data assimilation applied to a velocity-vorticity formulation of the 2D Navier-Stokes equations. Electronic Research Archive, 2021, 29(3): 2223-2247. doi: 10.3934/era.2020113 |
[7] | Qingming Hao, Wei Chen, Zhigang Pan, Chao Zhu, Yanhua Wang . Steady-state bifurcation and regularity of nonlinear Burgers equation with mean value constraint. Electronic Research Archive, 2025, 33(5): 2972-2988. doi: 10.3934/era.2025130 |
[8] | Mingyou Zhang, Qingsong Zhao, Yu Liu, Wenke Li . Finite time blow-up and global existence of solutions for semilinear parabolic equations with nonlinear dynamical boundary condition. Electronic Research Archive, 2020, 28(1): 369-381. doi: 10.3934/era.2020021 |
[9] | Massimo Grossi . On the number of critical points of solutions of semilinear elliptic equations. Electronic Research Archive, 2021, 29(6): 4215-4228. doi: 10.3934/era.2021080 |
[10] | Yuwei Hu, Jun Zheng . Local porosity of the free boundary in a minimum problem. Electronic Research Archive, 2023, 31(9): 5457-5465. doi: 10.3934/era.2023277 |
G-quadruplex or G-tetrad (G4), is a thermodynamically stable structural element that is formed between clusters/stretches/tracts of Guanine (G) residues (|x|≥3) and is intra- or inter-molecular [1,2,3]. The intervening loops whence applicable are composed of one or more nucleotide(s) (N∈{A, U, T, G, C}) (Figure 1). G4 is found in DNA (telomeres, double-strand break sites, transcription start sites) and in the untranslated region(s) (5'-, 3'-UTR, introns) of mRNA [4,5]. In vivo, G4 may function to preserve the telomeric ends of chromosomes, repress or promote transcription and regulate translation [4,5]. The generic representation of an intra-strand G4 may be described as follows:
(((Gt,k)t≥3(Nh,k)h≥1)k=3((Gt,k)t≥3)k=1)m=1 | (Def. 1) |
𝑡 ≔ 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐺𝑢𝑎𝑛𝑖𝑛𝑒𝑠 𝑝𝑒𝑟 𝐺 − 𝑟𝑖𝑐ℎ 𝑐𝑙𝑢𝑠𝑡𝑒𝑟
ℎ ≔ 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑙𝑜𝑜𝑝 − 𝑓𝑜𝑟𝑚𝑖𝑛𝑔 𝑔𝑒𝑛𝑒𝑟𝑖𝑐 𝑖𝑛𝑡𝑒𝑟𝑣𝑒𝑛𝑖𝑛𝑔 𝑛𝑢𝑐𝑙𝑒𝑜𝑡𝑖𝑑𝑒𝑠
𝑘 ≔ 𝐶𝑙𝑢𝑠𝑡𝑒𝑟 𝑖𝑛𝑑𝑒𝑥
𝑚 ≔ 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑟𝑎𝑛𝑑𝑠
𝐺 ≔ 𝐺𝑢𝑎𝑛𝑖𝑛𝑒
𝐴 ≔ 𝐴𝑑𝑒𝑛𝑖𝑛𝑒
𝑇 ≔ 𝑇ℎ𝑦𝑚𝑖𝑛𝑒
𝐶 ≔ 𝐶𝑦𝑡𝑜𝑠𝑖𝑛𝑒
𝑁 ≔ 𝐴𝑛𝑦 𝑛𝑢𝑐𝑙𝑒𝑜𝑡𝑖𝑑𝑒
The high melting temperature (Tm~600C) of G4 implies that the mature quadruplex is stable and refractory to unfolding. This is partly due to stabilizing Hoogsteen (N7gu1-N2gu2; O6gu1-N1gu2) and reverse Hoogsteen (N7gu1-N1gu2; O6gu1-N2gu2) hydrogen bonding as well as π-orbital stacking between the purine rings of non-contiguous guanine pairs (gu1, gu2) (Figure 1) [6,7]. Additionally, the presence of Adenine residues in the intervening loops, variable loop length (h~1-30 Mer) and permutation have all been shown to contribute to the stability and thence persistence of the mature quadruplex [8,9,10,11].
Tm∝(#Adenine/h)=τ.(#Adenine/h)Tm:=Melting temperatureτ:=constant of proportionalityh:=Length of intervening loops | (1) |
Despite the wide range of methods available that can predict G4 formation in DNA/RNA, there is poor agreement between sequence-based motif locators and empirically derived biophysical data [12,13,14,15]. Motif-independent methods such as those that directly measure the GC-content or the GC-/AT-skew of a query sequence and utilize this data to train machine learning algorithms may address some of these discrepancies [16,17,18].
Investigations into transcribed RNA suggests that secondary and tertiary forms (5'- and 3'-UTRs) may not only coexist with stretches of unfolded ribonucleotides, but can also be read by the ribosomal machinery. Non-canonical translation is described as: a) translation from atypical start sites AUG→{CUG, GUG} or b) peptides (≤100 aa) of short open reading frames (sORF)-encoded polypeptides (SEPs) and upstream open reading frames (uORFs) [19,20,21,22]. The latter are rarely silent and can function as modulators of metabolism (S-Adenosylmethionine decarboxylase, AMD1) or transcription (activating transcription factor, ATF4, H19; yeast AP-1 like, YAP1) and as generic transcription factors (general control protein, GCN4) [19]. G4 has also been observed in one or more exons of the prion protein (PRNP, exon 2), zinc finger protein (ZNF669, exon 1), β-amyloid secretase (BACE1, exon 3) and the estrogen receptor 1 (ESR1, exon 4) among several others [16,23,24,25,26,27,28,29].
Whilst the presence of segments of folded mRNA may have a significant influence on the yield of the protein product(s), the effect on sequence whence part of the protein coding segment (PCS) is largely unknown [4,5,30,31,32,33]. Proteopathies are diseases that result directly from agammaegates of truncated and misfolded proteins. These may occur secondary to a faulty translation machinery such as a ribosome that has stalled on encountering a secondary or tertiay folded mRNA sub segment. Recent data suggests ~45% of the human genome may code for proteins that are either intrinsically disordered (IDPs) or comprise one or more sub-segments that are disordered (IDRs) [34]. The absence of delineable structural features notwithstanding, disordered regions are characterized by short linear motifs (SLiMS) and/or molecular recognition features (MoRFs) [34,35]. The improper folding and heightened degradation rates could lead to perturbed proteostasis and thence contribute to the pathogenesis of proteopathies [34,35]. Primary proteopathies are likely to result directly from mutations (point, chromosomal translocations) in the PCS of a gene. These include sickle cell disease (βE6→V6-mediated defective polymerization), amylin-based type Ⅱ Diabetes Mellitus, Cystic Fibrosis (cystic fibrosis transmembrane conductance regulator), Alzheimer's disease (Amyloid β-peptide) and Parkinson's disease (α-synuclein) [36,37]. Secondary proteopathies, in contrast, result from motif or molecular mimicry of a host protein(s) by a pathogen. These are further classified into acute and chronic variants depending on the onset, genesis and/or resolution of the resultant infection or infestation [34,35].
G4 is known to stall the ribosome during translation and the resultant protein is truncated and/or degraded at an accelerated rate. The manuscript subsumes ribosomal read-through of mRNA with a G-quadruplex and assesses influence of the translated product to proteostasis. Here, I present a mathematical model of a short G4 (20–60 Mer) in the PCS, i.e., translatable G-quadruplex (TG4), in the mRNA of a hypothetical gene. The mapping uses several novel indices to annotate, classify and select suitable Guanine-containing codons (α) and amino acids (β). A generic algorithm then computes and validates, as proof-of-principle, possible peptides (pTG4ij) that correspond to the modeled TG4 (pTG4ij∈PTG4~TG4). Co-occurrence, homology and the distribution of overlapping/shared amino acids between PTG4 and the disorder promoting SLiMS are used to infer probable mechanisms of TG4~PTG4 facilitated misfolding. Standard bioinformatics indices (accuracy, precision, recall, p-value) are used to arrive at these conclusions.
The objective of this investigation is to model a short G4 in an arbitrary PCS (TG4) which when translated will result in a set of peptides (PTG4) with an average length that is less than 100 amino acids. The hypothesis explored in this manuscript is that in the event of a ribosomal read-through, the translated mRNA, with its G4 will result in a modified protein product. This protein will then exhibit considerable propensity to misfold on account of the presence of one or more members of the PTG4.
2.1.1 Model of a translatable G-quadruplex (TG4)
SEPs-derived peptides with the lowest molecular weight (~2.5 KDa) and with lengths varying from ~7–20 aa were identified and used to define the boundaries of the peptides that comprise PTG4 [20,21]. The TG4 (m = 1) is therefore, modeled as an intra-strand sub sequence of the mRNA of a hypothetical gene and has a length of ~20–60 Mer. This is represented (with symbols and variable names as explained in Def. 1) as follows:
TG4:=(((Gt,k)3≤t≤9(Nh,k)2≤h≤7)k=3((Gt,k)3≤t≤9)k=1)m=1 | (Def. 2) |
Since the Guanine-rich clusters and loops are contiguous, the aforementioned model (Def.2) of the TG4 may be approximated with a sequence of codons and is as under:
TG4:=(CODq)Lq∈N|COD∈COD | (Def. 3) |
The algorithm to compute L, which is the number of codons needed to model TG4 is presented and is as follows:
1:N←{u∈[20,60)}2:r←N mod 33:e←N−((N mod 3)/3)4:If e<(⌊e⌋+⌈e⌉)/2 then5:L=⌈e/3⌉6:else If e≥(⌊e⌋+⌈e⌉)/2 then7:L=⌊e/3⌋8:end If |
𝑁 ≔ 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑖𝑏𝑜𝑛𝑢𝑐𝑙𝑒𝑜𝑡𝑖𝑑𝑒𝑠 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑡𝑜 𝑚𝑜𝑑𝑒𝑙 𝑇𝐺4
𝐿 : = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑑𝑜𝑛𝑠 𝑛𝑒𝑒𝑑𝑒𝑑 𝑡𝑜 𝑚𝑜𝑑𝑒𝑙 𝑇𝐺4 (7 ≤ 𝐿 < 21)
𝑞 ≔ 𝑞𝑡ℎ 𝑐𝑜𝑑𝑜𝑛
𝑟 ≔ 𝑅𝑒𝑚𝑎𝑖𝑛𝑑𝑒𝑟 = {0, 1, 2}
𝑒, 𝑢 : = 𝐺𝑒𝑛𝑒𝑟𝑖𝑐 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠
𝑪𝑶𝑫 ≔ 𝑆𝑒𝑡 𝑜𝑓 𝑣𝑒𝑟𝑡𝑒𝑏𝑟𝑎𝑡𝑒 𝑐𝑜𝑑𝑜𝑛𝑠
The codons selected for modelling TG4 (Def. 3) comprised suitably scored Guanine-containing vertebrate codons (gCOD+n⊂COD) for the Guanine-rich clusters/stretches/tracks (3≤t≤9;Defs. 1 and 2) and generic/no-stop codons for the intervening loops (Figures 2 and 3). Briefly, a Guanine-containing codon (gCODn) is scored by considering its association with two similar flanking codons, i.e., gCODn-1, gCODn, gCODn+1 such that there is at least one occurrence of 'GGGG' (δ≥1.0) (Figures 2 and 3). This non-trivial case (4≤t≤9) is chosen since its trivial equivalent t = 3, is already subsumed (Defs. 1 and 2). Numerically,
αaminocodon=γ.θ.δ+Ω | (2) |
𝛾 : = 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑐𝑜𝑑𝑜𝑛 𝑜𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 (𝛾 = 1/64 ≈ 0.02)
𝜃 ≔ 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑐𝑜𝑑𝑜𝑛 𝑜𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑖𝑛 𝑎 𝑔𝑟𝑜𝑢𝑝 (𝜃 = {0.04, 0.11, 0.33, 1})
𝛿 ≔ 𝐷𝑖𝑠𝑡𝑖𝑛𝑐𝑡 𝑜𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒𝑠 𝑜𝑓 ′𝐺𝐺𝐺𝐺′(𝛿 = {0, 1, 2, 6})
Ω ≔ 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑑𝑗𝑎𝑐𝑒𝑛𝑡 𝑐𝑜𝑑𝑜𝑛𝑠 𝑤𝑖𝑡ℎ 𝛿 (Ω = {0, 1, 2})
Since the genetic code is degenerate, amino acids mapped from the selected codons are further scored and grouped (g1, g2, g3) (Figures 3 and 4).
βamino=|gCOD+amino|/|CODamino| | (6) |
gCOD+amino:= set of optimal codons for each amino acid (αaminocodon>0.0000)
CODamino:= Set of codons for each amino acid
Whilst amino acids from groups 1 and 2 (β > 0.00) (3) can represent the modeled G-rich clusters (y∈g1∪g2 = Y), no constraint was imposed on the amino acids (z) used to model the loops (z∈g1∪g2∪g3 = Z) (Figures 3 and 4). The peptidome (PTG4) evaluated by this study is a combinatorial association of peptides such that the molecular weight is ~0.8–2.3 KDa and length of any arbitrary member is ~7–20 aa (Figure 4). This may be represented as follows:
pTG4ij=((((yi,k)1≤i≤3(zi,k)1≤i≤2)k=3(yi)1≤i≤3)(zi)1≤i≤2)j | (Def. 4) |
PTG4=⋃i=20i=7⋃j=Jj=1|pTG4ij| | (Def. 5) |
𝑷𝑻𝑮𝟒 : = 𝑃𝑒𝑝𝑡𝑖𝑑𝑜𝑚𝑒 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑡𝑜 𝑇𝐺4
𝑝𝑇𝐺4𝑖𝑗 : = 𝑗𝑡ℎ 𝑐𝑎𝑛𝑜𝑛𝑖𝑐𝑎𝑙 𝑎𝑚𝑖𝑛𝑜 𝑎𝑐𝑖𝑑 𝑓𝑜𝑟𝑚 𝑜𝑓 𝑃𝑇𝐺4 𝑤𝑖𝑡ℎ "i" 𝑎𝑚𝑖𝑛𝑜 𝑎𝑐𝑖𝑑𝑠
𝑖 : = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑚𝑖𝑛𝑜 𝑎𝑐𝑖𝑑𝑠 𝑡ℎ𝑎𝑡 𝑐𝑜𝑚𝑝𝑟𝑖𝑠𝑒 𝑡ℎ𝑒 𝑚𝑜𝑑𝑒𝑙𝑙𝑒𝑑 𝑷𝑻𝑮𝟒
𝐽 : = 𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑛𝑜𝑛𝑖𝑐𝑎𝑙 𝑝𝑇𝐺4 𝑓𝑜𝑟 "𝑖" 𝑎𝑚𝑖𝑛𝑜 𝑎𝑐𝑖𝑑𝑠
A dataset that comprises experimentally validated G4-forming mRNA segments of several genes (n = 99) was downloaded (http://scottgroup.med.usherbrooke.ca/G4RNA/) and used to investigate the distribution of G4 [16]. Genes which possess non-redundant RNA (R) sub sequences in the PCS are translated in 6 reading frames using an online tool (http://web.expasy.org/translate). The peptides generated are classified as those: ⅰ) with one or more uninterrupted stretch of N-terminal amino acids of length ≥7 aa (~A), ⅱ) with an in-frame termination signal designated as 'STOP' (~B) and ⅲ) without any termination signal, i.e., absence of a 'STOP' in their sequence (~C). The translated peptides are classified as "VALID" ((B∩A)∪(C∩A)) and then queried for matches with pTG4ij(7≤i≤20,j∈N). The PERL scripts that are required to parse and process the resulting data files have been developed in house and the pseudocode for the same is presented as additional information (Pseudocode, PS1: Supplementary Text 1).
This is done by examining the occurrence of PTG4 in amino acid/protein sequences of disordered regions (IDRs) and full-length proteins with disordered regions (IDPs). DisProt 7.0 (http://disprot.org), is a database of experimentally validated and non-redundant sequences of IDRs and IDPs [38]. The sequences (|IDR| = 1445;|IDP| = 800) that comprise these are queried for occurrences of pTG4ij(7≤i≤20,j∈N) (Supplementary Texts 2 and 3). A preliminary partitioning schema divides these datasets into two distinct subsets, i.e., #pTG4ij≥1 (PT+≡PPOS⊂{IDR, IDP}; (Def.6)) and #pTG4ij = 0 (PT-≡PNEG⊂{IDR, IDP}; (Def. 7)). The extent of co-occurrence of one or more SLiMSw≡SL (w = {1, 2, 3}) with pTG4ij (SL±∈{PPOS, PNEG}) (Defs.8 and 9) is then evaluated to infer relevance of PTG4 to misfolding induced proteostasis. The distribution of overlapping/shared sequences of amino acids ((zn)n≥2∈(pTG4ij∩SLiMSw); zn∈Z; ) (Def.10), is examined in protein sequences from taxonomically diverse organisms with ScanProsite (https://prosite.expasy.org/scanprosite). The proof behind this rationale is presented:
(zn)n≥2 ∈(pTG4ij∩SLiMSw)=((zn)n≥2∈pTG4ij)∩((zn)n≥2∈SLiMSw)Let zn=z′n and zn=z′′n.Rewriting =((z′n)n≥2∈pTG4ij)∩((z′′n)n≥2∈SLiMSw)=((z′n)n≥2,(z′′n)n≥2)=pTG4ij×SLiMSw |
𝒁 : = 𝑆𝑒𝑡 𝑜𝑓 𝑎𝑚𝑖𝑛𝑜 𝑎𝑐𝑖𝑑𝑠 (𝑧𝑛 ∈ 𝒁)
𝑝𝑇𝐺4𝑖𝑗 : = 𝐶𝑎𝑛𝑜𝑛𝑖𝑐𝑎𝑙 𝑎𝑚𝑖𝑛𝑜 𝑎𝑐𝑖𝑑 𝑓𝑜𝑟𝑚 𝑜𝑓 𝑷𝑻𝑮𝟒
𝑺𝑳𝒊𝑴𝑺 : = 𝑆𝑒𝑡 𝑜𝑓 𝑠ℎ𝑜𝑟𝑡 𝑙𝑖𝑛𝑒𝑎𝑟 𝑚𝑜𝑡𝑖𝑓𝑠 (𝑆𝐿𝑖𝑀𝑆𝑤 ∈ 𝑺𝑳𝒊𝑴𝑺)
𝑖, 𝑗, 𝑛, 𝑤 : = 𝐼𝑛𝑑𝑖𝑐𝑒𝑠 𝑜𝑓 𝑚𝑒𝑚𝑏𝑒𝑟𝑠 𝑜𝑓 𝒁, 𝑷𝑻𝑮𝟒, 𝑺𝑳𝒊𝑴𝑺
The indices utilized by this study to establish relevance of matched instances of various motifs/co-motifs in the peptide/protein sequences of interest include the accuracy (A), precision (P), recall (R) and the p-value. A 2X2 table which represents the categorized data (2.1.4) is constructed and used to compute various bioinformatics indices. This is outlined as under:
![]() |
𝑇𝑁 ≔ 𝑇𝑟𝑢𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 (|𝒔𝑵𝑬𝑮 ∩ 𝑷𝑵𝑬𝑮| = |𝒔𝑵𝑬𝑮|) ≡ 𝑆𝐿−𝑃𝑇− (𝐷𝑒𝑓. 11)
𝐹𝑃 ≔ 𝐹𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 (|𝑺𝑵𝑬𝑮 ∩ 𝑷𝑷𝑶𝑺| = |𝑺𝑵𝑬𝑮|) ≡ 𝑆𝐿−𝑃𝑇+ (𝐷𝑒𝑓. 12)
𝐹𝑁 ≔ 𝐹𝑎𝑙𝑠𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 (|𝒔𝑷𝑶𝑺 ∩ 𝑷𝑵𝑬𝑮| = |𝒔𝑷𝑶𝑺|) ≡ 𝑆𝐿+𝑃𝑇− (𝐷𝑒𝑓. 13)
𝑇𝑃 ≔ 𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 (|𝑺𝑷𝑶𝑺 ∩ 𝑷𝑷𝑶𝑺| = |𝑺𝑷𝑶𝑺|) ≡ 𝑆𝐿+𝑃𝑇+ (𝐷𝑒𝑓. 14)
The equations may then be written as:
(A)=(TN+TP/TN+FP+FN+TP)X100 | (4) |
(P)=(TP/FP+TP)X100 | (5) |
(R)=(TP/FN+TP)X100 | (6) |
The p-values for these analyses are computed by comparing the frequency of occurrence of all pTG4ij in a test sequence (ϕpTG4ij) with the same in randomly-generated (v∈V) sequences of similar lengths (ϕpTG4vij), i.e., 7-50 aa (1≤v≤10000) and > 50 aa (1≤v≤100000) (Pseudocode, PS2: Supplementary Text 1):
p−value=ϕpTG4vij/ϕpTG4ij=(v=|V|∑v=1i=21∑i=7j=J∑j=1pTG4vij)/(i=21∑i=7j=J∑j=1pTG4ij)=(v=|V|∑v=1i=21∑i=7j=J∑j=1pTG4vij/i=21∑i=7j=J∑j=1pTG4ij) | (7) |
The frequency of occurrence of overlapping sequences of amino acids ((zn)n≥2∈(pTG4ij∩SLiMSw); zn∈Z) in pre-compiled and curated protein sequences (ϕ(zn)) across taxa is compared with randomly chosen sequences of comparable lengths (ϕ(vzn); n = 5000). These are used to estimate statistical significance, i.e., p-value = ϕ(vzn)/ϕ(zn) (8).
The data presented discusses implementation of a model of short intra-strand TG4 for various values of α and β, populates PTG4 and establishes the equivalence TG4~PTG4. Co-occurrence and homology studies between PTG4 and the SLiMS in IDRs/IDPs and generic protein sequences across taxa are used to infer probable mechanisms of TG4~PTG4 facilitated misfolding-induced proteostasis.
An association-competent codon not only takes into account the presence of a Guanine residue, but also gives weightage to its position (Figures 1–3, Table 1). This schema partitions standard vertebrate codons into those with a high- (Ranks 1-4;α > 0.0000) or low- (Rank 5;α = 0.0000) propensity to form a contiguous cluster of Guanine residues (Figures 3 and 4, Table 1). Whilst, 'GGG' (Rank 1;α = 2.12) can associate with ({GGG, GxG, xGG, GGx, Gxx, xxG}) bilaterally (δ = 6;Ω = 2), 'GxG' (Rank 2; α = 2.0066) can do so only with 'GGG' (δ = 1;Ω = 2). On the other hand, the codon subsets 'GGx' and 'xGG' (Rank 3;α = 1.0132) can form two clusters of contiguous Guanine residues with 'GGG' and 'xGG'/ 'GGx' unilaterally (δ = 2;Ω = 1). Similarly, the subsets 'xxG' or 'Gxx' (Rank 4;α = 1.0022), can form contiguous Guanines with a single occurrence of 'GGG' (δ = 1;Ω = 1) (Figures 3 and 4, Table 1). Conversely, codons with either a single occurrence of a central Guanine residue 'xGx' or no Guanine residues 'xxx' (Rank 5;α = 0.0000) are unable to form the 'GGGG' and are excluded from this study (Figures 3 and 4, Table 1).
Rank | Codon set, Cardinality | Codon | γ | θ | δ | Ω | α=γ.θ.δ+Ω | aa |
1 | GGG, 1 | GGG | 0.02 | 1.00 | 6 | 2 | 2.1200 | Gly |
2 | GxG, 3 | GUG | 0.02 | 0.33 | 1 | 2 | 2.0066 | Val |
GCG | 0.02 | 0.33 | 1 | 2 | 2.0066 | Ala | ||
GAG | 0.02 | 0.33 | 1 | 2 | 2.0066 | Glu | ||
3 | xGG, 3 | UGG | 0.02 | 0.33 | 2 | 1 | 1.0132 | Trp |
CGG | 0.02 | 0.33 | 2 | 1 | 1.0132 | Arg | ||
AGG | 0.02 | 0.33 | 2 | 1 | 1.0132 | Arg | ||
3 | GGx, 3 | GGU | 0.02 | 0.33 | 2 | 1 | 1.0132 | Gly |
GGC | 0.02 | 0.33 | 2 | 1 | 1.0132 | Gly | ||
GGA | 0.02 | 0.33 | 2 | 1 | 1.0132 | Gly | ||
4 | xxG, 9 | UUG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Leu |
UCG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Ser | ||
UAG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Ter | ||
CUG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Leu | ||
CCG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Pro | ||
CAG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Gln | ||
AUG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Met | ||
ACG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Thr | ||
AAG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Lys | ||
4 | Gxx, 9 | GUU | 0.02 | 0.11 | 1 | 1 | 1.0022 | Val |
GCU | 0.02 | 0.11 | 1 | 1 | 1.0022 | Ala | ||
GAU | 0.02 | 0.11 | 1 | 1 | 1.0022 | Asp | ||
GUC | 0.02 | 0.11 | 1 | 1 | 1.0022 | Val | ||
GCC | 0.02 | 0.11 | 1 | 1 | 1.0022 | Ala | ||
GAC | 0.02 | 0.11 | 1 | 1 | 1.0022 | Asp | ||
GUA | 0.02 | 0.11 | 1 | 1 | 1.0022 | Val | ||
GCA | 0.02 | 0.11 | 1 | 1 | 1.0022 | Ala | ||
GAA | 0.02 | 0.11 | 1 | 1 | 1.0022 | Glu | ||
5 | xGx, 9 | UGU | 0.02 | 0.11 | 0 | 0 | 0.0000 | Cys |
UGC | 0.02 | 0.11 | 0 | 0 | 0.0000 | Cys | ||
UGA | 0.02 | 0.11 | 0 | 0 | 0.0000 | Ter | ||
CGU | 0.02 | 0.11 | 0 | 0 | 0.0000 | Arg | ||
CGC | 0.02 | 0.11 | 0 | 0 | 0.0000 | Arg | ||
CGA | 0.02 | 0.11 | 0 | 0 | 0.0000 | Arg | ||
AGU | 0.02 | 0.11 | 0 | 0 | 0.0000 | Ser | ||
AGC | 0.02 | 0.11 | 0 | 0 | 0.0000 | Ser | ||
AGA | 0.02 | 0.11 | 0 | 0 | 0.0000 | Arg | ||
5 | xxx, 27 | UUU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Phe |
UCU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ser | ||
UAU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Tyr | ||
UUC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Phe | ||
UCC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ser | ||
UUA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Leu | ||
UCA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ser | ||
UAA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ter | ||
CUU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Leu | ||
CCU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Pro | ||
CAU | 0.02 | 0.04 | 0 | 0 | 0.0000 | His | ||
CUC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Leu | ||
CCC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Pro | ||
CAC | 0.02 | 0.04 | 0 | 0 | 0.0000 | His | ||
CUA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Leu | ||
CCA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Pro | ||
CAA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Gln | ||
AUU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ile | ||
ACU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Thr | ||
AAU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Asn | ||
AUC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ile | ||
ACC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Thr | ||
AAC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Asn | ||
AUA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ile | ||
ACA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Thr | ||
AAA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Lys |
Abbreviations
𝛾: General probability of a codon (𝛾 = 1/64 ≅ 0.02)
𝜃: Probability of codon within a group (𝜃 = {0.04, 0.33, 0.11, 1.00})
𝛿: Number of distinct codon sets that could complete 'GGGG' (𝛿 = {0, 1, 2, 6})
Ω: Number of adjacent positions that contain 𝛿 (Ω = {0, 1, 2})
𝛼: Threshold for selecting codons that may favour G-quadruplex formation
x: Codon specific generic ribonucleotide {𝐴, 𝐺, 𝑈, 𝐶}
aa: Amino acid
Ter: Stop codons {𝑈𝐴𝐺, 𝑈𝐺𝐴, 𝑈𝐴𝐴}
An estimate of the possible combinations of the simplest peptide (∑i=7∑j=Jj=1pTG4ij = 8.00E+03;GlyzGlyzGlyzGly; J = (20)3; i = lengthp(TG4ij) = 7 aa; z∈Z) (Figures 3 and 4, Table 2). This justifies usage of PTG4 (pTG4ij∈PTG4) as a generic representation of the putative peptidome encoded by the TG4 (PTG4). Approximately ~12% (n = 11) of in silico translated amino acid sequences from exon-derived TG4 possesses one of more "STOP" signals and include ESR1, longer RNA variants of PRNP (85 nt) and BCL2 (29 n t, 33 nt, 34 nt) (Table 3; Table 1, Supplementary Text 2). With the exceptions of KCNH2/ZNF669 and the shorter variants of PRNP (14 nt, 15 nt, 20 nt, 24 nt), "VALID" sub sequences are found for BACE1, BCL2, ESR1, PRNP (long) and TERF2 (Table 3; Tables 1A and 1C). Interestingly, all the genes considered possessed at least one occurrence of PTG4 (P = 100%, n = 6) (Table 3; Table 1B). This finding, despite the small sample size is proof-of-principal that the TG4 can be mapped to definite peptide sequences, i.e., TG4~PTG4. Since this can occur only after a ribosomal read through of the G4 containing mRNA, it raises the intriguing possibility that PTG4 whence part of a larger protein may increase its propensity to undergo misfolding. This notion is investigated in non-redundant sequences of IDRs (PTG4~10%, n = 145;0.00≤p-value≤0.20) and IDPs (PTG4~34%, n = 269;0.00≤p-value < 0.5) (Table 4; Tables 2 and 3).
aa | CODamino | gCOD+amino | β | |
Group 1 (n=7) | Ala | 4 | 4 | 1.00 |
Val | 4 | 4 | 1.00 | |
Asp | 2 | 2 | 1.00 | |
Glu | 2 | 2 | 1.00 | |
Trp | 1 | 1 | 1.00 | |
Met | 1 | 1 | 1.00 | |
Gly | 4 | 4 | 1.00 | |
Group 2 (n=7) | Leu | 6 | 2 | 0.3333 |
Gln | 2 | 1 | 0.5 | |
Arg | 6 | 2 | 0.3333 | |
Lys | 2 | 1 | 0.5 | |
Ser | 6 | 1 | 0.1667 | |
Thr | 4 | 1 | 0.25 | |
Pro | 4 | 1 | 0.25 | |
Group 3 (n=6) | Cys | 2 | 0 | 0.00 |
Asn | 2 | 0 | 0.00 | |
Ile | 3 | 0 | 0.00 | |
His | 2 | 0 | 0.00 | |
Phe | 2 | 0 | 0.00 | |
Tyr | 2 | 0 | 0.00 |
Abbreviations
gCOD+amino : Guanine-containing optimal codons excluding STOP (UAG) (𝛼 > 0.000)
COD−amino : Non-optimal codon excluding STOP (UGA, UAA) (𝛼 = 0.000)
𝑪𝑶𝑫𝒂𝒎𝒊𝒏𝒐 = gCOD+amino + COD−amino : All codons for an amino acid
GENE | NAME | G4 (nt) | Ex | STOP(n=11) | VALID(n=59) | |PTG4| |
BACE1 | Beta-secretase 1 | 33 | 3 | n=0 | n=6 | n=2 |
BCL2 | B-cell lymphoma 2 | 33 | 2 | n=1 | n=6 | n=1 |
23 | n=0 | n=6 | ||||
28 | n=0 | n=6 | ||||
29 | n=1 | n=5 | ||||
34 | n=1 | n=5 | ||||
33 | 3 | n=2 | n=5 | |||
ESR1 | Estrogen receptor alpha (ERα) | 36 | 4 | n=1 | n=5 | n=2 |
KCNH2 | Potassium Voltage-Gated Channel sub family H | 18 | 12 | n=0 | n=0 | NA |
ZNF669 | Member 2 Zinc Finger Protein 669 |
1 | ||||
PRNP | Prion protein | 14 | 2 | n=0 | n=0 | n=1 |
15 | n=0 | n=0 | ||||
20 | n=0 | n=0 | ||||
24 | n=0 | n=6 | ||||
85 | n=6 | n=3 | ||||
TERF2 | Telomeric repeat-binding factor 2 | 55 | 1 | n=0 | n=6 | n=1 |
Disordered regions (IDRs; n=1445;0.00≤p-value < 0.05) | |||||||||||||||||
SL-PT- | SL-PT+ | SL+PT- | SL+PT+ | R1T | R2T | C1T | C2T | A (%) | P (%) | R (%) | |||||||
SLiMS1 | 1078 | 64 | 58 | 9 | 1142 | 67 | 1136 | 73 | 89.90 | 12.32 | 13.43 | ||||||
SLiMS2 | 749 | 18 | 121 | 29 | 767 | 150 | 870 | 47 | 84.84 | 61.70 | 19.33 | ||||||
SLiMS3 | 1212 | 108 | 34 | 9 | 1320 | 43 | 1246 | 117 | 89.58 | 7.69 | 20.93 | ||||||
Proteins with disordered segments (IDPs; n=800;0.00≤p-value < 0.05) | |||||||||||||||||
SL-PT- | SL-PT+ | SL+PT- | SL+PT+ | R1T | R2T | C1T | C2T | A (%) | P (%) | R (%) | |||||||
SLiMS1 | 86 | 12 | 28 | 18 | 98 | 46 | 114 | 30 | 72.22 | 60.00 | 39.10 | ||||||
SLiMS2 | 1 | 1 | 96 | 66 | 2 | 162 | 97 | 67 | 40.85 | 98.50 | 40.74 | ||||||
SLiMS3 | 250 | 57 | 26 | 25 | 307 | 51 | 276 | 82 | 76.81 | 30.48 | 49.01 |
Abbreviations
𝐼𝐷𝑅𝑠: Intrinsically disordered regions
𝐼𝐷𝑃s: Intrinsically disordered proteins
𝑧: Any amino acid
𝑆𝐿𝑖𝑀𝑆1: [𝑆𝑇]𝑃𝑧𝑅
𝑆𝐿𝑖𝑀𝑆2: [𝐸𝐷]𝑧𝑧[𝐷𝐸][𝐴𝐺𝑆]
𝑆𝐿𝑖𝑀𝑆3: [𝐾𝑅]𝑧𝑃𝑧𝑧𝑃
𝑆𝐿−𝑃𝑇−: |𝑺𝑵𝑬𝑮 ∩ 𝑷𝑵𝑬𝑮|
𝑆𝐿−𝑃𝑇+: |𝑺𝑵𝑬𝑮 ∩ 𝑷𝑷𝑶𝑺|
𝑆𝐿+𝑃𝑇−: |𝑺𝑷𝑶𝑺 ∩ 𝑷𝑵𝑬𝑮|
𝑆𝐿+𝑃𝑇+: |𝑺𝑷𝑶𝑺 ∩ 𝑷𝑷𝑶𝑺|
𝑅1𝑇: 𝑆𝐿−𝑃𝑇− + 𝑆𝐿−𝑃𝑇+
𝑅2𝑇: 𝑆𝐿+𝑃𝑇− + 𝑆𝐿+𝑃𝑇+
𝐶1𝑇: 𝑆𝐿−𝑃𝑇−+ 𝑆𝐿+𝑃𝑇−
𝐶2𝑇: 𝑆𝐿−𝑃𝑇+ + 𝑆𝐿+𝑃𝑇+
𝐴: Accuracy
𝑃: Precision
𝑅: Recall
The amino acids that comprise the peptide members of PTG4 and the short linear motifs (g1, g2 vs SLiMS) are well conserved. The co-occurrence of PTG4 with SLiMS in the IDRs (A~85-89%;0.00 < p-value≤0.05) suggests that this association is non-trivial and may favor all purported mechanisms of misfolding (hyperphosphorylation, proteolytic cleavage, complex formation) (Table 4; Tables 2 and 3). However, the higher precision of PTG4 with the proteolytic-SLiMS suggests that this may predominate (Table 4; Tables 2 and 3). The data with the IDPs suggests a similar predilection for proteolytic cleavage (A~40-77%;P~99%;0.00 < p-value < 0.05, although hyperphosphorylation (P~60%;0.00 < p-value < 0.05) and complex-promotion (P~30%;0.00 < p-value < 0.05) may constitute viable alternatives to the genesis of misfolding (Table 4; Tables 2 and 3). The presence of overlapping sequences of amino acids between PTG4 and the SLiMS when examined in protein sequences from taxonomically diverse organisms is degenerate for SLiMS1 (number of matches = 6251) and SLiMS3 (number of matches = 1480) (Table 5; Table 4). In contrast, the corresponding data for SLiMS2 (number ofmatches = 3759;0.00 < p-value < 0.05) is statistically significant (Table 5). The taxonomic spread includes archaea (n = 150), bacteria (n = 1735), viruses (n = 84), green land plants (n = 199), fungi (n = 182), eukaryotic invertebrates (n = 43) and vertebrates (n = 700) (Table 4).
SLiMS | Sample | (zn)n≥2∈pTG4ij∩SLiMSw (p-value) | |
SLiMS1=[ST]PzR | Pz | PG (n=1) | PG (Degenerate) |
SLiMS2=[ED]zzD[AGS] | z[DE] | G[DE] (n=2) | [WGRVAELMKQSTP][AG][DE]z2EG[VADE](p-value=0.00069) |
[DE]z | [DE]G (n=1) | ||
zz[DE] | [LMKQSTP]G[DE] (n=14) [VAE]G[DE] (n=6) [WGR][AG][DE] (n=6) |
||
[DE]zz[DE] | [VAE]G[DE]zzEG[VADE] (n=28) [WGR][AG][DE]zzEG[VADE] (n=24) [WGR][AG][DE]zzEG (n=6) GEzzEG[VADE] (n=4) GEzzEG (n=1) |
||
SLiMS3=[KR]zPzzP | Pzz | PG[VADE] (n=4) | PGV (Degenerate) PGA (Degenerate) PGD (Degenerate) PGE (Degenerate) |
Abbreviations
𝑝𝑇𝐺4𝑖𝑗: Members of putative peptidome (𝑝𝑇𝐺4𝑖𝑗 ∈ 𝑷𝑻𝑮𝟒)
𝑆𝐿𝑖𝑀𝑆𝑤: Short linear motifs (𝑆𝐿𝑖𝑀𝑆𝑤 ∈ 𝑺𝑳𝒊𝑴𝑺)
𝑧𝑛: Shared sequence(s) of amino acids between 𝑷𝑻𝑮𝟒 and 𝑺𝑳𝒊𝑴𝑺
𝑖, 𝑗, 𝑤, 𝑛: Indices to characterize members of 𝑷𝑻𝑮𝟒, 𝑺𝑳𝒊𝑴𝑺, 𝒁
The significant association and homology between PTG4 and the SLiMS along with the equivalence data (PTG4~TG4) suggest that TG4 may influence proteostasis in a multitude of ways (Tables 1–5; Tables 1–4, Supplementary Text 2–4).
The short TG4 modeled in this study has an average loop length (h~2 Mer) which may contribute to thermodynamic stability by restricting the mobility of the participating strands (1) [8,9,10,11]. The physical presence of TG4 will result in a stalled ribosome and translation which is prolonged, inefficient and incomplete [31,32,33]. Interestingly, this analysis also includes UAG (Amber; α > 0.0000), which when present in-frame will prematurely terminate translation and result in a truncated protein (Table 1) [39]. Whilst nonsense-mediated mRNA decay may be triggered if the stop codon is within ±50 Mer of the exon-junction complex (EJC), a read-through may occur nonetheless. The resulting protein sequences may be modified which in tandem with one or more occurrences of PTG4 and/or SLiMS would predispose the same to agammaegate and result in a proteopathy [39,40].
Whilst the preponderance of Glycine (Gly) might impart heightened flexibility and limit the formation of stabilizing secondary structural elements in the hypothetical protein, Proline (Pro) confers rigidity and may retard proper folding. There is also remarkable conservation between the amino acids that comprise PTG4 and the SLiMS. These include the complex-promoting hydrophobic (Ala, Val, Met, Trp) and ionic (Asp, Glu, Lys, Arg) residues, along with nucleophile-favoring Serine and Threonine (Figures 3 and 4, Tables 2–5). Whilst, the former may favor agammaegation by non-covalent interactions, the latter may promote phosphorylation-mediated charge imbalance and thence misfolding. Interestingly, the loops of G4 whence modeled by Adenine-containing codons (Axx) are translated to Lysine (K), Arginine (R), Serine (S), Threonine (T) and Isoleucine (I); all of which may also promote misfolding (Figures 3 and 4, Tables 2-5) [8,9,10,11,34,35]. The distribution of PTG4 amongst physiologically relevant proteins further suggests that the peptide-mediated misfolding may influence/regulate signal transduction, cytoskeleton organization, metabolism, synaptic transmission and transcription/translation (Table 6; Table 5).
Cellular function | Disordered regions of proteins | |
1. | Signal transduction | DP00274, DP00224, DP00141, DP00332, DP01063, DP00506, DP00418, DP00341, DP00435, DP00613, DP00463, DP00954, DP00959, DP01104, DP00611, DP00519, DP00086, DP00707, DP00712 |
2. | Endocytosis | DP01073, DP01065, DP01066, DP00225 |
3. | Calcium-calmodulin | DP00092, DP00132, DP00561, DP00118, DP00253 |
4. | Myofibril assembly | DP01090 |
5. | Cytoskeleton | DP01056 DP00240, DP01022, DP00169, DP00716, DP00717, DP01100, DP00122 |
6. | Nuclear pore | DP01075, DP01077, DP01079 |
7. | Phototransduction | DP00768, DP00347 |
8. | Targeting | DP00893, DP00609, DP00610, DP01058 |
9. | Transcription | DP00062, DP00177, DP00633, DP00348, DP00786, DP00049, DP00231, DP00873, DP00720, DP00217, DP00081 |
10. | Translation | DP00082, DP00164, DP00229 DP00949, DP00134 |
11. | Synaptic transmission | DP00943 |
12. | Supercoiling | DP00076 |
13. | Binding | DP00539, DP00854, DP01052, DP00659, DP00656 |
14. | Peptide bond formation | DP00944 |
15. | Enzymes | DP00557, DP00032, DP00095, DP00337, DP00379, DP00787, DP00427, DP00429 |
16. | Bacterial/parasitic virulence | |
Secreted toxins | DP00345, DP00591 | |
Cytoadherence | DP00025, DP00065, DP01096 | |
17. | Viral infectivity | |
Cyclophilin interaction | DP00615, DP01031 | |
Chaperones | DP00699, DP00700, DP00674 | |
Capsid assembly | DP00133, DP00876 | |
Membrane fusion | DP01043 | |
Latency | DP01060 | |
18. | Unknown | DP00119 |
Note: DP≔DisProt ID
The distribution of overlapping/shared amino acids in protein sequences of non-vertebrates suggests that PTG4 is either completely degenerate with the SLiMS or present in proportions that is statistically significant (Tables 5 and 6; Tables 4 and 5). These data imply that motif-mimicry too, might constitute a probable cause (tropism, oncogenic potential, virulence) of infection/infestation-mediated acute/chronic proteopathies [34,35,41,42]. The contribution(s) of misfolding to the pathogenesis of secondary proteopathies is however, debatable. Whilst, there is evidence that mislocalization of proteins can precipitate misfolding, mimicry itself may result in exonuclease-mediated proteolytic cleavage and thence trigger an infective proteopathy [43,44]. Additionally, the presence of sequences of amino acids such as Proline and Threonine in viral or fungal proteins may be responsible for creating and/or maintaining a milieu conducive to the genesis of infective/transmissible proteopathies, viz., a high charge density and imbalance of electrostatic interactions [43,44].
The coexistence of potentially translatable G-quadruplexes (TG4) with unfolded ribonucleotides in the PCS of an mRNA transcript may have important consequences for protein homeostasis. Here, I have investigated the contribution of a short intra-strand translatable G-quadruplex and its associated peptidome (TG4~PTG4) to the genesis of misfolding-induced proteostasis. The co-occurrence, homology and distribution of overlapping/shared amino acids of PTG4 with the SLiMS suggests that this may occur by truncation, complex formation, increased charge density and/or accelerated degradation. An additional mechanism that is also supported is motif-mimicry by pathogens which may trigger the development of infective proteopathies. The putative peptidome (~7–20 aa) that corresponds to the short translatable G-quadruplex delineated by this investigation may be utilized as novel markers of both the primary and secondary proteopathies.
SK outlined and designed the study, designed and conceptualized the algorithm(s) and formulae for prediction, wrote mathematical proofs to establish rigor, collated the data, constructed the models, formulated the filters, carried out the computational analysis, wrote all necessary code and the manuscript.
The author declares no conflict of interest.
[1] | Pocketbook FS (2015) World food and agriculture. FAO Rome Italy . |
[2] |
Ashraf M, Athar HR, Harris PJC, et al. (2008) Some prospective strategies for improving crop salt tolerance. Adv Agron 97: 45-110. doi: 10.1016/S0065-2113(07)00002-8
![]() |
[3] |
Ali MA, Naveed M, Mustafa A, et al. (2017) The good, the bad, and the ugly of rhizosphere microbiome. Probiotics Plant Health Singapore: Springer, 253-290. doi: 10.1007/978-981-10-3473-2_11
![]() |
[4] |
Munns R, Tester M (2008) Mechanisms of salinity tolerance. Annu Rev Plant Biol 59: 651-681. doi: 10.1146/annurev.arplant.59.032607.092911
![]() |
[5] |
Hasegawa PM, Bressan RA, Zhu JK, et al. (2000) Plant cellular and molecular resopnses to high salinity. Annu Rev Plant Physiol Plant Mol Biol 51: 463-499. doi: 10.1146/annurev.arplant.51.1.463
![]() |
[6] |
Tabatabaei S, Ehsanzadeh P (2016) Photosynthetic pigments, ionic and antioxidative behaviour of hulled tetraploid wheat in response to NaCl. Photosynthetica 54: 340-350. doi: 10.1007/s11099-016-0083-3
![]() |
[7] |
Munns R (2002) Comparative physiology of salt and water stress. Plant Cell Environ 25: 239-250. doi: 10.1046/j.0016-8025.2001.00808.x
![]() |
[8] |
Shrivastava P, Kumar R (2015) Soil salinity: A serious environmental issue and plant growth promoting bacteria as one of the tools for its alleviation. Saudi J Biol Sci 22: 123-131. doi: 10.1016/j.sjbs.2014.12.001
![]() |
[9] |
Turan M, Yildirim E, Kitir N, et al. (2017) Beneficial role of plant growth-promoting bacteria in vegetable production under abiotic stress. Microbial Strategies for Vegetable Production 151-166. doi: 10.1007/978-3-319-54401-4_7
![]() |
[10] |
Gouda S, Kerry RG, Das G, et al. (2018) Revitalization of plant growth promoting rhizobacteria for sustainable development in agriculture. Microbiol Res 206: 131-140. doi: 10.1016/j.micres.2017.08.016
![]() |
[11] |
Nagargade M, Tyagi V, Singh MK (2018) Plant growth-promoting rhizobacteria: a biological approach toward the production of sustainable agriculture. Role of Rhizospheric Microbes in Soil Singapore: Springer, 205-223. doi: 10.1007/978-981-10-8402-7_8
![]() |
[12] |
Bhattacharyya PN, Jha DK (2012) Plant growth-promoting rhizobacteria (PGPR): emergence in agriculture. World J Microbiol Biotechnol 28: 1327-1350. doi: 10.1007/s11274-011-0979-9
![]() |
[13] |
Egamberdieva D, Lugtenberg B (2014) Use of plant growth-promoting rhizobacteria to alleviate salinity atress in plants. Use of Microbes for the Alleviation of Soil Stresses New York: Springer, 73-96. doi: 10.1007/978-1-4614-9466-9_4
![]() |
[14] |
Shameer S, Prasad TNVKV (2018) Plant growth promoting rhizobacteria for sustainable agricultural practices with special reference to biotic and abiotic stresses. Plant Growth Regul 84: 603-615. doi: 10.1007/s10725-017-0365-1
![]() |
[15] |
Yao L, Wu Z, Zheng Y, et al. (2010) Growth promotion and protection against salt stress by Pseudomonas putida Rs-198 on cotton. Eur J Soil Biol 46: 49-54. doi: 10.1016/j.ejsobi.2009.11.002
![]() |
[16] | Dasgupta D, Ghati A, Sarkar A, et al. (2015) Application of plant growth promoting rhizobacteria (PGPR) isolated from the rhizosphere of Sesbania bispinosa on the Growth of Chickpea (Cicer arietinum L.). Int J Curr Microbiol App Sci 4: 1033-1042. |
[17] | Singh RP, Shelke GM, Kumar A, et al. (2015) Biochemistry and genetics of ACC deaminase: a weapon to ‘stress ethylene’ produced in plants. Front Microbiol 6: 937. |
[18] |
Bharti N, Barnawal D (2019) Amelioration of salinity stress by PGPR: ACC deaminase and ROS scavenging enzymes activity. PGPR Amelioration in Sustainable Agriculture Woodhead Publishing, 85-106. doi: 10.1016/B978-0-12-815879-1.00005-7
![]() |
[19] |
Ramadoss D, Lakkineni VK, Bose P, et al. (2013) Mitigation of salt stress in wheat seedlings by halotolerant bacteria isolated from saline habitats. SpringerPlus 2: 6. doi: 10.1186/2193-1801-2-6
![]() |
[20] |
Marques APGC, Pires C, Moreira H, et al. (2010) Assessment of the plant growth promotion abilities of six bacterial isolates using Zea mays as indicator plant. Soil Biol Biochem 42: 1229-1235. doi: 10.1016/j.soilbio.2010.04.014
![]() |
[21] |
Ji SH, Gururani MA, Chun SC (2014) Isolation and characterization of plant growth promoting endophytic diazotrophic bacteria from Korean rice cultivars. Microbiol Res 169: 83-98. doi: 10.1016/j.micres.2013.06.003
![]() |
[22] |
Dell' Amico E, Cavalca L, Andreoni V (2005) Analysis of rhizobacterial communities in perennial Graminaceae from polluted water meadow soil, and screening of metal-resistant, potentially plant growth-promoting bacteria. FEMS Microbiol Ecol 52: 153-162. doi: 10.1016/j.femsec.2004.11.005
![]() |
[23] |
Duan J, Müler KM, Charles TC, et al. (2009) 1-Aminocyclopropane-1-Carboxylate (ACC) Deaminase Genes in Rhizobia from Southern Saskatchewan. Microb Ecol 57: 423-436. doi: 10.1007/s00248-008-9407-6
![]() |
[24] |
Bric JM, Bostock RM, Silverstone SE (1991) Rapid in situ assay for indoleacetic acid production by bacteria immobilized on a nitrocellulose membrane. Appl Environ Microbiol 57: 535-538. doi: 10.1128/AEM.57.2.535-538.1991
![]() |
[25] |
Adler J (1966) Effect of amino acids and oxygen on chemotaxis in Escherichia coli. J Bacteriol 92: 121-129. doi: 10.1128/JB.92.1.121-129.1966
![]() |
[26] | Al Shorouk City, Egypt Weather History|Weather Underground. Available from: https://www.wunderground.com/history/monthly/eg/al-shorouk-city/HECA/date/2019-5. |
[27] | Minolta K (1989) Chlorophyll meter SPAD-502 instruction manual. Available from: https://www.konicaminolta.com/instruments/download/catalog/color/pdf/spad502plus_catalog_eng.pdf. |
[28] |
Liu X, Huang B (2000) Heat stress injury in relation to membrane lipid peroxidation in creeping bentgrass. Crop Sci 40: 503-510. doi: 10.2135/cropsci2000.402503x
![]() |
[29] | Goudarzi M, Pakniyat H (2008) Evaluation of wheat cultivars under salinity stress based on some agronomic and physiological traits. J Agri Soc Sci 4: 4. |
[30] | Steel RG (1997) Pinciples and procedures of statistics a biometrical approach. |
[31] | Falconer DS (1989) Introduction to quantitative genetics 3rd ed.. Harlow Longman Sci Tech . |
[32] | Nadarajan N (2005) Quantitative genetics and biometrical techniques in plant breeding Kalyani Publishers. |
[33] | Allard RW (1999) Principles of plant breeding John Wiley & Sons. |
[34] |
JOHNSON H (1955) Estimates of genetic and environmental variability in soybeans. Agron J 47: 314-318. doi: 10.2134/agronj1955.00021962004700070009x
![]() |
[35] | Miles AG (1992) Biological Nitrogen Fixation Springer Science & Business Media. |
[36] |
Anzai Y, Kim H, Park JY, et al. (2000) Phylogenetic affiliation of the pseudomonads based on 16S rRNA sequence. Int J Syst Evol Microbiol 50: 1563-1589. doi: 10.1099/00207713-50-4-1563
![]() |
[37] |
Solanki MK, Wang Z, Wang FY, et al. (2017) Intercropping in sugarcane cultivation influenced the soil properties and enhanced the diversity of vital diazotrophic bacteria. Sugar Tech 19: 136-147. doi: 10.1007/s12355-016-0445-y
![]() |
[38] |
Santi C, Bogusz D, Franche C (2013) Biological nitrogen fixation in non-legume plants. Ann Bot 111: 743-767. doi: 10.1093/aob/mct048
![]() |
[39] |
Carvalho TLG, Ballesteros HGF, Thiebaut F, et al. (2016) Nice to meet you: genetic, epigenetic and metabolic controls of plant perception of beneficial associative and endophytic diazotrophic bacteria in non-leguminous plants. Plant Mol Biol 90: 561-574. doi: 10.1007/s11103-016-0435-1
![]() |
[40] |
Cassán F, Vanderleyden J, Spaepen S (2014) Physiological and agronomical aspects of phytohormone production by model plant-growth-promoting rhizobacteria (PGPR) belonging to the genus azospirillum. J Plant Growth Regul 33: 440-459. doi: 10.1007/s00344-013-9362-4
![]() |
[41] |
Verma JP, Jaiswal DK, Krishna R, et al. (2018) Characterization and screening of thermophilic Bacillus strains for developing plant growth promoting consortium from hot spring of Leh and Ladakh Region of India. Front Microbiol 9: 1293. doi: 10.3389/fmicb.2018.01293
![]() |
[42] | Hussain S, Khaliq A, Matloob A, et al. (2013) Germination and growth response of three wheat cultivars to NaCl salinity. Plant Soil Environ 31: 36-43. |
[43] |
Wang C, Knill E, Glick BR, et al. (2000) Effect of transferring 1-aminocyclopropane-1-carboxylic acid (ACC) deaminase genes into Pseudomonas fluorescens strain CHA0 and its gacA derivative CHA96 on their growth-promoting and disease-suppressive capacities. Can J Microbiol 46: 898-907. doi: 10.1139/w00-071
![]() |
[44] |
Glick BR, Jacobson CB, Schwarze MMK, et al. (1994) 1-Aminocyclopropane-1-carboxylic acid deaminase mutants of the plant growth promoting rhizobacterium Pseudomonas putida GR12-2 do not stimulate canola root elongation. Can J Microbiol 40: 911-915. doi: 10.1139/m94-146
![]() |
[45] |
Mayak S, Tirosh T, Glick BR (2004) Plant growth-promoting bacteria confer resistance in tomato plants to salt stress. Plant Physiol Biochem 42: 565-572. doi: 10.1016/j.plaphy.2004.05.009
![]() |
[46] | Kaya MD, Ipek A (2003) Effects of different soil salinity levels on germination and seedling growth of safflower (Carthamus tinctorius L.). Turk J Agric For 27: 221-227. |
[47] | El-Shraiy AM, Hegazi AM, Hikal MS (2016) Nodule formation, antioxidant enzymes activities and other biochemical changes in salt stressed faba bean plants treated with glycine betaine, arbuscular mycorrhiza fungi and yeast extract. Middle East J Appl Sci 6: 1076-1099. |
[48] |
Egamberdieva D (2009) Alleviation of salt stress by plant growth regulators and IAA producing bacteria in wheat. Acta Physiol Plant 31: 861-864. doi: 10.1007/s11738-009-0297-0
![]() |
[49] |
Tiwari S, Singh P, Tiwari R, et al. (2011) Salt-tolerant rhizobacteria-mediated induced tolerance in wheat (Triticum aestivum) and chemical diversity in rhizosphere enhance plant growth. Biol Fertil Soils 47: 907. doi: 10.1007/s00374-011-0598-5
![]() |
[50] | Ansari FA, Ahmad I (2018) Plant growth promoting attributes and alleviation of salinity stress to wheat by biofilm forming Brevibacterium sp. FAB3 isolated from rhizospheric soil. Saudi J Biol Sci . |
[51] |
Gange AC, Gadhave KR (2018) Plant growth-promoting rhizobacteria promote plant size inequality. Sci Rep 8: 13828. doi: 10.1038/s41598-018-32111-z
![]() |
[52] |
Glick BR, Penrose DM, Li J (1998) A model for the lowering of plant ethylene concentrations by plant growth-promoting bacteria. J Theor Biol 190: 63-68. doi: 10.1006/jtbi.1997.0532
![]() |
[53] |
Saravanakumar D, Samiyappan R (2007) ACC deaminase from Pseudomonas fluorescens mediated saline resistance in groundnut (Arachis hypogea) plants. J Appl Microbiol 102: 1283-1292. doi: 10.1111/j.1365-2672.2006.03179.x
![]() |
[54] |
Etesami H, Beattie GA (2018) Mining halophytes for plant growth-promoting halotolerant bacteria to enhance the salinity tolerance of non-halophytic crops. Front Microbiol 9: 148. doi: 10.3389/fmicb.2018.00148
![]() |
[55] |
Pushpavalli R, Quealy J, Colmer TD, et al. (2016) Salt stress delayed flowering and reduced reproductive success of chickpea (Cicer arietinum L.), a response associated with Na+ accumulation in leaves. J Agron Crop Sci 202: 125-138. doi: 10.1111/jac.12128
![]() |
[56] |
Pirasteh-Anosheh H, Ranjbar G, Pakniyat H, et al. (2015) Physiological mechanisms of salt stress tolerance in plants. Plant-Environment Interaction John Wiley & Sons, Ltd, 141-160. doi: 10.1002/9781119081005.ch8
![]() |
[57] |
El-Hendawy SE, Hu Y, Schmidhalter U (2007) Assessing the suitability of various physiological traits to screen wheat genotypes for salt tolerance. J Integr Plant Biol 49: 1352-1360. doi: 10.1111/j.1744-7909.2007.00533.x
![]() |
[58] | KHAN MA (2009) Role of proline, K/Na ratio and chlorophyll content in salt tolerance of wheat (Triticum aestivum L.). Pak J Bot 41: 633-638. |
[59] |
Arkus KAJ, Cahoon EB, Jez JM (2005) Mechanistic analysis of wheat chlorophyllase. Arch Biochem Biophys 438: 146-155. doi: 10.1016/j.abb.2005.04.019
![]() |
[60] |
Mishra S, Tyagi A, Singh IV, et al. (2006) Changes in lipid profile during growth and senescence of Catharanthus roseus leaf. Braz J Plant Physiol 18: 447-454. doi: 10.1590/S1677-04202006000400002
![]() |
[61] |
Kaur P, Kaur J, Kaur S, et al. (2014) Salinity induced physiological and biochemical changes in chickpea (Cicer arietinum L.) genotypes. J Appl Nat Sci 6: 578-588. doi: 10.31018/jans.v6i2.500
![]() |
[62] |
Mahlooji M, Seyed Sharifi R, Razmjoo J, et al. (2018) Effect of salt stress on photosynthesis and physiological parameters of three contrasting barley genotypes. Photosynthetica 56: 549-556. doi: 10.1007/s11099-017-0699-y
![]() |
[63] |
Harinasut P, Poonsopa D, Roengmongkol K, et al. (2003) Salinity effects on antioxidant enzymes in mulberry cultivar. Sci Asia 29: 109-113. doi: 10.2306/scienceasia1513-1874.2003.29.109
![]() |
[64] |
Katsuhara M, Otsuka T, Ezaki B (2005) Salt stress-induced lipid peroxidation is reduced by glutathione S-transferase, but this reduction of lipid peroxides is not enough for a recovery of root growth in Arabidopsis. Plant Sci 2: 369-373. doi: 10.1016/j.plantsci.2005.03.030
![]() |
[65] | Aghaleh M, Niknam V (2009) Effect of salinity on some physiological and biochemical parameters in explants of two cultivars of soybean (Glyicine max L.). J Phytol 1: 86-94. |
[66] |
Sreenivasulu N, Ramanjulu S, Ramachandra-Kini K, et al. (1999) Total peroxidase activity and peroxidase isoforms as modified by salt stress in two cultivars of fox-tail millet with differential salt tolerance. Plant Sci 141: 1-9. doi: 10.1016/S0168-9452(98)00204-0
![]() |
[67] |
Nazar R, Iqbal N, Syeed S, et al. (2011) Salicylic acid alleviates decreases in photosynthesis under salt stress by enhancing nitrogen and sulfur assimilation and antioxidant metabolism differentially in two mungbean cultivars. J Plant Physiol 168: 807-815. doi: 10.1016/j.jplph.2010.11.001
![]() |
[68] | Weisany W, Sohrabi Y, Heidari G, et al. (2012) Changes in antioxidant enzymes activity and plant performance by salinity stress and zinc application in soybean (Glycine max L.). Plant Omics J 5: 60-67. |
[69] |
Zhang JL, Aziz M, Qiao Y, et al. (2014) Soil microbe Bacillus subtilis (GB03) induces biomass accumulation and salt tolerance with lower sodium accumulation in wheat. Crop Pasture Sci 65: 423-427. doi: 10.1071/CP13456
![]() |
[70] | Fellahi Z, Hannachi A, Guendouz A, et al. (2013) Genetic variability, heritability and association studies in bread wheat (Triticum aestivum L.) genotypes. Electron J Plant Breed 4: 1161-1166. |
1. | Zhihua Huang, Jianfu Yang, Weilin Yu, Boundary plasmas for a confined plasma problem in dimensional two, 2023, 62, 0944-2669, 10.1007/s00526-022-02421-2 |
Rank | Codon set, Cardinality | Codon | γ | θ | δ | Ω | α=γ.θ.δ+Ω | aa |
1 | GGG, 1 | GGG | 0.02 | 1.00 | 6 | 2 | 2.1200 | Gly |
2 | GxG, 3 | GUG | 0.02 | 0.33 | 1 | 2 | 2.0066 | Val |
GCG | 0.02 | 0.33 | 1 | 2 | 2.0066 | Ala | ||
GAG | 0.02 | 0.33 | 1 | 2 | 2.0066 | Glu | ||
3 | xGG, 3 | UGG | 0.02 | 0.33 | 2 | 1 | 1.0132 | Trp |
CGG | 0.02 | 0.33 | 2 | 1 | 1.0132 | Arg | ||
AGG | 0.02 | 0.33 | 2 | 1 | 1.0132 | Arg | ||
3 | GGx, 3 | GGU | 0.02 | 0.33 | 2 | 1 | 1.0132 | Gly |
GGC | 0.02 | 0.33 | 2 | 1 | 1.0132 | Gly | ||
GGA | 0.02 | 0.33 | 2 | 1 | 1.0132 | Gly | ||
4 | xxG, 9 | UUG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Leu |
UCG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Ser | ||
UAG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Ter | ||
CUG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Leu | ||
CCG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Pro | ||
CAG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Gln | ||
AUG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Met | ||
ACG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Thr | ||
AAG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Lys | ||
4 | Gxx, 9 | GUU | 0.02 | 0.11 | 1 | 1 | 1.0022 | Val |
GCU | 0.02 | 0.11 | 1 | 1 | 1.0022 | Ala | ||
GAU | 0.02 | 0.11 | 1 | 1 | 1.0022 | Asp | ||
GUC | 0.02 | 0.11 | 1 | 1 | 1.0022 | Val | ||
GCC | 0.02 | 0.11 | 1 | 1 | 1.0022 | Ala | ||
GAC | 0.02 | 0.11 | 1 | 1 | 1.0022 | Asp | ||
GUA | 0.02 | 0.11 | 1 | 1 | 1.0022 | Val | ||
GCA | 0.02 | 0.11 | 1 | 1 | 1.0022 | Ala | ||
GAA | 0.02 | 0.11 | 1 | 1 | 1.0022 | Glu | ||
5 | xGx, 9 | UGU | 0.02 | 0.11 | 0 | 0 | 0.0000 | Cys |
UGC | 0.02 | 0.11 | 0 | 0 | 0.0000 | Cys | ||
UGA | 0.02 | 0.11 | 0 | 0 | 0.0000 | Ter | ||
CGU | 0.02 | 0.11 | 0 | 0 | 0.0000 | Arg | ||
CGC | 0.02 | 0.11 | 0 | 0 | 0.0000 | Arg | ||
CGA | 0.02 | 0.11 | 0 | 0 | 0.0000 | Arg | ||
AGU | 0.02 | 0.11 | 0 | 0 | 0.0000 | Ser | ||
AGC | 0.02 | 0.11 | 0 | 0 | 0.0000 | Ser | ||
AGA | 0.02 | 0.11 | 0 | 0 | 0.0000 | Arg | ||
5 | xxx, 27 | UUU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Phe |
UCU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ser | ||
UAU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Tyr | ||
UUC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Phe | ||
UCC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ser | ||
UUA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Leu | ||
UCA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ser | ||
UAA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ter | ||
CUU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Leu | ||
CCU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Pro | ||
CAU | 0.02 | 0.04 | 0 | 0 | 0.0000 | His | ||
CUC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Leu | ||
CCC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Pro | ||
CAC | 0.02 | 0.04 | 0 | 0 | 0.0000 | His | ||
CUA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Leu | ||
CCA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Pro | ||
CAA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Gln | ||
AUU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ile | ||
ACU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Thr | ||
AAU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Asn | ||
AUC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ile | ||
ACC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Thr | ||
AAC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Asn | ||
AUA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ile | ||
ACA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Thr | ||
AAA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Lys |
aa | CODamino | gCOD+amino | β | |
Group 1 (n=7) | Ala | 4 | 4 | 1.00 |
Val | 4 | 4 | 1.00 | |
Asp | 2 | 2 | 1.00 | |
Glu | 2 | 2 | 1.00 | |
Trp | 1 | 1 | 1.00 | |
Met | 1 | 1 | 1.00 | |
Gly | 4 | 4 | 1.00 | |
Group 2 (n=7) | Leu | 6 | 2 | 0.3333 |
Gln | 2 | 1 | 0.5 | |
Arg | 6 | 2 | 0.3333 | |
Lys | 2 | 1 | 0.5 | |
Ser | 6 | 1 | 0.1667 | |
Thr | 4 | 1 | 0.25 | |
Pro | 4 | 1 | 0.25 | |
Group 3 (n=6) | Cys | 2 | 0 | 0.00 |
Asn | 2 | 0 | 0.00 | |
Ile | 3 | 0 | 0.00 | |
His | 2 | 0 | 0.00 | |
Phe | 2 | 0 | 0.00 | |
Tyr | 2 | 0 | 0.00 |
GENE | NAME | G4 (nt) | Ex | STOP(n=11) | VALID(n=59) | |PTG4| |
BACE1 | Beta-secretase 1 | 33 | 3 | n=0 | n=6 | n=2 |
BCL2 | B-cell lymphoma 2 | 33 | 2 | n=1 | n=6 | n=1 |
23 | n=0 | n=6 | ||||
28 | n=0 | n=6 | ||||
29 | n=1 | n=5 | ||||
34 | n=1 | n=5 | ||||
33 | 3 | n=2 | n=5 | |||
ESR1 | Estrogen receptor alpha (ERα) | 36 | 4 | n=1 | n=5 | n=2 |
KCNH2 | Potassium Voltage-Gated Channel sub family H | 18 | 12 | n=0 | n=0 | NA |
ZNF669 | Member 2 Zinc Finger Protein 669 |
1 | ||||
PRNP | Prion protein | 14 | 2 | n=0 | n=0 | n=1 |
15 | n=0 | n=0 | ||||
20 | n=0 | n=0 | ||||
24 | n=0 | n=6 | ||||
85 | n=6 | n=3 | ||||
TERF2 | Telomeric repeat-binding factor 2 | 55 | 1 | n=0 | n=6 | n=1 |
Disordered regions (IDRs; n=1445;0.00≤p-value < 0.05) | |||||||||||||||||
SL-PT- | SL-PT+ | SL+PT- | SL+PT+ | R1T | R2T | C1T | C2T | A (%) | P (%) | R (%) | |||||||
SLiMS1 | 1078 | 64 | 58 | 9 | 1142 | 67 | 1136 | 73 | 89.90 | 12.32 | 13.43 | ||||||
SLiMS2 | 749 | 18 | 121 | 29 | 767 | 150 | 870 | 47 | 84.84 | 61.70 | 19.33 | ||||||
SLiMS3 | 1212 | 108 | 34 | 9 | 1320 | 43 | 1246 | 117 | 89.58 | 7.69 | 20.93 | ||||||
Proteins with disordered segments (IDPs; n=800;0.00≤p-value < 0.05) | |||||||||||||||||
SL-PT- | SL-PT+ | SL+PT- | SL+PT+ | R1T | R2T | C1T | C2T | A (%) | P (%) | R (%) | |||||||
SLiMS1 | 86 | 12 | 28 | 18 | 98 | 46 | 114 | 30 | 72.22 | 60.00 | 39.10 | ||||||
SLiMS2 | 1 | 1 | 96 | 66 | 2 | 162 | 97 | 67 | 40.85 | 98.50 | 40.74 | ||||||
SLiMS3 | 250 | 57 | 26 | 25 | 307 | 51 | 276 | 82 | 76.81 | 30.48 | 49.01 |
SLiMS | Sample | (zn)n≥2∈pTG4ij∩SLiMSw (p-value) | |
SLiMS1=[ST]PzR | Pz | PG (n=1) | PG (Degenerate) |
SLiMS2=[ED]zzD[AGS] | z[DE] | G[DE] (n=2) | [WGRVAELMKQSTP][AG][DE]z2EG[VADE](p-value=0.00069) |
[DE]z | [DE]G (n=1) | ||
zz[DE] | [LMKQSTP]G[DE] (n=14) [VAE]G[DE] (n=6) [WGR][AG][DE] (n=6) |
||
[DE]zz[DE] | [VAE]G[DE]zzEG[VADE] (n=28) [WGR][AG][DE]zzEG[VADE] (n=24) [WGR][AG][DE]zzEG (n=6) GEzzEG[VADE] (n=4) GEzzEG (n=1) |
||
SLiMS3=[KR]zPzzP | Pzz | PG[VADE] (n=4) | PGV (Degenerate) PGA (Degenerate) PGD (Degenerate) PGE (Degenerate) |
Cellular function | Disordered regions of proteins | |
1. | Signal transduction | DP00274, DP00224, DP00141, DP00332, DP01063, DP00506, DP00418, DP00341, DP00435, DP00613, DP00463, DP00954, DP00959, DP01104, DP00611, DP00519, DP00086, DP00707, DP00712 |
2. | Endocytosis | DP01073, DP01065, DP01066, DP00225 |
3. | Calcium-calmodulin | DP00092, DP00132, DP00561, DP00118, DP00253 |
4. | Myofibril assembly | DP01090 |
5. | Cytoskeleton | DP01056 DP00240, DP01022, DP00169, DP00716, DP00717, DP01100, DP00122 |
6. | Nuclear pore | DP01075, DP01077, DP01079 |
7. | Phototransduction | DP00768, DP00347 |
8. | Targeting | DP00893, DP00609, DP00610, DP01058 |
9. | Transcription | DP00062, DP00177, DP00633, DP00348, DP00786, DP00049, DP00231, DP00873, DP00720, DP00217, DP00081 |
10. | Translation | DP00082, DP00164, DP00229 DP00949, DP00134 |
11. | Synaptic transmission | DP00943 |
12. | Supercoiling | DP00076 |
13. | Binding | DP00539, DP00854, DP01052, DP00659, DP00656 |
14. | Peptide bond formation | DP00944 |
15. | Enzymes | DP00557, DP00032, DP00095, DP00337, DP00379, DP00787, DP00427, DP00429 |
16. | Bacterial/parasitic virulence | |
Secreted toxins | DP00345, DP00591 | |
Cytoadherence | DP00025, DP00065, DP01096 | |
17. | Viral infectivity | |
Cyclophilin interaction | DP00615, DP01031 | |
Chaperones | DP00699, DP00700, DP00674 | |
Capsid assembly | DP00133, DP00876 | |
Membrane fusion | DP01043 | |
Latency | DP01060 | |
18. | Unknown | DP00119 |
Rank | Codon set, Cardinality | Codon | γ | θ | δ | Ω | α=γ.θ.δ+Ω | aa |
1 | GGG, 1 | GGG | 0.02 | 1.00 | 6 | 2 | 2.1200 | Gly |
2 | GxG, 3 | GUG | 0.02 | 0.33 | 1 | 2 | 2.0066 | Val |
GCG | 0.02 | 0.33 | 1 | 2 | 2.0066 | Ala | ||
GAG | 0.02 | 0.33 | 1 | 2 | 2.0066 | Glu | ||
3 | xGG, 3 | UGG | 0.02 | 0.33 | 2 | 1 | 1.0132 | Trp |
CGG | 0.02 | 0.33 | 2 | 1 | 1.0132 | Arg | ||
AGG | 0.02 | 0.33 | 2 | 1 | 1.0132 | Arg | ||
3 | GGx, 3 | GGU | 0.02 | 0.33 | 2 | 1 | 1.0132 | Gly |
GGC | 0.02 | 0.33 | 2 | 1 | 1.0132 | Gly | ||
GGA | 0.02 | 0.33 | 2 | 1 | 1.0132 | Gly | ||
4 | xxG, 9 | UUG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Leu |
UCG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Ser | ||
UAG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Ter | ||
CUG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Leu | ||
CCG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Pro | ||
CAG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Gln | ||
AUG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Met | ||
ACG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Thr | ||
AAG | 0.02 | 0.11 | 1 | 1 | 1.0022 | Lys | ||
4 | Gxx, 9 | GUU | 0.02 | 0.11 | 1 | 1 | 1.0022 | Val |
GCU | 0.02 | 0.11 | 1 | 1 | 1.0022 | Ala | ||
GAU | 0.02 | 0.11 | 1 | 1 | 1.0022 | Asp | ||
GUC | 0.02 | 0.11 | 1 | 1 | 1.0022 | Val | ||
GCC | 0.02 | 0.11 | 1 | 1 | 1.0022 | Ala | ||
GAC | 0.02 | 0.11 | 1 | 1 | 1.0022 | Asp | ||
GUA | 0.02 | 0.11 | 1 | 1 | 1.0022 | Val | ||
GCA | 0.02 | 0.11 | 1 | 1 | 1.0022 | Ala | ||
GAA | 0.02 | 0.11 | 1 | 1 | 1.0022 | Glu | ||
5 | xGx, 9 | UGU | 0.02 | 0.11 | 0 | 0 | 0.0000 | Cys |
UGC | 0.02 | 0.11 | 0 | 0 | 0.0000 | Cys | ||
UGA | 0.02 | 0.11 | 0 | 0 | 0.0000 | Ter | ||
CGU | 0.02 | 0.11 | 0 | 0 | 0.0000 | Arg | ||
CGC | 0.02 | 0.11 | 0 | 0 | 0.0000 | Arg | ||
CGA | 0.02 | 0.11 | 0 | 0 | 0.0000 | Arg | ||
AGU | 0.02 | 0.11 | 0 | 0 | 0.0000 | Ser | ||
AGC | 0.02 | 0.11 | 0 | 0 | 0.0000 | Ser | ||
AGA | 0.02 | 0.11 | 0 | 0 | 0.0000 | Arg | ||
5 | xxx, 27 | UUU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Phe |
UCU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ser | ||
UAU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Tyr | ||
UUC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Phe | ||
UCC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ser | ||
UUA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Leu | ||
UCA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ser | ||
UAA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ter | ||
CUU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Leu | ||
CCU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Pro | ||
CAU | 0.02 | 0.04 | 0 | 0 | 0.0000 | His | ||
CUC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Leu | ||
CCC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Pro | ||
CAC | 0.02 | 0.04 | 0 | 0 | 0.0000 | His | ||
CUA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Leu | ||
CCA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Pro | ||
CAA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Gln | ||
AUU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ile | ||
ACU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Thr | ||
AAU | 0.02 | 0.04 | 0 | 0 | 0.0000 | Asn | ||
AUC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ile | ||
ACC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Thr | ||
AAC | 0.02 | 0.04 | 0 | 0 | 0.0000 | Asn | ||
AUA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Ile | ||
ACA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Thr | ||
AAA | 0.02 | 0.04 | 0 | 0 | 0.0000 | Lys |
aa | CODamino | gCOD+amino | β | |
Group 1 (n=7) | Ala | 4 | 4 | 1.00 |
Val | 4 | 4 | 1.00 | |
Asp | 2 | 2 | 1.00 | |
Glu | 2 | 2 | 1.00 | |
Trp | 1 | 1 | 1.00 | |
Met | 1 | 1 | 1.00 | |
Gly | 4 | 4 | 1.00 | |
Group 2 (n=7) | Leu | 6 | 2 | 0.3333 |
Gln | 2 | 1 | 0.5 | |
Arg | 6 | 2 | 0.3333 | |
Lys | 2 | 1 | 0.5 | |
Ser | 6 | 1 | 0.1667 | |
Thr | 4 | 1 | 0.25 | |
Pro | 4 | 1 | 0.25 | |
Group 3 (n=6) | Cys | 2 | 0 | 0.00 |
Asn | 2 | 0 | 0.00 | |
Ile | 3 | 0 | 0.00 | |
His | 2 | 0 | 0.00 | |
Phe | 2 | 0 | 0.00 | |
Tyr | 2 | 0 | 0.00 |
GENE | NAME | G4 (nt) | Ex | STOP(n=11) | VALID(n=59) | |PTG4| |
BACE1 | Beta-secretase 1 | 33 | 3 | n=0 | n=6 | n=2 |
BCL2 | B-cell lymphoma 2 | 33 | 2 | n=1 | n=6 | n=1 |
23 | n=0 | n=6 | ||||
28 | n=0 | n=6 | ||||
29 | n=1 | n=5 | ||||
34 | n=1 | n=5 | ||||
33 | 3 | n=2 | n=5 | |||
ESR1 | Estrogen receptor alpha (ERα) | 36 | 4 | n=1 | n=5 | n=2 |
KCNH2 | Potassium Voltage-Gated Channel sub family H | 18 | 12 | n=0 | n=0 | NA |
ZNF669 | Member 2 Zinc Finger Protein 669 |
1 | ||||
PRNP | Prion protein | 14 | 2 | n=0 | n=0 | n=1 |
15 | n=0 | n=0 | ||||
20 | n=0 | n=0 | ||||
24 | n=0 | n=6 | ||||
85 | n=6 | n=3 | ||||
TERF2 | Telomeric repeat-binding factor 2 | 55 | 1 | n=0 | n=6 | n=1 |
Disordered regions (IDRs; n=1445;0.00≤p-value < 0.05) | |||||||||||||||||
SL-PT- | SL-PT+ | SL+PT- | SL+PT+ | R1T | R2T | C1T | C2T | A (%) | P (%) | R (%) | |||||||
SLiMS1 | 1078 | 64 | 58 | 9 | 1142 | 67 | 1136 | 73 | 89.90 | 12.32 | 13.43 | ||||||
SLiMS2 | 749 | 18 | 121 | 29 | 767 | 150 | 870 | 47 | 84.84 | 61.70 | 19.33 | ||||||
SLiMS3 | 1212 | 108 | 34 | 9 | 1320 | 43 | 1246 | 117 | 89.58 | 7.69 | 20.93 | ||||||
Proteins with disordered segments (IDPs; n=800;0.00≤p-value < 0.05) | |||||||||||||||||
SL-PT- | SL-PT+ | SL+PT- | SL+PT+ | R1T | R2T | C1T | C2T | A (%) | P (%) | R (%) | |||||||
SLiMS1 | 86 | 12 | 28 | 18 | 98 | 46 | 114 | 30 | 72.22 | 60.00 | 39.10 | ||||||
SLiMS2 | 1 | 1 | 96 | 66 | 2 | 162 | 97 | 67 | 40.85 | 98.50 | 40.74 | ||||||
SLiMS3 | 250 | 57 | 26 | 25 | 307 | 51 | 276 | 82 | 76.81 | 30.48 | 49.01 |
SLiMS | Sample | (zn)n≥2∈pTG4ij∩SLiMSw (p-value) | |
SLiMS1=[ST]PzR | Pz | PG (n=1) | PG (Degenerate) |
SLiMS2=[ED]zzD[AGS] | z[DE] | G[DE] (n=2) | [WGRVAELMKQSTP][AG][DE]z2EG[VADE](p-value=0.00069) |
[DE]z | [DE]G (n=1) | ||
zz[DE] | [LMKQSTP]G[DE] (n=14) [VAE]G[DE] (n=6) [WGR][AG][DE] (n=6) |
||
[DE]zz[DE] | [VAE]G[DE]zzEG[VADE] (n=28) [WGR][AG][DE]zzEG[VADE] (n=24) [WGR][AG][DE]zzEG (n=6) GEzzEG[VADE] (n=4) GEzzEG (n=1) |
||
SLiMS3=[KR]zPzzP | Pzz | PG[VADE] (n=4) | PGV (Degenerate) PGA (Degenerate) PGD (Degenerate) PGE (Degenerate) |
Cellular function | Disordered regions of proteins | |
1. | Signal transduction | DP00274, DP00224, DP00141, DP00332, DP01063, DP00506, DP00418, DP00341, DP00435, DP00613, DP00463, DP00954, DP00959, DP01104, DP00611, DP00519, DP00086, DP00707, DP00712 |
2. | Endocytosis | DP01073, DP01065, DP01066, DP00225 |
3. | Calcium-calmodulin | DP00092, DP00132, DP00561, DP00118, DP00253 |
4. | Myofibril assembly | DP01090 |
5. | Cytoskeleton | DP01056 DP00240, DP01022, DP00169, DP00716, DP00717, DP01100, DP00122 |
6. | Nuclear pore | DP01075, DP01077, DP01079 |
7. | Phototransduction | DP00768, DP00347 |
8. | Targeting | DP00893, DP00609, DP00610, DP01058 |
9. | Transcription | DP00062, DP00177, DP00633, DP00348, DP00786, DP00049, DP00231, DP00873, DP00720, DP00217, DP00081 |
10. | Translation | DP00082, DP00164, DP00229 DP00949, DP00134 |
11. | Synaptic transmission | DP00943 |
12. | Supercoiling | DP00076 |
13. | Binding | DP00539, DP00854, DP01052, DP00659, DP00656 |
14. | Peptide bond formation | DP00944 |
15. | Enzymes | DP00557, DP00032, DP00095, DP00337, DP00379, DP00787, DP00427, DP00429 |
16. | Bacterial/parasitic virulence | |
Secreted toxins | DP00345, DP00591 | |
Cytoadherence | DP00025, DP00065, DP01096 | |
17. | Viral infectivity | |
Cyclophilin interaction | DP00615, DP01031 | |
Chaperones | DP00699, DP00700, DP00674 | |
Capsid assembly | DP00133, DP00876 | |
Membrane fusion | DP01043 | |
Latency | DP01060 | |
18. | Unknown | DP00119 |