1.
Introduction
Escherichia coli is an abundant member of the gastrointestinal microbiota of mammals. Most E. coli strains are harmless organisms that efficiently colonize the mucous layer of the mammalian colon [1]. However, some E. coli clones have acquired the ability to cause diseases and adapt to new niches. The genetic information providing metabolic and pathogenic properties for adaptation to particular environmental conditions is frequently encoded on mobile genetic elements, such as bacteriophages, plasmids and transposons [1],[2].
According to the virulence factors acquired, pathogenic E. coli strains involved in diarrheal diseases have been classified into several pathotypes. The Shiga toxin-producing E. coli (STEC) pathotype represents the group of E. coli strains whose virulence hallmark is the production of Shiga toxins (Stx). These highly pathogenic strains can cause severe human diseases, such as bloody diarrhea (BD) and hemolytic-uremic syndrome (HUS) [3]. Stx toxins are among the most potent bacterial toxins known, with rRNA-N-glycosidase activity that inactivates 60S ribosomal subunits and disrupts protein synthesis in eukaryotic cells [4],[5]. Shiga toxins are classified into two major types: Stx1 and Stx2, with several subtypes. STEC strains harboring the stx2a subtype have shown the highest rates of HUS, hospitalization and BD [3].
The genes encoding the Stx toxins are carried by bacteriophages (Stx phages) integrated into the bacterial chromosome. These phages are usually called lambda-like because the analyses of the first genomes sequenced showed a similar genetic organization (comprising recombination, early regulation, replication, late regulation, lysis and structural gene regions) and many homologous genes to those of the bacteriophage lambda [6]–[8]. However, the Stx phage family is a diverse group of phages, variable in the stx subtype they carry and in genome size, genetic composition and virion morphology [7],[9],[10]. Furthermore, Stx phages carry many genes whose function is not yet well-known [8],[11]–[13].
The stx genes are located in the late transcribed region, downstream of the Q protein-encoding gene and upstream of the lysis cassette. Thus, stx expression is controlled by the anti-terminator Q protein, which allows transcription to continue into late genes [10],[14]. Under inducing conditions, transcription of stx and lysis genes is Q-activated, resulting in host cell lysis and toxin release [15].
Stx2a-encoding phages (Stx2a phages) are the most studied prophages of STEC, as they are located in the genomes of the strains most frequently associated with severe clinical outcomes. Studies performed on groups of Stx2a phages show a high level of genomic diversity also within the Stx2a phage family. For example, differences among Stx2a phages from O157:H7 STEC strains were observed in genes in the replication and early regulatory regions [9],[16], and researchers have identified distinct types among this group of phages [16]–[18]. Noteworthy, Yin et al. [16] and Krüger et al.[12] found that PST2, one of these phage types detected within O157:H7 strains associated with a high incidence of HUS, was closely related to Stx2a phages identified in some non-O157 STEC strains corresponding to serotypes like O103:H2, O104:H4 and O145:H-.
Phages, including Stx phages, have played a significant role in the evolution and diversity of STEC strains [17]–[19]. However, little is known about the impact of these phages on the fitness and virulence of the lysogens. As mentioned above, Stx phages are essential in regulating Shiga toxin production. In addition, various studies have demonstrated that lysogenic infection by specific Stx phages produced a significant impact on host gene expression. It was found that carriage of phage ΦMin27 (Δstx::cat) in E. coli K-12 MG1655 had a direct effect on the global expression of bacterial genes, and an increase of acid tolerance [20], integration of φ24B (Δstx::kan) into E. coli K-12 MC1061 increased the rates of respiration, cell proliferation and acid resistance [21],[22]. In addition, lysogenic infection by Stx2a phages φO104 and φPA8 introduced dramatic changes in carbon source utilization of E. coli MG1655 [23], and lysogenization of E. coli K-12 by Sp5 decreased cell motility under anaerobic conditions [24]. Moreover, it has been shown that NanS-p esterases encoded by Stx phages, but also by phages not containing stx genes, conferred a growth advantage to O157 EDL933 and O104 LB226692 on 5-N-acetyl-9-O-acetyl neuraminic acid as a carbon source [25],[26].
Considering that the Stx2a phage family is a group of diverse members, this study aimed to perform a comprehensive analysis of the stx flanking region of Stx2a phages belonging to several serotypes and origins, giving particular emphasis on the characterization of the predicted NanS-p esterase sequences.
2.
Material and methods
2.1. Analysis of stx flanking region of Stx2a phage genomes
DNA sequences of Stx2a phages and prophages were retrieved from the GenBank database using the BLASTN program (https://www.ncbi.nlm.nih.gov/blast/) [27] with the stx2a sequence from the phage 933W (X07865) as a query sequence. Sequences were downloaded either directly as complete phage genomes or retrieved from bacterial host genomes by analysis with PHAge Search Tool Enhanced Release (PHASTER) web server (http://phaster.ca) [28] and subsequently analyzed with the VirulenceFinder web server (https://cge.food.dtu.dk/services/VirulenceFinder/) [29]. This phage database of complete Stx2a phage genomes was created in August 2019.
The nucleotide sequences from anti-terminator Q to holin S encoding genes (named Q-S region) were extracted and aligned with the BLASTN program using the corresponding sequence of the phage 933W (AF125520: 20205-26045) as the query sequence. Insertion sequence elements were identified by comparing the sequences against the ISfinder database (https://isfinder.biotoul.fr/blast.php) [30]. The ORFs were determined by examination of annotations at GenBank and using ORFinder (https://www.ncbi.nlm.nih.gov/orffinder/). Protein domains were annotated using the Conserved Domain Search Service (CD-Search) (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) [31] and InterProScan (https://www.ebi.ac.uk/interpro/search/sequence/) [32]. A search for close homologs of the holin sequences was performed with TC-BLAST at the Transporter Classification Database (TCD) (https://www.tcdb.org/progs/blast.php) [33]. Distance trees were produced using BLAST pairwise alignment (https://www.ncbi.nlm.nih.gov/blast/) [27]. Alignments for Q-S region were generated with progressive MAUVE [34] using default settings. Whole phages were aligned with the exception of the integrated Stx2a phages where Q-S region was selected.
Sequence alignments of NanS-p and corresponding domains were performed using Muscle implemented in MEGA [35] . Sequences sharing 100% identity were represented by only one of them to simplify comparisons. C-terminal variants were classified based on an identity threshold of 80% after alignment with Clustal Omega using default parameters [36].
2.2. Relative quantification of nanS-p expression in stx2a-positive STEC strains
2.2.1. Sample collection, RNA extraction and cDNA synthesis
A group of 13 stx2a-positive STEC strains, whose genomes have been sequenced (Illumina MiSeq sequencing platform as described by [12]), were selected from our collection (Supplementary Table 1). The reference strain E. coli O157:H7 EDL933 was also used. Expression of nanS-p was quantified in mitomycin-C-induced STEC cultures relative to uninduced cultures using real-time qPCR. Bacterial growth conditions, total RNA extraction and cDNA synthesis were previously described for stx2a-expression of the same strains [37].
2.2.2. Primers and qPCR conditions
Primers DS-forward (5′-CCTTATGGTAGTGCGCTGATT-3′) and DS-reverse (5′-AGTCCCTCACCGTATGACA-3′) were designed to allow the amplification of the DUF-SASA region of nanS-p subtypes present mainly in Stx prophages but not in bacterial nanS. In order to check in silico if the region amplified by these primers could be present in other genomic regions beyond those present in Stx2a prophages, a search was done with the BLASTN program using the sequence of the PCR product for the DUF-SASA region of phage 933W against the genome sequences of the 13 STEC strains listed in Supplementary Table 1.
Each reaction contained 4 µL of 1/10 diluted cDNA sample, 10 µL of 2X SYBR Green Master Mix (FastStart Universal SYBR Green Master, Roche) and either the pair of primers DS-forward/DS-reverse (400 nM each) to amplify nanS-p, or primers TufAqR/TufAqF (300 nM each) to amplify the housekeeping gene tufA [38]. The amplification and detection of the specific products were carried out in an OneStep Plus Real-Time PCR System cycler, with the following amplification conditions: 2 min at 50 °C, 10 min at 95 °C and 40 cycles of 20 s at 95 °C and 60 s at 60 °C. Standard curves were made with serial dilutions of a pool of cDNA samples. The real-time RT-PCR efficiency for each gene was determined by a linear regression model according to the equation: E = 10[−1/slope]. The nanS-p expression levels for mitomycin C-induced cultures were calculated relative to non-induced ones by the ΔΔCT method [39] using the efficiency corresponding to each gene.
2.2.3. Characterization of NanS-p encoded by other Stx phages
In order to investigate if NanS-p is encoded by phages carrying stx subtypes other than stx2a, stx-related sequences were identified in the GenBank DNA database using the BLASTN program with reference sequences for stx1a, stx1c, stx2b, stx2c, stx2d, stx2e, stx2f and stx2g [40] as query sequences. Similar analyses to those conducted for Stx2a phages were performed to construct a database of phages encoding these subtypes and evaluate the characteristics of nanS-p.
3.
Results
3.1. Analysis of stx flanking region in Stx2a phage genomes
Fifty-nine DNA sequences of Stx2a (pro)phages were obtained for the study (Table 1). The available information on bacterial hosts indicates that they belonged to ten serogroups and were generally isolated from human samples in several countries.
The analysis of stx flanking regions confirmed that all Stx2a (pro)phages shared similar gene organization: Q, stx, nanS-p and S genes and showed 94 to 100% sequence identity with the corresponding region of 933W. Main differences were observed upstream of the operon stx and in the region between stx to nanS-p (including the last gene in some cases) (Figure 1). Presence of insertion sequence (IS) elements contributed to the differences in some sequences. The IS elements were located within the nanS-p gene or between stx and nanS-p genes (Figure 1). They corresponded to the IS3 family (significant alignments with IS1203 and IS629) except the IS in phage F451, which best matched with ISEc8 of the IS66 family.
The predicted sequences of Q proteins mostly clustered into two major clades (Figure 2). One group corresponded to 126-aa Q proteins and included the Q protein of the Stx2a phage of STEC strain 11128, previously described as QO111 and those from the prophages present in strains RM13516 and 155. The most extensive group comprised the Q protein of the phage 933W, named Q933, and other 144-aa Q proteins (some registered as 150-aa or 157-aa due to a larger ORF considered in the annotation). On the other hand, the predicted Q protein (273 aa) encoded by the Stx2 prophage present in STEC strain 00-3076 showed no significant similarity to the others.
Downstream Q, regions encoding for putative DNA methylases have been registered in some GenBank records. We identified similar sequences in all the Stx2a phages by tblastn analyses using annotated proteins against the genomic sequences of the phages. All sequences were clustered into three main groups. Two clusters corresponded to putative 50 aa-proteins and the third cluster grouped putative 352-aa proteins encoded by prophages of the strains 11128, RM13516, 155 and 00-3076.
The stx2a sequences shared high identity (99.3 to 100%) with that present in 933W. VirulenceFinder results showed that most stx2a sequences best matched with nucleotide sequences corresponding to the Stx2a-O157-EDL933 variant, except three that corresponded to Stx2a-O157-SF-258-98 and one to Stx2a-O113-TS17-08 (Table 1).
Comparative analyses of NanS-p identified 20 different amino acid sequences among Stx2a phages (Table 2). Sequences 1 and 16 (which only differ by the presence of an IS element in sequence 16) were the most frequent, and they were only identified in Stx2a phages carried by O157 strains. On the other hand, sequence 12 was detected in Stx2a phages present in O104 and O2 strains.
For a detailed analysis of the different NanS-p proteins, domains were identified in 933W NanS-p and used for subsequent sequence comparisons. Three sequences corresponding to the N-terminal domain (DUF1737, IPR013619/PF08410, amino acids 3-53 in 933W NanS-p) were recognized and named D1 to D3 (Table 2). D1 and D2 only differed in one amino acid and were detected in phages encoding the Stx2a-O157-EDL933 variant, while D3 was distinct and only present in the phages carried by the strains 11128, 155, RM13516 00-3076. Eleven sequences matching the catalytic esterase domain (SASA, IPR005181/PF03629, amino acids 83-276 in 933W NanS-p) were detected and named SASA1 to SASA11, this domain was the only to show homology in its whole extension across all the sequence analyzed indicating a common ancestor (Figure 3). The differences were concentrated in nine amino acid positions (Table 2). Moreover, sequence 20 (corresponding to the Stx2a prophage 86, accession NC_008464) showed lower sequence identity. The comparison of the C-terminal domain evidenced four groups of sequences (CTR1 to CTR4). Differences between CTR1 and CTR3 arose from an IS element in CTR3 that modifies the C-terminal end. All CTR1 sequences, not including CTR1b, had high similarity with the C-terminal domain of vb_24B_21 NanSp from phage Φ24B (crystallized portion, position 423-645 aa). Notably, variations were mostly concentrated in two amino acid positions (corresponding to 624 and 640 in the NanS-p sequence of vb_24B). On the other hand, the four aromatic amino acids identified in the proposed carbohydrate binding site of the C-terminal domain of the protein vb_24B_21 (Y510, Y515, Y540, F611) were highly conserved among CTR1 sequences.
NanS-p sequences shared with the bacterial NanS, from Escherichia coli str K-12 substr. MG1655 (NP_418729.1), the same variations in the motifs of the four blocks common to the SGNH superfamily of hydrolases (Figure 4). Only the NanS-p sequence encoded by the Stx2a prophage 86 (NanS-p sequence 20) showed two aa different in the block IV (RPTH instead of RSSH or RASH).
Between nanS-p and S genes, some annotated genomes showed other coding regions, like some encoding proteins of unknown function containing domains of the DUF826 or DUF1378 superfamilies.
Regarding predicted holins encoded by the S genes, most sequences obtained from the Stx2a phages were identical to that encoded by phage 933W or differed in one amino acid. Only one holin sequence, present in only one phage, differed 4 aa from that in phage 933W. According to the search we performed on the Transporter Classification Database (TCDB), the predicted holins for the Stx2a phages of the present study are pinholins. They showed a high identity (92.6 to 95.6%) with the bacteriophage H-19B pinholin, which belongs to the P21 holin S family (holin II superfamily).
3.2. Expression of nanS-p in strains belonging to different serotypes
To perform a more detailed study of NanS-p, we evaluated the expression of nanS-p in 14 stx2a-positive STEC strains belonging to O26:H11, O91:H21, O145:H- and O157:H7 serotypes. In silico analysis was used to assess the PCR sequence specificity, which showed that the amplicon sequence matched only once with each of the genomes of most strains in the vicinity of the stx2a gene. For three strains, the stx2a gene was located on short contigs that did not include the downstream flanking sequence. The qPCR results showed that all but one strain expressed nanS-p, and the expression increased by 1 to 2.4 logs under mitomycin C-treatment (Figure 5).
3.3. NanS-p sequences in phages encoding Stx subtypes other than Stx2a
Additionally, we carried out a genome analysis of twenty-four (pro)phages harboring stx subtypes different from stx2a. Nineteen phages encoding Stx1a, Stx1c, Stx2b, Stx2c or Stx2d carried the nanS-p gene downstream the stx operon. Interestingly, nanS-p could not be identified in Stx2e, Stx2f and Stx2g phages. The NanS-p sequences showed size variation (636 to 657 aa) (Supplementary Table 2). Although the three domains were detected, sequence variability was observed: seven different N-terminal sequences, twelve SASA-domain sequences and three C-terminal sequence groups. Remarkably, the Stx1a, Stx1c and Stx2c phages had higher identity in N-terminal sequences with the Stx2a phages than the Stx2b and Stx2d phages. In addition, differences in SASA sequences were observed in similar aa positions than those identified in Stx2a phages (Supplementary Table 2). All except one of these 19 NanS-p sequences had the four blocks of the SGNH family of hydrolases described for Stx2a phages.
4.
Discussion
Shiga toxins (Stx), the main virulence factor of STEC strains, are encoded in the genome of phages integrated into the bacterial chromosome (prophages). S.O.S induction of Stx prophages and consequent Stx expression have been demonstrated to be required for disease in animal models [41],[42]. Stx toxins and Stx phages comprise families of diverse members [6],[7]. Several studies indicate that the Shiga toxin subtype and the amount of toxin produced may be associated with severe illness, with Stx2a being a great concern for its association with severe diseases [5],[43],[44]. Conversely, less is known about other characteristics of Stx phages that could contribute to the pathogenicity of STEC strains.
Advances in sequencing technologies have provided an excellent opportunity to study Stx phages. However, the number of studies on Stx phage genomes is still limited [7]. Furthermore, despite the increased availability of genomic STEC sequences, Stx prophage sequences cannot always be fully resolved from bacterial genome sequencing if only short-read sequencing technologies have been used [45]–[47].
This study aimed to analyze the stx flanking region of Stx2a phages using publicly available complete sequences, including phages associated with STEC strains from different serotypes, sources and countries. As STEC strains belonging to serotype O157:H7 are most frequently isolated from humans, prophages corresponding to O157:H7 strains are found in a higher proportion in the databases. In addition, STEC genome sequences of some countries are overrepresented. Therefore, this study does not represent serotypes, sources and countries equally.
We retrieved 59 genomic sequences of Stx2a (pro)phages from databases and performed a comprehensive analysis of a region we named Q-S. All these genomes showed conservation of the gene order Q, stx2a, nanS-p and S. However, different gene alleles and allele combinations were detected. In addition, some genomes exhibited transposable insertion sequence elements (most of the IS3 family), contributing to the heterogeneity in the Q-S region.
The Q gene encodes an anti-terminator protein that participates in the mechanism that regulates the switch from the lysogenic to the lytic life cycle, influencing stx expression. Previous studies have identified Q gene variants, of which Q933, Q21 and QO111:H- are the best characterized [48]. However, associations between each Q allele and levels of Stx2 production are still not clear [37],[49]–[51]. The present study detected stx2a genes mainly linked to genes encoding proteins Q933 and QO111:H-, and none with Q21, similar to a previous study [50]. We observed specific associations between Q and Stx2a variants: Q933 with Stx2a_O157_EDL933, QO111 with Stx2aO157-SF-258-98 and an uncharacterized Q with Stx2a-O113-TS17-08. No strong relationships were observed between serogroups and Q variants in our study.
On the other hand, the sequences of the S gene, encoding putative holins, shared high identity. Holins are part of phageś lysis system and control endolysin-mediated degradation of the bacterial peptidoglycan layer. They can be canonical holins or pinholins [52], which differ in the size and number of lesions they cause on the bacterial membrane. In contrast to canonical holins, the pinholins form much smaller and more numerous holes [53]. They are associated with SAR endolysins, which acquire an enzymatically active conformation when they are released from the membrane [52]. The predicted holins for the Stx2a phages of the present study are all pinholins, in accordance with Pinto et al. (2021), who found that most Stx phages have a pinholin among their lysis genes. In addition, our results showed a high identity between the pinholins of the Stx2a phages we studied and the pinholin of Stx1 bacteriophage H-19B, and, consequently, they belong to the P21 holin S family.
All Stx2a phage genomes harbored the nanS-p gene downstream stx2a, as previously observed in the reference prophage 933W from Escherichia coli O157:H7 strain EDL933 (where it was initially identified as ORF L0105) and in other Stx prophages from some STEC strains [54],[55]. This gene has been recognized also in non-Stx prophage genomes integrated into some E. coli strains, including STEC, and received its name for the homology with the chromosomally encoded O-acetylesterase NanS. Characteristically, prophage-borne genes present N- and C-terminal domains flanking the catalytic SASA domain [56]. Among the sequences of the Stx2a phages from our study, the comparison analysis of NanS-p sequences revealed some diversity in the three domains (i.e., D, SASA and CTR domains). The combination of domains named D1, SASA1 and CTR1 or CTR3 (which differs from CTR1 by the presence of an IS) was the most frequently detected among Stx2a phages carried by O157 strains. On the other hand, Stx2a phages carried by O104:H4 strains harbored the domains D2, SASA3 and CTR1. The phages that encoded QO111:H- or an untypable Q also differed in their NanS-p sequences, being the only phages with the D3 domain.
The conserved genome localization of nanS-p suggests that its transcription is regulated by Q protein and prompted us to evaluate its expression under basal and mitomycin C-induced conditions in a set of stx2a-positive STEC strains of our collection. The assays showed that all cultures, except one, increased nanS-p expression under induced conditions similar to that obtained for stx2a in a previous study [37]. Furthermore, the exception was a strain that did not show nanS-p expression but had not demonstrated stx2a expression either. Our results agree with previous studies that observed an increment in nanS-p expression in other inducing conditions [57]–[59], and evidence coexpression of nanS-p and stx2a.
Our study also showed that nanS-p presence was not restricted to Stx phages that carry stx2a as it was also detected downstream of the operons encoding other Stx subtypes, supporting early findings [55]. However, nanS-p could not be identified in Stx2e, Stx2f and Stx2g phages. Coincidently, this gene was absent in the two Stx2f phages whose stx-flanking region was sequenced and analyzed by Unkmeir and Schmidt [55].
The sialate O-acetylesterase function has been demonstrated in Neu5,9Ac2 utilization for different NanS-p proteins encoded by prophages in the strains O157 EDL933 and O104 LB226692, suggesting that NanS-p could provide advantages for bacterial growth in the gut [25],[54],[60]. Analysis of NanS-p proteins in those two strains and two other O104 strains showed that NanS-p encoded by Stx1a or Stx2a phages were closely related [25],[26]. Our study provides further evidence for this, as it revealed that NanS-p sequences associated with the 59 Stx2a (pro)phages that were analyzed share close sequence similarity among them and also with those carried by phages encoding other Stx subtypes. Diversity in the catalytic domain was mainly restricted to a few amino acid positions, but higher variability was detected for the C-terminal domain. Remarkably, a study on the crystal structure of the C-terminal domain of vb_24B_21 NanS-p from phage Φ24B [61] revealed a lectin-like, jelly-roll β-sandwich fold that predictably acts as a non-catalytic carbohydrate-binding module that could assist esterase activity. Further studies are necessary to evaluate if the NanS-p subtypes differ in their activity or in binding specificity.
5.
Conclusions
The analysis of the Q-S region in Stx2a phage genomes indicated that all sequences included the Q, stx2a, nanS-p and S genes. However, different alleles of these genes were observed, as well as other sequence differences. Interestingly, we found that there were groups of Stx2a phages that shared specific combinations of variants of these genes.
We identified sequence differences in each of the three domains of NanS-p esterases encoded by Stx2a phages and other Stx phages. Therefore, future studies will be necessary to evaluate if NanS-p variants differ in their activity. Our findings also suggest that phages that encode some Stx subtypes do not carry the nanS-p gene, but as phage genomes associated with these subtypes are underrepresented in the databases, more sequences are required to support this conclusion.