Loading [MathJax]/jax/output/SVG/jax.js
Research article

Sequence data analysis and preprocessing for oligo probe design in microbial genomes

  • Received: 12 December 2016 Accepted: 26 December 2016 Published: 25 January 2017
  • A good oligo probe design in DNA microarray experiments is crucial to obtain the better results of gene expression analysis. However, sequence data from a very large microbial genome or pan-genome will produce a reduced number of oligos and affect the design quality if processed by a probe designer. Gene redundancies and discrepancies across resources of the same species or strain and their sequence similarity and homology are responsible for the poor quantity of oligos designed. We addressed these issues and problems with sequences and introduced the concept of open reading frame (ORF) sequence segmentation from which quality oligos can be selected. Analysis and pre-processing of sequence data were performed using our Perl-based pipeline ORF-Purger 2.0. ORFs were purged of redundancy, discrepancy, invalidity, overlapping, similarity and, optionally, homology, such that the quantity and quality of oligos to be designed were drastically improved. Probe integrity was proposed as the first probe selection criterion since the fully physical availability of all possible probes corresponding to their targets in a nucleic acid sample is necessary for a best probe design.

    Citation: Ruming Li, Brian Fristensky, Guixue Wang. Sequence data analysis and preprocessing for oligo probe design in microbial genomes[J]. AIMS Bioengineering, 2017, 4(1): 28-45. doi: 10.3934/bioeng.2017.1.28

    Related Papers:

    [1] Haikun Liu, Yongqiang Fu . On the variable exponential fractional Sobolev space Ws(·),p(·). AIMS Mathematics, 2020, 5(6): 6261-6276. doi: 10.3934/math.2020403
    [2] Abraham Love Prins . Computing the conjugacy classes and character table of a non-split extension 26·(25:S6) from a split extension 26:(25:S6). AIMS Mathematics, 2020, 5(3): 2113-2125. doi: 10.3934/math.2020140
    [3] Choonkil Park, XiaoYing Wu . Homomorphism-derivation functional inequalities in C*-algebras. AIMS Mathematics, 2020, 5(5): 4482-4493. doi: 10.3934/math.2020288
    [4] Sudesh Kumari, Renu Chugh, Jinde Cao, Chuangxia Huang . On the construction, properties and Hausdorff dimension of random Cantor one pth set. AIMS Mathematics, 2020, 5(4): 3138-3155. doi: 10.3934/math.2020202
    [5] Zhen Pu, Kaimin Cheng . Consecutive integers in the form ax+yb. AIMS Mathematics, 2023, 8(8): 17620-17630. doi: 10.3934/math.2023899
    [6] Sara Pouyandeh, Amirhossein Morovati Moez, Ali Zeydi Abdian . The spectral determinations of connected multicone graphs KwmCP(n). AIMS Mathematics, 2019, 4(5): 1348-1356. doi: 10.3934/math.2019.5.1348
    [7] Motohiro Sobajima, Kentarou Yoshii . Lp-analysis of one-dimensional repulsive Hamiltonian with a class of perturbations. AIMS Mathematics, 2018, 3(1): 21-34. doi: 10.3934/Math.2018.1.21
    [8] Yahui Yu, Jiayuan Hu . On the generalized Ramanujan-Nagell equation x2+(2k1)y=kz with k3 (mod 4). AIMS Mathematics, 2021, 6(10): 10596-10601. doi: 10.3934/math.2021615
    [9] Bingzhou Chen, Jiagui Luo . On the Diophantine equations x2Dy2=1 and x2Dy2=4. AIMS Mathematics, 2019, 4(4): 1170-1180. doi: 10.3934/math.2019.4.1170
    [10] Aleksa Srdanov . Fractal form of the partition functions p (n). AIMS Mathematics, 2020, 5(3): 2539-2568. doi: 10.3934/math.2020167
  • A good oligo probe design in DNA microarray experiments is crucial to obtain the better results of gene expression analysis. However, sequence data from a very large microbial genome or pan-genome will produce a reduced number of oligos and affect the design quality if processed by a probe designer. Gene redundancies and discrepancies across resources of the same species or strain and their sequence similarity and homology are responsible for the poor quantity of oligos designed. We addressed these issues and problems with sequences and introduced the concept of open reading frame (ORF) sequence segmentation from which quality oligos can be selected. Analysis and pre-processing of sequence data were performed using our Perl-based pipeline ORF-Purger 2.0. ORFs were purged of redundancy, discrepancy, invalidity, overlapping, similarity and, optionally, homology, such that the quantity and quality of oligos to be designed were drastically improved. Probe integrity was proposed as the first probe selection criterion since the fully physical availability of all possible probes corresponding to their targets in a nucleic acid sample is necessary for a best probe design.


    Catalan's conjecture, one of the famous classical problems in number theory, states that the equation

    xpyq=1

    has no solutions in positive integers x and y, other than 3223=1, where p and q are prime numbers. In 2004, Mihuailesc [1] proved Catalan's conjecture by using the theory of cyclotomic field. On the other hand, some scholars (see [2], and [3,4,5]) studied the solutions to the general equation

    axby=c, (1.1)

    where a,b and c are fixed positive integers. In 1936, Pillai conjectured that the number of positive integer solutions (a,b,x,y), with x2,y2, to Eq (1.1) is finite. This conjecture which is still open for all c>1, amounts to saying that the distance between two consecutive terms in the sequence of all perfect powers tends to infinity. However, it is easy to see that Pillai's conjecture is closely related to the number of consecutive integers tuples of perfect powers. In fact, Catalan's conjecture is equivalent to the statement that no two consecutive integers are perfect powers, other than 23 and 32. We also easily know that there are no four consecutive integers with each of them being perfect powers, since any set of four consecutive integers must contain one integer of the form 4n+2 which can not be a perfect power. Are there three consecutive integers with each of them being perfect powers? In 1962, Chao Ko [6], by supplying a sufficient and necessary condition for the equation xpyq=1 to be solvable with positive integers x and y, showed that no three consecutive integers are powers of other positive integers,

    Let k be an integer with k2. In this paper, let us turn our attention to the number of k tuples of consecutive integers such that each of them is the sum of two perfect powers. For any positive integer n, we call the integer n a 22-STP (STP means Sum of Two Powers) number if it can be expressed in the form 2x+y2 with x and y being nonnegative integers. Furthermore, we call a k-tuple (a1,a2,,ak) nice if a1,a2,,ak are increasingly consecutive integers and each of them is a 22-STP number. Then an interesting question is raised naturally as follows.

    Question 1.1. For each integer k2, how many nice k-tuples are there?

    In this paper, we mainly study Question 1.1 by utilizing some elementary tools in number theory. In fact, we provide the following theorem, which gives the complete answer to Question 1.1.

    Theorem 1.2. Let k be an integer with k2. Each of the following is true.

    (a) If 2k4, then there exist infinitely many nice k-tuples.

    (b) If k=5, then there are only six nice 5-tuples. Moreover, the only six nice 5-tuples can be listed as follows:

    (1,2,3,4,5),(2,3,4,5,6),(8,9,10,11,12),(9,10,11,12,13),
    (288,289,290,291,292),and (289,290,291,292,293).

    (c) If k=6, then there are only three nice 6-tuples. Moreover, the only three nice 6-tuples can be list as follows:

    (1,2,3,4,5,6),(8,9,10,11,12,13),and (288,289,290,291,292,293).

    (d) If k7, then there is no nice k-tuple at all.

    This paper is organized as follows. First in Section 2, we are mainly dedicated in presenting the proof of Theorem 1.2 by using the method of elementary number theory, especially the tool of modulo cover. In Section 3, we propose a general question for readers who are interested in this topic to do further.

    In this section, we are concentration on the proof of Theorem 1.2.

    Proof of Theorem 1.2. First of all, it is noticed that Parts (c) and (d) of Theorem 1.2 follows immediately from Part (b). So it is sufficient to show that Parts (a) and (b) are true.

    For Part (a), we only need to prove that there exist infinitely nice 4-tuples, since from which one can easily deduce that Part (a) holds for any k{2,3}. One notes that the integers y2+20,y2+21,y2+22 are 22-STP numbers for every nonnegative integer y. So the proof of Part (a) will be done if we show that y2+3 is 22-STP finitely often. Now let us consider the diophantine equation y2+3=2x+y2. We claim that this equation at least has one solution (y,y)N2 for any xN2. In fact, since

    y2+3=2x+y2y2y2=2x3(y+y)(yy)=2x3,

    then one can take yy=1 and y+y=2x3, that is, (y,y)=(2x11,2x12)N2. Thus, this completes the proof of Part (a).

    Now we turn our attention to Part (b). First, we make a key table as follows to show the results of 2x+y2(mod8).

    One knows from the key table that there is no 22-STP number which is congruent to 7 modulo 8. It then follows that any nice 5-tuple must contain an integer with 3(mod8). Let this integer be denoted by A. Thus by the key table we know that A=y20+2 for some odd positive integer y0. So we can list all possible nice 5-tuples containing A=y20+2 as follows:

    (ⅰ) (y20+1,y20+2,y20+3,y20+4,y20+5),

    (ⅱ) or (y20,y20+1,y20+2,y20+3,y20+4),

    (ⅲ) or (y201,y20,y20+1,y20+2,y20+3),

    since y20+23(mod8) and any number in all possible nice 5-tuples is not congruent to 7 modulo 8. Next, let's discuss the details case by case.

    CASE 1. Suppose the nice 5-tuple is (y20+1,y20+2,y20+3,y20+4,y20+5). Note that y20+23(mod8), then y20+56(mod8). It follows from the key table that there exists a positive integer z with z2(mod4) such that y20+5=z2+2. So one has that (z+y0)(zy0)=3, which implies that z+y0=3,zy0=1, i.e., z=2,y0=1. It gives the unique nice 5-tuple (2,3,4,5,6) of this case since that 2=20+12,3=21+12,4=22+02,5=22+12,6=21+22.

    CASE 2. Suppose the nice 5-tuple is (y20,y20+1,y20+2,y20+3,y20+4). First, since y20 is a 22-STP number, then there exist two nonnegative integers s and u such that y20=2s+u2, which one writes as

    (y0u)(y0+u)=2s. (2.1)

    Now we solve Eq (2.1) by dividing s into two subcases.

    SUBCASE 2.1. Let 0s4.

    ● If s=0, then y0=1 and u=0, which gives a nice 5-tuple (1,2,3,4,5) since 1=20+02.

    ● If s=1, then it is easy to see that Eq (2.1) has no no integer solution since y0+u and y0u have the same parity.

    ● If s=2, then Eq (2.1) is equivalent to the equations y0+u=2 and y0u=2 since y0+u and y0u have the same parity, which gives y0=2, a contradiction with y0 being odd.

    ● If s=3, then Eq (2.1) implies that y0+u=4 and y0u=2 since y0+u and y0u have the same parity. So y0=3. It gives a nice 5-tuple (9,10,11,12,13) since 9=23+12,10=20+32,11=21+32,12=23+22,13=22+32.

    ●If s=4, then from Eq (2.1), we have that y0+u=4 and y0u=4, or y0+u=8 and y0u=2 since y0+u and y0u have the same parity. So y0=4 or 5. Note that y0 was odd, then y0=5, which gives a possible nice 5-tuple (25,26,27,28,29). But it is easy to check that 28 is not a 22-STP integer. So the 5-tuple (25,26,27,28,29) is not nice.

    SUBCASE 2.2. Suppose s5. First, one notes that gcd(y0+u,y0u)2y0 and y0 is odd. Then, by (2.1) we have that y0u=2,y0u=2s1, which implies that y0=2s2+1. Let's now focus on the 22-STP number y20+3. One can write that

    y20+3=2t+w2 (2.2)

    for some two nonnegative integers t and w. Now we discuss the details for t by splitting into following cases.

    SUBCASE 2.2.1. Assume 0t2.

    ★ If t=0, we then easily find that Eq (2.2) has no integer solution.

    ★ If t=1, then Eq (2.2) can be reduced to that y0=1 and w=1, a contradiction, since y0 was odd.

    ★ If t=2, then Eq (2.2) is equivalent to that y0+w=1 and y0w=1, i.e., y0=1,w=0. This gives the nice 5-tuple (1,2,3,4,5) which was obtained in Subcase 2.1.

    SUBCASE 2.2.2. Assume t3. Note that y0=2s2+1 with s5. By substituting y0=2s2+1 into (2.2), we have that

    22s6+2s32t2=(w2)21. (2.3)

    It yields that w2(mod4) since s5 and t3. In (2.3), for simplicity, let s3=c2,t2=a1,b=w2, then (2.3) becomes that

    22c+2c+1=2a+b2. (2.4)

    It then follows that

    (2c+1b)(2c+1+b)=(2c+1)2b2=2a+2c>0,

    which implies that b2c. So

    2a+2c=(2c+1)2b2(2c+1)2(2c)2=2c+1+1>2c+1.

    Then 2a>2c, i.e., a>c. Thus we rewrite (2.4) as

    (2c+1b)(2c+1+b)=2c(2ac+1). (2.5)

    However, one notes that gcd(2c+1b,2c+1+b)2b and b is odd. Then we deduce that one of 2c+1b and 2c+1+b has at most one factor of 2. By (2.5), we then know that 2c1 divides one of 2c+1b and 2c+1+b. Hence b±1(mod2c1). This together with 1b2c gives that b{1,2c11,2c1+1,2c1}.

    ★ If b=1, then Eq (2.4) is equivalent to the equation 2c+1=2ac which has no integer solution since c2 and a>c.

    ★ If b=2c11, then Eq (2.4) turns out to be 3×2c2=2ac2. It then follows that ac=3 and c2=1, i.e., c=3 or s=6. So y0=2s2+1=17. This gives us a new nice 5-tuple (289,290,291,292,293) since 289=26+152,290=20+172,291=21+172,292=28+62,293=23+172.

    ★ If b=2c1+1, then Eq (2.4) is simplified to that 2c1+2c2=2ac. Clearly, the later equation has no integer solution.

    ★ If b=2c1, then Eq (2.4) is reduced to be 2c+1=2c(2ac+1), a contradiction.

    CASE 3. Suppose the nice 5-tuple is (y201,y20,y20+1,y20+2,y20+3). Then (y20,y20+1,y20+2,y20+3,y20+4) is also a nice 5-tuple since y20+4=y20+22. It then follows from Case 2 that y0{1,3,5,17}. But one checks that (y201,y20,y20+1,y20+2,y20+3) can not be a nice 5-tuple for each y0{1,5} since both 0 and 28 are not 22-STP integers. But y0=3 or 17 give the new nice 5-tuples (8,9,10,11,12) and (288,289,290,291,292), as 8=22+22 and 288=25+162.

    Hence, the above cases give the all nice 5-tuples, which are (1,2,3,4,5),(2,3,4,5,6), (8,9,10,11,12),(9,10,11,12,13),(288,289,290,291,292),(289,290,291,292,293).

    To here, the proof of Part (b) is complete.

    Thus we finish the proof of Theorem 2.2.

    In this section, we will propose one general question. Let a,b be given integers with a,b2. Let n be a positive integer. We call the integer n an ab-STP number if it can be expressed in the form ax+yb with x and y being nonnegative integers. Furthermore, we call a k-tuple (a1,a2,,ak) ab-nice if a1,a2,,ak are increasingly consecutive integers and each of them is an ab-STP number. Now we propose the following general question which seems little hard, as follows.

    Question 3.1. Let a and b be given integers with a,b2. For each integer k2, find all ab-nice k-tuples.

    The gap in integer sequences are wide problems in Number Theory. The gap of primes |pnpn+1| is one of the most important topics in analytic Number Theory. In the field of Diophantine analysis, there are many open questions on the gap of the powers |xmyn|. In this paper, we considered k-tuples of consecutive integers (a1,a2,,ak) such that each of them is the sum of two perfect powers. We used some elementary methods in number theory to prove that there exists infinitely many 4-tuples with each elements of the form 2x+y2, no such 7-tuples exists, and such quintuples and sextuples were listed. At the end of this paper, a general question was also proposed for the interested readers there.

    The authors would like to thank the referees for their helpful comments. This work was supported partially by China Scholarship Council Foundation # 201908510050 and the Research Initiation Fund for Young Teachers of China West Normal University # 412679.

    The authors declare that there is no conflict of interest.

    [1] Kane MD, Jatkoe TA, Stumpf CR, et al. (2000) Assessment of the sensitivity and specificity of oligonucleotide (50 mer) microarrays. Nucleic Acids Res 28: 4552–4557. doi: 10.1093/nar/28.22.4552
    [2] Rahmann S (2002) Rapid large-scale oligonucleotide selection for microarrays. In Proc IEEE Comput Soc Bioinform Conf 1: 54–63.
    [3] Russell R (2003) Designing microarray oligonucleotide probes. Brief Bioinform 4: 361–367. doi: 10.1093/bib/4.4.361
    [4] Reymond N, Charle H, Duret L, et al. (2004) ROSO: optimizing oligonucleotide probes for microarrays. Bioinformatics 20: 271–273. doi: 10.1093/bioinformatics/btg401
    [5] He Z, Wu L, Fields MW, et al. (2005) Use of microarrays with different probe sizes for monitoring gene expression. Appl Environ Microbiol 71: 5154–5162. doi: 10.1128/AEM.71.9.5154-5162.2005
    [6] Li X, He Z, Zhou J (2005) Selection of optimal oligonucleotide probes for microarrays using multiple criteria, global alignment and parameter estimation. Nucleic Acids Res 33: 6114–6123.
    [7] Li F and Stormo GD (2001) Selection of optimal DNA oligos for gene expression arrays. Bioinformatics 17: 1067–1076.
    [8] Nielsen HB, Knudsen S (2002) Avoiding cross hybridization by choosing nonredundant targets on cDNA arrays. Bioinformatics 18: 321–322. doi: 10.1093/bioinformatics/18.2.321
    [9] Krause A, Krautner M, Meier H (2003) Accurate method for fast design of diagnostic oligonucleotide probe sets for DNA microarrays. IPDPS: 1–9.
    [10] Letowski J, Brousseau R, Masson L (2004) Designing better probes: effect of probe size, mismatch position and number on hybridization in DNA oligonucleotide microarrays. J Microbiol Meth 57: 269–278.
    [11] Nordberg EK (2005) YODA: selecting signature oligonucleotides. Bioinformatics 21: 1365-1370.
    [12] Jourdren L, Duclos A, Brion C, et al. (2010) Teolenn: an efficient and customizable workflow to design high-quality probes for microarray experiments. Nucleic Acids Res 38: e117. doi: 10.1093/nar/gkq110
    [13] Rouillard JM, Herbert CJ, Zuker M (2002) OligoArray: genome-scale oligonucleotide design for microarrays. Bioinformatics 18: 486–487. doi: 10.1093/bioinformatics/18.3.486
    [14] Sung W, Lee W (2003) Fast and accurate probe selection algorithm for large genomes. In Proc IEEE Comput Soc Bioinform Conf 2: 65–74.
    [15] Hyyrö H, Juhola M, Vihinen M (2005) Genome-wide selection of unique and valid oligonucleotides. Nucl Acids Res 33: e115.
    [16] Markowitz VM, Chen IA, Palaniappan K, et al. (2010) The integrated microbial genomes system: an expanding comparative analysis resource. Nucleic Acids Res 38: D382–D390.
    [17] Oh S, Yoder-Himes DR, Tiedje J, et al. (2010) Evaluating the performance of oligonucleotide microarrays for bacterial strains with increasing genetic divergence from the reference strain. Appl Environ Microbiol 76: 2980–2988. doi: 10.1128/AEM.02826-09
    [18] Hug LA, Salehi M, Nuin P, et al. (2011) Design and verification of a pangenome microarray oligonucleotide probe set for dehalococcoides spp. Appl Environ Microbiol 77: 5361–5369.
    [19] Markowitz VM, Mavromatis K, Ivanova NN, et al. (2009) IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics 25: 2271–2278. doi: 10.1093/bioinformatics/btp393
    [20] Davidsen T, Beck E, Ganapathy A, et al. (2010) The comprehensive microbial resource. Nucl Acids Res 38: D340–D345.
    [21] Rimour S, Hill D, Militon C, et al. (2005) GoArrays: highly dynamic and efficient microarray probe design. Bioinformatics 21: 1094–1103.
    [22] Rouillard JM, Gulari E (2009) OligoArrayDb: pangenomic oligonucleotide microarray probe sets database. Nucleic Acids Res 37: D938–D941. doi: 10.1093/nar/gkn761
    [23] Hyatt D, Chen GL, LoCascio PF, et al. (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11: 119. doi: 10.1186/1471-2105-11-119
    [24] Wu C, Carta R, Zhang L (2005) Sequence dependence of cross-hybridization on short oligo microarrays. Nucleic Acids Res 33: e84. doi: 10.1093/nar/gni082
    [25] Hu G, Llinás M, Li J, et al. (2007) Selection of long oligonucleotides for gene expression microarrays using weighted rank-sum strategy. BMC Bioinformatics 8: 350. doi: 10.1186/1471-2105-8-350
    [26] Flikka K, Yadetie F, Laegreid A (2004) XHM: A system for detection of potential cross hybridizations in DNA microarrays. BMC Bioinformatics 5: 117. doi: 10.1186/1471-2105-5-117
    [27] SantaLucia J J (1998) A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc Natl Acad Sci USA 95: 1460–1465.
    [28] Binder H, Preibisch S, Kirsten T (2005) Base pair interactions and hybridization isotherms of matched and mismatched oligonucleotide probes on microarrays. Langmuir 21: 9287–9302.
    [29] Binder H, Kirsten T, Loeffler Met, et al. (2004) Sensitivity of microarray oligonucleotide probes: variability and effect of base composition. J Phys Chem B 108: 18003–18014.
    [30] Liebich J, Schadt CW, Chong SC, et al. (2006) Improvement of oligonucleotide probe design criteria for functional gene microarrays in environmental applications. Appl Environ Microbiol 72: 1688–1691.
  • This article has been cited by:

    1. Zhen Pu, Kaimin Cheng, Consecutive integers in the form ax+yb, 2023, 8, 2473-6988, 17620, 10.3934/math.2023899
  • Reader Comments
  • © 2017 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(5040) PDF downloads(911) Cited by(0)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog