Research article

Sequence data analysis and preprocessing for oligo probe design in microbial genomes

  • Received: 12 December 2016 Accepted: 26 December 2016 Published: 25 January 2017
  • A good oligo probe design in DNA microarray experiments is crucial to obtain the better results of gene expression analysis. However, sequence data from a very large microbial genome or pan-genome will produce a reduced number of oligos and affect the design quality if processed by a probe designer. Gene redundancies and discrepancies across resources of the same species or strain and their sequence similarity and homology are responsible for the poor quantity of oligos designed. We addressed these issues and problems with sequences and introduced the concept of open reading frame (ORF) sequence segmentation from which quality oligos can be selected. Analysis and pre-processing of sequence data were performed using our Perl-based pipeline ORF-Purger 2.0. ORFs were purged of redundancy, discrepancy, invalidity, overlapping, similarity and, optionally, homology, such that the quantity and quality of oligos to be designed were drastically improved. Probe integrity was proposed as the first probe selection criterion since the fully physical availability of all possible probes corresponding to their targets in a nucleic acid sample is necessary for a best probe design.

    Citation: Ruming Li, Brian Fristensky, Guixue Wang. Sequence data analysis and preprocessing for oligo probe design in microbial genomes[J]. AIMS Bioengineering, 2017, 4(1): 28-45. doi: 10.3934/bioeng.2017.1.28

    Related Papers:

  • A good oligo probe design in DNA microarray experiments is crucial to obtain the better results of gene expression analysis. However, sequence data from a very large microbial genome or pan-genome will produce a reduced number of oligos and affect the design quality if processed by a probe designer. Gene redundancies and discrepancies across resources of the same species or strain and their sequence similarity and homology are responsible for the poor quantity of oligos designed. We addressed these issues and problems with sequences and introduced the concept of open reading frame (ORF) sequence segmentation from which quality oligos can be selected. Analysis and pre-processing of sequence data were performed using our Perl-based pipeline ORF-Purger 2.0. ORFs were purged of redundancy, discrepancy, invalidity, overlapping, similarity and, optionally, homology, such that the quantity and quality of oligos to be designed were drastically improved. Probe integrity was proposed as the first probe selection criterion since the fully physical availability of all possible probes corresponding to their targets in a nucleic acid sample is necessary for a best probe design.


    加载中
    [1] Kane MD, Jatkoe TA, Stumpf CR, et al. (2000) Assessment of the sensitivity and specificity of oligonucleotide (50 mer) microarrays. Nucleic Acids Res 28: 4552–4557. doi: 10.1093/nar/28.22.4552
    [2] Rahmann S (2002) Rapid large-scale oligonucleotide selection for microarrays. In Proc IEEE Comput Soc Bioinform Conf 1: 54–63.
    [3] Russell R (2003) Designing microarray oligonucleotide probes. Brief Bioinform 4: 361–367. doi: 10.1093/bib/4.4.361
    [4] Reymond N, Charle H, Duret L, et al. (2004) ROSO: optimizing oligonucleotide probes for microarrays. Bioinformatics 20: 271–273. doi: 10.1093/bioinformatics/btg401
    [5] He Z, Wu L, Fields MW, et al. (2005) Use of microarrays with different probe sizes for monitoring gene expression. Appl Environ Microbiol 71: 5154–5162. doi: 10.1128/AEM.71.9.5154-5162.2005
    [6] Li X, He Z, Zhou J (2005) Selection of optimal oligonucleotide probes for microarrays using multiple criteria, global alignment and parameter estimation. Nucleic Acids Res 33: 6114–6123.
    [7] Li F and Stormo GD (2001) Selection of optimal DNA oligos for gene expression arrays. Bioinformatics 17: 1067–1076.
    [8] Nielsen HB, Knudsen S (2002) Avoiding cross hybridization by choosing nonredundant targets on cDNA arrays. Bioinformatics 18: 321–322. doi: 10.1093/bioinformatics/18.2.321
    [9] Krause A, Krautner M, Meier H (2003) Accurate method for fast design of diagnostic oligonucleotide probe sets for DNA microarrays. IPDPS: 1–9.
    [10] Letowski J, Brousseau R, Masson L (2004) Designing better probes: effect of probe size, mismatch position and number on hybridization in DNA oligonucleotide microarrays. J Microbiol Meth 57: 269–278.
    [11] Nordberg EK (2005) YODA: selecting signature oligonucleotides. Bioinformatics 21: 1365-1370.
    [12] Jourdren L, Duclos A, Brion C, et al. (2010) Teolenn: an efficient and customizable workflow to design high-quality probes for microarray experiments. Nucleic Acids Res 38: e117. doi: 10.1093/nar/gkq110
    [13] Rouillard JM, Herbert CJ, Zuker M (2002) OligoArray: genome-scale oligonucleotide design for microarrays. Bioinformatics 18: 486–487. doi: 10.1093/bioinformatics/18.3.486
    [14] Sung W, Lee W (2003) Fast and accurate probe selection algorithm for large genomes. In Proc IEEE Comput Soc Bioinform Conf 2: 65–74.
    [15] Hyyrö H, Juhola M, Vihinen M (2005) Genome-wide selection of unique and valid oligonucleotides. Nucl Acids Res 33: e115.
    [16] Markowitz VM, Chen IA, Palaniappan K, et al. (2010) The integrated microbial genomes system: an expanding comparative analysis resource. Nucleic Acids Res 38: D382–D390.
    [17] Oh S, Yoder-Himes DR, Tiedje J, et al. (2010) Evaluating the performance of oligonucleotide microarrays for bacterial strains with increasing genetic divergence from the reference strain. Appl Environ Microbiol 76: 2980–2988. doi: 10.1128/AEM.02826-09
    [18] Hug LA, Salehi M, Nuin P, et al. (2011) Design and verification of a pangenome microarray oligonucleotide probe set for dehalococcoides spp. Appl Environ Microbiol 77: 5361–5369.
    [19] Markowitz VM, Mavromatis K, Ivanova NN, et al. (2009) IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics 25: 2271–2278. doi: 10.1093/bioinformatics/btp393
    [20] Davidsen T, Beck E, Ganapathy A, et al. (2010) The comprehensive microbial resource. Nucl Acids Res 38: D340–D345.
    [21] Rimour S, Hill D, Militon C, et al. (2005) GoArrays: highly dynamic and efficient microarray probe design. Bioinformatics 21: 1094–1103.
    [22] Rouillard JM, Gulari E (2009) OligoArrayDb: pangenomic oligonucleotide microarray probe sets database. Nucleic Acids Res 37: D938–D941. doi: 10.1093/nar/gkn761
    [23] Hyatt D, Chen GL, LoCascio PF, et al. (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11: 119. doi: 10.1186/1471-2105-11-119
    [24] Wu C, Carta R, Zhang L (2005) Sequence dependence of cross-hybridization on short oligo microarrays. Nucleic Acids Res 33: e84. doi: 10.1093/nar/gni082
    [25] Hu G, Llinás M, Li J, et al. (2007) Selection of long oligonucleotides for gene expression microarrays using weighted rank-sum strategy. BMC Bioinformatics 8: 350. doi: 10.1186/1471-2105-8-350
    [26] Flikka K, Yadetie F, Laegreid A (2004) XHM: A system for detection of potential cross hybridizations in DNA microarrays. BMC Bioinformatics 5: 117. doi: 10.1186/1471-2105-5-117
    [27] SantaLucia J J (1998) A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc Natl Acad Sci USA 95: 1460–1465.
    [28] Binder H, Preibisch S, Kirsten T (2005) Base pair interactions and hybridization isotherms of matched and mismatched oligonucleotide probes on microarrays. Langmuir 21: 9287–9302.
    [29] Binder H, Kirsten T, Loeffler Met, et al. (2004) Sensitivity of microarray oligonucleotide probes: variability and effect of base composition. J Phys Chem B 108: 18003–18014.
    [30] Liebich J, Schadt CW, Chong SC, et al. (2006) Improvement of oligonucleotide probe design criteria for functional gene microarrays in environmental applications. Appl Environ Microbiol 72: 1688–1691.
  • Reader Comments
  • © 2017 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(3805) PDF downloads(909) Cited by(0)

Article outline

Figures and Tables

Tables(6)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog