AIMS Bioengineering, 2017, 4(1): 28-45. doi: 10.3934/bioeng.2017.1.28.

Research article

Export file:


  • RIS(for EndNote,Reference Manager,ProCite)
  • BibTex
  • Text


  • Citation Only
  • Citation and Abstract

Sequence data analysis and preprocessing for oligo probe design in microbial genomes

1 Bioinformatics Laboratory, University of Manitoba, Winnipeg, MB R3T 2N2, Canada
2 Key Laboratory for Biorheological Science and Technology of Ministry of Education, Bioengineering College of Chongqing University, Chongqing 400044, China

A good oligo probe design in DNA microarray experiments is crucial to obtain the better results of gene expression analysis. However, sequence data from a very large microbial genome or pan-genome will produce a reduced number of oligos and affect the design quality if processed by a probe designer. Gene redundancies and discrepancies across resources of the same species or strain and their sequence similarity and homology are responsible for the poor quantity of oligos designed. We addressed these issues and problems with sequences and introduced the concept of open reading frame (ORF) sequence segmentation from which quality oligos can be selected. Analysis and pre-processing of sequence data were performed using our Perl-based pipeline ORF-Purger 2.0. ORFs were purged of redundancy, discrepancy, invalidity, overlapping, similarity and, optionally, homology, such that the quantity and quality of oligos to be designed were drastically improved. Probe integrity was proposed as the first probe selection criterion since the fully physical availability of all possible probes corresponding to their targets in a nucleic acid sample is necessary for a best probe design.
  Article Metrics

Keywords sequence data analysis; gene sequence pre-processing; ORF purging; ORF segmentation; DNA microarray; oligo probe design; oligo probe quantity; oligo probe quality

Citation: Ruming Li, Brian Fristensky, Guixue Wang. Sequence data analysis and preprocessing for oligo probe design in microbial genomes. AIMS Bioengineering, 2017, 4(1): 28-45. doi: 10.3934/bioeng.2017.1.28


  • 1. Kane MD, Jatkoe TA, Stumpf CR, et al. (2000) Assessment of the sensitivity and specificity of oligonucleotide (50 mer) microarrays. Nucleic Acids Res 28: 4552–4557.    
  • 2. Rahmann S (2002) Rapid large-scale oligonucleotide selection for microarrays. In Proc IEEE Comput Soc Bioinform Conf 1: 54–63.
  • 3. Russell R (2003) Designing microarray oligonucleotide probes. Brief Bioinform 4: 361–367.    
  • 4. Reymond N, Charle H, Duret L, et al. (2004) ROSO: optimizing oligonucleotide probes for microarrays. Bioinformatics 20: 271–273.    
  • 5. He Z, Wu L, Fields MW, et al. (2005) Use of microarrays with different probe sizes for monitoring gene expression. Appl Environ Microbiol 71: 5154–5162.    
  • 6. Li X, He Z, Zhou J (2005) Selection of optimal oligonucleotide probes for microarrays using multiple criteria, global alignment and parameter estimation. Nucleic Acids Res 33: 6114–6123.
  • 7. Li F and Stormo GD (2001) Selection of optimal DNA oligos for gene expression arrays. Bioinformatics 17: 1067–1076.
  • 8. Nielsen HB, Knudsen S (2002) Avoiding cross hybridization by choosing nonredundant targets on cDNA arrays. Bioinformatics 18: 321–322.    
  • 9. Krause A, Krautner M, Meier H (2003) Accurate method for fast design of diagnostic oligonucleotide probe sets for DNA microarrays. IPDPS: 1–9.
  • 10. Letowski J, Brousseau R, Masson L (2004) Designing better probes: effect of probe size, mismatch position and number on hybridization in DNA oligonucleotide microarrays. J Microbiol Meth 57: 269–278.
  • 11. Nordberg EK (2005) YODA: selecting signature oligonucleotides. Bioinformatics 21: 1365-1370.
  • 12. Jourdren L, Duclos A, Brion C, et al. (2010) Teolenn: an efficient and customizable workflow to design high-quality probes for microarray experiments. Nucleic Acids Res 38: e117.    
  • 13. Rouillard JM, Herbert CJ, Zuker M (2002) OligoArray: genome-scale oligonucleotide design for microarrays. Bioinformatics 18: 486–487.    
  • 14. Sung W, Lee W (2003) Fast and accurate probe selection algorithm for large genomes. In Proc IEEE Comput Soc Bioinform Conf 2: 65–74.
  • 15. Hyyrö H, Juhola M, Vihinen M (2005) Genome-wide selection of unique and valid oligonucleotides. Nucl Acids Res 33: e115.
  • 16. Markowitz VM, Chen IA, Palaniappan K, et al. (2010) The integrated microbial genomes system: an expanding comparative analysis resource. Nucleic Acids Res 38: D382–D390.
  • 17. Oh S, Yoder-Himes DR, Tiedje J, et al. (2010) Evaluating the performance of oligonucleotide microarrays for bacterial strains with increasing genetic divergence from the reference strain. Appl Environ Microbiol 76: 2980–2988.    
  • 18. Hug LA, Salehi M, Nuin P, et al. (2011) Design and verification of a pangenome microarray oligonucleotide probe set for dehalococcoides spp. Appl Environ Microbiol 77: 5361–5369.
  • 19. Markowitz VM, Mavromatis K, Ivanova NN, et al. (2009) IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics 25: 2271–2278.    
  • 20. Davidsen T, Beck E, Ganapathy A, et al. (2010) The comprehensive microbial resource. Nucl Acids Res 38: D340–D345.
  • 21. Rimour S, Hill D, Militon C, et al. (2005) GoArrays: highly dynamic and efficient microarray probe design. Bioinformatics 21: 1094–1103.
  • 22. Rouillard JM, Gulari E (2009) OligoArrayDb: pangenomic oligonucleotide microarray probe sets database. Nucleic Acids Res 37: D938–D941.    
  • 23. Hyatt D, Chen GL, LoCascio PF, et al. (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11: 119.    
  • 24. Wu C, Carta R, Zhang L (2005) Sequence dependence of cross-hybridization on short oligo microarrays. Nucleic Acids Res 33: e84.    
  • 25. Hu G, Llinás M, Li J, et al. (2007) Selection of long oligonucleotides for gene expression microarrays using weighted rank-sum strategy. BMC Bioinformatics 8: 350.    
  • 26. Flikka K, Yadetie F, Laegreid A (2004) XHM: A system for detection of potential cross hybridizations in DNA microarrays. BMC Bioinformatics 5: 117.    
  • 27. SantaLucia J J (1998) A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc Natl Acad Sci USA 95: 1460–1465.
  • 28. Binder H, Preibisch S, Kirsten T (2005) Base pair interactions and hybridization isotherms of matched and mismatched oligonucleotide probes on microarrays. Langmuir 21: 9287–9302.
  • 29. Binder H, Kirsten T, Loeffler Met, et al. (2004) Sensitivity of microarray oligonucleotide probes: variability and effect of base composition. J Phys Chem B 108: 18003–18014.
  • 30. Liebich J, Schadt CW, Chong SC, et al. (2006) Improvement of oligonucleotide probe design criteria for functional gene microarrays in environmental applications. Appl Environ Microbiol 72: 1688–1691.


Reader Comments

your name: *   your email: *  

Copyright Info: 2017, Ruming Li, et al., licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution Licese (

Download full text in PDF

Export Citation

Copyright © AIMS Press All Rights Reserved