BIOPHYSICAL THEORY AND MODELING |
Prokaryotic Gene Finding based on Physicochemical Characterstics of Codons calculated from Molecular Dynamics Simulations
Poonam Singhal 1, B Jayaram 1, Surjit B Dixit 2 and David L Beveridge 2*
1 Indian Institute of Technology, New Delhi, India
2 Wesleyan University
* To whom correspondence should be addressed. E-mail: dbeveridge{at}wesleyan.edu.
Submitted on June 29, 2007
Revised on July 28, 2007
Accepted on 29 November 2007
 |
Abstract |
|---|
An ab initio model for gene prediction in prokaryotic genomes is proposed based on physico-chemical characteristics of codons calculated from molecular dynamics (MD) simulations. The model requires a specification of three calculated quantities for each codon: the double helical trinucleotide base pairing energy, the base pair stacking energy and an index of the propensity of a codon for protein-nucleic acid interactions. The base pairing and stacking energies for each codon are obtained from recently reported MD simulations on all unique tetranucleotide steps (Beveridge et al., Biophys. J., 87, 3799-3813, 2004, Dixit et al., Biophys J., 89, 3721-3740, 2005), and the third parameter is assigned based on the conjugate rule previously proposed to account for the wobble hypothesis with respect to degeneracies in the genetic code (Jayaram, J. Mol. Evol., 45, 704-705, 1997). The third interaction propensity parameter values correlate well with ab initio MD calculated solvation energies and flexibility of codon sequences as well as codon usage in genes and amino acid composition frequencies in ~175000 protein sequences in the Swissprot database. Assignment of these three parameters for each codon enables the calculation of the magnitude and orientation of a cumulative three dimensional vector for a DNA sequence of any length in each of the six genomic reading frames. Analysis of 372 genomes comprising ~ 350000 genes shows that the orientations of the gene and non-gene vectors are well differentiated and make a clear distinction feasible between genic and non-genic sequences at a level equivalent to or better than currently available knowledge based models trained on the basis of empirical data, presenting a strong support for the possibility of a unique and useful physicochemical characterization of DNA sequences from codons to genomes.
Key Words:
Gene prediction, Molecular dynamics simulation, Physicochemical model, Prokaryotic Gene finding, whole genome analysis