help button home button Biophys. J.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

Biophysical Journal 71: 148-155 (1996)
© 1996 the Biophysical Society

This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Strait, B J
Right arrow Articles by Dewey, T G
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Strait, B J
Right arrow Articles by Dewey, T G

The Shannon information entropy of protein sequences.

B J Strait and T G Dewey

Department of Chemistry, University of Denver, Colorado 80208, USA.

ABSTRACT

A comprehensive data base is analyzed to determine the Shannon information content of a protein sequence. This information entropy is estimated by three methods: a k-tuplet analysis, a generalized Zipf analysis, and a "Chou-Fasman gambler." The k-tuplet analysis is a "letter" analysis, based on conditional sequence probabilities. The generalized Zipf analysis demonstrates the statistical linguistic qualities of protein sequences and uses the "word" frequency to determine the Shannon entropy. The Zipf analysis and k-tuplet analysis give Shannon entropies of approximately 2.5 bits/amino acid. This entropy is much smaller than the value of 4.18 bits/amino acid obtained from the nonuniform composition of amino acids in proteins. The "Chou-Fasman" gambler is an algorithm based on the Chou-Fasman rules for protein structure. It uses both sequence and secondary structure information to guess at the number of possible amino acids that could appropriately substitute into a sequence. As in the case for the English language, the gambler algorithm gives significantly lower entropies than the k-tuplet analysis. Using these entropies, the number of most probable protein sequences can be calculated. The number of most probable protein sequences is much less than the number of possible sequences but is still much larger than the number of sequences thought to have existed throughout evolution. Implications of these results for mutagenesis experiments are discussed.




This article has been cited by other articles:


Home page
Biophys. JHome page
P. W. Hildebrand, S. Gunther, A. Goede, L. Forrest, C. Frommel, and R. Preissner
Hydrogen-Bonding and Packing Features of Membrane Proteins: Functional Implications
Biophys. J., March 15, 2008; 94(6): 1945 - 1953.
[Abstract] [Full Text] [PDF]


Home page
Biophys. JHome page
Y. Sawada and S. Honda
Structural Diversity of Protein Segments Follows a Power-Law Distribution
Biophys. J., August 15, 2006; 91(4): 1213 - 1223.
[Abstract] [Full Text] [PDF]


Home page
Biophys. JHome page
T. Aynechi and I. D. Kuntz
An Information Theoretic Approach to Macromolecular Modeling: I. Sequence Alignments
Biophys. J., November 1, 2005; 89(5): 2998 - 3007.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
B. Shen and M. Vihinen
Conservation and covariance in PH domain sequences: physicochemical profile and information theoretical analysis of XLA-causing mutations in the Btk PH domain
Protein Eng. Des. Sel., March 1, 2004; 17(3): 267 - 276.
[Abstract] [Full Text] [PDF]


Home page
Biophys. JHome page
J. P. Zbilut, A. Colosimo, F. Conti, M. Colafranceschi, C. Manetti, M. Valerio, C. L. Webber Jr., and A. Giuliani
Protein Aggregation/Folding: The Role of Deterministic Singularities of Sequence Hydrophobicity as Determined by Nonlinear Signal Analysis of Acylphosphatase and A{beta}(1-40)
Biophys. J., December 1, 2003; 85(6): 3544 - 3557.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Physiol. Regul. Integr. Comp. Physiol.Home page
E. A. Szell, G. T. Somogyi, W. C. de Groat, and G. P. Szigeti
Developmental changes in spontaneous smooth muscle activity in the neonatal rat urinary bladder
Am J Physiol Regulatory Integrative Comp Physiol, October 1, 2003; 285(4): R809 - R816.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
R. Schwartz, S. Istrail, and J. King
Frequencies of amino acid strings in globular protein sequences indicate suppression of blocks of consecutive hydrophobic residues
Protein Sci., May 1, 2001; 10(5): 1023 - 1031.
[Abstract] [Full Text]


Home page
Protein Eng Des SelHome page
A. Giuliani, P. Sirabella, R. Benigni, and A. Colosimo
Mapping protein sequence spaces by recurrence quantification analysis: a case study on chimeric structures
Protein Eng. Des. Sel., October 1, 2000; 13(10): 671 - 678.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
W. R. Atchley, K. R. Wollenberg, W. M. Fitch, W. Terhalle, and A. W. Dress
Correlations Among Amino Acid Sites in bHLH Protein Domains: An Information Theoretic Analysis
Mol. Biol. Evol., January 1, 2000; 17(1): 164 - 178.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 1996 by the Biophysical Society.