OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Advances in Bioinformatics 2011

Prediction of Enzyme Mutant Activity Using Computational Mutagenesis and Incremental Transduction

DOI: 10.1155/2011/958129

Nada Basit,Harry Wechsler

Full-Text Cite this paper Add to My Lib

Abstract:

Wet laboratory mutagenesis to determine enzyme activity changes is expensive and time consuming. This paper expands on standard one-shot learning by proposing an incremental transductive method (T2bRF) for the prediction of enzyme mutant activity during mutagenesis using Delaunay tessellation and 4-body statistical potentials for representation. Incremental learning is in tune with both eScience and actual experimentation, as it accounts for cumulative annotation effects of enzyme mutant activity over time. The experimental results reported, using cross-validation, show that overall the incremental transductive method proposed, using random forest as base classifier, yields better results compared to one-shot learning methods. T2bRF is shown to yield 90% on T4 and LAC (and 86% on HIV-1). This is significantly better than state-of-the-art competing methods, whose performance yield is at 80% or less using the same datasets. 1. Introduction A chain of amino acids in a given sequence forms the primary structure that makes up a protein and determines its functions. Proteins are necessary for virtually every activity in the human body [1]. There are twenty distinct amino acids that make up the polypeptides. They are known as proteinogenic or standard amino acids [1, 2]. The order of these amino acids in the chain, known as the primary sequence, is very important. Changes in even one amino acid (e.g., substituting one kind of amino acid, at a given location, with a different one) can affect the way the protein functions, that is, its activity. Such a substitution is an example of a mutation in the protein’s amino acid sequence and is characteristic of a single-site mutation. The interplay between mutations and their effect on protein function is the domain of bioinformatics, in general, and computational mutagenesis, in particular. Mutagenesis can be described as creating a mutation in the protein (in the amino acid chain) by substituting an original (or wild-type) amino acid at a given position in the chain with one of the other 19 amino acid types, for example, substituting the amino acid tryptophan at position 10 with cysteine at that same location in a particular protein [3]. The resulting mutated protein’s activity may be different from its wild-type counterpart (remaining active or becoming inactive). Experiments using mutagenesis enable researchers to collect data about protein activity with respect to mutations. Since wet lab experimentation is very expensive, finding a less expensive method, by being able to predict a protein’s activity/function, is

References

[1]	H. Lodish, Molecular Cell Biology, W.H. Freeman, New York, NY, USA, 5th edition, 2004.
[2]	T. H. Creighton, Proteins: Structures and Molecular Properties, W.H. Freeman, San Francisco, Calif, USA, 1993.
[3]	J. Pevsner, Bioinformatics and Functional Genomics, Wiley-Blackwell, Hoboken, NJ, USA, 2nd edition, 2009.
[4]	A. Z. Machalek, Inside the Cell, U.S. Department of Health and Human Services, 2007, http://www.nigms.nih.gov.
[5]	M. Masso and I. Vaisman, “Accurate prediction of enzyme mutant activity based on a multibody statistical potential,” Bioinformatics, vol. 23, no. 23, pp. 3155–3161, 2007.
[6]	M. Masso and I. Vaisman, “Accurate prediction of stability changes in protein mutants by combining machine learning with structure based computational mutagenesis,” Bioinformatics, vol. 24, no. 18, pp. 2002–2009, 2008.
[7]	H. M. Berman, J. Westbrook, Z. Feng et al., “The protein data bank,” Nucleic Acids Research, vol. 28, no. 1, pp. 235–242, 2000.
[8]	O. S. Platt, D. J. Brambilla, W. F. Rosse et al., “Mortality in sickle cell disease. Life expectancy and risk factors for early death,” The New England Journal of Medicine, vol. 330, no. 23, pp. 1639–1644, 1994.
[9]	D. R. Bloch, Organic Chemistry Demystified, McGraw-Hill, New York, NY, USA, 2006.
[10]	D. L. Nelson and M. M. Cox, Lehninger's Principles of Biochemistry, W.H. Freeman, New York, NY, USA, 4th edition, 2005.
[11]	“The Twenty Amino Acids,” Birkbeck University, London, UK, 2010, http://www.cryst.bbk.ac.uk/education/AminoAcid/the_twenty.html.
[12]	I. Vaisman, A. Tropsha, and W. Zheng, “Compositional preferences in quadruplets of nearest neighbor residues in protein structures: statistical geometry analysis,” in Proceedings of the IEEE Symposium on Intelligent Systems, pp. 163–168, 1998.
[13]	M. Masso, K. Hijazi, N. Parvez, and I. Vaisman, “Computational mutagenesis of E. coli lac repressor: insight into structure-function relationships and accurate prediction of mutant activity,” in Lecture Notes in Bioinformatics, I. Mandoiu, R. Sunderraman, and A. Zelikovsky, Eds., vol. 4983, pp. 390–401, Springer, Berlin, Germany, 2008.
[14]	R. K. Singh, A. Tropsha, and I. Vaisman, “Delaunay tessellation of proteins: four body nearest-neighbor propensities of amino acid residues,” Journal of Computational Biology, vol. 3, no. 2, pp. 213–221, 1996.
[15]	C. B. Barber, D. P. Dobkin, and H. Huhdanpaa, “The quickhull algorithm for convex hulls,” ACM Transactions on Mathematical Software, vol. 22, no. 4, pp. 469–483, 1996.
[16]	I. Vaisman, “Statistical and computational geometry of biomolecular structure,” in Handbook of Computational Statistics, J. E. Gentle, W. H？rdle, and Y. Mori, Eds., Springer, Berlin, Germany, 2004.
[17]	M. Masso and I. Vaisman, “Comprehensive mutagenesis of HIV-1 protease: a computational geometry approach,” Biochemical and Biophysical Research Communications, vol. 305, no. 2, pp. 322–326, 2003.
[18]	M. Masso, “Knowledge-based study of protein structure-function correlations using computational geometry,” in Proceedings of the IEEE International Conference on Bioinformatics & Biomedicine (BIBM '09) Tutorial, George Mason University, Washington, DC, USA, 2009.
[19]	V. Cherkassky and F. Mulier, Learning From Data Concepts, Theory, and Methods, John Wiley & Sons, New York, NY, USA, 2nd edition, 2007.
[20]	X. Zhu, “Semi-supervised learning literature survey,” 2005, http://pages.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf.
[21]	V. Vapnik, Estimation of Dependencies Based on Empirical Data, Springer, New York, NY, USA, 1982.
[22]	V. Vapnik, Statistical Learning Theory, John Wiley & Sons, New York, NY, USA, 1998.
[23]	O. Chapelle, B. Sch？lkopf, and A. Zien, Semi-Supervised Learning, MIT Press, 2006.
[24]	T. M. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory, vol. 13, pp. 21–27, 1967.
[25]	R. El-Yaniv and L. Gerzon, “Effective transductive learning via objective model selection,” Pattern Recognition Letters, vol. 26, no. 13, pp. 2104–2115, 2005.
[26]	M. Masso, Z. Lu, and I. Vaisman, “Computational mutagenesis studies of protein structure-function correlations,” Proteins, vol. 64, no. 1, pp. 234–245, 2006.
[27]	P. Tan, M. Seinbach, and V. Kumar, Introduction to Data Mining, Addison-Wesley, 2006.
[28]	S. Russell and P. Norvig, Artificial Intelligence—A Modern Approach, Prentice Hall, New York, NY, USA, 3rd edition, 2010.
[29]	R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee, “Boosting the margin: a new explanation for the effectiveness of voting methods,” Annals of Statistics, vol. 26, no. 5, pp. 1651–1686, 1998.
[30]	J. Friedman, T. Hastie, and R. Tibshirani, “Additive logistic regression: a statistical view of boosting,” Annals of Statistics, vol. 28, no. 2, pp. 337–407, 2000.
[31]	B. Boser, I. Guyon, and V. Vapnik, “A training algorithm for optimal margin classifiers,” Computational Learning Theory, vol. 5, pp. 144–152, 1992.
[32]	J. Weston, F. Pérez-Cruz, O. Bousquet, O. Chapelle, A. Elisseeff, and B. Sch？lkopf, “Feature selection and transduction for prediction of molecular bioactivity for drug design,” Bioinformatics, vol. 19, no. 6, pp. 764–771, 2003.
[33]	MATLAB version 6.5.0 / 7.10.0, http://www.mathworks.com.
[34]	WEKA version 3.7.1, http://www.cs.waikato.ac.nz/ml/weka.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413