OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

科学通报 2015

药物分子设计中的大数据问题

DOI: 10.1360/N972014-01144, PP. 558-565

严鑫,丁鹏,刘志红,王领,廖晨钟,顾琼,徐峻

Keywords: 大数据,药物设计,生物信息学,化学信息学,高性能计算

Full-Text Cite this paper Add to My Lib

Abstract:

药物创新领域的大数据主要来源于高通量实验、高效能模拟计算、信息化、科技出版物和专利文献4个方面.这些大数据使我们有可能在系统层面上看到药物分子与许多靶标相互作用的新现象、新规律,提高药物创新的效率,也带来新的挑战,如存储、标引/标注和质控、可视化、数据挖掘和计算复杂度等问题.这些问题可以通过在超算和云服务技术的支持下发展并行计算方法而逐渐得到解决.从离散、不完备且信噪比低的大数据中难以找到物质活性与结构之间的连续函数关系,贝叶斯学习机及其与支持向量机、决策树技术的组合是大数据挖掘的发展方向.大数据既是科学实验通量化和社会信息化的结果又是原因,正确解决大数据挖掘问题是提高药物创新效率的核心.

References

[1]	1 Costa F F. Big data in biomedicine. Drug Discov Today, 2014,19: 433-440
[2]	6 Mullis K B, Ferré F, Gibbs R A. The Polymerase Chain Reaction. New York: Birkhauser Boston Inc., 1994
[3]	7 Merrifield RB. Solid phase peptide synthesis. I. The synthesis of a tetrapeptide. J Am Chem Soc, 1963, 85: 2149-2154
[4]	15 Ruddigkeit L, van Deursen R, Blum L C, et al. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J Chem Inf Model, 2012, 52: 2864-2875
[5]	16 Ieong P U, S？rensen J, Vemu P L, et al. Progress towards automated kepler scientific workflows for computer-aided drug discovery and molecular simulations. Proc Comput Sci, 2014, 29: 1745-1755
[6]	17 Ge H, Wang Y, Li C, et al. Molecular dynamics-based virtual screening: Accelerating the drug discovery process by high-performance computing. J Chem Inf Model, 2013, 53: 2757-2764
[7]	18 White M J. Chemical patents. In: Currano J, Roth D, eds. Chemical Information for Chemists: A Primer. Cambridge: Royal Society of Chemistry, 2013. 53
[8]	28 Shi L, Campbell G, Jones W D, et al. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol, 2010, 28: 827-838
[9]	29 Xu J. Two-dimensional structure and substructure searching. In: Gasteiger J, ed. Handbook of Chemoinformatics. Weinheim: Wiley-VCH Verlag GmbH, 2008. 868-884
[10]	30 Barnard J M. Substructure searching methods: Old and new. J Chem Inf Comput Sci, 1993, 33: 532-538
[11]	31 Zhang L, Zhang Y, Gu X, et al. Scalable similarity search with topology preserving hashing. IEEE Transact Image Proc, 2014, 23: 3025-3039
[12]	32 Bontcheva K, Tablan V, Cunningham H. Semantic search over documents and ontologies. In: Ferro N, ed. Bridging Between Information Retrieval and Databases. Berlin: Springer-Verlag, 2014. 31-53
[13]	33 Pearson W. BLAST and FASTA similarity searching for multiple sequence alignment. In: Russell D J, ed. Multiple Sequence Alignment Methods. New York: Humana Press, 2014. 75-101
[14]	34 Geyer P. Markush structure searching by information professionals in the chemical industry—Our views and expectations. World Patent Inf, 2013, 35:178-182
[15]	36 Smalter H A, Shan Y, Lushington G, et al. An overview of computational life science databases & exchange formats of relevance to chemical biology research. Comb Chem High Throughput Screen, 2013, 16: 189-198
[16]	37 Herndon W C, Bertz S H. Linear notations and molecular graph similarity. J Comput Chem, 1987, 8: 367-374
[17]	38 Warr W A. Representation of chemical structures. Wiley Interdiscip Rev Comput Mol Sci, 2011, 1: 557-579
[18]	39 Southan C. InChI in the wild: An assessment of InChI Key searching in Google. J Cheminf, 2013, 5: 10
[19]	40 Tenenbaum J B, Langford J C, Silva V D. A global geometric framework for nonlinear dimensionality reduction. Science, 2000, 290: 2319-2323
[20]	41 Abdi H, Williams L J. Principal component analysis. Wiley Interdiscip Rev Comput Stat, 2010, 2: 433-459
[21]	42 Kruskal J B. Nonmetric multidimensional scaling: A numerical method. Psychometrika, 1964, 29: 115-129
[22]	48 Eckert H, Bajorath J. Molecular similarity analysis in virtual screening: Foundations, limitations and novel approaches. Drug Discov Today, 2007, 12: 225-233
[23]	49 Durrant J D, McCammon J A. Molecular dynamics simulations and drug discovery. BMC Biol, 2011, 9: 71
[24]	50 G？tz A W, Williamson M J, Xu D, et al. Routine microsecond molecular dynamics simulations with AMBER on GPUs. 1. Generalized born. J Chem Theory Comput, 2012, 8: 1542-1555
[25]	56 Li H, Xie Y, Liu C, et al. Physicochemical bases for protein folding, dynamics, and protein-ligand binding. Sci China Life Sci, 2014, 57: 287-302
[26]	57 Li C, Ge H, Cui L, et al. Molecular mechanism of action of K(D)PT as an IL-1RI antagonist for the treatment of rhinitis. RSC Adv, 2014, 4: 48741-48749
[27]	58 Yan X, Li J, Liu Z, et al. Enhancing molecular shape comparison by weighted Gaussian functions. J Chem Inf Model, 2013, 53: 1967-1978
[28]	65 Ekins S, Freundlich J S, Reynolds R C. Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for mycobacterium tuberculosis. J Chem Inf Model, 2014, 54: 2157-2165
[29]	2 Shen B, Teschendorff A E, Zhi D, et al. Biomedical data integration, modeling, and simulation in the era of big data and translational medicine. Biomed Res Int, 2014, 2014: 731546
[30]	3 Martin-Sanchez F, Verspoor K. Big data in medicine is driving big changes. Yearbook Med Informatics, 2014, 9: 14-20
[31]	4 Ndiaye N C. Systems medicine in the era of “big data”: A game-changer for personalized medicine? Drug Metab Drug Interact, 2014, 29: 127
[32]	5 Bartlett J S, Stirling D. A Short history of the polymerase chain reaction. In: Bartlett J S, Stirling D, eds. PCR Protocols. New York: Humana Press, 2003. 3-6
[33]	8 Pereira D A, Williams J A. Origin and evolution of high throughput screening. Br J Pharmacol, 2007, 152: 53-61
[34]	9 Baker M. Big biology: The 'omes puzzle. Nature, 2013, 494: 416-419
[35]	10 Massarotti A, Brunco A, Sorba G, et al. ZINClick: A database of 16 million novel, patentable, and readily synthesizable 1,4-disubstituted triazoles. J Chem Inf Model, 2014, 54: 396-406
[36]	11 Clery D. Light loophole wins laurels. Science, 2014, 346: 290-291
[37]	12 Giuliano K A, Haskins J R, Taylor D L. Advances in high content screening for drug discovery. Assay Drug Develop Technol, 2003, 1: 565-577
[38]	13 Zhang X, Yang C, Liu F, et al. Optimizing and scaling HPCG on tianhe-2: Early experience. In: Sun X H, Qu W, Stojmenovic I, et al., eds. Algorithms and Architectures for Parallel Processing. 14th International Conference (ICA3PP 2014). Cham, Switzerland: Springer International Publishing, 2014. 28-41
[39]	14 Service RF. Biology's dry future. Science, 2013, 342: 186-189
[40]	19 World Health Organization. International Statistical Classification of Diseases and Related Health Problems Tenth Revision (ICD-10), 2007
[41]	20 Genovese G, Handsaker R E, Li H, et al. Using population admixture to help complete maps of the human genome. Nat Genet, 2013, 45: 406-414
[42]	21 Feinleib D. The Big Data Landscape. Big Data Bootcamp. New York: Apress, 2014. 15-34
[43]	22 Xu J. GMA: A generic match algorithm for structural homomorphism, isomorphism, and maximal common substructure match and its applications. J Chem Inf Comput Sci, 1996, 36: 25-34
[44]	23 Degtyarenko K, Hastings J, Matos P, et al. ChEBI: An open bioinformatics and cheminformatics resource. Curr Protoc Bioinf, 2009, 14: 14.9
[45]	24 Marx V. Biology: The big challenges of big data. Nature, 2013, 498: 255-260
[46]	25 Collignon B, Schulz R, Smith J C, et al. Task-parallel message passing interface implementation of Autodock4 for docking of very large databases of compounds using high-performance super-computers. J Comput Chem, 2011, 32: 1202-1209
[47]	26 Shaw D E, Maragakis P, Lindorff-Larsen K, et al. Atomic-level characterization of the structural dynamics of proteins. Science, 2010, 330: 341-346
[48]	27 Rutherford K M, Harris M A, Lock A, et al. Canto: An online tool for community literature curation. Bioinformatics, 2014, 30: 1791-1792
[49]	35 Gasarch W I. The P=?NP poll. SIGACT News, 2002, 33: 34-47
[50]	43 Kohonen T. Self-Organization And Associative Memory. 3rd ed. New York: Springer-Verlag, 1989
[51]	44 Jain A K, Murty M N, Flynn P J. Data clustering: A review. ACM Comput Surv, 1999, 31: 264-323
[52]	45 Warmuth M K, Liao J, R？tsch G, et al. Active learning with support vector machines in the drug discovery process. J Chem Inf Comput Sci, 2003, 43: 667-673
[53]	46 Cramer G, Ford R, Hall R. Estimation of toxic hazard—A decision tree approach. Food Cosmet Toxicol, 1976, 16: 255-276
[54]	47 Kohavi R. Scaling up the accuracy of na？ve-bayes classifiers: A decision-tree hybrid. In: Simoudis E, Han J, Fayyad U, eds. Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96). Menlo Park, CA: AAAI Press, 1996. 202-207
[55]	51 Salomon-Ferrer R, G？tz A W, Poole D, et al. Routine microsecond molecular dynamics simulations with Amber on GPUs. 2. Explicit solvent particle mesh Ewald. J Chem Theory Comput, 2013, 9: 3878-3888
[56]	52 Stone J E, Hardy D J, Ufimtsev I S, et al. GPU-accelerated molecular modeling coming of age. J Mol Grap, 2010, 29: 116-125
[57]	53 Suhartanto H, Yanuar A, Wibisono A. Performance analysis cluster and GPU computing environment on molecular dynamic simulation of BRV-1 and REM2 with GROMACS. Int J Comput Sci Issu, 2011, 8: 131-135
[58]	54 Wang L, Gu Q, Zheng X, et al. Discovery of new selective human aldose reductase inhibitors through virtual screening multiple binding pocket conformations. J Chem Inf Model, 2013, 53: 2409-2422
[59]	55 Liu L, Liu X, Gong J, et al. Accelerating all-atom normal mode analysis with graphics processing unit. J Chem Theory Comput, 2011, 7: 1595-1603
[60]	59 Yan X, Li J, Gu Q, et al. gWEGA: GPU-accelerated WEGA for molecular superposition and shape comparison. J Comput Chem, 2014, 35: 1122-1130
[61]	60 Zheng M, Liu Z, Yan X, et al. LBVS: An online platform for ligand-based virtual screening using publicly accessible databases. Mol Divers, 2014, 18: 829-840
[62]	61 Liu Z, Zheng M, Yan X, et al. ChemStable: A web server for rule-embedded na？ve Bayesian learning approach to predict compound stability. J Comput Aided Mol Des, 2014, 28: 941-950
[63]	62 Asadi N B. High performance reconfigurable computing for learning bayesian networks with flexible parametrization. Doctor Dissertation. Palo Alto: Stanford University, 2010
[64]	63 Fang J, Yang R, Gao L, et al. Predictions of BuChE inhibitors using support vector machine and naive bayesian classification techniques in drug discovery. J Chem Inf Model, 2013, 53: 3009-3020
[65]	64 Wang L, Chen L, Liu Z, et al. Predicting mTOR Inhibitors with a classifier using recursive partitioning and na？ve bayesian approaches. PLoS One, 2014, 9: e95221

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133