全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Consistency of VDJ Rearrangement and Substitution Parameters Enables Accurate B Cell Receptor Sequence Annotation

DOI: 10.1371/journal.pcbi.1004409

Full-Text   Cite this paper   Add to My Lib

Abstract:

VDJ rearrangement and somatic hypermutation work together to produce antibody-coding B cell receptor (BCR) sequences for a remarkable diversity of antigens. It is now possible to sequence these BCRs in high throughput; analysis of these sequences is bringing new insight into how antibodies develop, in particular for broadly-neutralizing antibodies against HIV and influenza. A fundamental step in such sequence analysis is to annotate each base as coming from a specific one of the V, D, or J genes, or from an N-addition (a.k.a. non-templated insertion). Previous work has used simple parametric distributions to model transitions from state to state in a hidden Markov model (HMM) of VDJ recombination, and assumed that mutations occur via the same process across sites. However, codon frame and other effects have been observed to violate these parametric assumptions for such coding sequences, suggesting that a non-parametric approach to modeling the recombination process could be useful. In our paper, we find that indeed large modern data sets suggest a model using parameter-rich per-allele categorical distributions for HMM transition probabilities and per-allele-per-position mutation probabilities, and that using such a model for inference leads to significantly improved results. We present an accurate and efficient BCR sequence annotation software package using a novel HMM “factorization” strategy. This package, called partis (https://github.com/psathyrella/partis/), is built on a new general-purpose HMM compiler that can perform efficient inference given a simple text description of an HMM.

References

[1]  Cooper MD. The early history of B cells. Nat Rev Immunol. 2015 Mar;15(3):191–197. Available from: . pmid:25656707
[2]  Huppa JB, Davis MM. T-cell-antigen recognition and the immunological synapse. Nat Rev Immunol. 2003 Dec;3(12):973–983. Available from: . pmid:14647479
[3]  Hozumi N, Tonegawa S. Evidence for somatic rearrangement of immunoglobulin genes coding for variable and constant regions. Proc Natl Acad Sci U S A. 1976 Oct;73(10):3628–3632. Available from: . doi: 10.1073/pnas.73.10.3628. pmid:824647
[4]  Tonegawa S. Somatic generation of antibody diversity. Nature. 1983 14 Apr;302(5909):575–581. Available from: . doi: 10.1038/302575a0. pmid:6300689
[5]  Weigert MG, Cesari IM, Yonkovich SJ, Cohn M. Variability in the lambda light chain sequences of mouse antibody. Nature. 1970 12 Dec;228(5276):1045–1047. Available from: . doi: 10.1038/2281045a0. pmid:5483159
[6]  McKean D, Huppi K, Bell M, Staudt L, Gerhard W, Weigert M. Generation of antibody diversity in the immune response of BALB/c mice to influenza virus hemagglutinin. Proc Natl Acad Sci U S A. 1984 May;81(10):3180–3184. Available from: . doi: 10.1073/pnas.81.10.3180. pmid:6203114
[7]  Boyd SD, Marshall EL, Merker JD, Maniar JM, Zhang LN, Sahaf B, et al. Measurement and Clinical Monitoring of Human Lymphocyte Clonality by Massively Parallel V-D-J Pyrosequencing. Sci Transl Med. 2009 Dec;1(12):12ra23–12ra23. Available from: . pmid:20161664
[8]  Weinstein JA, Jiang N, White RA 3rd, Fisher DS, Quake SR. High-throughput sequencing of the zebrafish antibody repertoire. Science. 2009 8 May;324(5928):807–810. Available from: . pmid:19423829
[9]  Larimore K, McCormick MW, Robins HS, Greenberg PD. Shaping of human germline IgH repertoires revealed by deep sequencing. J Immunol. 2012 3 Aug;189(6):3221–3230. Available from: . pmid:22865917
[10]  Reddy ST, Ge X, Miklos AE, Hughes RA, Kang SH, Hoi KH, et al. Monoclonal antibodies isolated without screening by analyzing the variable-gene repertoire of plasma cells. Nat Biotechnol. 2010 Sep;28(9):965–969. Available from: . pmid:20802495
[11]  Jackson KJL, Kidd MJ, Wang Y, Collins AM. The Shape of the Lymphocyte Receptor Repertoire: Lessons from the B Cell Receptor. Front Immunol. 2013 2 Sep;4:263. Available from: . pmid:24032032
[12]  Georgiou G, Ippolito GC, Beausang J, Busse CE, Wardemann H, Quake SR. The promise and challenge of high-throughput sequencing of the antibody repertoire. Nat Biotechnol. 2014 19 Jan; Available from: . pmid:24441474
[13]  Vollmers C, Sit RV, Weinstein JA, Dekker CL, Quake SR. Genetic measurement of memory B-cell recall using antibody repertoire sequencing. Proc Natl Acad Sci U S A. 2013 13 Aug;110(33):13463–13468. Available from: . pmid:23898164
[14]  Sok D, Laserson U, Laserson J, Liu Y, Vigneault F, Julien JP, et al. The effects of somatic hypermutation on neutralization and binding in the PGT121 family of broadly neutralizing HIV antibodies. PLoS Pathog. 2013 21 Nov;9(11):e1003754. Available from: . pmid:24278016
[15]  Zhu J, Ofek G, Yang Y, Zhang B, Louder MK, Lu G, et al. Mining the antibodyome for HIV-1-neutralizing antibodies with next-generation sequencing and phylogenetic pairing of heavy/light chains. Proc Natl Acad Sci U S A. 2013 16 Apr;110(16):6470–6475. Available from: . pmid:23536288
[16]  Gao F, Bonsignori M, Liao HX, Kumar A, Xia SM, Lu X, et al. Cooperation of B Cell Lineages in Induction of HIV-1-Broadly Neutralizing Antibodies. Cell. 2014 23 Jul; Available from: .
[17]  Kepler TB, Liao HX, Alam SM, Bhaskarabhatla R, Zhang R, Yandava C, et al. Immunoglobulin Gene Insertions and Deletions in the Affinity Maturation of HIV-1 Broadly Reactive Neutralizing Antibodies. Cell Host Microbe. 2014 10 Sep;16(3):304–313. Available from: . pmid:25211073
[18]  Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 5 Oct;215(3):403–410. Available from: . pmid:2231712
[19]  Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981 25 Mar;147(1):195–197. Available from: . doi: 10.1016/0022-2836(81)90087-5. pmid:7265238
[20]  Ye J, Ma N, Madden TL, Ostell JM. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res. 2013 Jul;41(Web Server issue):W34–40. Available from: . pmid:23671333
[21]  Lefranc MP, Giudicelli V, Ginestoux C, Jabado-Michaloud J, Folch G, Bellahcene F, et al. IMGT, the international ImMunoGeneTics information system. Nucleic Acids Res. 2009 1 Jan;37(suppl 1):D1006–D1012. Available from: . doi: 10.1093/nar/gkn838. pmid:18978023
[22]  Souto-Carneiro MM, Longo NS, Russ DE, Sun HW, Lipsky PE. Characterization of the human Ig heavy chain antigen binding complementarity determining region 3 using a newly developed software algorithm, JOINSOLVER. J Immunol. 2004 1 Jun;172(11):6790–6802. doi: 10.4049/jimmunol.172.11.6790. pmid:15153497
[23]  Durbin R, Eddy SR, Krogh A, Mitchison G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press; 1998. Available from: .
[24]  Eddy SR. What is a hidden Markov model? Nat Biotechnol. 2004 Oct;22(10):1315–1316. Available from: . pmid:15470472
[25]  Volpe JM, Cowell LG, Kepler TB. SoDA: implementation of a 3D alignment algorithm for inference of antigen receptor recombinations. Bioinformatics. 2005 15 Dec;22(4):438–444. Available from: . pmid:16357034
[26]  Ga?ta BA, Malming HR, Jackson KJL, Bain ME, Wilson P, Collins AM. iHMMune-align: hidden Markov model-based alignment and identification of germline genes in rearranged immunoglobulin gene sequences. Bioinformatics. 2007 26 Apr;23(13):1580–1587. Available from: . pmid:17463026
[27]  Munshaw S, Kepler TB. SoDA2: a Hidden Markov Model approach for identification of immunoglobulin rearrangements. Bioinformatics. 2010 1 Apr;26(7):867–872. Available from: . pmid:20147303
[28]  Jackson KJL, Gaeta B, Sewell W, Collins AM. Exonuclease activity and P nucleotide addition in the generation of the expressed immunoglobulin repertoire. BMC Immunol. 2004 2 Sep;5:19. Available from: . pmid:15345030
[29]  Murugan A, Mora T, Walczak AM, Callan CG Jr. Statistical inference of the generation probability of T-cell receptors from sequence repertoires. Proc Natl Acad Sci U S A. 2012 2 Oct;109(40):16161–16166. Available from: . pmid:22988065
[30]  Elhanati Y, Sethna Z, Marcou Q, Callan CG Jr, Mora T, Walczak AM. Inferring processes underlying B-cell repertoire diversity. arXiv. 2015 10 Feb; Available from: .
[31]  Benichou J, Glanville J, Prak ETL, Azran R, Kuo TC, Pons J, et al. The restricted DH gene reading frame usage in the expressed human antibody repertoire is selected based upon its amino acid content. J Immunol. 2013 1 Jun;190(11):5567–5577. Available from: . pmid:23630353
[32]  DeKosky BJ, Kojima T, Rodin A, Charab W, Ippolito GC, Ellington AD, et al. In-depth determination and analysis of the human paired heavy- and light-chain antibody repertoire. Nat Med. 2015 Jan;21(1):86–91. Available from: . pmid:25501908
[33]  DeWitt WS, Lindau P, Snyder TM, Emerson RO, Sherwood AM, Vignali M, et al.. A public immunosequencing database of memory and na?ve B cell receptors; 2015.
[34]  Boettiger C. An introduction to Docker for reproducible research, with examples from the R environment. arXiv. 2014 2 Oct; Available from: .
[35]  McCoy CO, Bedford T, Minin VN, Bradley P, Robins H, Matsen FA IV. Quantifying evolutionary constraints on B cell affinity maturation. Submitted to Proc Roy Soc B. 2014 12 Mar; Available from: .
[36]  Lunter G. HMMoC - a compiler for hidden Markov models. Bioinformatics. 2007;23(18):2485–2487. Available from: . doi: 10.1093/bioinformatics/btm350. pmid:17623703
[37]  Lott PC KI. StochHMM: a flexible hidden Markov model tool and C++ library. Bioinformatics. 2014 Jun;30(11):1625–6. Available from: . doi: 10.1093/bioinformatics/btu057. pmid:24489371
[38]  McCoy CO. ighutil. GitHub; 2015. .
[39]  Bolotin DA, Poslavsky S, Mitrophanov I, Shugay M, Mamedov IZ, Putintseva EV, et al. MiXCR: software for comprehensive adaptive immunity profiling. Nat Methods. 2015 29 Apr;12(5):380–381. doi: 10.1038/nmeth.3364. pmid:25924071
[40]  Kuchenbecker L, Nienen M, Hecht J, Neumann AU, Babel N, Reinert K, et al. IMSEQ - a fast and error aware approach to immunogenetic sequence analysis. Bioinformatics. 2015 18 May;. doi: 10.1093/bioinformatics/btv309.
[41]  Robbins H. An Empirical Bayes Approach to Statistics. In: Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics. The Regents of the University of California; 1956. Available from: .
[42]  DeWitt W, Lindau P, Snyder T, Vignali M, Emerson R, Robins H. Replicate immunosequencing as a robust probe of B cell repertoire diversity. arXiv. 2014 1 Oct; Available from: .
[43]  He L, Sok D, Azadnia P, Hsueh J, Landais E, Simek M, et al. Toward a more accurate view of human B-cell repertoire by next-generation sequencing, unbiased repertoire capture and single-molecule barcoding. Sci Rep. 2014 27 Oct;4. Available from: .
[44]  Lee CEH, Ga?ta B, Malming HR, Bain ME, Sewell WA, Collins AM. Reconsidering the human immunoglobulin heavy-chain locus. Immunogenetics. 2006;57(12):917–925. doi: 10.1007/s00251-005-0062-5. pmid:16402215
[45]  Gadala-Maria D, Yaari G, Uduman M, Kleinstein SH. Automated analysis of high-throughput B-cell sequencing data reveals a high frequency of novel immunoglobulin V gene segment alleles. Proc Natl Acad Sci U S A. 2015 9 Feb;. doi: 10.1073/pnas.1417683112. pmid:25675496
[46]  Watson CT, Steinberg KM, Graves TA, Warren RL, Malig M, Schein J, et al. Sequencing of the human IG light chain loci from a hydatidiform mole BAC library reveals locus-specific signatures of genetic diversity. Genes Immun. 2014 23 Oct; Available from: . pmid:25338678
[47]  Saada R, Weinberger M, Shahaf G, Mehr R. Models for antigen receptor gene rearrangement: CDR3 length. Immunol Cell Biol. 2007 3 Apr;85(4):323–332. Available from: . pmid:17404591
[48]  Kepler TB, Borrero M, Rugerio B, McCray SK, Clarke SH. Interdependence of N nucleotide addition and recombination site choice in V(D)J rearrangement. The Journal of Immunology. 1996 15 Nov;157(10):4451–4457. Available from: . pmid:8906821
[49]  Jackson KJL, Ga?ta BA, Collins AM. Identifying highly mutated IGHD genes in the junctions of rearranged human immunoglobulin heavy chain genes. J Immunol Methods. 2007 15 May;324(1-2):26–37. Available from: . pmid:17553518
[50]  Sutton C, McCallum A. An Introduction to Conditional Random Fields. Foundations and Trends? in Machine Learning. 2011;4(4):267–373. Available from: .
[51]  Kepler TB, Munshaw S, Wiehe K, Zhang R, Yu JS, Woods CW, et al. Reconstructing a B-cell Clonal Lineage. II. Mutation, Selection, and Affinity Maturation. Front Immunol. 2014 22 Apr;5:170. Available from: . pmid:24795717
[52]  Eddy SR. Profile hidden Markov models. Bioinformatics. 1998 27 Jan;14(9):755–763. Available from: . pmid:9918945
[53]  Dutheil J, Boussau B. Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs. BMC Evol Biol. 2008 Sep;8(1):255. doi: 10.1186/1471-2148-8-255. pmid:18808672
[54]  Bashford-Rogers RJM, Palser AL, Huntly BJ, Rance R, Vassiliou GS, Follows GA, et al. Network properties derived from deep sequencing of human B-cell receptor repertoires delineate B-cell populations. Genome Res. 2013 Nov;23(11):1874–1884. doi: 10.1101/gr.154815.113. pmid:23742949
[55]  Stadler T. Simulating trees with a fixed number of extant species. Syst Biol. 2011 Oct;60(5):676–84. doi: 10.1093/sysbio/syr029. pmid:21482552

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413