%0 Journal Article %T Finding biomarkers in non-model species: literature mining of transcription factors involved in bovine embryo development %A Turenne Nicolas %A Tiys Evgeniy %A Ivanisenko Vladimir %A Yudin Nikolay %J BioData Mining %D 2012 %I BioMed Central %R 10.1186/1756-0381-5-12 %X Background Since processes in well-known model organisms have specific features different from those in Bos taurus, the organism under study, a good way to describe gene regulation in ruminant embryos would be a species-specific consideration of closely related species to cattle, sheep and pig. However, as highlighted by a recent report, gene dictionaries in pig are smaller than in cattle, bringing a risk to reduce the gene resources to be mined (and so for sheep dictionaries). Bioinformatics approaches that allow an integration of available information on gene function in model organisms, taking into account their specificity, are thus needed. Besides these closely related and biologically relevant species, there is indeed much more knowledge of (i) trophoblast proliferation and differentiation or (ii) embryogenesis in human and mouse species, which provides opportunities for reconstructing proliferation and/or differentiation processes in other mammalian embryos, including ruminants. The necessary knowledge can be obtained partly from (i) stem cell or cancer research to supply useful information on molecular agents or molecular interactions at work in cell proliferation and (ii) mouse embryogenesis to supply useful information on embryo differentiation. However, the total number of publications for all these topics and species is great and their manual processing would be tedious and time consuming. This is why we used text mining for automated text analysis and automated knowledge extraction. To evaluate the quality of this ˇ°miningˇ±, we took advantage of studies that reported gene expression profiles during the elongation of bovine embryos and defined a list of transcription factors (or TF, n = 64) that we used as biological ˇ°gold standardˇ±. When successful, the ˇ°miningˇ± approach would identify them all, as well as novel ones. Methods To gain knowledge on molecular-genetic regulations in a non model organism, we offer an approach based on literature-mining and score arrangement of data from model organisms. This approach was applied to identify novel transcription factors during bovine blastocyst elongation, a process that is not observed in rodents and primates. As a result, searching through human and mouse corpuses, we identified numerous bovine homologs, among which 11 to 14% of transcription factors including the gold standard TF as well as novel TF potentially important to gene regulation in ruminant embryo development. The scripts of the workflow are written in Perl and available on demand. They require data input coming from all various databa %U http://www.biodatamining.org/content/5/1/12