Cite this article as:

Tverdokhlebov V. A., Kariakin D. A. Classification and Recognition of Structures of Genetic Sequences. Izv. Saratov Univ. (N. S.), Ser. Math. Mech. Inform., 2019, vol. 19, iss. 3, pp. 338-350. DOI: https://doi.org/10.18500/1816-9791-2019-19-3-338-350


Published online: 
31.08.2019
Language: 
Russian
Heading: 
UDC: 
501.1

Classification and Recognition of Structures of Genetic Sequences

Abstract: 

For solving problems of determining the relationships between the properties of organisms and the properties of the corresponding genetic sequences, we proposed a classification of genetic sequences based on numerical indicators of recurrent and Z-recurrent shapes, which define the structure of functional relationships of elements in sequences. For numerical indicators of recurrent and Z-recurrent shapes, we introduce a method of classification of genetic sequences. We compared a numerical characteristic that generalizes numerical values with a numerical characteristic of recurrent or Z-recurrent shapes which determine the structure of a sequence for each sequence of a biological rank considered in the recognition problem, which has a meaningful in-terpretation in the application area. The problem of recognition is considered from two points of view: when we determine belonging of a sequence to a specific rank of sequences, and when we determine which group of sequences contains the experimental sequence. Basic mathematical difficulties in solving these recognition problems are associated with the search difference in numerical representation of recurrent and Z-recurrent shapes of experimental sequences. To
overcome these difficulties we created a spectrum of numerical indicators of recurrent and Z-recurrent shapes. Classification and recognition of sequences are illustrated by an example with three ranks of genetic codes of organisms, each of them represented by 5 sequences. Z-recurrent shape is introduced to define and extend the classification of sequences and increase the efficiency of recognition methods.

References

1. Tverdokhlebov V. A. Geometric Shape Automaton Mappings, Recurrent and Z-recurrent Definition Sequences. Izv. Saratov Univ. (N.S.), Ser. Math. Mech. Inform., 2016, vol. 16, iss. 2, pp. 232–241 (in Russian). DOI: https://doi.org/10.18500/1816-9791-2016-16-2-232-241

2. Tverdokhlebov V. A. Z-recurrent definition sequences in the tasks of monitoring and diagnosing processes in systems. Reports of the Academy of Military Sciences, 2016, no. 2 (70), pp. 43–47 (in Russian).
3. Kariakin D. A. Analysis of genetic codes by indicators interposition of nucleotides. In: Komp’yuternye nauki i informatsionnye tekhnologii [Computer Science and Information Technology: Proc. Int. Sci. Conf.]. Saratov, Publ. Center “Nauka”, 2016, pp. 190–193 (in Russian).
4. Lewin B. Geny [Genes]. Moscow, BINOM, Laboratoriya znanij Publ., 2011. 896 p. (in Russian).
5. Watson D. Dvojnaya spiral’. Vospominaniya ob otkrytii struktury DNK [Double helix. Memories of the discovery of the structure of DNA]. Moscow. Mir, 1969. 152 p. (in Russian).
6. Hogeweg P. The Roots of Bioinformatics in Theoretical Biology. PLoS. Computational Biology, 2011, vol. 7, iss. 3, art. ID e1002021. DOI: https://doi.org/10.1371/journal. pcbi.1002021
7. Wattam A. R., Abraham D., Dalay O., Disz T. L., Driscoll T., Gabbard J. L., Gillespie J. J., Gough R., Hix D., Kenyon R., Machi D., Mao C., Nordberg E. K., Olson R., Overbeek R., Pusch G. D., Shukla M., Schulman J., Stevens R. L., Sullivan D. E., Vonstein V., Warren A., Will R., Wilson M. J., Yoo H. S., Zhang C., Zhang Y., Sobral B. W. PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res., 2014, vol. 42, iss. D1, pp. D581–D591. DOI: https://doi.org/10.1093/nar/gkt1099
8. Barnett D. W., Garrison E. K., Quinlan A. R., Stromberg M. P., Marth G. T. BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics, 2011, vol. 27, iss. 12, pp. 1691–1692. DOI: https://doi.org/10.1093/bioinformatics/btr174
9. Plieskatt J., Rinaldi G., Brindley P. J., Jia X., Potriquet J., Bethony J., Mulvenna J. Bioclojure: a functional library for the manipulation of biological sequences. Bioinformatics, 2014, vol. 30, iss. 17, pp. 2537–2539. DOI: https://doi.org/10.1093/bioinformatics/btu311
10. Goto N., Prins P., Nakao M., Bonnal R., Aerts J., Katayama T. BioRuby: bioinformatics software for the Ruby programming language. Bioinformatics, 2010, vol. 26, iss. 20, pp. 2617–2619. DOI: https://doi.org/10.1093/bioinformatics/btq475
11. de Brevern A. G., Meyniel J. P., Fairhead C., Neuvéglise C., Malpertuy A. Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies. BioMed Research International, vol. 2015, art. ID 904541, 15 p. DOI: http://dx.doi.org/10.1155/2015/904541
12. Schuster S. C. Next-generation sequencing transforms today’s biology. Nature Methods, 2008, vol. 5, iss. 1, pp. 16–18. DOI: https://doi.org/10.1038/nmeth1156
13. Singer M., Berg P. Geny i genomy [Genes and genomes]. Moscow, Mir, 1998. 391 p. (in Russian).
14. Berg J. M., Tymoczko J. L., Stryer L. DNA, RNA, and the Flow of Genetic Information. In: Berg J. M., Tymoczko J. L., Stryer L. Biochemistry. 5th. ed. New York, W. H. Freeman and Company, 2002. 1515 p.
15. NCBI Genome List. Available at: http://www.ncbi.nlm.nih.gov/genome/browse/ (accessed 18 Desember 2017).

Full text: