|
A top-down approach to classify enzyme functional classes and sub-classes using random forestAbstract: Recent advancements in sequencing technologies have seen an exponential growth in protein sequences, thus bringing to light new metabolic pathways. For many such newly found protein sequences, it is of prime interest to biologists to identify their biological function. In a biology lab, scientists conduct expensive and time consuming experiments to decipher the function of the sequences. One of the questions they often strive to address is whether the query protein is an enzyme or non-enzyme. Enzymes, as we all know catalyze biochemical reactions, but they perform this function differently using mechanisms depending on their bio-chemical properties. This has lead to the genesis of an interesting problem in Bioinformatics, i.e., given a protein sequence, how well can we classify it as an enzyme and accurately predict its function?In light of the key biological role of enzyme proteins, the Enzyme Commission (EC) of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) has created a hierarchical classification scheme based on the functional mechanism of enzymes [1]. Each enzyme is designated an EC number of the format X.Y.Z.W., where 'X' at the top of this scheme represents one of the six main classes (one-six), each further sub-divided to three levels in the hierarchy (Y.Z.W). The six main classes are Oxidoreductases (1), Transferases (2), Hydrolases (3), Lyases (4), Isomerases (5), and Ligases (6). Considering the costly experiments scientists conduct to know the enzyme mechanism, a need is felt for an automated method that can reliably predict the EC function class and thus significantly expedite experimental investigations on the query enzyme.Enzyme function classification has engaged bioinformaticians for a considerable time now resulting in different feature extraction methods to tackle this problem. There are three prominent approaches that have been widely experimented with: first, using sequence similarity between enzymes belonging to same f
|