Thursday, May 2, 2019
Efficiency of Clustering algorithms for mining large biological data Research Paper
Efficiency of Clustering algorithmic programs for mining declamatory biological info bases - Research Paper ExampleThey are categorized into portioning, hierarchical and graph-based techniques. The most widely use of the three algorithms are the graph-based technique, and the hierarchical technique. However, the variance techniques are employ in other disciplines it is less utilise in gene era clustering and as such, there is no substantial theory of whether the partitioning methods are efficient. This study analyzes four clustering mining algorithms using four large protein sequence info sets. The analysis highlights the weakness and shortcomings of the four and proposes a new algorithm based on the shortcomings of the four algorithms. design Today, protein sequences are more than one million (Sasson et al., 2002) and as such, there is need in bioinformatics for identifying meaningful patterns for the purposes of dread their functions. For a long time, protein and gene seq uences leave been analyzed, compared and grouped using alignment methods. According to Cai et al. (2000), alignment methods are algorithms constructed to arrange, RNA, DNA, and protein sequences to detect similarities that may be as a result of evolutionary, functional or structural sequence relationships. Mount (2002) asserts that bathroomvas and clustering sequences is done using pair-wise alignment method, which are of two types, global and topical anaesthetic. Consequently, local alignment algorithm proposed by Waterman and Smith (Bolten et al., 2001) is utilized in identifying amino acid patterns that have been conserved in protein sequences. The global alignment algorithm proposed by Wunsh and Needleman (Bolten et al., 2001) is used to try and align numerous characters of the entire sequence. It is clear from the above that the pair-wise alignment method is expensive when it comes to comparing and clustering a large protein data set. This is because there are very many co mparisons performed during computation, since every single protein in a data set is compared to all the proteins in the data set (Bolten et al., 2001). This brings into question the efficiency of the pair-wise alignment methods in comparing and clustering of large protein data sets. The pair-wise alignment method, both local and global, do not put into considerateness the size of the data set, especially too large data sets that may overwhelm the computer memory. Han & Kamber (2000) argues that, unsupervised learning is aimed at identifying from a data set, a sensible partition or a immanent pattern with the help of a distance function. Biology and life science field of operationss have extensively exploited clustering techniques in sequence analysis to classify similar sequences into either protein or gen families (Galperin & Koonin, 2001). Currently, protein sequences can be classified in similar patterns using various, readily available sequencing and clustering methods. As ha d originally been mentioned, these methods can be grouped as graph-based, partitioning and hierarchical methods. These methods, especially graph-based and hierarchical methods, have been used consecutively or together to complement each other as argued by Sasson et al. (2002), Sperisen & Pagni (2005), Essoussi & Fayech (2007) and Enright & Ouzounis (2000). In the field of protein comparison and sequence clustering, there are very few instances in which partitioning techniques have been used. For instance, Guralnik & Karypis (2001) proposed an algorithm or sequencing method-on the
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.