An Improved Exon-Intron Recognition via a Committee of Machines
|
 |
|
Post a Comment
|
 |
|
|
|
CONTRIBUTORS:
|
|
|
JOURNAL:
|
|
|
YEAR:
|
2008
|
|
PUB TYPE:
|
Journal Article
|
|
SUBJECT(S):
|
intron; exon; committee machines; machine recognition
|
|
DISCIPLINE:
|
Information Systems/Technology
|
|
HTTP:
|
http://www.ics.uplb.edu.ph/node/282
|
|
LANGUAGE:
|
English
|
|
PUB ID:
|
103-444-121
(Last edited on
2008/10/20 06:11:50 GMT-6)
|
|
SPONSOR(S):
|
|
|
ABSTRACT:
The human genome consists of a sequence of gene base pairs that generate proteins called exons. Exons are bounded by subsequences, called introns, that are spliced out prior to translation. In RNA splicing, the current procedure followed by researchers to recognize the gene boundaries is the GU-AG heuristic which has the following motif: exon/GU-intron-AG/exon. However, this motif occurs so frequently that a typical intron will contain several GUs and AGs within it, resulting in many false boundaries being recognized. Several methodologies to automate the recognition of these sites have been employed by other researchers, such as support vector machines, hidden Markov models, and artificial neural networks (ANN), where the reported maximum recognition accuracy on a production set is only 81%. A production set is a set of DNA sequences whose intron-exon boundaries are known but where not used in the development of the model. A committee of machines is a computational methodology where the output of multiple models are combined into a single output. The member models' output are combined using several methodologies such as averaging, boosting, bagging and simple majority voting. It has been shown, both theoretically and empirically, that the output of the committee machine is superior to those of its constituent member models. In this effort, we developed a committee of neural network classifiers trained to classify whether a given 60bp long DNA sequence is an intron-exon (IE) boundary (acceptor site), an exon-intron (EI) boundary (donor site), or not (N). Using the same production set used by other researchers, our committee machine was able to recognize 84% of the DNA sequences, improving the recognition rate by 3%.
|
|
|
|
STATISTICS
|
|
Click on # to view
|
|
Citations
|
|
0
|
|
References
|
|
0
|
|
Comments
|
|
0
|
|
Quality
|
|
0/0.00
|
|
Interest
|
|
0/0.00
|
|
View(er)s
|
|
2/84
|
|
|
|
|
|
|
| Prev |
Next |
|