Selección de Características de Microarreglos de ADN Utilizando una Búsqueda Cuckoo
DOI:
https://doi.org/10.30973/progmat/2019.11.3/2Palabras clave:
Microarreglos de ADN, Preprocesamiento, Fusión de filtros, Selección, ClasificaciónResumen
En este artículo, se propone un método híbrido para la selección y clasificación de datos de microarreglos de AND. Primero, el método combina los subconjuntos de genes relevantes obtenidos de cinco métodos de filtro, después, se implementa un algoritmo basado en una búsqueda cuckoo combinado con un clasificador MSV. El algoritmo híbrido explora dentro del subconjunto obtenido en la etapa anterior y selecciona los genes que alcanzan un alto desempeño al entrenar al clasificador. En los resultados experimentales, el algoritmo obtiene una tasa de clasificación alta seleccionado un número pequeño de genes, los resultados obtenidos son comparados con otros métodos reportados en la literatura.
Citas
I. Guyon, A. Elisseeff.: “An Introduction to Variable and Feature Selection”, Journal of Machine Learning Research, pp 1157-1182, 003.
T. Golub, D. Slonim, P. Tamayo et al.: “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring”, Science, pp. 531– 537, 1999. https://doi.org/10.1126/science.286.5439.531
T. Hwang, C. H. Sun, T. Yun, and G. S. Yi.: “Figs: A Filter-Based Gene Selection Workbench for Microarray Data”, BMC Bioinformatics, 2010. https://doi.org/10.1186/1471-2105-11-50
Y. Wang, I. V. Tetko, M. A Hall, E Frank, et al.: “Gene selection from microarray data for cancer classification--a machine learning approach”. Comput Biol Chem, pp. 37- 46, 2005. https://doi.org/10.1016/j.compbiolchem.2004.11.001
A. Kulkarni, B.S.C. N. Kumar, V. Ravi, U. S. Murthy. “Colon cancer prediction with genetics profiles using evolutionary techniques”, Expert Systems with Applications, pp. 2752–2757, 2011. https://doi.org/10.1016/j.eswa.2010.08.065
S. Li, X. Wu, M. Tan.: “Gene Selection using Hybrid Particle Swarm Optimization and Genetic Algorithm”, Soft Comput, pp. 1039–1048, 2008. https://doi.org/10.1007/s00500-007-0272-x
M. S. Mohamad, et al.: “A Hybrid of Genetic Algorithm and Support Vector Machine for Features Selection and Classification of Gene Expression Microarray”, International Journal of Computational Intelligence and Applications, pp. 91–107, 2005. https://doi.org/10.1142/S1469026805001465
F. XU, L. WEI, W. WANG.: “Fuzzy Rough Feature Selection Based on Normalized Conditional Mutual Information”, Journal of Computational Information Systems, pp. 2519–2529, 2012.
D. Mishra, B. Sahu,: “Feature Selection for Cancer Classification: A Signal-tonoise Ratio Approach”, International Journal of Scientific & Engineering Research. (2011).
P. Yang, B. B Zhou, Z. Zhang, A. Zomaya.: “A Multi-filter Enhanced Genetic Ensemble System for Gene Selection and Sample Classification of Microarray Data”, the Eighth Asia Pacific Bioinformatics Conference Bangalore, pp. 18-21, 2010. https://doi.org/10.1186/1471-2105-11-S1-S5
E. Bonilla-Huerta, et al.: “Hybrid Framework using Multiple-Filters and an Embedded Approach for an Efficient Selection and Classification of Microarray Data”, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2015. https://doi.org/10.1109/TCBB.2015.2474384
R. P. Rubido. “Una revisión a algoritmos de selección de atributos que tratan la redundancia en datos microarreglos”. Revista Cubana de Ciencias Informáticas, pp. 16 - 30. 2013
Q. Huang, D. Tao, X. Li, W. C. Liew. “Parallelized Evolutionary Learning for Detection of Biclusters in Gene Expression Data”, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2012. https://doi.org/10.1109/TCBB.2011.53
U. Alon, N. Barkai, D. Notterman et al.: “Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays”, Proc. Nat. Acad. Sci. USA, pp. 6745– 6750, 1999. https://doi.org/10.1073/pnas.96.12.6745
S. L. Pomeroy, P. Tamayo, M. Gaasenbeek, L. M. Sturla, and T. Golub.: “Prediction of central nervous system embryonal tumour outcome based on gene expression”, Nature, pp. 436–442, 2002. https://doi.org/10.1038/415436a
G.J. Gordon et al.: “Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma”, Cancer Res., 2002.
A. A. Alizadeh, B.M. Eisen, R.E. Davis et al.: “Distinct Types of Diffuse Large (B)–Cell Lymphoma Identified by Gene Expression Profiling”, Nature, pp. 503–511, 2000. https://doi.org/10.1038/35000501
TG Dietterich. “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting and Randomization”. Machine Learning. 2000; 40:139–158. https://doi.org/10.1023/A:1007607513941
W. L Martínez, A. R. Martinez: “Exploratory Data Analysis with MATLAB®”. A CRC Press Company. Boca Ratón London New York Washington, D.C. (2005).
L. Ladha et al.: “Feature Selection Methods and Algorithms”, International Journal on Computer Science and Engineering (IJCSE), pp. 1787-1797, 2011.
S. Dudoit, J. Fridlyand, T. Speed. “Comparison of Discrimination Methods for the Classification of Tumors using Gene Expression Data”, Journal of the American Statistical Association, pp. 77–87, 2002. https://doi.org/10.1198/016214502753479248
J. C. Porras-Cerrón. “Componentes Principales Supervisados Para Clasificación De Datos De Expresión Genética”, Tesis de Maestro en Ciencias, Universidad De Puerto Rico Mayagüez, 2005.
A. H. Tan, H. Pan.: “Predictive Neural Networks for Gene Expression Data Analysis”, Neural Networks, pp. 297– 306, 2005. https://doi.org/10.1016/j.neunet.2005.01.003
P. Radivojac, Z. Obradovic, A. K. Dunker, S. Vucetic, "Feture selection filters based on the permutation test", Proc. ECML, pp. 334-346, 2004. https://doi.org/10.1007/978-3-540-30115-8_32
B. Kumari, T. Swarnkar, “Filter versus Wrapper Feature Subset Selection in Large Dimensionality Micro array: A Review”, International Journal of Computer Science and Information Technologies, Vol. 2 (3) , 2011, 1048- 1053. 2012
X.S. Yang and S. Deb, ‘Cuckoo search via Levy flights’, World Congress on Nature & Biologically Inspired Computing NaBIC’09, 9–11 December, Coimbatore, India, pp.210–214. 2009 https://doi.org/10.1109/NABIC.2009.5393690
C. Gunavathi, and K. Premalatha, ‘Cuckoo search optimization for feature selection in cancer classification: a new approach’, Int. J. Data Mining and Bioinformatics, Vol. 13, No. 3, pp.248– 265. (2015) https://doi.org/10.1504/IJDMB.2015.072092
S. Wang, H. Chen, R. Li, D. Zhang. “Gene Selection with Rough Sets for the Molecular Diagnosing of Tumor Base on Support Vector Machines”, International Computer Symposium, pp. 1368-1373, 2006.
L. K. Lou, D. F. Huang, L. J. Ye, Q. F. Zhou, G. F. Sheo, F Peng. “Improving the Computational Efficieny of Cluster Elimination for Gene Selection”. IEEE/ACM Trans. Comput. Bioinform. 8(1): 122-129. 2011.
G. Kulshestha A. Agarwal A. Mittal A. Sahoo Hybrid cuckoo search algorithm for simultaneous feature and classifier selection, International Conference on Cognitive Computing and Information Processing (CCIP), IEEE, pp. 1 – 6, 2015, https://doi.org/10.1109/CCIP.2015.7100701
J. C., Hernández-Hernández, B. J., Duval, K. Hao “SVM-based local search for gene selection and classification of microarray data”. Comunicativos in Computer and Information Science, Vol. 13. pp. 499–508. 2008. https://doi.org/10.1007/978-3-540-70600-7_39
Yu, L., Han, Y. and Berens M. E.: “Stable gene selection from microarray data via sample weighting”, IEEE/ACM Transactions on Computational Biology and Bioinformatics, pp. 262-272, 2012. https://doi.org/10.1109/TCBB.2011.47
M., Filippone, F., Masulli, S. Rovetta, “Simulated Annealing for Supervised Gene Selection”, Soft Computing, pp. 1471–1482, 2011. https://doi.org/10.1007/s00500-010-0597-8
S.-B. Cho, and H.-H Won: Cancer classification using ensemble of neural networks with multiple significant gene subsets. In Applied Intelligence, 26(3):243–250, 2007. https://doi.org/10.1007/s10489-006-0020-4
L., Zhang, Z., Li, and H. Chen, “An effective gene selection method based on relevance analysis and discernibility matrix.” In PAKDD, volume 4426 of Lecture Notes in Computer Science, pages 1088– 1095, 2007. https://doi.org/10.1007/978-3-540-71701-0_123
S., Pang, I., Havukkala, Y Hu and N. Kasabov. “Classification consistency analysis for bootstrapping gene selection”. In Neural Computing and Applications, 16:527,539, (2007). https://doi.org/10.1007/s00521-007-0110-1
G-Z., Li, X-Q Zeng, J.Y Yang, and M. Q Yang. “Partial least squares based dimension reduction with gene selection for tumor classification”. In Proceedings of IEEE 7th International Symposium on Bioinformatics and Bioengineering, pages 1439–1444, 2007. https://doi.org/10.1109/BIBE.2007.4375763
A. C. Tan, and D.Gilbert: “Ensemble machine learning on gene expression data for cancer classification”. In Applied Bioinformatics, 2(2):75–83, 2003. http://bura.brunel.ac.uk/handle/2438/3013
F., Yue, K., Wang, and W. Zuo, “Informative gene selection and tumor classification by null space LDA for microarray data”. In ESCAPE’07, volume 4614 of Lecture Notes in Computer Science, pages 435–446. Springer, 2007. https://doi.org/10.1007/978-3-540-74450-4_39
G. Yu , Y. Feng, D. J. Miller, J. Xuan, E. P. Hoffman, R. Clarke, B. Davidson, I. M. Shih, Y. Wang.: “Matched Gene Selection and Committee Classifier for Molecular Classification of Heterogeneous Diseases”, Journal of Machine Learning Research, pp. 2141- 2167, 2010.
L., Sun, D., Miao, H. Zhang “Gene Selection and Cancer Classification: A Rough Sets Based Approach”. Transactions on Rough Sets XII. LNCS Springer, Heidelberg, vol. 6190, pp. 106–116. 2010. https://doi.org/10.1007/978-3-642-14467-7_6
Y. Leung and Y. Hung. “A Multiplefilter-multiple-wrapper Approach to Gene Selection and Microarray Data Classification”. In IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(1):108117, 2010. https://doi.org/10.1109/TCBB.2008.46
S-L. Wang, X. Li, S. Zhang, J. Gui and D-S. Huang. “Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction”. In Computers in Biology and Medicine. 40,179-189, 2010. https://doi.org/10.1016/j.compbiomed.2009.11.014
S-W. Zhang, D-S. Huang and S. L. Wang. “A method of tumor classification based on wavelet packet transforms and neighborhood rough set”. In Computers in Biology and Medicine, 40, 420–437, 2010. https://doi.org/10.1016/j.compbiomed.2010.02.007
C. H. Zheng, L. Zhang, V. T. Ng, S. C. Shiu and D. S. Huang. “Metasample Based sparse representation for tumor classification”. In IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(5):1273–1282. 2011. https://doi.org/10.1109/TCBB.2011.20
S-L. Wang, L. Sun, and J. Fang. “Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification”. In BMC Bioinformatics, 13(178): 1–26, 2013. https://doi.org/10.1186/1471-2105-13-178
X. Wang, O. Gotoh.: “Cancer Classification using Single Genes”, Genome Informatics, pp. 176-188, 2009. https://doi.org/10.1142/9781848165632_0017
L.F. Wessels, M.J.T. Reinders, T. VanWelsem, P.M. Nederlof and Y. Wang. “Representation and classification for high-throughput data”. In SPIE., 4626:226–237, 2002. https://doi.org/10.1117/12.472086
W. Chu, Z. Ghahramani, F. Falciani and D.L. Wild. “Biomarker discovery in microarray gene expression with gaussian process”. In Bioinformatics, 21(16):3385–3393, 2005. https://doi.org/10.1093/bioinformatics/bti526
K. Deb and R. Reddy. “Reliable classification of two-class cancer data using evolutionary algorithms”. In BioSystems, 72(1):111–129, 2003. https://doi.org/10.1016/S0303-2647(03)00138-2
P. Yang, B. Zhou, Z. Zhang and A.Y. Zomaya. “A multi-filter enhanced genetic ensemble system for gene selection and sample classificationof microarray data”. In BMC Bioinformatics, 11(55):1– 12, 2010. https://doi.org/10.1186/1471-2105-11-S1-S5
Ben-Dor, L. Bruhn, et al. “Tissue classification with gene expression profiles”. In Journal of Computational Biology, 7(3-4):559–583, 2000. https://doi.org/10.1145/332306.332328
Y. Wang, F.S. Makedon, J.C. Ford and J. Pearlman. HykGene: “A hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data”. In Bioinformatics, 21(8):1530–1537, 2005. https://doi.org/10.1093/bioinformatics/bti192
S.A. Vinterbo, E-Y. Kim and L. OhnoMachao. “Small, fuzzy and interpretable gene expression based classifiers”. In Bioinformatics, 21(9):1964–1970, 2005. https://doi.org/10.1093/bioinformatics/bti287
V. Roth, “The Generalized LASSO: a wrapper approach to gene selection for microarray data”. University of Bonn, Computer Science III, Roemerstr. Bonn Germany. August, 2002.
H. Zhang, X. Song, and H. Wang, y X. Zhang. “Miclique: An Algorithm to Identify Differentially Co-expressed Disease Gene Subset from Microarray Data”. Journal of Biomedicine and Biotechnology, 2009. https://doi.org/10.1155/2009/642524
L. Li, T. A. Darden, C. R. Weinberg, A. J. Levine y L. G. Pedersen.: “Gene Assessment and Sample Classification for Gene Expression Data Using a Genetic Algorithm/K-Nearest Neighbor Method”, Combinatorial Chemistry & High Throughput Screening, pp. 727-739, 2001. https://doi.org/10.2174/1386207013330733
S. Li, X. Wu, X. Hu.: “Gene selection using genetic algorithm and support vectors machines”, Soft Comput, pp. 693- 698, 2008. https://doi.org/10.1007/s00500-007-0251-2
F. Tan, X. Fu, Y. Zhang, y A. G. Bourgeois.: “Improving Feature Subset Selection Using a Genetic Algorithm for Microarray Gene Expression Data”. IEEE Congress on Evolutionary Computation, pp. 2529-2534, 2006. https://doi.org/10.1109/CEC.2006.1688623
X. Wang, O. Gotoh. “Inference of cancerspecific gene regulatory networks using soft computing rules”. Gene Regul Syst Biol. pp. 19–34, 2010. https://doi.org/10.4137/GRSB.S4509
T. M. Huang, V. Kecman.: “Gene Extraction for Cancer Diagnosis by Support Vector Machines–an Improvement”. Artificial Intelligence in Medicine. pp. 185-194, 2005. https://doi.org/10.1016/j.artmed.2005.01.006
B. Krishnapuram, L. Carin, A. J. Hartemink.: “Joint Classifier and Feature Optimization for Comprehensive Cancer Diagnosis Using Gene Expression Data”. J. Comput. Biol., To Appear, 2004. https://doi.org/10.1145/640075.640097
R. Maglietta, A. D’Addabbo, A. Piepoli, BF. Perri et al. “Selection of relevant genes in cancer diagnosis based on their prediction accuracy”. Artif Intell Med 40:29–44. 2007. https://doi.org/10.1016/j.artmed.2006.06.002
A. SUNDARAM, N. L.VENKATA, & R. S. PARTHASARATHY. “Hybrid SPR algorithm to select predictive genes for effectual cancer classification”. Turkish Journal of Electrical Engineering & Computer Sciences, 21(2). 2013. https://doi.org/10.3906/elk-1203-138
J. J. Chen, C.A. Tsai, S.L. Tzeng y C.H. Chen: “Gene Selection with Multiple Ordering Criteria”. BMC Bioinformatics, 8:74, 2007. https://doi.org/10.1186/1471-2105-8-74
J-M. Arevalillo and H. Navarro. “A new approach for detecting bivariate interactions in high-dimensional data using quadratic discriminant analysis”. In BIOKDD10, pages 1–7, 2010.
Z. Guan and H. Zhao. “A semiparametric approach for marker gene selection based on gene expression data”. In Bioinformatics, 24(4):529–536, 2005. https://doi.org/10.1093/bioinformatics/bti032
Y. Tang, Y. Zhang, and Z. Huang. “Development of two-stage SVMRFE gene selection strategy for microarray expression data analysis”. In IEEE/ACM Transactions on Computational Biology and Bioinformatics, 4(3):365–381, 2007. https://doi.org/10.1109/TCBB.2007.1028
H-H. Li, Y-Z. Liang et al. “Recipe for Uncovering Predictive Genes Using Support Vector Machines Based on Model Population Analysis”. In IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(6):1633-1641.2011. https://doi.org/10.1109/TCBB.2011.36
K. Do: “Applications of gene shaving and mixture models to cluster microarray gene expression data”. Cancer Informatics, 2: 25–43. 2007 https://doi.org/10.1177/117693510700500002
J. S. Aguilar-Ruiz, F. Azuaje, and J. C. Riquelme Santos: “Data Mining Approaches to Diffuse Large B-Cell Lymphoma Gene Expression Data Interpretation”, Lecture Notes in Computer Science 3181, Springer. Pp. 279-288. 2004 https://doi.org/10.1007/978-3-540-30076-2_28
S, Baek. H, Moon. H. Ahn, et al. “Identifying high-dimensional biomarkers for personalized medicine via variable importance ranking”, J Biopharm Stat, vol. 18 pg. 853-68. 2008 https://doi.org/10.1080/10543400802278023
Y. Wang, IV Tetko, M. A Hall, E Frank, et al, “Gene selection from microarray data for cancer classification--a machine learning approach”. Comput Biol Chem, pp. 37-46, 2005. https://doi.org/10.1016/j.compbiolchem.2004.11.001
I. K., Yoon, H. K., Kim, Y. K., Kim, Song et. al. “Exploration of replicative senescence-associated genes in human dermal fibroblasts by cDNA microarray technology”. Experimental gerontology, 39(9), 1369-1378. 2004. https://doi.org/10.1016/j.exger.2004.07.002
VB Mahajan, C Wei, PJ McDonnell. “Microarray analysis of corneal fibroblast gene expression after interleukin-1 treatment. Invest Ophthalmol”. Vis Sci; 43: 2143-2151. 2002
K. Iwao-Koizumi, R. Matoba, N. Ueno, J. K. Seung., et al, “Prediction of Docetaxel Response in Human Breast Cancer by Gene Expression Profiling”. Journal of Clinical Oncology, pp. 422-431, 2005.
Descargas
Publicado
Cómo citar
Número
Sección
Licencia
Derechos de autor 2019 Programación Matemática y Software
Esta obra está bajo una licencia internacional Creative Commons Atribución 4.0.
Usted es libre de:
Compartir — compartir y redistribuir el material publicado en cualquier medio o formato. |
Adaptar — combinar, transformar y construir sobre el material para cualquier propósito, incluso comercialmente. |
Bajo las siguientes condiciones:
Atribución — Debe otorgar el crédito correspondiente, proporcionar un enlace a la licencia e indicar si se realizaron cambios. Puede hacerlo de cualquier manera razonable, pero de ninguna manera que sugiera que el licenciador lo respalda a usted o a su uso. |
Sin restricciones adicionales: no puede aplicar términos legales o medidas tecnológicas que restrinjan legalmente a otros a hacer cualquier cosa que permita la licencia. |