Selección de Características de Microarreglos de ADN Utilizando una Búsqueda Cuckoo

Autores/as

  • Luis Alberto Hernández Montiel Universidad del Istmo, Campus Ixtepec (UNISTMO), Ciudad Ixtepec, Oaxaca, México, 70110
  • Carlos Edgardo Cruz Pérez Universidad del Istmo, Campus Ixtepec (UNISTMO), Ciudad Ixtepec, Oaxaca, México, 70110
  • Luis David Hernández Huerta Universidad del Istmo, Campus Ixtepec (UNISTMO), Ciudad Ixtepec, Oaxaca, México, 70110

DOI:

https://doi.org/10.30973/progmat/2019.11.3/2

Palabras clave:

Microarreglos de ADN, Preprocesamiento, Fusión de filtros, Selección, Clasificación

Resumen

En este artículo, se propone un método híbrido para la selección y clasificación de datos de microarreglos de AND. Primero, el método combina los subconjuntos de genes relevantes obtenidos de cinco métodos de filtro, después, se implementa un algoritmo basado en una búsqueda cuckoo combinado con un clasificador MSV. El algoritmo híbrido explora dentro del subconjunto obtenido en la etapa anterior y selecciona los genes que alcanzan un alto desempeño al entrenar al clasificador. En los resultados experimentales, el algoritmo obtiene una tasa de clasificación alta seleccionado un número pequeño de genes, los resultados obtenidos son comparados con otros métodos reportados en la literatura.

Biografía del autor/a

Luis Alberto Hernández Montiel, Universidad del Istmo, Campus Ixtepec (UNISTMO), Ciudad Ixtepec, Oaxaca, México, 70110

Luis Alberto Hernández Montiel, se recibe como Licenciado en informática en abril del 2011 por el Instituto Tecnológico de Apizaco, obtiene el grado de maestro en sistemas computacionales en noviembre del 2013 por el Instituto Tecnológico de Apizaco, Apizaco Tlaxcala México. Actualmente es profesorinvestigador de la Licenciatura en Informática en la universidad del istmo campus Ixtepec, Oaxaca, México. Sus áreas de interés son: Algoritmos Evolutivos, Metaheurísticas, Optimización y Bioinformática.

Carlos Edgardo Cruz Pérez, Universidad del Istmo, Campus Ixtepec (UNISTMO), Ciudad Ixtepec, Oaxaca, México, 70110

Obtuvo el título de Ingeniero en Electrónica por el Instituto Tecnológico de Oaxaca, obtiene el grado de Maestro en Ingeniera en Tecnologías de la Información otorgada por la Universidad Anáhuac. Tiene la especialidad en seguridad de la información otorgada por el INACIPE y certificaciones como UBWA y Perito en Informática Forense. Actualmente es Profesor de Tiempo Completo en la universidad del Istmo (UNISTMO) Campus Ixtepec, donde ha dirigido proyectos de evaluación y desempeño de redes de computadoras y sistemas expertos enfocados al diagnóstico de diabetes. Sus líneas de investigación son: Ruteo en redes de computadora, VoIP, Seguridad y Cifrado de la información.

Luis David Hernández Huerta, Universidad del Istmo, Campus Ixtepec (UNISTMO), Ciudad Ixtepec, Oaxaca, México, 70110

Se graduó como Ingeniero en Sistemas Computacionales por el Instituto Tecnológico de Tehuacán en México en 2000. Recibió una maestría en ciencias computacionales por el Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE) México en 2007, es especialista en Soft Computing. Trabajo en el instituto de investigación y desarrollo de la armada mexicana. En la actualidad, es profesorinvestigador en la Universidad del Istmo campues Ixtepec, Oaxaca, México. El área de investigación es acerca de la aplicación de algoritmos evolutivos, lógica difusa, redes neuronales y procesamiento del habla.

Citas

I. Guyon, A. Elisseeff.: “An Introduction to Variable and Feature Selection”, Journal of Machine Learning Research, pp 1157-1182, 003.

T. Golub, D. Slonim, P. Tamayo et al.: “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring”, Science, pp. 531– 537, 1999. https://doi.org/10.1126/science.286.5439.531

T. Hwang, C. H. Sun, T. Yun, and G. S. Yi.: “Figs: A Filter-Based Gene Selection Workbench for Microarray Data”, BMC Bioinformatics, 2010. https://doi.org/10.1186/1471-2105-11-50

Y. Wang, I. V. Tetko, M. A Hall, E Frank, et al.: “Gene selection from microarray data for cancer classification--a machine learning approach”. Comput Biol Chem, pp. 37- 46, 2005. https://doi.org/10.1016/j.compbiolchem.2004.11.001

A. Kulkarni, B.S.C. N. Kumar, V. Ravi, U. S. Murthy. “Colon cancer prediction with genetics profiles using evolutionary techniques”, Expert Systems with Applications, pp. 2752–2757, 2011. https://doi.org/10.1016/j.eswa.2010.08.065

S. Li, X. Wu, M. Tan.: “Gene Selection using Hybrid Particle Swarm Optimization and Genetic Algorithm”, Soft Comput, pp. 1039–1048, 2008. https://doi.org/10.1007/s00500-007-0272-x

M. S. Mohamad, et al.: “A Hybrid of Genetic Algorithm and Support Vector Machine for Features Selection and Classification of Gene Expression Microarray”, International Journal of Computational Intelligence and Applications, pp. 91–107, 2005. https://doi.org/10.1142/S1469026805001465

F. XU, L. WEI, W. WANG.: “Fuzzy Rough Feature Selection Based on Normalized Conditional Mutual Information”, Journal of Computational Information Systems, pp. 2519–2529, 2012.

D. Mishra, B. Sahu,: “Feature Selection for Cancer Classification: A Signal-tonoise Ratio Approach”, International Journal of Scientific & Engineering Research. (2011).

P. Yang, B. B Zhou, Z. Zhang, A. Zomaya.: “A Multi-filter Enhanced Genetic Ensemble System for Gene Selection and Sample Classification of Microarray Data”, the Eighth Asia Pacific Bioinformatics Conference Bangalore, pp. 18-21, 2010. https://doi.org/10.1186/1471-2105-11-S1-S5

E. Bonilla-Huerta, et al.: “Hybrid Framework using Multiple-Filters and an Embedded Approach for an Efficient Selection and Classification of Microarray Data”, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2015. https://doi.org/10.1109/TCBB.2015.2474384

R. P. Rubido. “Una revisión a algoritmos de selección de atributos que tratan la redundancia en datos microarreglos”. Revista Cubana de Ciencias Informáticas, pp. 16 - 30. 2013

Q. Huang, D. Tao, X. Li, W. C. Liew. “Parallelized Evolutionary Learning for Detection of Biclusters in Gene Expression Data”, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2012. https://doi.org/10.1109/TCBB.2011.53

U. Alon, N. Barkai, D. Notterman et al.: “Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays”, Proc. Nat. Acad. Sci. USA, pp. 6745– 6750, 1999. https://doi.org/10.1073/pnas.96.12.6745

S. L. Pomeroy, P. Tamayo, M. Gaasenbeek, L. M. Sturla, and T. Golub.: “Prediction of central nervous system embryonal tumour outcome based on gene expression”, Nature, pp. 436–442, 2002. https://doi.org/10.1038/415436a

G.J. Gordon et al.: “Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma”, Cancer Res., 2002.

A. A. Alizadeh, B.M. Eisen, R.E. Davis et al.: “Distinct Types of Diffuse Large (B)–Cell Lymphoma Identified by Gene Expression Profiling”, Nature, pp. 503–511, 2000. https://doi.org/10.1038/35000501

TG Dietterich. “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting and Randomization”. Machine Learning. 2000; 40:139–158. https://doi.org/10.1023/A:1007607513941

W. L Martínez, A. R. Martinez: “Exploratory Data Analysis with MATLAB®”. A CRC Press Company. Boca Ratón London New York Washington, D.C. (2005).

L. Ladha et al.: “Feature Selection Methods and Algorithms”, International Journal on Computer Science and Engineering (IJCSE), pp. 1787-1797, 2011.

S. Dudoit, J. Fridlyand, T. Speed. “Comparison of Discrimination Methods for the Classification of Tumors using Gene Expression Data”, Journal of the American Statistical Association, pp. 77–87, 2002. https://doi.org/10.1198/016214502753479248

J. C. Porras-Cerrón. “Componentes Principales Supervisados Para Clasificación De Datos De Expresión Genética”, Tesis de Maestro en Ciencias, Universidad De Puerto Rico Mayagüez, 2005.

A. H. Tan, H. Pan.: “Predictive Neural Networks for Gene Expression Data Analysis”, Neural Networks, pp. 297– 306, 2005. https://doi.org/10.1016/j.neunet.2005.01.003

P. Radivojac, Z. Obradovic, A. K. Dunker, S. Vucetic, "Feture selection filters based on the permutation test", Proc. ECML, pp. 334-346, 2004. https://doi.org/10.1007/978-3-540-30115-8_32

B. Kumari, T. Swarnkar, “Filter versus Wrapper Feature Subset Selection in Large Dimensionality Micro array: A Review”, International Journal of Computer Science and Information Technologies, Vol. 2 (3) , 2011, 1048- 1053. 2012

X.S. Yang and S. Deb, ‘Cuckoo search via Levy flights’, World Congress on Nature & Biologically Inspired Computing NaBIC’09, 9–11 December, Coimbatore, India, pp.210–214. 2009 https://doi.org/10.1109/NABIC.2009.5393690

C. Gunavathi, and K. Premalatha, ‘Cuckoo search optimization for feature selection in cancer classification: a new approach’, Int. J. Data Mining and Bioinformatics, Vol. 13, No. 3, pp.248– 265. (2015) https://doi.org/10.1504/IJDMB.2015.072092

S. Wang, H. Chen, R. Li, D. Zhang. “Gene Selection with Rough Sets for the Molecular Diagnosing of Tumor Base on Support Vector Machines”, International Computer Symposium, pp. 1368-1373, 2006.

L. K. Lou, D. F. Huang, L. J. Ye, Q. F. Zhou, G. F. Sheo, F Peng. “Improving the Computational Efficieny of Cluster Elimination for Gene Selection”. IEEE/ACM Trans. Comput. Bioinform. 8(1): 122-129. 2011.

G. Kulshestha A. Agarwal A. Mittal A. Sahoo Hybrid cuckoo search algorithm for simultaneous feature and classifier selection, International Conference on Cognitive Computing and Information Processing (CCIP), IEEE, pp. 1 – 6, 2015, https://doi.org/10.1109/CCIP.2015.7100701

J. C., Hernández-Hernández, B. J., Duval, K. Hao “SVM-based local search for gene selection and classification of microarray data”. Comunicativos in Computer and Information Science, Vol. 13. pp. 499–508. 2008. https://doi.org/10.1007/978-3-540-70600-7_39

Yu, L., Han, Y. and Berens M. E.: “Stable gene selection from microarray data via sample weighting”, IEEE/ACM Transactions on Computational Biology and Bioinformatics, pp. 262-272, 2012. https://doi.org/10.1109/TCBB.2011.47

M., Filippone, F., Masulli, S. Rovetta, “Simulated Annealing for Supervised Gene Selection”, Soft Computing, pp. 1471–1482, 2011. https://doi.org/10.1007/s00500-010-0597-8

S.-B. Cho, and H.-H Won: Cancer classification using ensemble of neural networks with multiple significant gene subsets. In Applied Intelligence, 26(3):243–250, 2007. https://doi.org/10.1007/s10489-006-0020-4

L., Zhang, Z., Li, and H. Chen, “An effective gene selection method based on relevance analysis and discernibility matrix.” In PAKDD, volume 4426 of Lecture Notes in Computer Science, pages 1088– 1095, 2007. https://doi.org/10.1007/978-3-540-71701-0_123

S., Pang, I., Havukkala, Y Hu and N. Kasabov. “Classification consistency analysis for bootstrapping gene selection”. In Neural Computing and Applications, 16:527,539, (2007). https://doi.org/10.1007/s00521-007-0110-1

G-Z., Li, X-Q Zeng, J.Y Yang, and M. Q Yang. “Partial least squares based dimension reduction with gene selection for tumor classification”. In Proceedings of IEEE 7th International Symposium on Bioinformatics and Bioengineering, pages 1439–1444, 2007. https://doi.org/10.1109/BIBE.2007.4375763

A. C. Tan, and D.Gilbert: “Ensemble machine learning on gene expression data for cancer classification”. In Applied Bioinformatics, 2(2):75–83, 2003. http://bura.brunel.ac.uk/handle/2438/3013

F., Yue, K., Wang, and W. Zuo, “Informative gene selection and tumor classification by null space LDA for microarray data”. In ESCAPE’07, volume 4614 of Lecture Notes in Computer Science, pages 435–446. Springer, 2007. https://doi.org/10.1007/978-3-540-74450-4_39

G. Yu , Y. Feng, D. J. Miller, J. Xuan, E. P. Hoffman, R. Clarke, B. Davidson, I. M. Shih, Y. Wang.: “Matched Gene Selection and Committee Classifier for Molecular Classification of Heterogeneous Diseases”, Journal of Machine Learning Research, pp. 2141- 2167, 2010.

L., Sun, D., Miao, H. Zhang “Gene Selection and Cancer Classification: A Rough Sets Based Approach”. Transactions on Rough Sets XII. LNCS Springer, Heidelberg, vol. 6190, pp. 106–116. 2010. https://doi.org/10.1007/978-3-642-14467-7_6

Y. Leung and Y. Hung. “A Multiplefilter-multiple-wrapper Approach to Gene Selection and Microarray Data Classification”. In IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(1):108117, 2010. https://doi.org/10.1109/TCBB.2008.46

S-L. Wang, X. Li, S. Zhang, J. Gui and D-S. Huang. “Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction”. In Computers in Biology and Medicine. 40,179-189, 2010. https://doi.org/10.1016/j.compbiomed.2009.11.014

S-W. Zhang, D-S. Huang and S. L. Wang. “A method of tumor classification based on wavelet packet transforms and neighborhood rough set”. In Computers in Biology and Medicine, 40, 420–437, 2010. https://doi.org/10.1016/j.compbiomed.2010.02.007

C. H. Zheng, L. Zhang, V. T. Ng, S. C. Shiu and D. S. Huang. “Metasample Based sparse representation for tumor classification”. In IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(5):1273–1282. 2011. https://doi.org/10.1109/TCBB.2011.20

S-L. Wang, L. Sun, and J. Fang. “Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification”. In BMC Bioinformatics, 13(178): 1–26, 2013. https://doi.org/10.1186/1471-2105-13-178

X. Wang, O. Gotoh.: “Cancer Classification using Single Genes”, Genome Informatics, pp. 176-188, 2009. https://doi.org/10.1142/9781848165632_0017

L.F. Wessels, M.J.T. Reinders, T. VanWelsem, P.M. Nederlof and Y. Wang. “Representation and classification for high-throughput data”. In SPIE., 4626:226–237, 2002. https://doi.org/10.1117/12.472086

W. Chu, Z. Ghahramani, F. Falciani and D.L. Wild. “Biomarker discovery in microarray gene expression with gaussian process”. In Bioinformatics, 21(16):3385–3393, 2005. https://doi.org/10.1093/bioinformatics/bti526

K. Deb and R. Reddy. “Reliable classification of two-class cancer data using evolutionary algorithms”. In BioSystems, 72(1):111–129, 2003. https://doi.org/10.1016/S0303-2647(03)00138-2

P. Yang, B. Zhou, Z. Zhang and A.Y. Zomaya. “A multi-filter enhanced genetic ensemble system for gene selection and sample classificationof microarray data”. In BMC Bioinformatics, 11(55):1– 12, 2010. https://doi.org/10.1186/1471-2105-11-S1-S5

Ben-Dor, L. Bruhn, et al. “Tissue classification with gene expression profiles”. In Journal of Computational Biology, 7(3-4):559–583, 2000. https://doi.org/10.1145/332306.332328

Y. Wang, F.S. Makedon, J.C. Ford and J. Pearlman. HykGene: “A hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data”. In Bioinformatics, 21(8):1530–1537, 2005. https://doi.org/10.1093/bioinformatics/bti192

S.A. Vinterbo, E-Y. Kim and L. OhnoMachao. “Small, fuzzy and interpretable gene expression based classifiers”. In Bioinformatics, 21(9):1964–1970, 2005. https://doi.org/10.1093/bioinformatics/bti287

V. Roth, “The Generalized LASSO: a wrapper approach to gene selection for microarray data”. University of Bonn, Computer Science III, Roemerstr. Bonn Germany. August, 2002.

H. Zhang, X. Song, and H. Wang, y X. Zhang. “Miclique: An Algorithm to Identify Differentially Co-expressed Disease Gene Subset from Microarray Data”. Journal of Biomedicine and Biotechnology, 2009. https://doi.org/10.1155/2009/642524

L. Li, T. A. Darden, C. R. Weinberg, A. J. Levine y L. G. Pedersen.: “Gene Assessment and Sample Classification for Gene Expression Data Using a Genetic Algorithm/K-Nearest Neighbor Method”, Combinatorial Chemistry & High Throughput Screening, pp. 727-739, 2001. https://doi.org/10.2174/1386207013330733

S. Li, X. Wu, X. Hu.: “Gene selection using genetic algorithm and support vectors machines”, Soft Comput, pp. 693- 698, 2008. https://doi.org/10.1007/s00500-007-0251-2

F. Tan, X. Fu, Y. Zhang, y A. G. Bourgeois.: “Improving Feature Subset Selection Using a Genetic Algorithm for Microarray Gene Expression Data”. IEEE Congress on Evolutionary Computation, pp. 2529-2534, 2006. https://doi.org/10.1109/CEC.2006.1688623

X. Wang, O. Gotoh. “Inference of cancerspecific gene regulatory networks using soft computing rules”. Gene Regul Syst Biol. pp. 19–34, 2010. https://doi.org/10.4137/GRSB.S4509

T. M. Huang, V. Kecman.: “Gene Extraction for Cancer Diagnosis by Support Vector Machines–an Improvement”. Artificial Intelligence in Medicine. pp. 185-194, 2005. https://doi.org/10.1016/j.artmed.2005.01.006

B. Krishnapuram, L. Carin, A. J. Hartemink.: “Joint Classifier and Feature Optimization for Comprehensive Cancer Diagnosis Using Gene Expression Data”. J. Comput. Biol., To Appear, 2004. https://doi.org/10.1145/640075.640097

R. Maglietta, A. D’Addabbo, A. Piepoli, BF. Perri et al. “Selection of relevant genes in cancer diagnosis based on their prediction accuracy”. Artif Intell Med 40:29–44. 2007. https://doi.org/10.1016/j.artmed.2006.06.002

A. SUNDARAM, N. L.VENKATA, & R. S. PARTHASARATHY. “Hybrid SPR algorithm to select predictive genes for effectual cancer classification”. Turkish Journal of Electrical Engineering & Computer Sciences, 21(2). 2013. https://doi.org/10.3906/elk-1203-138

J. J. Chen, C.A. Tsai, S.L. Tzeng y C.H. Chen: “Gene Selection with Multiple Ordering Criteria”. BMC Bioinformatics, 8:74, 2007. https://doi.org/10.1186/1471-2105-8-74

J-M. Arevalillo and H. Navarro. “A new approach for detecting bivariate interactions in high-dimensional data using quadratic discriminant analysis”. In BIOKDD10, pages 1–7, 2010.

Z. Guan and H. Zhao. “A semiparametric approach for marker gene selection based on gene expression data”. In Bioinformatics, 24(4):529–536, 2005. https://doi.org/10.1093/bioinformatics/bti032

Y. Tang, Y. Zhang, and Z. Huang. “Development of two-stage SVMRFE gene selection strategy for microarray expression data analysis”. In IEEE/ACM Transactions on Computational Biology and Bioinformatics, 4(3):365–381, 2007. https://doi.org/10.1109/TCBB.2007.1028

H-H. Li, Y-Z. Liang et al. “Recipe for Uncovering Predictive Genes Using Support Vector Machines Based on Model Population Analysis”. In IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(6):1633-1641.2011. https://doi.org/10.1109/TCBB.2011.36

K. Do: “Applications of gene shaving and mixture models to cluster microarray gene expression data”. Cancer Informatics, 2: 25–43. 2007 https://doi.org/10.1177/117693510700500002

J. S. Aguilar-Ruiz, F. Azuaje, and J. C. Riquelme Santos: “Data Mining Approaches to Diffuse Large B-Cell Lymphoma Gene Expression Data Interpretation”, Lecture Notes in Computer Science 3181, Springer. Pp. 279-288. 2004 https://doi.org/10.1007/978-3-540-30076-2_28

S, Baek. H, Moon. H. Ahn, et al. “Identifying high-dimensional biomarkers for personalized medicine via variable importance ranking”, J Biopharm Stat, vol. 18 pg. 853-68. 2008 https://doi.org/10.1080/10543400802278023

Y. Wang, IV Tetko, M. A Hall, E Frank, et al, “Gene selection from microarray data for cancer classification--a machine learning approach”. Comput Biol Chem, pp. 37-46, 2005. https://doi.org/10.1016/j.compbiolchem.2004.11.001

I. K., Yoon, H. K., Kim, Y. K., Kim, Song et. al. “Exploration of replicative senescence-associated genes in human dermal fibroblasts by cDNA microarray technology”. Experimental gerontology, 39(9), 1369-1378. 2004. https://doi.org/10.1016/j.exger.2004.07.002

VB Mahajan, C Wei, PJ McDonnell. “Microarray analysis of corneal fibroblast gene expression after interleukin-1 treatment. Invest Ophthalmol”. Vis Sci; 43: 2143-2151. 2002

K. Iwao-Koizumi, R. Matoba, N. Ueno, J. K. Seung., et al, “Prediction of Docetaxel Response in Human Breast Cancer by Gene Expression Profiling”. Journal of Clinical Oncology, pp. 422-431, 2005.

Descargas

Publicado

31-10-2019

Cómo citar

Hernández Montiel, L. A., Cruz Pérez, C. E., & Hernández Huerta, L. D. (2019). Selección de Características de Microarreglos de ADN Utilizando una Búsqueda Cuckoo. Programación matemática Y Software, 11(3), 12–31. https://doi.org/10.30973/progmat/2019.11.3/2

Número

Sección

Artículos

Artículos más leídos del mismo autor/a