Técnicas de clasificación de data mining: una aplicación al consumo de tabaco en adolescentes
Agencias de apoyo
- Plan nacional sobre drogas (INT/2012/2002)
Resumen
El presente trabajo tiene el propósito de analizar el poder predictivo de diversas variables psicosociales y de personalidad sobre el consumo o no consumo de nicotina en la población adolescente mediante el uso de diversas técnicas de clasificación procedentes de la metodología Data Mining. Más concretamente, se analizan las RNA –Perceptrón Multicapa (MLP), Funciones de Base Radial (RBF) y Redes Probabilísticas (PNN)--, los árboles de decisión, el modelo de regresión logística y el análisis discriminante. Para ello, se ha trabajado con una muestra de 2666 adolescentes, de los cuales 1378 no consumen nicotina mientras que 1288 son consumidores de nicotina. Los modelos analizados han sido capaces de discriminar correctamente entre ambos tipos de sujeto en un rango comprendido entre el 77.39% y el 78.20%, alcanzando una sensibilidad del 91.29% y una especificidad del 74.32%. Con este estudio, se pone a disposición del especialista en conductas adictivas, un conjunto de técnicas estadísticas avanzadas capaces de manejar simultáneamente una gran cantidad de variables y sujetos, así como aprender de forma automática patrones y relaciones complejas, siendo muy adecuadas para la predicción y prevención del comportamiento adictivo.Descargas
Citas
Battiti, R. (1992). First and second order methods for learning: between steepest descent and Newton's method. Neural Computation, 4, 141-166.
Bishop, C.M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press.
Breiman, L., Friedman, J.H., Losen, R.A. & Stone, C.J. (1984). Classification And Regression Trees. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software.
Broman, C.L. (2009). The longitudinal impact of adolescent drug use on socioeconomic outcomes in young adulthood. Journal of Child & Adolescent Substance Abuse, 18, 131-143.
Broomhead, D.S. & Lowe, D. (1988). Multivariable functional interpolation and adaptive networks. Complex Systems, 2, 321-355.
Buscema, M. (1995). Squashing Theory: A prediction approach for drug behavior. Drugs and Society, 8(3-4), 103-110.
Buscema, M. (1997). A general presentation of artificial neural networks. I. Substance Use & Misuse, 32(1), 97-112.
Buscema, M. (1998). Artificial neural networks and complex systems. I. Theory. Substance Use & Misuse, 33(1), 1-220.
Buscema, M., Intraligi, M. & Bricolo, R. (1998). Artificial neural networks for drug vulnerability recognition and dynamic scenarios simulation. Substance Use & Misuse, 33(3), 587-623.
Carvajal, S.C. & Granillo, T.M. (2006). A prospective test of distal and proximal determinants of smoking initiation in early adolescents. Addictive Behaviors, 31, 649-660.
Ciairano, S., Bosma, H.A., Miceli, R. & Settani, M. (2008). Adolescent substance use in two European countries: Relationships with psychosocial adjustment, peers, and activities. International Journal of Clinical and Health Psychology, 8(1), 119-138.
Clarke, B., Fokoué, E. & Zhang, H.H. (2009). Principles and Theory for Data Mining and Machine Learning. New York: Springer.
Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematical Control, Signal and Systems, 2, 303-314.
De Leeuw, R.N.H., Engels, R.C.M.E., Vermulst, A.A. & Scholte, R.H.J. (2009). Relative risks of exposure to different smoking models on the development of nicotine dependence during adolescence: a five-wave longitudinal study. Journal of Adolescent Health, 45, 171-178.
De Vries, H., Engels, R., Kremers, S., Wetzels, J. & Mudde, A. (2003). Parents’ and friends’ smoking status as predictors of smoking onset: Findings from six European countries. Health Education Research, 18, 627-636.
Dick, D.M., Pagan, J.L., Viken, R., Purcell, S., Kaprio, J., Pulkkinen, L. & Rose, R.J. (2007). Changing environmental influences on substance use across development. Twin Research and Human Genetics, 10(2), 315-326.
Doran, N., McCharge, D. & Cohen, L. (2007). Impulsivity and the reinforcing value of cigarette smoking. Addictive Behaviors, 32, 90-98.
Fernández, J.R., Secades, R., Vallejo, G. & Errasti, J.M. (2003). Evaluation of what parents know about their children’s drug use and how they perceive the most common family risk factors. Journal of Drug Education, 33, 334-350.
Fisher, L.B., Winickoff, J.P., Camargo, C.A., Colditz, G.A. & Frazier, A.L. (2007). Household smoking restrictions and adolescent smoking. American Journal of Health Promotion, 22, 15-21.
Fisher, R.A. (1936). The use of multiple measurements on taxonomic problems. Annals of Eugenics, 7, 179-188.
Franken, I.H.A., Muris, P. & Georgieva, I. (2006). Gray’s model of personality and addiction. Addictive Behaviors, 31, 399-403.
Funahashi, K. (1989). On the approximate realization of continuous mappings by neural networks. Neural Networks, 2, 183-192.
Georgiades, K. & Boyle, M.H. (2007). Adolescent tobacco and cannabis use: young adult outcomes from the Ontario Child Health Study. Journal of Child Psychology and Psychiatry, 48, 724-731.
Gervilla, E. & Palmer, A. (2009). Predicción del consumo de cocaína en adolescentes mediante árboles de decisión. Revista de Investigación en Educación, 6, 7-13.
Gervilla, E. & Palmer, A. (2010). Prediction of cannabis and cocaine use in adolescence using decision trees and logistic regression. The European Journal of Psychology Applied to Legal Context, 2(1), 19-35.
Gervilla, E., Cajal, B., Roca, J. & Palmer, A. (2010). Modelling alcohol consumption during adolescente using Zero Inflated Negative Binomial and Decision Trees. The European Journal of Psychology Applied to Legal Context, 2, 145-159.
Gervilla, E., Jiménez, R., Montaño, J.J., Sesé, A., Cajal, B. & Palmer, A. (2009). La metodología del Data Mining. Una aplicación al consumo de alcohol en adolescentes. Adicciones, 21(1), 65-80.
Giudici, P. (2003). Applied data mining: Statistical methods for business and industry. Chichester: Hoboken, NJ: Wiley.
Hall, J.A. & Valente, T.W. (2007). Adolescent smoking networks: The effect of influence and selection on future smoking. Addictive Behaviors, 32, 3054-3059.
Han, J. & Kamber, M. (2006). Data Mining: Concepts and Techniques (2nd. ed.). San Francisco: Morgan Kaufmann.
Hand, D., Mannila, H. & Smith, P. (2001). Principles of Data Mining. London: The MIT Press.
Hartman, E., Keeler, J.D. & Kowalski, J.M. (1990). Layered neural networks with Gaussian hidden units as universal approximators. Neural Computation, 2(2), 210-215.
Hernandez, J., Ramirez, M. & Ferri, C. (2004). Introducción a la Minería de Datos [Introduction to Data Mining]. Madrid: Pearson Educación, S.A.
Hoffman, B.R., Monge, P.R., Chou, C.P. & Valente, T.W. (2007). Perceived peer influence and peer selection on adolescent smoking. Addictive Behaviors, 32, 1546-1554.
Hoffman, J.H., Welte, J.W. & Barnes, G.M. (2001). Co-ocurrence of alcohol and cigarette use among adolescents. Addictive Behaviors, 26, 63-78.
Hornik, K., Stinchcombe, M. & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2, 359-366.
Hosmer, D.W. & Lemeshow, S. (2000). Applied Logistic Regression (2nd edition). New York: Wiley.
Huver, R.M.E., Engels, R.C.M.E., Vermulst, A.A. & De Vries, H. (2007). Is parenting style a context for smoking-specific parenting practices? Drug and Alcohol Dependence, 89, 116-125.
Johnson, P. B., Boles, S. M. & Kleber, H. D. (2000). The relationship between adolescent smoking and drinking and likelihood estimates of illicit drug use. Journal of Addictive Diseases, 19(2), 75-82.
Kaastra, I., & Boyd, M. (1996). Designing a neural network for forecasting financial and economic time series. Neurocomputing, 10, 215-236.
Kantardzic, M. (2003). Data Mining: Concepts, Models, Methods, and Algorithms. New York: Wiley.
Kass, G.V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2), 119-127.
Kitsantas, P., Moore, T.W. & Sly, D.F. (2007). Using classification trees to profile adolescent smoking behaviors. Addictive Behaviors, 32, 9-23.
Larose, D.T. (2005). Discovering Knowledge in Data: An Introduction to Data Mining. Hoboken, NJ: Wiley.
Luther, E.J., Parzynski, C.S., Jaszyna-Gasior, M., Bagot, K.S., Royo, M.B., Leff, M.K. & Moolchan, E.T. (2008). Does allowing adolescents to smoke at home affect their consumption and dependence? Addictive Behaviors, 33, 836-840.
Maurelli, G. & Di Giulio, M. (1998). Artificial neural networks for the identification of the differences between “light” and “heavy” alcoholics, starting from five nonlinear biological variables. Substance Use & Misuse, 33(3), 693-708.
Molyneux, A., Lewis, S., Antoniak, M., Browne, W., McNeill, A., Godfrey, C. & Britton, J. (2004). Prospective study of the effect of exposure to other smokers in high school tutor groups on the risk of incident smoking in adolescence. American Journal of Epidemiology, 159(2), 127-132.
Montaño, J.J., Palmer, A. & Muñoz, P. (2011). Artificial neural networks applied to forecasting time series. Psicothema, 23, 322-329.
Muñoz, M. & Graña, J.L. (2001). Factores familiares de riesgo y de protección para el consumo de drogas en adolescentes. Psicothema, 13(1), 87-94.
Okoli, C.T.C., Richardson, C.G. & Johnson, J.L. (2008). An examination of the relationship between adolescents’ initial smoking experience and their exposure to peer and family member smoking. Addictive Behaviors, 33, 1183-1191.
Otten, R., Engels, R.C.M.E. & Prinstein, M.J. (2009). A prospective study of perception in adolescent smoking. Journal of Adolescent Health, 44, 478-484.
Otten, R., Wanner, B., Vitaro, F. & Engels, R.C.M.E. (2009). Disruptiveness, peer experiences and adolescent smoking: a long-term longitudinal approach. Addiction,104, 641-650.
Palmer, A. & Montaño, J.J. (1999). ¿Qué son las redes neuronales artificiales? Aplicaciones realizadas en el ámbito de las adicciones. [What are artificial neural networks? Applications in the field of addictions]. Adicciones, 11, 243-255.
Palmer, A., Jiménez, R. & Gervilla, E. (2011). Knowledge-Oriented Applications in Data Mining. In Data Mining: Machine learning and statistical techniques. Viena: Intech. Open Access Publisher.
Palmer, A., Montaño, J.J. & Calafat, A. (2000). Predicción del consumo de éxtasis a partir de redes neuronales artificiales [Ecstasy consumption prediction on the basis of artificial neural networks]. Adicciones, 12, 29-41.
Parr-Rud, O. (2001). Data Mining Cookbook. Modeling Data for Marketing, Risk and Customer Relationship Management. New York: John Wiley & Sons.
Pérez, C. & Santín, D. (2007). Minería de Datos. Técnicas y Herramientas. Madrid: Thomson.
Piko, B.F. (2006). Adolescent smoking and drinking: The role of communal mastery and other social influences. Addictive Behaviors, 31, 102-114.
Quinlan, J.R. (1986). Induction of Decision Trees. Machine Learning, 1, 81-106.
Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. San Mateo: Morgan Kaufmann.
Quinlan, J.R. (1997). C5.0 Data Mining Tool. Rule Quest Research, http://www.rulequest.com.
Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by error propagation. In D.E. Rumelhart, & J.L. McClelland (Eds.), Parallel distributed processing (pp. 318-362). Cambridge, MA: MIT Press.
Sargent, J.D., Tanski, S., Stoolmiller M. & Hanewinkel, R. (2009). Using sensation seeking to target adolescents for substance use interventions. Addiction, 105, 506-514.
Shmueli, G., Patel, N.R. & Bruce, P.C. (2007). Data mining in excel: Lecture notes and cases. Arlington, VA: Resampling Stats, Inc.
Simons-Morton, B. (2007). Social influences on adolescent substance use. American Journal of Health Behavior, 31, 672-684.
Specht, D.F. (1990). Probabilistic neural networks. Neural Networks, 3, 109-118.
Speri, L., Schilirò, G., Bezzetto, A., Cifelli, G., De Battisti, L., Marchi, S., Modenese, M., Varalta, F. & Consigliere, F. (1998). The use of artificial neural networks methodology in the assessment of “vulnerability” to heroin use among army corps soldiers: A preliminary study of 170 cases inside the Military Hospital of Legal Medicine of Verona. Substance Use & Misuse, 33(3), 555-586.
Szabo, E., White, V. & Hayman, J. (2006). Can home smoking restrictions influence adolescents’ smoking behaviors if their parents and friends smoke? Addictive Behaviors, 31(12), 2298-2303.
Wasserman, P.D. (1989). Neural computing: theory and practice. New York: Van Nostrand Reinhold.
Widrow, B. & Hoff, M. (1960). Adaptive switching circuits. In J. Anderson & E. Rosenfeld (Eds.), Neurocomputing (pp. 126-134). Cambridge, Mass.: The MIT Press.
Witten, I.H. & Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques (2nd. ed.). San Francisco: Morgan Kaufmann.
Ye, N. (Ed.) (2003). The Handbook of Data Mining. Mahwah, NJ: Lawrence Erlbaum Associates.
Las obras que se publican en esta revista están sujetas a los siguientes términos:
1. El Servicio de Publicaciones de la Universidad de Murcia (la editorial) conserva los derechos patrimoniales (copyright) de las obras publicadas, y favorece y permite la reutilización de las mismas bajo la licencia de uso indicada en el punto 2.
© Servicio de Publicaciones, Universidad de Murcia, 2024
2. Las obras se publican en la edición electrónica de la revista bajo una licencia Creative Commons Reconocimiento-CompartirIgual 4.0 Internacional (texto legal). Se pueden copiar, usar, difundir, transmitir y exponer públicamente, siempre que: i) se cite la autoría y la fuente original de su publicación (revista, editorial y URL de la obra); ii) no se usen para fines comerciales; iii) se mencione la existencia y especificaciones de esta licencia de uso.
3. Condiciones de auto-archivo. Se permite y se anima a los autores a difundir electrónicamente las versiones pre-print (versión antes de ser evaluada y enviada a la revista) y/o post-print (versión evaluada y aceptada para su publicación) de sus obras antes de su publicación, ya que favorece su circulación y difusión más temprana y con ello un posible aumento en su citación y alcance entre la comunidad académica. Color RoMEO: verde.