Data mining classification techniques: an application to tobacco consumption in teenagers
AbstractThis study is aimed at analysing the predictive power of different psychosocial and personality variables on the consumption or non-consumption of nicotine in a teenage population using different classification techniques from the field of Data Mining. More specifically, we analyse ANNs – Multilayer Perceptron (MLP), Radial Basis Functions (RBF) and Probabilistic Neural Networks (PNNs) – decision trees, the logistic regression model and discriminant analysis. To this end, we worked with a sample of 2666 teenagers, 1378 of whom do not consume nicotine while 1288 are nicotine consumers. The models analysed were able to discriminate correctly between both types of subjects within a range of 77.39% to 78.20%, achieving 91.29% sensitivity and 74.32% specificity. With this study, we place at the disposal of specialists in addictive behaviours a set of advanced statistical techniques that are capable of simultaneously processing a large quantity of variables and subjects, as well as learning complex patterns and relationships automatically, in such a way that they are very appropriate for predicting and preventing addictive behaviour.
Battiti, R. (1992). First and second order methods for learning: between steepest descent and Newton's method. Neural Computation, 4, 141-166.
Bishop, C.M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press.
Breiman, L., Friedman, J.H., Losen, R.A. & Stone, C.J. (1984). Classification And Regression Trees. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software.
Broman, C.L. (2009). The longitudinal impact of adolescent drug use on socioeconomic outcomes in young adulthood. Journal of Child & Adolescent Substance Abuse, 18, 131-143.
Broomhead, D.S. & Lowe, D. (1988). Multivariable functional interpolation and adaptive networks. Complex Systems, 2, 321-355.
Buscema, M. (1995). Squashing Theory: A prediction approach for drug behavior. Drugs and Society, 8(3-4), 103-110.
Buscema, M. (1997). A general presentation of artificial neural networks. I. Substance Use & Misuse, 32(1), 97-112.
Buscema, M. (1998). Artificial neural networks and complex systems. I. Theory. Substance Use & Misuse, 33(1), 1-220.
Buscema, M., Intraligi, M. & Bricolo, R. (1998). Artificial neural networks for drug vulnerability recognition and dynamic scenarios simulation. Substance Use & Misuse, 33(3), 587-623.
Carvajal, S.C. & Granillo, T.M. (2006). A prospective test of distal and proximal determinants of smoking initiation in early adolescents. Addictive Behaviors, 31, 649-660.
Ciairano, S., Bosma, H.A., Miceli, R. & Settani, M. (2008). Adolescent substance use in two European countries: Relationships with psychosocial adjustment, peers, and activities. International Journal of Clinical and Health Psychology, 8(1), 119-138.
Clarke, B., Fokoué, E. & Zhang, H.H. (2009). Principles and Theory for Data Mining and Machine Learning. New York: Springer.
Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematical Control, Signal and Systems, 2, 303-314.
De Leeuw, R.N.H., Engels, R.C.M.E., Vermulst, A.A. & Scholte, R.H.J. (2009). Relative risks of exposure to different smoking models on the development of nicotine dependence during adolescence: a five-wave longitudinal study. Journal of Adolescent Health, 45, 171-178.
De Vries, H., Engels, R., Kremers, S., Wetzels, J. & Mudde, A. (2003). Parents’ and friends’ smoking status as predictors of smoking onset: Findings from six European countries. Health Education Research, 18, 627-636.
Dick, D.M., Pagan, J.L., Viken, R., Purcell, S., Kaprio, J., Pulkkinen, L. & Rose, R.J. (2007). Changing environmental influences on substance use across development. Twin Research and Human Genetics, 10(2), 315-326.
Doran, N., McCharge, D. & Cohen, L. (2007). Impulsivity and the reinforcing value of cigarette smoking. Addictive Behaviors, 32, 90-98.
Fernández, J.R., Secades, R., Vallejo, G. & Errasti, J.M. (2003). Evaluation of what parents know about their children’s drug use and how they perceive the most common family risk factors. Journal of Drug Education, 33, 334-350.
Fisher, L.B., Winickoff, J.P., Camargo, C.A., Colditz, G.A. & Frazier, A.L. (2007). Household smoking restrictions and adolescent smoking. American Journal of Health Promotion, 22, 15-21.
Fisher, R.A. (1936). The use of multiple measurements on taxonomic problems. Annals of Eugenics, 7, 179-188.
Franken, I.H.A., Muris, P. & Georgieva, I. (2006). Gray’s model of personality and addiction. Addictive Behaviors, 31, 399-403.
Funahashi, K. (1989). On the approximate realization of continuous mappings by neural networks. Neural Networks, 2, 183-192.
Georgiades, K. & Boyle, M.H. (2007). Adolescent tobacco and cannabis use: young adult outcomes from the Ontario Child Health Study. Journal of Child Psychology and Psychiatry, 48, 724-731.
Gervilla, E. & Palmer, A. (2009). Predicción del consumo de cocaína en adolescentes mediante árboles de decisión. Revista de Investigación en Educación, 6, 7-13.
Gervilla, E. & Palmer, A. (2010). Prediction of cannabis and cocaine use in adolescence using decision trees and logistic regression. The European Journal of Psychology Applied to Legal Context, 2(1), 19-35.
Gervilla, E., Cajal, B., Roca, J. & Palmer, A. (2010). Modelling alcohol consumption during adolescente using Zero Inflated Negative Binomial and Decision Trees. The European Journal of Psychology Applied to Legal Context, 2, 145-159.
Gervilla, E., Jiménez, R., Montaño, J.J., Sesé, A., Cajal, B. & Palmer, A. (2009). La metodología del Data Mining. Una aplicación al consumo de alcohol en adolescentes. Adicciones, 21(1), 65-80.
Giudici, P. (2003). Applied data mining: Statistical methods for business and industry. Chichester: Hoboken, NJ: Wiley.
Hall, J.A. & Valente, T.W. (2007). Adolescent smoking networks: The effect of influence and selection on future smoking. Addictive Behaviors, 32, 3054-3059.
Han, J. & Kamber, M. (2006). Data Mining: Concepts and Techniques (2nd. ed.). San Francisco: Morgan Kaufmann.
Hand, D., Mannila, H. & Smith, P. (2001). Principles of Data Mining. London: The MIT Press.
Hartman, E., Keeler, J.D. & Kowalski, J.M. (1990). Layered neural networks with Gaussian hidden units as universal approximators. Neural Computation, 2(2), 210-215.
Hernandez, J., Ramirez, M. & Ferri, C. (2004). Introducción a la Minería de Datos [Introduction to Data Mining]. Madrid: Pearson Educación, S.A.
Hoffman, B.R., Monge, P.R., Chou, C.P. & Valente, T.W. (2007). Perceived peer influence and peer selection on adolescent smoking. Addictive Behaviors, 32, 1546-1554.
Hoffman, J.H., Welte, J.W. & Barnes, G.M. (2001). Co-ocurrence of alcohol and cigarette use among adolescents. Addictive Behaviors, 26, 63-78.
Hornik, K., Stinchcombe, M. & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2, 359-366.
Hosmer, D.W. & Lemeshow, S. (2000). Applied Logistic Regression (2nd edition). New York: Wiley.
Huver, R.M.E., Engels, R.C.M.E., Vermulst, A.A. & De Vries, H. (2007). Is parenting style a context for smoking-specific parenting practices? Drug and Alcohol Dependence, 89, 116-125.
Johnson, P. B., Boles, S. M. & Kleber, H. D. (2000). The relationship between adolescent smoking and drinking and likelihood estimates of illicit drug use. Journal of Addictive Diseases, 19(2), 75-82.
Kaastra, I., & Boyd, M. (1996). Designing a neural network for forecasting financial and economic time series. Neurocomputing, 10, 215-236.
Kantardzic, M. (2003). Data Mining: Concepts, Models, Methods, and Algorithms. New York: Wiley.
Kass, G.V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2), 119-127.
Kitsantas, P., Moore, T.W. & Sly, D.F. (2007). Using classification trees to profile adolescent smoking behaviors. Addictive Behaviors, 32, 9-23.
Larose, D.T. (2005). Discovering Knowledge in Data: An Introduction to Data Mining. Hoboken, NJ: Wiley.
Luther, E.J., Parzynski, C.S., Jaszyna-Gasior, M., Bagot, K.S., Royo, M.B., Leff, M.K. & Moolchan, E.T. (2008). Does allowing adolescents to smoke at home affect their consumption and dependence? Addictive Behaviors, 33, 836-840.
Maurelli, G. & Di Giulio, M. (1998). Artificial neural networks for the identification of the differences between “light” and “heavy” alcoholics, starting from five nonlinear biological variables. Substance Use & Misuse, 33(3), 693-708.
Molyneux, A., Lewis, S., Antoniak, M., Browne, W., McNeill, A., Godfrey, C. & Britton, J. (2004). Prospective study of the effect of exposure to other smokers in high school tutor groups on the risk of incident smoking in adolescence. American Journal of Epidemiology, 159(2), 127-132.
Montaño, J.J., Palmer, A. & Muñoz, P. (2011). Artificial neural networks applied to forecasting time series. Psicothema, 23, 322-329.
Muñoz, M. & Graña, J.L. (2001). Factores familiares de riesgo y de protección para el consumo de drogas en adolescentes. Psicothema, 13(1), 87-94.
Okoli, C.T.C., Richardson, C.G. & Johnson, J.L. (2008). An examination of the relationship between adolescents’ initial smoking experience and their exposure to peer and family member smoking. Addictive Behaviors, 33, 1183-1191.
Otten, R., Engels, R.C.M.E. & Prinstein, M.J. (2009). A prospective study of perception in adolescent smoking. Journal of Adolescent Health, 44, 478-484.
Otten, R., Wanner, B., Vitaro, F. & Engels, R.C.M.E. (2009). Disruptiveness, peer experiences and adolescent smoking: a long-term longitudinal approach. Addiction,104, 641-650.
Palmer, A. & Montaño, J.J. (1999). ¿Qué son las redes neuronales artificiales? Aplicaciones realizadas en el ámbito de las adicciones. [What are artificial neural networks? Applications in the field of addictions]. Adicciones, 11, 243-255.
Palmer, A., Jiménez, R. & Gervilla, E. (2011). Knowledge-Oriented Applications in Data Mining. In Data Mining: Machine learning and statistical techniques. Viena: Intech. Open Access Publisher.
Palmer, A., Montaño, J.J. & Calafat, A. (2000). Predicción del consumo de éxtasis a partir de redes neuronales artificiales [Ecstasy consumption prediction on the basis of artificial neural networks]. Adicciones, 12, 29-41.
Parr-Rud, O. (2001). Data Mining Cookbook. Modeling Data for Marketing, Risk and Customer Relationship Management. New York: John Wiley & Sons.
Pérez, C. & Santín, D. (2007). Minería de Datos. Técnicas y Herramientas. Madrid: Thomson.
Piko, B.F. (2006). Adolescent smoking and drinking: The role of communal mastery and other social influences. Addictive Behaviors, 31, 102-114.
Quinlan, J.R. (1986). Induction of Decision Trees. Machine Learning, 1, 81-106.
Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. San Mateo: Morgan Kaufmann.
Quinlan, J.R. (1997). C5.0 Data Mining Tool. Rule Quest Research, http://www.rulequest.com.
Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by error propagation. In D.E. Rumelhart, & J.L. McClelland (Eds.), Parallel distributed processing (pp. 318-362). Cambridge, MA: MIT Press.
Sargent, J.D., Tanski, S., Stoolmiller M. & Hanewinkel, R. (2009). Using sensation seeking to target adolescents for substance use interventions. Addiction, 105, 506-514.
Shmueli, G., Patel, N.R. & Bruce, P.C. (2007). Data mining in excel: Lecture notes and cases. Arlington, VA: Resampling Stats, Inc.
Simons-Morton, B. (2007). Social influences on adolescent substance use. American Journal of Health Behavior, 31, 672-684.
Specht, D.F. (1990). Probabilistic neural networks. Neural Networks, 3, 109-118.
Speri, L., Schilirò, G., Bezzetto, A., Cifelli, G., De Battisti, L., Marchi, S., Modenese, M., Varalta, F. & Consigliere, F. (1998). The use of artificial neural networks methodology in the assessment of “vulnerability” to heroin use among army corps soldiers: A preliminary study of 170 cases inside the Military Hospital of Legal Medicine of Verona. Substance Use & Misuse, 33(3), 555-586.
Szabo, E., White, V. & Hayman, J. (2006). Can home smoking restrictions influence adolescents’ smoking behaviors if their parents and friends smoke? Addictive Behaviors, 31(12), 2298-2303.
Wasserman, P.D. (1989). Neural computing: theory and practice. New York: Van Nostrand Reinhold.
Widrow, B. & Hoff, M. (1960). Adaptive switching circuits. In J. Anderson & E. Rosenfeld (Eds.), Neurocomputing (pp. 126-134). Cambridge, Mass.: The MIT Press.
Witten, I.H. & Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques (2nd. ed.). San Francisco: Morgan Kaufmann.
Ye, N. (Ed.) (2003). The Handbook of Data Mining. Mahwah, NJ: Lawrence Erlbaum Associates.
Copyright (c) 2014 Servicio de Publicaciones de la Universidad de Murcia
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The works published in this journal are subject to the following terms:
1. The Publications Service of the University of Murcia (the publisher) retains the property rights (copyright) of published works, and encourages and enables the reuse of the same under the license specified in paragraph 2.
2. The works are published in the online edition of the journal under a Creative Commons Attribution-NonCommercial 4.0 (legal text). You can copy, use, distribute, transmit and publicly display, provided that: i) you cite the author and the original source of publication (journal, editorial and URL of the work), ii) are not used for commercial purposes, iii ) mentions the existence and specifications of this license.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
3. Conditions of self-archiving. Is allowed and encouraged the authors to disseminate electronically pre-print versions (version before being evaluated and sent to the journal) and / or post-print (version reviewed and accepted for publication) of their works before publication, as it encourages its earliest circulation and diffusion and thus a possible increase in its citation and scope between the academic community. RoMEO Color: Green.