Data mining classification techniques: an application to tobacco consumption in teenagers
Supporting Agencies
- National Plan on Drugs (INT/2012/2002)
Abstract
This study is aimed at analysing the predictive power of different psychosocial and personality variables on the consumption or non-consumption of nicotine in a teenage population using different classification techniques from the field of Data Mining. More specifically, we analyse ANNs – Multilayer Perceptron (MLP), Radial Basis Functions (RBF) and Probabilistic Neural Networks (PNNs) – decision trees, the logistic regression model and discriminant analysis. To this end, we worked with a sample of 2666 teenagers, 1378 of whom do not consume nicotine while 1288 are nicotine consumers. The models analysed were able to discriminate correctly between both types of subjects within a range of 77.39% to 78.20%, achieving 91.29% sensitivity and 74.32% specificity. With this study, we place at the disposal of specialists in addictive behaviours a set of advanced statistical techniques that are capable of simultaneously processing a large quantity of variables and subjects, as well as learning complex patterns and relationships automatically, in such a way that they are very appropriate for predicting and preventing addictive behaviour.Downloads
-
Abstract693
-
PDF422
-
Sin título422
-
Resúmenes Redes y Tabaco ...422
References
Battiti, R. (1992). First and second order methods for learning: between steepest descent and Newton's method. Neural Computation, 4, 141-166.
Bishop, C.M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press.
Breiman, L., Friedman, J.H., Losen, R.A. & Stone, C.J. (1984). Classification And Regression Trees. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software.
Broman, C.L. (2009). The longitudinal impact of adolescent drug use on socioeconomic outcomes in young adulthood. Journal of Child & Adolescent Substance Abuse, 18, 131-143.
Broomhead, D.S. & Lowe, D. (1988). Multivariable functional interpolation and adaptive networks. Complex Systems, 2, 321-355.
Buscema, M. (1995). Squashing Theory: A prediction approach for drug behavior. Drugs and Society, 8(3-4), 103-110.
Buscema, M. (1997). A general presentation of artificial neural networks. I. Substance Use & Misuse, 32(1), 97-112.
Buscema, M. (1998). Artificial neural networks and complex systems. I. Theory. Substance Use & Misuse, 33(1), 1-220.
Buscema, M., Intraligi, M. & Bricolo, R. (1998). Artificial neural networks for drug vulnerability recognition and dynamic scenarios simulation. Substance Use & Misuse, 33(3), 587-623.
Carvajal, S.C. & Granillo, T.M. (2006). A prospective test of distal and proximal determinants of smoking initiation in early adolescents. Addictive Behaviors, 31, 649-660.
Ciairano, S., Bosma, H.A., Miceli, R. & Settani, M. (2008). Adolescent substance use in two European countries: Relationships with psychosocial adjustment, peers, and activities. International Journal of Clinical and Health Psychology, 8(1), 119-138.
Clarke, B., Fokoué, E. & Zhang, H.H. (2009). Principles and Theory for Data Mining and Machine Learning. New York: Springer.
Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematical Control, Signal and Systems, 2, 303-314.
De Leeuw, R.N.H., Engels, R.C.M.E., Vermulst, A.A. & Scholte, R.H.J. (2009). Relative risks of exposure to different smoking models on the development of nicotine dependence during adolescence: a five-wave longitudinal study. Journal of Adolescent Health, 45, 171-178.
De Vries, H., Engels, R., Kremers, S., Wetzels, J. & Mudde, A. (2003). Parents’ and friends’ smoking status as predictors of smoking onset: Findings from six European countries. Health Education Research, 18, 627-636.
Dick, D.M., Pagan, J.L., Viken, R., Purcell, S., Kaprio, J., Pulkkinen, L. & Rose, R.J. (2007). Changing environmental influences on substance use across development. Twin Research and Human Genetics, 10(2), 315-326.
Doran, N., McCharge, D. & Cohen, L. (2007). Impulsivity and the reinforcing value of cigarette smoking. Addictive Behaviors, 32, 90-98.
Fernández, J.R., Secades, R., Vallejo, G. & Errasti, J.M. (2003). Evaluation of what parents know about their children’s drug use and how they perceive the most common family risk factors. Journal of Drug Education, 33, 334-350.
Fisher, L.B., Winickoff, J.P., Camargo, C.A., Colditz, G.A. & Frazier, A.L. (2007). Household smoking restrictions and adolescent smoking. American Journal of Health Promotion, 22, 15-21.
Fisher, R.A. (1936). The use of multiple measurements on taxonomic problems. Annals of Eugenics, 7, 179-188.
Franken, I.H.A., Muris, P. & Georgieva, I. (2006). Gray’s model of personality and addiction. Addictive Behaviors, 31, 399-403.
Funahashi, K. (1989). On the approximate realization of continuous mappings by neural networks. Neural Networks, 2, 183-192.
Georgiades, K. & Boyle, M.H. (2007). Adolescent tobacco and cannabis use: young adult outcomes from the Ontario Child Health Study. Journal of Child Psychology and Psychiatry, 48, 724-731.
Gervilla, E. & Palmer, A. (2009). Predicción del consumo de cocaína en adolescentes mediante árboles de decisión. Revista de Investigación en Educación, 6, 7-13.
Gervilla, E. & Palmer, A. (2010). Prediction of cannabis and cocaine use in adolescence using decision trees and logistic regression. The European Journal of Psychology Applied to Legal Context, 2(1), 19-35.
Gervilla, E., Cajal, B., Roca, J. & Palmer, A. (2010). Modelling alcohol consumption during adolescente using Zero Inflated Negative Binomial and Decision Trees. The European Journal of Psychology Applied to Legal Context, 2, 145-159.
Gervilla, E., Jiménez, R., Montaño, J.J., Sesé, A., Cajal, B. & Palmer, A. (2009). La metodología del Data Mining. Una aplicación al consumo de alcohol en adolescentes. Adicciones, 21(1), 65-80.
Giudici, P. (2003). Applied data mining: Statistical methods for business and industry. Chichester: Hoboken, NJ: Wiley.
Hall, J.A. & Valente, T.W. (2007). Adolescent smoking networks: The effect of influence and selection on future smoking. Addictive Behaviors, 32, 3054-3059.
Han, J. & Kamber, M. (2006). Data Mining: Concepts and Techniques (2nd. ed.). San Francisco: Morgan Kaufmann.
Hand, D., Mannila, H. & Smith, P. (2001). Principles of Data Mining. London: The MIT Press.
Hartman, E., Keeler, J.D. & Kowalski, J.M. (1990). Layered neural networks with Gaussian hidden units as universal approximators. Neural Computation, 2(2), 210-215.
Hernandez, J., Ramirez, M. & Ferri, C. (2004). Introducción a la Minería de Datos [Introduction to Data Mining]. Madrid: Pearson Educación, S.A.
Hoffman, B.R., Monge, P.R., Chou, C.P. & Valente, T.W. (2007). Perceived peer influence and peer selection on adolescent smoking. Addictive Behaviors, 32, 1546-1554.
Hoffman, J.H., Welte, J.W. & Barnes, G.M. (2001). Co-ocurrence of alcohol and cigarette use among adolescents. Addictive Behaviors, 26, 63-78.
Hornik, K., Stinchcombe, M. & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2, 359-366.
Hosmer, D.W. & Lemeshow, S. (2000). Applied Logistic Regression (2nd edition). New York: Wiley.
Huver, R.M.E., Engels, R.C.M.E., Vermulst, A.A. & De Vries, H. (2007). Is parenting style a context for smoking-specific parenting practices? Drug and Alcohol Dependence, 89, 116-125.
Johnson, P. B., Boles, S. M. & Kleber, H. D. (2000). The relationship between adolescent smoking and drinking and likelihood estimates of illicit drug use. Journal of Addictive Diseases, 19(2), 75-82.
Kaastra, I., & Boyd, M. (1996). Designing a neural network for forecasting financial and economic time series. Neurocomputing, 10, 215-236.
Kantardzic, M. (2003). Data Mining: Concepts, Models, Methods, and Algorithms. New York: Wiley.
Kass, G.V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2), 119-127.
Kitsantas, P., Moore, T.W. & Sly, D.F. (2007). Using classification trees to profile adolescent smoking behaviors. Addictive Behaviors, 32, 9-23.
Larose, D.T. (2005). Discovering Knowledge in Data: An Introduction to Data Mining. Hoboken, NJ: Wiley.
Luther, E.J., Parzynski, C.S., Jaszyna-Gasior, M., Bagot, K.S., Royo, M.B., Leff, M.K. & Moolchan, E.T. (2008). Does allowing adolescents to smoke at home affect their consumption and dependence? Addictive Behaviors, 33, 836-840.
Maurelli, G. & Di Giulio, M. (1998). Artificial neural networks for the identification of the differences between “light” and “heavy” alcoholics, starting from five nonlinear biological variables. Substance Use & Misuse, 33(3), 693-708.
Molyneux, A., Lewis, S., Antoniak, M., Browne, W., McNeill, A., Godfrey, C. & Britton, J. (2004). Prospective study of the effect of exposure to other smokers in high school tutor groups on the risk of incident smoking in adolescence. American Journal of Epidemiology, 159(2), 127-132.
Montaño, J.J., Palmer, A. & Muñoz, P. (2011). Artificial neural networks applied to forecasting time series. Psicothema, 23, 322-329.
Muñoz, M. & Graña, J.L. (2001). Factores familiares de riesgo y de protección para el consumo de drogas en adolescentes. Psicothema, 13(1), 87-94.
Okoli, C.T.C., Richardson, C.G. & Johnson, J.L. (2008). An examination of the relationship between adolescents’ initial smoking experience and their exposure to peer and family member smoking. Addictive Behaviors, 33, 1183-1191.
Otten, R., Engels, R.C.M.E. & Prinstein, M.J. (2009). A prospective study of perception in adolescent smoking. Journal of Adolescent Health, 44, 478-484.
Otten, R., Wanner, B., Vitaro, F. & Engels, R.C.M.E. (2009). Disruptiveness, peer experiences and adolescent smoking: a long-term longitudinal approach. Addiction,104, 641-650.
Palmer, A. & Montaño, J.J. (1999). ¿Qué son las redes neuronales artificiales? Aplicaciones realizadas en el ámbito de las adicciones. [What are artificial neural networks? Applications in the field of addictions]. Adicciones, 11, 243-255.
Palmer, A., Jiménez, R. & Gervilla, E. (2011). Knowledge-Oriented Applications in Data Mining. In Data Mining: Machine learning and statistical techniques. Viena: Intech. Open Access Publisher.
Palmer, A., Montaño, J.J. & Calafat, A. (2000). Predicción del consumo de éxtasis a partir de redes neuronales artificiales [Ecstasy consumption prediction on the basis of artificial neural networks]. Adicciones, 12, 29-41.
Parr-Rud, O. (2001). Data Mining Cookbook. Modeling Data for Marketing, Risk and Customer Relationship Management. New York: John Wiley & Sons.
Pérez, C. & Santín, D. (2007). Minería de Datos. Técnicas y Herramientas. Madrid: Thomson.
Piko, B.F. (2006). Adolescent smoking and drinking: The role of communal mastery and other social influences. Addictive Behaviors, 31, 102-114.
Quinlan, J.R. (1986). Induction of Decision Trees. Machine Learning, 1, 81-106.
Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. San Mateo: Morgan Kaufmann.
Quinlan, J.R. (1997). C5.0 Data Mining Tool. Rule Quest Research, http://www.rulequest.com.
Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by error propagation. In D.E. Rumelhart, & J.L. McClelland (Eds.), Parallel distributed processing (pp. 318-362). Cambridge, MA: MIT Press.
Sargent, J.D., Tanski, S., Stoolmiller M. & Hanewinkel, R. (2009). Using sensation seeking to target adolescents for substance use interventions. Addiction, 105, 506-514.
Shmueli, G., Patel, N.R. & Bruce, P.C. (2007). Data mining in excel: Lecture notes and cases. Arlington, VA: Resampling Stats, Inc.
Simons-Morton, B. (2007). Social influences on adolescent substance use. American Journal of Health Behavior, 31, 672-684.
Specht, D.F. (1990). Probabilistic neural networks. Neural Networks, 3, 109-118.
Speri, L., Schilirò, G., Bezzetto, A., Cifelli, G., De Battisti, L., Marchi, S., Modenese, M., Varalta, F. & Consigliere, F. (1998). The use of artificial neural networks methodology in the assessment of “vulnerability” to heroin use among army corps soldiers: A preliminary study of 170 cases inside the Military Hospital of Legal Medicine of Verona. Substance Use & Misuse, 33(3), 555-586.
Szabo, E., White, V. & Hayman, J. (2006). Can home smoking restrictions influence adolescents’ smoking behaviors if their parents and friends smoke? Addictive Behaviors, 31(12), 2298-2303.
Wasserman, P.D. (1989). Neural computing: theory and practice. New York: Van Nostrand Reinhold.
Widrow, B. & Hoff, M. (1960). Adaptive switching circuits. In J. Anderson & E. Rosenfeld (Eds.), Neurocomputing (pp. 126-134). Cambridge, Mass.: The MIT Press.
Witten, I.H. & Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques (2nd. ed.). San Francisco: Morgan Kaufmann.
Ye, N. (Ed.) (2003). The Handbook of Data Mining. Mahwah, NJ: Lawrence Erlbaum Associates.
About Copyright and Licensing, more details here.


