Statistical matching in practice – An application to the evaluation of the education system from PISA and TALIS
Abstract
Statistical matching methods are aimed at the integration of information collected through multiple sources, usually, surveys drawn from some target population. As opposed to record linkage methods -where we search for identical units-, in statistical matching we search for similar units in order to find statistical relations across databases. Methods: Statistical matching is feasible provided that the independent surveys share a common block of variables. A particular solution is based on imputation methods for missing data: first, the distinct files are concatenated (i.e. rows and columns are joined together to form a unique file); next, empty cells corresponding to non-observed values are interpreted as missing data, and they are imputed according to observed data. Results: The fundamental concepts of statistical matching are shown, and the process is illustrated with the PISA (2012) and TALIS (2013) educational studies with Spain’s data. Imputations are carried out using mice package from the free R software. A first validation of the results is performed. Conclusions: Statistical matching offers high potential benefits for the social sciences since it enables to relate information from independent information sources. These techniques can now be applied with relative ease thanks to the development of tools such as R computing environment.
Downloads
References
Breakspear, S. (2012). The policy impact of PISA: An Exploration of the Normative Effects of International Benchmarking in School System Performance. OECD Journals, 71, 1–32. http://dx.doi.org/10.1787/19939019
Choi, A., & Jerrim, J. (2016). The Use (and Misuse) of PISA in Guiding Policy Reform: The Case of Spain. Comparative education. 56(2), 230–245. http://dx.doi.org/10.1080/03050068.2016.1142739
D’Orazio, M., Di Zio, M., & Scanu, M. (2006). Statistical Matching: Theory and Practice. NJ: Wiley.
D’Orazio, M. (2012). StatMach: Statistical Matching. R package version 1.2.0. http://CRAN.R-project.org/package=StatMatch
D’Orazio, M. (2013). Statistical Matching: Metodological issues and practice with R-StatMatch (or. 69). XXVI. Seminario Internacional de Estadística. Eustat. http://www.eustat.es/prodserv/seminario_i.html#axzz2sF9JV1rV
Eurostat (2008). Recommendations on the use of methodologies for the integration of surveys and administrative data. http://www.cros-portal.eu/sites/default/files//Report_of_WP2.doc
Fernández-Díaz, M. J.; Rodríguez-Mantilla J. M., & Martínez-Zarzuelo, A. (2016). PISA y TALIS ¿congruencia o discrepancia? RELIEVE, 22(1), art. M6. http://dx.doi.org/10.7203/relieve.22.1.8247
González-Such, J., Sancho-Álvarez, C., & Sánchez-Delgado, P. (2016). Cuestionarios de contexto pisa: Un estudio sobre los indicadores de evaluación. RELIEVE, 22(1), art. M7. http://dx.doi.org/10.7203/relieve.22.1.8274
Gustafsson, J. E. (2003). What Do We Know About Effects of School Resources on Educational Results? Swedish Economic Policy Review, 10, 77-110.
Jolani S, Frank L.E., & van Buuren S (2014). Dual imputation model for incomplete longitudinal data. British Journal of Mathematical and Statistical Psychology, 67(2), 197-212. http://dx.doi.org/10.1111/bmsp.12021
Jong R., van Buuren S., & Spiess M. (2014). Multiple imputation of predictor variables using generalized additive models. Communications in Statistics - Simulation and Computation, 45(3), 1-18. http://dx.doi.org/10.1080/03610918.2014.911894
Kaplan, D., & Turner, A. (2012). Statistical Matching of PISA 2009 and TALIS 2008 Data in Iceland (OECD Education Working Papers). Paris: Organisation for Econo- mic Co-operation and Development. http://www.oecd-ilibrary.org/;jsessionid=2ah7v0n0eg9ce.x-oecd-live-02content/workingpaper/5k97g3zzvg30-en
Kaplan, D., & Turner, A. (2013). Data fusion with international large scale assessments: a case study using the OECD PISA and TALIS Surveys. Springer-Verlag. http://link.springer.com/article/10.1186%2F2196-0739-1-6/fulltext.html
Leulescu, A., & Agafitei, M. (2013). Statistical matching: A model based approach for data integration. Luxembourg: European Commission, Eurostat. Publications Office.
OECD (2012). Pisa 2012. http://www.oecd.org/pisa/keyfindings/pisa-2012-results.htm
OECD (2013). Talis 2013. http://www.oecd.org/edu/school/talis.htm
Rässler, S. (2002). Statistical matching. A frequentist theory, practical applications, and alternative Bayesian approaches. New York: Springer.
Rubin. D.B. (1987). An overview on multiple imputation. http://www.amstat.org/sections/srms/Proceedings/papers/1988_016.pdf
Taut, S., & Palacios, D. (2016). Interpretaciones no intencionadas e intencionadas y usos de los resultados de PISA: Una perspectiva de validez consecuencial. RELIEVE, 22(1), art. M8. http://dx.doi.org/10.7203/relieve22.1.8294
Van Buuren, S., & Groothuis-Oudshoorn, K. (2011). MICE: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1–67.
Van Buuren, S. (2014). Mice. Imputation by random forests. http://www.inside-r.org/packages/cran/mice/docs/mice.impute.rf
Wheater, R. (2013). Achievement of 15 year olds in England: PISA 2012 national report. OECD Programme for International Student Assessment. https://www.nfer.ac.uk/publications/PQUK02/PQUK02.pdf
The articles and scientific documents published in RIE abide the following conditions:
1. The Servicio de Publicaciones de la Universidad de Murcia (the publisher) has the property rights (copyright) of all the documents published and allows the reuse under the user’s license indicated in point 2.
2. All documents are published in the digital edition of RIE under a Creative Commons Reconocimiento-NoComercial-SinObraDerivada 3.0 España (legal document) license. These documents can be copied, used, distributed, communicated and explained publicly if: i) the author(s) and its original source of publishing (magazine, publisher and URL of the document) are cited; ii) it is not used for commercial purpose; iii) the existence and the specifications about this license are mentioned.
3. Auto-archive’s conditions. The authors are allowed and encouraged to digitally distribute the pre-print versions (a version before evaluation) and/or post-print (a version that it is already evaluated and accepted to its publication). This promotes circulation and distribution earlier and can increase the citations and significance within the academic community.