The small impact of p-hacking marginally significant results on the meta-analytic estimation of effect size

Authors

  • Juan Botella Universidad Autónoma de Madrid
  • Manuel Suero Autonomous Unoversity of madrid
  • Juan I. Durán
  • Desirée Blazquez
DOI: https://doi.org/10.6018/analesps.433051
Keywords: p-hacking, Effect size, Meta-analysis

Abstract

The label p-hacking (pH) refers to a set of opportunistic practices aimed at making statistically significant p values that should be non-significant. Some have argued that we should prevent and fight against pH for several reasons, especially because of its potential harmful effects on the assessment of both primary research results and their meta-analytical synthesis. We focus here on the effect of a specific type of pH, focused on marginally significant studies, on the combined estimation of effect size in meta-analysis. We want to know how much we should be concerned with its biasing effect when assessing the results of a meta-analysis. We have calculated the bias in a range of situations that seem realistic in terms of the prevalence and the operational definition of pH. The results show that in most of the situations analyzed the bias is less than one hundredth (± 0.01), in terms of d or r. To reach a level of bias of five-hundredths (± 0.05), there would have to be a massive presence of this type of pH, which seems rather unrealistic. There are many good reasons for fighting against pH, but our main conclusion is that among them is not that it has a big impact on the meta-analytical estimation of effect size.

Downloads

Download data is not yet available.

Author Biography

Juan Botella, Universidad Autónoma de Madrid

Facultad de Psicologia

Universidad Autonoma de Madrid

References

Anvari, F., & Lakens, D. (2019). The replicability crisis and public trust in psychological science. Comprehensive Results in Social Psychology, 1-21.

Baker, M. (2016). Is there a reproducibility crisis? A Nature survey lifts the lid on how researchers view the ‘crisis’ rocking science and what they think will help. Nature, 533(7604), 452-455.

Bakker, M., van Dijk, A,. & Wicherts, J. M. (2012). The Rules of the Game Called Psychological Science. Perspectives on Psychological Science, 7, 543-554.

Banks, G. C., Rogelberg, S. G., Woznyj, H. M., Landis, R. S., & Rupp, D. E. (2016). Evidence on questionable research practices: The good, the bad, and the ugly. Journal of Business and Psychology, 31:323–338.

Belas, N., Bengart, P., & Vogt, B. (2017). P-hacking in Clinical Trials. Working Paper Series.

Bishop, D. V., & Thompson, P. A. (2016). Problems in using p-curve analysis and text-mining to detect rate of p-hacking and evidential value. PeerJ, 4, e1715.

Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2010). A basic introduction to fixed-effects and random-effects models for meta-analysis. Research Synthesis Methods, 1, 97-111.

Bosco, F. A., Aguinis, H., Singh, K., Field, J. G., & Pierce, C. A. (2015). Correlational effect size benchmarks. Journal of Applied Psychology, 100, 431–449.

Botella, J., & Duran, J. I. (2019). A meta-analytical answer to the crisis of confidence of Psychology. Anales De Psicología/Annals of Psychology, 35(2), 350-356.

Botella, J., Ximénez, M. C., Revuelta, J., & Suero, M. (2006). Optimization of sample size in controlled experiments: the CLAST rule. Behavior Research Methods, Instruments & Computers, 38(1), 65-76.

Brodeur, A., Lé, M., Sangnier, M., & Zylberberg, Y. (2016). Star wars: The empirics strike back. American Economic Journal: Applied Economics, 8(1), 1-32.

Carter, E. C., Schönbrodt, F. D., Gervais, W. M., & Hilgard, J. (2019). Correcting for bias in psychology: A comparison of meta-analytic methods. Advances in Methods and Practices in Psychological Science, 2(2), 115-144.

Cohen, J. (1988). Statistical power analysis for the behavioural sciences, 2ª ed. New York: Academic Press.

De Boeck, P., & Jeon, M. (2018). Perceived crisis and reforms: Issues, explanations, and remedies. Psychological Bulletin, 144(7), 757.

De Winter, J. C., & Dodou, D. (2015). A surge of p-values between 0.041 and 0.049 in recent decades (but negative results are increasing rapidly too). PeerJ, 3, e733.

DeCoster, J., Sparks, E. A., Sparks, J. C., Sparks, G. G., & Sparks, C. W. (2015). Opportunistic biases: Their origins, effects, and an integrated solution. American Psychologist, 70(6), 499.

Earp, B. D., & Trafimow, D. (2015). Replication, falsification, and the crisis of confidence in social psychology. Frontiers in Psychology, 6, 621.

Fanelli, D. (2009). How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PloS one, 4(5), e5738.

Fiedler, K., & Schwarz, N. (2016). Questionable research practices revisited. Social Psychological and Personality Science, 7(1), 45-52.

Francis, G. (2012). Publication bias and the failure of replication in experimental psychology. Psychonomic Bulletin & Review, 19(6), 975-991.

Franco, A., Malhotra, N., & Simonovits, G. (2014). Publication bias in the social sciences: Unlocking the file drawer. Science, 345(6203), 1502-1505.

Friese, M., & Frankenbach, J. (2019). p-Hacking and publication bias interact to distort meta-analytic effect size estimates. Psychological Methods. Advance online publication. https://doi.org/10.1037/met0000246

Hall, J., & Martin, B. R. (2019). Towards a taxonomy of research misconduct: The case of business school research. Research Policy, 48(2), 414-427.

Hartgerink, C. H. (2017). Reanalyzing Head et al.(2015): Investigating the robustness of widespread p-hacking. PeerJ, 5, e3068.

Head, M. L., Holman, L., Lanfear, R., Kahn, A. T., & Jennions, M. D. (2015). The extent and consequences of p-hacking in science. PLoS Biology, 13(3), e1002106.

Hedges, L. V., & Vevea, J. L. (2005). Selection method approaches. In H. R. Rothstein, A. J. Sutton, & M. Borenstein (Eds.), Publication bias in meta-analysis: Prevention, assessment and adjustments (pp. 145–174). Chichester, England: John Wiley & Sons.

Holtfreter, K., Reisig, M. D., Pratt, T. C., & Mays, R. D. (2019). The perceived causes of research misconduct among faculty members in the natural, social, and applied sciences. Studies in Higher Education, 1-13.

Ioannidis, J. P., & Trikalinos, T. A. (2007). An exploratory test for an excess of significant findings. Clinical Trials, 4(3), 245-253.

John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524-532.

Johnson, N. L., Kotz, S., & Balakrishnan, N. (1994). Continuous Univariate Distributions (2nd edition). New York, John Wiely Sons. Inc Vol. 2.

Kraemer, H. C., Gardner, C., Brooks, J., & Yesavage, J. A. (1998). Advantages of excluding underpowered studies in meta-analysis: Inclusionist versus exclusionist viewpoints. Psychological Methods, 3(1), 23-31.

Krawczyk, M. (2015). The search for significance: a few peculiarities in the distribution of P values in experimental psychology literature. PloS One, 10(6), e0127872.

Krishna, A., & Peter, S. M. (2018). Questionable research practices in student final theses–Prevalence, attitudes, and the role of the supervisor’s perceived attitudes. PloS One, 13(8), e0203470.

Lane, D. M., & Dunlap, W. P. (1978). Estimating effect size: Bias resulting from the significance criterion in editorial decisions. British Journal of Mathematical & Statistical Psychology, 31, 107-112.

Leggett, N. C., Thomas, N. A., Loetscher, T., & Nicholls, M. E. (2013). The life of p: “Just significant” results are on the rise. The Quarterly Journal of Experimental Psychology, 66(12), 2303-2309.

Marszalek, J. M., Barber, C., Kohlhart, J., & Cooper, B. H. (2011). Sample size in psychological research over the past 30 years. Perceptual and Motor Skills, 112(2), 331-348.

Martinson, B. C., Anderson, M. S., & De Vries, R. (2005). Scientists behaving badly. Nature, 435(7043), 737.

Marusic, A., Wager, E., Utrobicic, A., Rothstein, H. R., & Sambunjak, D. (2016). Interventions to prevent misconduct and promote integrity in research and publication. Cochrane Database of Systematic Reviews, (4).

Mueller, G. P. (2018). When the search for truth fails: A computer simulation of the impact of the publication bias on the meta-analysis of scientific literature. Scientometrics, 117(3), 2061-2076.

Pashler, H., & Harris, C. R. (2012). Is the replicability crisis overblown? Three arguments examined. Perspectives on Psychological Science, 7(6), 531-536.

Pashler, H., & Wagenmakers, E.J. (2012). Editors’ Introduction to the Special Section on Replicability in Psychological Science: A Crisis of Confidence? Perspectives on Psychological Science, 7, 528-530.

Richard, F. D., Bond, C. F., Jr., & Stokes-Zoota, J. J. (2003). One hundred years of social psychology quantitatively described. Review of General Psychology, 7, 331–363.

Ross, L. (2018). From the fundamental attribution error to the truly fundamental attribution error and beyond: My research journey. Perspectives on Psychological Science, 13(6), 750-769.

Rothstein, H. R., Sutton, A. J., & Borenstein, M. (Eds.) (2005). Publication bias in meta-analysis: Prevention, assessment, and adjustments. Nueva York: Wiley.

Rubio-Aparicio, M., Marín-Martínez, F., Sánchez-Meca, J., & López-López, J. A. (2018). A methodological review of meta-analyses of the effectiveness of clinical psychology treatments. Behavior Research Methods, 50(5), 2057-2073.

Schneck, A. (2018). Examining publication bias—a simulation-based evaluation of statistical tests on publication bias. PeerJ, 5, e4115.

Sijtsma, K. (2016). Playing with data—or how to discourage questionable research practices and stimulate researchers to do things right. Psychometrika, 81(1), 1-15.

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359-1366.

Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014a). P-curve: a key to the file-drawer. Journal of Experimental Psychology: General, 143(2), 534.

Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014b). p-curve and effect size: Correcting for publication bias using only significant results. Perspectives on Psychological Science, 9, 666–681.

Stricker, J., & Günther, A. (2019). Scientific misconduct in psychology: A systematic review of prevalence estimates and new empirical data. Zeitschrift für Psychologie, 227(1), 53.

Ulrich, R., & Miller, J. (2015). p-hacking by post hoc selection with multiple opportunities: Detectability by skewness test?: Comment on Simonsohn, Nelson, and Simmons (2014). Journal of Experimental Psychology: General, 144(6), 1137-1145.

Ulrich, R., & Miller, J. (2018). Some properties of p-curves, with an application to gradual publication bias. Psychological Methods, 23(3), 546.

van Aert, R. C., Wicherts, J. M., & van Assen, M. A. (2019). Publication bias examined in meta-analyses from psychology and medicine: A meta-meta-analysis. PloS One, 14(4), e0215052.

van Assen, M. A. L. M., van Aert, R. C. M., & Wicherts, J. M. (2015). Meta-analysis using effect size distributions of only statistically significant studies. Psychological Methods, 20, 293–309.

Wigboldus, D. H., & Dotsch, R. (2016). Encourage playing with data and discourage questionable reporting practices. Psychometrika, 81(1), 27-32.

Yong, E. (2012). In the wake of high-profile controversies, psychologists are facing up to problems with replication. Nature, 485, 298-300.

Published
01-01-2021
How to Cite
Botella, J., Suero, M., Durán , J. I., & Blazquez, D. (2021). The small impact of p-hacking marginally significant results on the meta-analytic estimation of effect size. Anales de Psicología / Annals of Psychology, 37(1), 178–187. https://doi.org/10.6018/analesps.433051
Issue
Section
Methodology