The prevalence and problems of thoughtless statistical analysis
An ideal statistical analysis will use appropriate methods to create insights from the data and inform the research questions. Unfortunately, many current statistical analyses are far from ideal, with many researchers using the wrong methods, misinterpreting the results, or failing to adequately check their assumptions. Many researchers may not have received adequate training in research methods, and statistics is something they do with trepidation and even ignorance. However, using the wrong statistical methods can cause real harm and bad statistical practices are being used to abet weak science.
Even when the correct methods are used, many researchers fail to describe them adequately, making it difficult to reproduce the results. I will describe our team's research looking at how statistical methods are described and applied. We will show the prevalence of "boilerplate" statistical methods sections in papers, where virtually the same text is used regardless of the research question.
We will examine researchers' use of linear regression, which is a widely used and useful method for uncovering associations between variables. Unfortunately, most researchers apply this method badly, with a focus on the p-value rather than the strength of the association, and little to no checking of the key underlying assumptions. These results will look at papers published in the multi-disciplinary open access journal "PLOS ONE". I will discuss potential ways these problems could be reduced, including education, journal policies and automation.