💾 Archived View for spxtr.net › 20211126.gmi captured on 2021-12-17 at 13:26:06. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2021-11-30)

-=-=-=-=-=-=-

Code for scientific publications should be public

Written November 26, 2021. Contact me@spxtr.net with any questions or comments.

My friend, as part of her PhD in astrophysics, wanted to extend a model that was published in a peer-reviewed journal. She first needed to reproduce the results from the paper. No worries, it was a fairly straightforward set of differential equations. Immediately however, her attempt at reproducing the result gave numbers two orders of magnitude off of what was originally published. She figured she must have done something wrong, and asked the original authors for their code, so that she could use it to debug hers. They refused.

Over the next six months she spent many hours both poring over her own code and corresponding with the authors of the original paper. They suggested numerous checks and tests for her to perform, all of which came back as they expected. She spent literally hundreds of hours trying to figure out what was wrong.

After seven months, the original authors broke down and sent her their code. Within a couple hours, she found the bug: they had simply defined a constant incorrectly. The main results of the paper were thus wrong. Rather than retracting the paper, they decided to add a note to a follow-up paper indicating the mistake. Of course, if some scientist happens to read the original paper they will have no idea that it is outright wrong, but okay. The publication isn't actually important to anyone but the authors, anyway.

Should the reviewers have caught this? Because they did not have access to the code, the only way they could have caught the issue was if they tried to implement it themselves. Reviewers are horrifically underpaid (that is, not paid at all), so that wasn't going to happen.

Scientists, journals, and funding agencies need to shift their perspective into the twenty-first century. It is still the case that the most important output of a scientific project is a collection of few-page papers, although now at least they are often accompanied by longer supplemental materials. It is still not standard to include the raw data and code with these papers. In some cases you can get the data or code by emailing the corresponding author, but in many cases you still cannot. This is unacceptable.

Scientists: when you submit a preprint, additionally upload the code to some code repository, and cite the repository in the paper. Nobody cares if your code is ugly or uncommented or if it only runs on your computer and you don't know why. Upload it anyway.