It's hard to avoid, but there are steps we can make towards fixing it. I spent years in academi...

nippoo • today at 4:25 AM • 1 reply • view on HN

It's hard to avoid, but there are steps we can make towards fixing it. I spent years in academia building open-source data processing pipelines for neuroscience data and helping other researchers do the same. Most quantitative research goes through "lossy" steps between raw data and final results involving Excel spreadsheets, one-off MATLAB commands, copy pasting the results, etc.

In a lot of cases (where data is being collected by humans with a tape measure, say) there is room for error. But one of the things that's getting traction in some fields is open-source publication of both raw datasets and the evaluation/processing methods (in a Jupyter Notebook, say) in a way that lets other people run their analysis on your data, your analysis on their data, or at least re-run your start-to-finish pipeline and look for errors!

As is often the case, the holdups are mostly political: methods papers are less prestigious than the "real science" ones, and it takes journals / funders to mandate these things and provide funding/hosting for datasets for 10+ years, etc - researchers are a time-poor bunch and often won't do things unless there's an incentive to!

Replies

jiggunjer • today at 6:00 AM

Taking notebooks to a production environment isn't fun either. With ai there's no more excuse for using that coding crutch.

alt Hacker News

Replies