>Clean data is expensive--as in, it takes real human labor to obtain clean data.
Yes, data can contain subtle errors that are expensive and difficult to find. But the 2nd error in the article was so obvious that a bright 10 year would probably have spotted it.
Agreed--and maybe they should have fixed it.
But sometimes the "provenance" of the data is important. I want to know whether I'm getting data straight from some source (even with errors) rather than having some intermediary make fixes that I don't know about.
For example, in the case where maybe they flipped the latitude and longitude, I don't want them to just automatically "fix" the data (especially not without disclosing that).
What they need to do is verify the outliers with the original gas station and fix the data from the source. But that's much more expensive.