But the lab must publish at least the general category of data, and if that doesn't replicate, then the model only works on a more specific category than they claim (e.g. only their dataset).
Even with the exact same dataset and architecture, ML results aren't perfectly replicable due to random weight initialisation, training data order, and non-deterministic GPU operations. I've trained identical networks on identical data and gotten different final weights and performance metrics.
This doesn't mean the model only works on that specific dataset - it means ML training is inherently stochastic. The question isn't 'can you get identical results' but 'can you get comparable performance on similar data distributions.
Even with the exact same dataset and architecture, ML results aren't perfectly replicable due to random weight initialisation, training data order, and non-deterministic GPU operations. I've trained identical networks on identical data and gotten different final weights and performance metrics.
This doesn't mean the model only works on that specific dataset - it means ML training is inherently stochastic. The question isn't 'can you get identical results' but 'can you get comparable performance on similar data distributions.