For it to be evidence, you would need to know the number of Greptile comments made and how many of those comments were instead considered to be poor. You need to contrast false positive rate with true positive rate to simply plot a single point along a classifier curve. You would then need to contrast that with a control group of experts or a static linter which means you would need to modify the "conservativeness" of the classifier to produce multiple points along its ROC curve, then you could compare whether the classifier is better or worse than your control by comparing the ROC curves.
Sample number of true positives says more or less nothing on its own.