With AIs, it seems like there never is a comparison that is useful.
You can build evals. Look at Harbor or Inspect. It’s just more work than most are interested in doing right now.
yup its all vibes. And anthropic is winning on those in my book still
You can build evals. Look at Harbor or Inspect. It’s just more work than most are interested in doing right now.