logoalt Hacker News

mickdarlingyesterday at 4:36 PM0 repliesview on HN

I use an LLM behavior test to see if the semantic responses from LLMs using my MCP server match what I expect them to. This is beyond the regex tests, but to see if there's a semantic response that's appropriate. Sometimes the LLMs kick back an unusual response that technically is a no, but effectively is a yes. Different models can behave semantically different too.

If I had a nice CI/CD workflow that was built into GitHub rather than rolling my own that I have running locally, that might just make it a little more automatic and a little easier.