It's not. Maybe if you used old versions of stockfish that predate the neural net methods used by current versions, because otherwise you'd be comparing the hand-rolled (by an LLM) position evaluation functions against an NNUE and the results of that are a forgone conclusion; stockfish will stomp it every time.
Maybe that's the result you want for some sort of rhetorical reason, but it would nonetheless not be an informative test.