Thinking shouldn't be too hard to deal with---just let the model generate freely until it hits ...

aesthesia • today at 5:42 AM • 1 reply • view on HN

Thinking shouldn't be too hard to deal with---just let the model generate freely until it hits a </think> token, then do constrained decoding, right?

stymaar • today at 8:03 AM

Sure, but does llama-cpp support that?

alt Hacker News