logoalt Hacker News

dminiktoday at 7:08 AM1 replyview on HN

You can have multiple models served now with loading/unloading with just the server binary.

https://github.com/ggml-org/llama.cpp/blob/master/tools/serv...


Replies

speedgoosetoday at 8:34 AM

It only lacks the automatic FIFO loading/unloading then. Maybe it will be there in a few weeks.