logoalt Hacker News

0xbadcafebeeyesterday at 11:39 PM4 repliesview on HN

I don't see an explanation of why they would make a model-specific inference engine vs just using llamacpp. There are already lots of people working on the llamacpp integration. This is a lot of effort spent on a single model which is likely to become obsolete when a different model comes out that does better. In some discussions, people are now making PRs against both the llamacpp branches and ds4... so it's taking a rare commodity (people investing development time in this model) and fragmenting it


Replies

dilaptoday at 4:12 AM

way easier to work on a focussed c codebase you own than a mature unwieldy c++ codebase you don't. but it's fine, people will take that work and port to llamacpp and everyone wins.

(the ux of ds4 is fantastic too -- it's dead-easy to get a known-good model, great quant. llamacpp you're much more hacking in the wilderness, w/ many many knobs.)

flakinesstoday at 12:03 AM

I believe the assumption is: The code is cheap. The collaboration (eg. upstreaming) is expensive.

Is it true? We'll see, in a few years.

zozbot234today at 12:01 AM

Author has mentioned many times that the llama.cpp maintainers don't want code that's prevalently written by AI with no human revision. If anyone wants to try and get the support upstreamed into that project, they're quite free to do that: the code is MIT licensed.

show 1 reply
fgfarbentoday at 12:58 AM

At a certain point the level of abstraction / genericization necessary for a big flexible project (like llama.cpp or Linux) blows things up into a huge number of files. Something newer and smaller can move faster.