I've done some basic testing of the CoreAI framework (using Apple's official 'llm-runner' and officially supported .coreai converted models) and seen no noticable performance increase between standard MLX or GGUF with llama.cpp. I'd love to see some thorough benchmarks from someone though.
I've done some basic testing of the CoreAI framework (using Apple's official 'llm-runner' and officially supported .coreai converted models) and seen no noticable performance increase between standard MLX or GGUF with llama.cpp. I'd love to see some thorough benchmarks from someone though.