I've done some basic testing of the CoreAI framework (using Apple's official 'llm-runner' and officially supported .coreai converted models) and seen no noticable performance increase between standard MLX or GGUF with llama.cpp. I'd love to see some thorough benchmarks from someone though.