I'm not running it locally, just using cloud inference. The people I know who do use RTX 6000s, picking the quant based on how many of them they've got. Chained M3 ultra setups are fine to play around with but too slow for actual use as a dev.