Just to piggyback onto this comment; has anyone tried running multiple of these in conjunction? For example, having a Python script that has one of these orchestrate others, and offloads certain tasks to better/more powerful models, or even cloud models?
yes but then that defeats the purpose of 'local'
and if remaining local, the hardware required to run multiple poor models could be better spent running better models.
I have attempted to orchestrate using different models, loading and unloading, but the speed is not there and by the time mistakes are discovered considering the lack of quick iteration the results become worthless unless the task is trivial.