> Tool calling? The model emits JSON as it autocompletes the prompt, and the json is then parsed out and transformed into an HTTP call.
No. Code assistants determine which tool they can execute to meet a specific goal. They pick the tool, the execute the tool (meaning, they build command line arguments, run the command line app, analyze output, assess outcome) as subtasks.
And they do it as part of ReAct loops. If the tool fails to run, code assistants can troubleshoot problems on the fly and adapt how to call then tool until they reach the goal.
> And they do it as part of ReAct loops. If the tool fails to run, code assistants can troubleshoot problems on the fly and adapt how to call then tool until they reach the goal.
Yeah, but fundamentally all of this is implemented as next token prediction, given the context (which the tool results are).
Honestly, it's pretty amazing how much we can do with next token prediction, but that's essentially all that's happening here.
> Code assistants determine which tool they can execute to meet a specific goal. They pick the tool, the execute the tool (meaning, they build command line arguments, run the command line app, analyze output, assess outcome) as subtasks.
And they do this - wait for it - by emitting tokens. Which are then parsed into a function call.
You’re just confusing a harness around an LLM for something more. And the core, the LLM takes input tokens and outputs the most likely next tokens. Those tokens might be interpreted into a tool call or anything else, but it’s still just token prediction.
If you disagree, explain what the actual difference is. I claim that LLMs “use” tools by emitting tokens which are taken and passed to a tool call. If you disagree, how?