The inference code is not part of a LLM and there can be multiple different implementations of it. The model, code to train the model, and code to run the modal are different things.
While that might be true in a majority of cases, it's not necessarily universal. Recently model providers have worked with inference libraries to support their models at launch, but say in transformers you can include code for a new architecture, and if you load it with "trust_remote_code=True" it will still work. You can modify the forward pass or whatever you want to do. In that sense, code can be part of a model.
> The inference code is not part of a LLM
While that might be true in a majority of cases, it's not necessarily universal. Recently model providers have worked with inference libraries to support their models at launch, but say in transformers you can include code for a new architecture, and if you load it with "trust_remote_code=True" it will still work. You can modify the forward pass or whatever you want to do. In that sense, code can be part of a model.