Makes sense - This is very similar to fine tuning a down stream task in encoder-decoder architecture (~Bert style)